Skip to content

Use higher quality name lists #8

@cmrn

Description

@cmrn

Currently the data used to decode SLKs is the top 1000 female and male first names and top 89000 last names from the 1990 US Census. This works relatively well, however there are a couple ways in which this could be improved:

  1. Use an Australian dataset. This will fix any problems arising from demographic differences between Australia and the US. For example, Hispanic names have a higher representation in the US than Australia.
  2. Sort by popularity in year of birth: We can more accurately guess the name by sorting the possible names based on the popularity in the specified birth year.

The best candidate dataset to make these improvements is the Popular Baby Names dataset from South Australia.

Some things to consider:

  • How can we efficiently load this data into the web browser? I think it's an important characteristic that the page, once loaded, can operate offline (for the privacy conscious).
  • What happens if the name is not in the birth-year dataset? The popularity in other years would be better than no sorting at all.
  • The South Australian dataset only contains first names. Is there an equivalent dataset for last names?

Many thanks to @crgentle on Twitter for suggesting this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions