Building a Tool to Calculate Age from Name Preferences – Step 1/?
I’m on a quest to build a tool that guesses your age based on what names you prefer. Today I worked on gathering and processing name data from the Social Security Administration. The national-level data came as a bunch of CSV files disguised as ordinary text files
</figure>
So far, I’ve written a little Python script to read all of these files into Pandas DataFrames and concatenate it all into one big DataFrame with all the name information
import pandas as pd
start_year = int(input("Start Year: "))
end_year = 1 + int(input("End Year: "))
combined_data = pd.DataFrame()
for year in range(start_year, end_year):
print(f"Processing {year}")
year_names = pd.read_csv(f'names/yob{year}.txt', header=None, names=["name", 'gender', 'occurrence'])
combined_data = pd.concat([combined_data, year_names])
print(f'COMBINED\n{combined_data}')
print(combined_data.shape)
I’ll comment on this later. It’s 23:59 and this blog post has to get out there!