I’m on a quest to build a tool that guesses your age based on what names you prefer. Today I worked on gathering and processing name data from the Social Security Administration. The national-level data came as a bunch of CSV files disguised as ordinary text files

</figure>

So far, I’ve written a little Python script to read all of these files into Pandas DataFrames and concatenate it all into one big DataFrame with all the name information

import pandas as pd

start_year = int(input("Start Year: "))
end_year = 1 + int(input("End Year: "))
combined_data = pd.DataFrame()

for year in range(start_year, end_year):
    print(f"Processing {year}")
    year_names = pd.read_csv(f'names/yob{year}.txt', header=None, names=["name", 'gender', 'occurrence'])
    combined_data = pd.concat([combined_data, year_names])

print(f'COMBINED\n{combined_data}')
print(combined_data.shape)

I’ll comment on this later. It’s 23:59 and this blog post has to get out there!