Building a Tool to Calculate Age from Name Preferences – Step 2/?
Today I reworked the code a bit, added the ability to choose between genders when building the name database, and generated a unique name list at the end.
#import the wonderful pandas library
import pandas as pd
#input parameters to request the right information
start_year = int(input("Start Year: "))
#quick shortcut that inputs default start and end years if zero is typed
if start_year == 0:
start_year = 1880
end_year = 2018
else:
end_year = 1 + int(input("End Year: "))
gender = input("Gender:").upper()
#create dataframe to final dataset
combined_data = pd.DataFrame()
#iterate through csv files to produce dataframe of top 100 names from each year
for year in range(start_year, end_year):
year_names = pd.read_csv(f'names/yob{year}.txt', header=None, names=["name", 'gender', 'occurrence'])
gendered_year_names = year_names[year_names['gender'] == gender]
gendered_year_names.sort_values(by=['occurrence'], ascending=False, inplace=True)
gendered_year_names_100 = gendered_year_names.head(100)
combined_data = pd.concat([combined_data, gendered_year_names_100])
#create and print a list of unique names from selected years
unique_names = combined_data['name'].unique()
print(unique_names)
print(len(unique_names))
I also commented my code like a decent human being. Look for this project on GitHub soon.