Building a Tool to Calculate Age from Name Preferences – Step 2/?

Today I reworked the code a bit, added the ability to choose between genders when building the name database, and generated a unique name list at the end.

#import the wonderful pandas library
import pandas as pd


#input parameters to request the right information
start_year = int(input("Start Year: "))
#quick shortcut that inputs default start and end years if zero is typed
if start_year == 0:
    start_year = 1880
    end_year = 2018
else:
    end_year = 1 + int(input("End Year: "))
gender = input("Gender:").upper()


#create dataframe to final dataset
combined_data = pd.DataFrame()


#iterate through csv files to produce dataframe of top 100 names from each year
for year in range(start_year, end_year):
    year_names = pd.read_csv(f'names/yob{year}.txt', header=None, names=["name", 'gender', 'occurrence'])
    gendered_year_names = year_names[year_names['gender'] == gender]
    gendered_year_names.sort_values(by=['occurrence'], ascending=False, inplace=True)
    gendered_year_names_100 = gendered_year_names.head(100)
    combined_data = pd.concat([combined_data, gendered_year_names_100])


#create and print a list of unique names from selected years
unique_names = combined_data['name'].unique()
print(unique_names)
print(len(unique_names))

The top 100 male names from 1880 to 2017

I also commented my code like a decent human being. Look for this project on GitHub soon.