-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Review for Project #1
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall comments:
- Good division of work into bite-sized functions
- Unfortunately, many of the functions are single-purpose because they hard-code application logic in them. Think about whether you can factor things like the names of the directories out to make more generally useful functions that you could reuse elsewhere
Most of the things I comment on below are stylistic or for greater intelligibility of the code. However the use of \
for directory separators means that this code doesn't work on non-Windows OSes (and fails in a way that gives the runner very little information about what went wrong)
|
||
def verify_dirs_exist(): | ||
#This notebook requires a few directories | ||
dirs = ["download", "download\csv", "download\excel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these directories intended to be literally named download\csv
and download\excel
? Or are csv
and excel
intended to be subdirectories under download
? On non-Windows OSes, \
is not the directory separator character, so this does the former. You can use os.path.join('download', 'csv')
to work cross-platform.
If the excel
and csv
directories are meant to be subdirectories of download
, you don't need to make download
separately. os.makedirs
operates recursively.
full_path = os.path.join(curpath, d) # join cwd with proposed d | ||
create_dir_if_not_exists(full_path) | ||
|
||
def create_dir_if_not_exists(full_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.makedirs
already does the file existence check, essentially.
try:
os.makedirs(full_path)
print(f"Created directory {full_path}")
except FileExistsError:
print(f"Directory {full_path} already existed")
The try/except version is slightly preferable because in the if version, there's a tiny chance that someone could create the directory between when you do the if and when you do the makedirs.
import glob | ||
import datetime as dt | ||
|
||
def verify_dirs_exist(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of this function is a bit misleading. It doesn't just verify that they exist, it actually creates them if they don't exist. I think it should be called ensure_dirs_exist
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also make this function take in the list of directories to create, so that it's re-usable.
else: | ||
print("Directory ", full_path, " already existed") | ||
|
||
def generate_file_name_from_url(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment with a sample URL and the file name you're generating from it would make this much easier to understand.
print("Directory ", full_path, " already existed") | ||
|
||
def generate_file_name_from_url(url): | ||
month_year = url.split("\\")[-1].split("_")[-1].split(".")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does url.split("\\")
do anything for these URLs? It looks like a sample url is https://www.huduser.gov/portal/datasets/usps/ZIP_COUNTY_032010.xlsx
, which doesn't contain the \
character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the (rare?) cases where a regular expression might make things clearer...
m = re.search('ZIP_COUNTY_(\d\d)(\d\d\d\d)\.xlsx', url)
month, year = m.groups()
#open and get the excel dataframe | ||
excel_df = process_excel_file(full_file_path) | ||
#merge the excel file with the fips data | ||
merged_df = fips_df.merge(excel_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if this function did not refer to fits_df
, which is defined outside of the function. That should be passed in as an argument or defined within the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same is true for cur_year
print(csv_path) | ||
try: | ||
merged_df.to_csv(csv_path, encoding='utf-8', index=False) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This except
is too broad. When I first tried to run this, the code did nothing because it was trying to write to a directory that didn't exist. It didn't show me the error because this clause swallowed it. If you're expecting a particular error, you should catch it as specifically as possible.
merged_df.to_csv(csv_path, encoding='utf-8', index=False) | ||
except: | ||
#once we get to a Q that hasn't happened yet, we'll get an XLDRerror | ||
print("Operation has completed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which operation has completed? This message could be more descriptive. "All files have been downloaded and processed"?
try: | ||
merged_df.to_csv(csv_path, encoding='utf-8', index=False) | ||
except: | ||
#once we get to a Q that hasn't happened yet, we'll get an XLDRerror |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more straightforward way to detect this? Check to see if the input is empty? Check if year > now.year or year == now.year and month > now.month
?
import os | ||
import time | ||
import pandas as pd | ||
import glob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you use glob
in this code.
Oh, one thing I forgot to mention. Your requirements.txt needs |
WonderingStar to perform code review & LearningSomethingNew to update code on feedback.