Welcome to the Distance Data codebase! This codebase provides functionalities for analyzing and visualizing distance data between zip codes, as well as performing data preprocessing and manipulation tasks. It is designed to work with CSV files containing distance data and a zip code database.
The main functionalities of the Distance Data codebase include:
- Reading and writing distance data to/from CSV files.
- Plotting histograms of distance data.
- Filtering distance data based on desired state.
- Computing patient/dispensary ratios and distance*patient values.
- Adding latitude and longitude information to zip codes in dispensary and patient data.
To use the Distance Data codebase, follow these steps:
-
Clone the repository:
git clone https://github.com/DoctorGoose/PAMJ.git
-
Navigate to the codebase directory:
cd PAMJ
-
Install the required dependencies:
pip install pandas numpy matplotlib geopandas zipfile
To read distance data from a CSV file, use the following code:
import pandas as pd
df = pd.read_csv('Distance Data.csv')
To write distance data to a CSV file, use the following code:
df.to_csv('Distance Data.csv', index=False)
To plot a histogram of the "Nearest Disp Distance" column in the distance data, use the following code:
import matplotlib.pyplot as plt
plt.hist(df['Nearest Disp Distance'])
plt.show()
To filter the distance data to include only zip codes from a desired state (e.g., Pennsylvania), use the following code:
desired_state = 'PA'
filtered_df = df[df['state'] == desired_state]
To compute the patient/dispensary ratio and distance*patient values, use the following code:
df['Patient/Dispensary Ratio'] = df.apply(lambda row: row['Patient Count'] if row['Dispensary Count'] == 0 else row['Patient Count']/row['Dispensary Count'], axis=1)
df['Distance*Patient'] = df['Nearest Disp Distance'] * df['Patient Count']
To add latitude and longitude information to zip codes in the dispensary and patient data, use the following code:
df_dispo = pd.read_csv('DispoZipLatLong.csv')
df_pat = pd.read_csv('PatientZipLatLong.csv')
zipcodes_dispo = df_dispo['Zipcode'].unique()
zipcodes_pat = df_pat['Zipcode'].unique()
zipcode_counts_dispo = df_dispo['Zipcode'].value_counts()
zipcode_counts_pat = df_pat['Zipcode'].value_counts()
zipcode_counts_combined = pd.concat([zipcode_counts_dispo, zipcode_counts_pat], axis=1)
zipcode_counts_combined.columns = ['Dispensary Count', 'Patient Count']
The Distance Data codebase is maintained by DoctorGoose.
Contributions to the Distance Data codebase are welcome! If you encounter any issues or have suggestions for improvements, please open an issue on GitHub.
The Distance Data codebase is licensed under the MIT License.