Skip to content

Commit

Permalink
Fix column naming in process_df for Pandas compatibility (#215)
Browse files Browse the repository at this point in the history
This commit adjusts the process_df function in citibike.py to ensure consistent column naming across different versions of Pandas. It resolves an issue where the 'value_counts' method in newer Pandas versions automatically names the count column as 'count' instead of the intended 'users_count'. The fix explicitly renames the column to 'users_count', enhancing compatibility and predictability with various Pandas versions.
  • Loading branch information
carlosfab authored Nov 21, 2023
1 parent 0bdbd60 commit 9718827
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions advanced_tutorials/citibike/features/citibike.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,9 @@ def process_df(original_df, month, year):
df_res = original_df[["started_at", "start_station_id"]]
df_res.started_at = pd.to_datetime(df_res.started_at)
df_res.started_at = df_res.started_at.dt.floor('d')
df_res = df_res.groupby(["started_at",
"start_station_id"]).value_counts().reset_index()
df_res = (df_res.groupby(['started_at', 'start_station_id'])
.size()
.reset_index(name='users_count'))
df_res = df_res.rename(columns={"started_at": "date",
"start_station_id": "station_id",
0: "users_count"})
Expand Down

0 comments on commit 9718827

Please sign in to comment.