cal-itp · shweta487 · Jan 13, 2025 · Jan 2, 2025 · Jan 2, 2025 · Jan 13, 2025
@@ -28,7 +28,7 @@ Below is a series of tips, tricks and use-cases for managing data throughout the
 
 ### GCS
 
-Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
+Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
 
 By putting data on GCS, anybody on the team can use/access/replicate the data without having to transfer data files between machines.
 

@@ -20,12 +20,14 @@ This section is geared towards data analysts who are new to Python. The followin
 - If you are new to Python, take a look at [all the Python tutorials](https://www.linkedin.com/learning/search?keywords=python&u=36029164) available through Caltrans. There are many introductory Python courses [such as this one.](https://www.linkedin.com/learning/python-essential-training-18764650/getting-started-with-python?autoplay=true&u=36029164)
 - [Joris van den Bossche's Geopandas Tutorial](https://github.com/jorisvandenbossche/geopandas-tutorial)
 - [Practical Python for Data Science by Jill Cates](https://www.practicalpythonfordatascience.com/intro.html)
+- [General Python Functions](https://pandas.pydata.org/pandas-docs/stable/reference/general_functions.html)
 - [Ben-Gurion University of the Negev - Geometric operations](https://geobgu.xyz/py/geopandas2.html)
 - [Geographic Thinking for Data Scientists](https://geographicdata.science/book/notebooks/01_geo_thinking.html)
 - [PyGIS Geospatial Tutorials](https://pygis.io/docs/a_intro.html)
 - [Python Courses, compiled by our team](https://docs.google.com/spreadsheets/d/1Omow8F0SUiMx1jyG7GpbwnnJ5yWqlLeMH7SMtKxwG80/edit?usp=sharing)
 - [Why Dask?](https://docs.dask.org/en/stable/why.html)
 - [10 Minutes to Dask](https://docs.dask.org/en/stable/10-minutes-to-dask.html)
+- [Jupyter Notebook Tutorial](https://www.youtube.com/watch?v=LW2Rye_l8L0)
 
 ### Books
 

@@ -60,7 +60,6 @@ Here are some resources data analysts have collected and referenced, that will h
 - When working with data sets where the "merge on" column is a string data type, it can be difficult to get the DataFrames to join. For example, df1 lists <i>County of Sonoma, Human Services Department, Adult and Aging Division</i>, but df2 references the same department as: <i>County of Sonoma (Human Services Department) </i>.
   - Potential Solution #1: [fill in a column in one DataFrame that has a partial match with the string values in another one.](https://stackoverflow.com/questions/61811137/based-on-partial-string-match-fill-one-data-frame-column-from-another-dataframe)
   - Potential Solution #2: [use the package fuzzymatcher. This will require you to carefully comb through for any bad matches.](https://pbpython.com/record-linking.html)
-  - Potential Solution #3: [if you don't have too many values, use a dictionary.](https://github.com/cal-itp/data-analyses/blob/main/drmt_grants/TIRCP_functions.py#:~:text=%23%23%23%20RECIPIENTS%20%23%23%23,%7D)
 
 (dates)=
 

@@ -46,7 +46,7 @@ In order to save data being used in a report, you can use two methods:
 
 Watch the screencast below and read the additional information to begin.
 
-**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse).
+**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse).
 
 (storing-new-data-screencast)=
 

@@ -101,15 +101,21 @@ jupyter nbconvert --to html --no-input --no-prompt my_notebook.ipynb
 weasyprint my_notebook.html my_notebook.pdf
 ```
 
-- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This  essentially combines parameterization principles using `papermill`  with the `weasyprint` steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/run_papermill.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/sb1_scorecard.ipynb) to produce 50+ PDF files for each of the nominated projects.
+- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using papermill with the weasyprint steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/_make_scorecard.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/08_csis_scorecard.ipynb) to produce PDF files for each of the nominated projects found [here](<https://console.cloud.google.com/storage/browser/calitp-analytics-data/data-analyses/general_csis/scorecards?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cal-itp-data-infra>).
 
-  Briefly, the script above does the following:
+Note: Viewer may need access to the private CSIS repository.
 
-  - Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
-  - Saves the notebook as html files.
-  - Converts the html files to PDF.
-  - Saves each PDF to the folder (organized by district) to our GCS.
-  - Deletes irrelevant files.
+Briefly, the script above does the following:
+
+- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
+
+- Saves the notebook as html files.
+
+- Converts the html files to PDF.
+
+- Saves each PDF to the folder (organized by district) to our GCS.
+
+- Deletes irrelevant files.
 
 - Here are some tips and tricks when converting notebooks to HTML before PDF conversions.
 

@@ -210,7 +210,7 @@ build_my_reports:
     git add portfolio/sites/my_report.yml
 ```
 
-### Delete Portfolio/ Refresh Index Page
+### Redeploying Portfolio/ Refresh Index Page
 
 When redeploying your portfolio with new content and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content.
-Original file line number
+Diff line change
@@ Expand Up @@
     ### GCS
-    Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
+    Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
     By putting data on GCS, anybody on the team can use/access/replicate the data without having to transfer data files between machines.
@@ Expand Down @@