Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curriculum Docs Update #3613

Merged
merged 4 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/analytics_new_analysts/03-data-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Below is a series of tips, tricks and use-cases for managing data throughout the

### GCS

Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).

By putting data on GCS, anybody on the team can use/access/replicate the data without having to transfer data files between machines.

Expand Down
2 changes: 2 additions & 0 deletions docs/analytics_new_analysts/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ This section is geared towards data analysts who are new to Python. The followin
- If you are new to Python, take a look at [all the Python tutorials](https://www.linkedin.com/learning/search?keywords=python&u=36029164) available through Caltrans. There are many introductory Python courses [such as this one.](https://www.linkedin.com/learning/python-essential-training-18764650/getting-started-with-python?autoplay=true&u=36029164)
- [Joris van den Bossche's Geopandas Tutorial](https://github.com/jorisvandenbossche/geopandas-tutorial)
- [Practical Python for Data Science by Jill Cates](https://www.practicalpythonfordatascience.com/intro.html)
- [General Python Functions](https://pandas.pydata.org/pandas-docs/stable/reference/general_functions.html)
- [Ben-Gurion University of the Negev - Geometric operations](https://geobgu.xyz/py/geopandas2.html)
- [Geographic Thinking for Data Scientists](https://geographicdata.science/book/notebooks/01_geo_thinking.html)
- [PyGIS Geospatial Tutorials](https://pygis.io/docs/a_intro.html)
- [Python Courses, compiled by our team](https://docs.google.com/spreadsheets/d/1Omow8F0SUiMx1jyG7GpbwnnJ5yWqlLeMH7SMtKxwG80/edit?usp=sharing)
- [Why Dask?](https://docs.dask.org/en/stable/why.html)
- [10 Minutes to Dask](https://docs.dask.org/en/stable/10-minutes-to-dask.html)
- [Jupyter Notebook Tutorial](https://www.youtube.com/watch?v=LW2Rye_l8L0)

### Books

Expand Down
1 change: 0 additions & 1 deletion docs/analytics_tools/knowledge_sharing.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ Here are some resources data analysts have collected and referenced, that will h
- When working with data sets where the "merge on" column is a string data type, it can be difficult to get the DataFrames to join. For example, df1 lists <i>County of Sonoma, Human Services Department, Adult and Aging Division</i>, but df2 references the same department as: <i>County of Sonoma (Human Services Department) </i>.
- Potential Solution #1: [fill in a column in one DataFrame that has a partial match with the string values in another one.](https://stackoverflow.com/questions/61811137/based-on-partial-string-match-fill-one-data-frame-column-from-another-dataframe)
- Potential Solution #2: [use the package fuzzymatcher. This will require you to carefully comb through for any bad matches.](https://pbpython.com/record-linking.html)
- Potential Solution #3: [if you don't have too many values, use a dictionary.](https://github.com/cal-itp/data-analyses/blob/main/drmt_grants/TIRCP_functions.py#:~:text=%23%23%23%20RECIPIENTS%20%23%23%23,%7D)

(dates)=

Expand Down
2 changes: 1 addition & 1 deletion docs/analytics_tools/storing_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ In order to save data being used in a report, you can use two methods:

Watch the screencast below and read the additional information to begin.

**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse).
**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse).

(storing-new-data-screencast)=

Expand Down
20 changes: 13 additions & 7 deletions docs/publishing/sections/2_static_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,15 +101,21 @@ jupyter nbconvert --to html --no-input --no-prompt my_notebook.ipynb
weasyprint my_notebook.html my_notebook.pdf
```

- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using `papermill` with the `weasyprint` steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/run_papermill.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/sb1_scorecard.ipynb) to produce 50+ PDF files for each of the nominated projects.
- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using papermill with the weasyprint steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/_make_scorecard.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/08_csis_scorecard.ipynb) to produce PDF files for each of the nominated projects found [here](<https://console.cloud.google.com/storage/browser/calitp-analytics-data/data-analyses/general_csis/scorecards?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cal-itp-data-infra>).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that a viewer may need access to the CSIS repo, which is private


Briefly, the script above does the following:
Note: Viewer may need access to the private CSIS repository.

- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
- Saves the notebook as html files.
- Converts the html files to PDF.
- Saves each PDF to the folder (organized by district) to our GCS.
- Deletes irrelevant files.
Briefly, the script above does the following:

- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.

- Saves the notebook as html files.

- Converts the html files to PDF.

- Saves each PDF to the folder (organized by district) to our GCS.

- Deletes irrelevant files.

- Here are some tips and tricks when converting notebooks to HTML before PDF conversions.

Expand Down
2 changes: 1 addition & 1 deletion docs/publishing/sections/5_analytics_portfolio_site.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ build_my_reports:
git add portfolio/sites/my_report.yml
```
### Delete Portfolio/ Refresh Index Page
### Redeploying Portfolio/ Refresh Index Page
When redeploying your portfolio with new content and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content.
Expand Down
Loading