Skip to content

Commit

Permalink
feat(lesson): activity 1 - clean code, pep8, pseudocode
Browse files Browse the repository at this point in the history
  • Loading branch information
lwasser authored Oct 11, 2024
2 parents d704748 + 0adc044 commit 54cc454
Show file tree
Hide file tree
Showing 4 changed files with 152 additions and 98 deletions.
4 changes: 0 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
# <img src="https://www.pyopensci.org/images/logo.png" width=100 /> pyOpenSci Lessons
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
[![All Contributors](https://img.shields.io/badge/all_contributors-3-orange.svg?style=flat-square)](#contributors-)
<!-- ALL-CONTRIBUTORS-BADGE:END -->

[![All Contributors](https://img.shields.io/github/all-contributors/pyOpenSci/lessons?color=ee8449)](#contributors-)

![GitHub release (latest by date)](https://img.shields.io/github/v/release/pyopensci/lessons?color=purple&display_name=tag&style=plastic)

[![DOI](https://zenodo.org/badge/855935338.svg)](https://doi.org/10.5281/zenodo.13910245)


[![CircleCI](https://circleci.com/gh/pyOpenSci/python-package-guide.svg?style=svg)](https://circleci.com/gh/pyOpenSci/python-package-guide)

## What is pyOpenSci?
Expand Down
148 changes: 148 additions & 0 deletions clean-modular-code/activity-1/clean-code-activity-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

(clean-code-activity-1)=
# Clean, Modular Code: Activity 1

Writing clean, modular code takes practice but is a habit worth building. Over time as you incorporate clean code strategies that improve code quality and maintainability, these strategies will become second nature. Writing cleaner code will make your code easier to read, maintain, and share with others. It will also make your work easier for others (and your future self) to use.

In this exercise, you'll focus on using three key clean code strategies:

1. **Use expressive names**: Assign meaningful names to all variables and functions to make your code more readable. [Learn more about expressive names.](../python-expressive-code.md)
2. **Use a Python style guide (PEP8) for consistent syntax**: Adhere to [PEP8 Python code style rules](../python-pep-8.md), including proper spacing and naming conventions, to maintain a consistent and readable codebase.
3. **Identify opportunities to make your code DRY (Don't Repeat Yourself)**: In this activity, you will use pseudocode to identify areas where the code can be simplified and made DRY. In the next activity, you will implement DRY best practices using loops and functions.

By practicing these strategies, you are well on your way to writing clean, efficient, and maintainable code.

## Activity data

This activity begins with some code that you "inherited from a former lab mate". This code processes [cross-ref](https://www.crossref.org/) citation data for [The Journal of Open Source Software (JOSS)](https://joss.theoj.org/) publications. The data are pulled directly from the crossref API but then modified with specific "features" to help you learn better coding practices when completing the activities.

## Your goal

Your goal is to take the code below and turn it into a script that has the following characteristics:

* The code uses clean, expressive naming conventions
* The code follows the PEP 8 style guide

In this part of the activity, you will create pseudocode to identify:

* How the code could be more DRY. Are elements repeated?
* How the code could be made more modular.

In the next activity, you will modify the code.

### If you want to use an LLM to support your learning

The cleanup steps discussed above are things a LLM (Large Language Model) like ChatGPT and Anthropic's Claude can help you with. However, remember that LLMs often return wrong or partially wrong answers. This means that if you use LLM's, you must have a keen eye to catch issues in LLM-generated code.

If you are using a LLM:

* provide it with descriptive, leading prompts that allow it to perform the task better. So you might write:

> Make the variable names in the code below more expressive.
or

> Identify areas of the code below that could be more DRY. Write pseudocode that identifies the processing steps that are repeated in the code.
:::{important}
If you use an LLM for this activity, consider adding any odd or incorrect code it returns to our shared workshop document.
:::

## Your task

The workflow below should open the data stored in the `/data` directory for this activity. The code should run if you have a proper Python environment setup.

## Part 1 - evaluate with a partner

To begin, look at the code.

* Create a list of any issues that you see with it.
* What is the code supposed to do?
* Does the code run?
* What ideas do you have to make it more efficient?

```{code-cell} ipython3
import os
import numpy as np
import json
from glob import glob
from pathlib import Path
import numpy as np
path = "data/part-1-data.json"
with open(path, "r") as z:
x = json.load(z)
import pandas as pd
a=pd.json_normalize(x)
b=['publisher', 'DOI', 'type', 'author','is-referenced-by-count', 'title', 'published.date-parts']
df=a.filter(items=b)
for i,r in df.iterrows():
l = r["published.date-parts"][0]
df.at[i, 'title'] = df.at[i, 'title'][0]
s = f"{l[0]}-{l[1]:02d}-{l[2]:02d}"
d = pd.to_datetime(s, format='%Y-%m-%d')
df.at[i, 'published_date'] = d
df.drop("published.date-parts", axis=1, inplace=True)
print(df.shape)
path="data/part-1-datab.json"
with open(path, "r") as z:
x=json.load(z)
a=pd.json_normalize(x)
b=['publisher', 'DOI', 'type', 'author','is-referenced-by-count', 'title', 'published.date-parts']
df2=a.filter(items=b)
for i, r in df2.iterrows():
l=r["published.date-parts"][0]
df2.at[i, 'title'] = df.at[i, 'title'][0]
s=f"{l[0]}-{l[1]:02d}-{l[2]:02d}"
d=pd.to_datetime(s, format='%Y-%m-%d')
df2.at[i, 'published_date']=d
df2.drop("published.date-parts", axis=1, inplace=True)
df_combined = pd.concat([df, df2], axis=0)
df_combined.shape
```

:::{admonition} Part 2
:class: tip

Take the code above and clean it up. Make the code:

* PEP8 compliant
* Add expressive names to make it more readable
* Add a docstring to the top of the script to help a user understand what the code does.
:::

:::{admonition} Part 3
:class: tip

Evaluate the code to determine whether it could be more DRY.
* Create a list of items that you notice are repeated and could be cleaned up in the code
* Write pseudocode that describes what the code intends to do step by step
:::

:::{admonition} One your own 1
:class: tip

Begin cleaning up the code. In the next activity, we will discuss loops and functions in more detail.
:::
90 changes: 0 additions & 90 deletions clean-modular-code/activity-1/clean-code-part-1.md

This file was deleted.

8 changes: 4 additions & 4 deletions clean-modular-code/intro-clean-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jupyter:
name: python3
---

<!-- #region editable=true slideshow={"slide_type": ""} tags=["hide-content"] -->
<!-- #region editable=true slideshow={"slide_type": ""} -->
:::{toctree}
:hidden:
:caption: Lessons
Expand Down Expand Up @@ -51,14 +51,13 @@ Tests & Checks <checks-conditionals/tests-checks>
:caption: Activities
:maxdepth: 2

Clean Code: Activity 1 <activity-1/clean-code-part-1>
Clean Code: Activity 1 <activity-1/clean-code-activity-1>
Clean Code: Activity 2 <activity-2/clean-code-part-2>
Clean Code: Activity 3 <activity-3/clean-code-part-3>
:::

<!-- #endregion -->

<!-- #region editable=true slideshow={"slide_type": ""} -->
# Write Clean, Modular, DRY Code

:::{note}
Expand All @@ -68,8 +67,9 @@ After completing this lesson, you will be able to:
* Apply the PEP 8 Style Guide standards to your **Python** code.
:::

"Pythonic" code is code that follows the conventions and best practices of the Python programming language. It emphasizes code that is clear, concise, and readable--principles that adhere to Python's design philosophy. <link to zen of python>
"Pythonic" code is code that follows the conventions and best practices of the Python programming language. It emphasizes code that is clear, concise, and readable--principles that adhere to Python's design philosophy.

<!-- #region editable=true slideshow={"slide_type": ""} -->
Pythonic code also takes full advantage of Python's features which include:

* list comprehensions,
Expand Down

0 comments on commit 54cc454

Please sign in to comment.