Goals:
- Participants should be able to create a new jupyter notebook, open an old one, or run a simple python script from within an IDE (stretch goal would be from command line)
- Participants can list the main datatypes and basic objects used in python
- Participants can produce code for simple python tasks: math, looping, functions, logical statements
Content:
1.1. Introduction - What is Python and Why it Matters (TBD)
1.2. How to Python – Anaconda, Conda, Jupyter, IDEs (Lindsay)
- Installing Python/Jupyter etc.
- Make this ideally pre-work so they will have at least attempted to install the basics onto their computer before the workshop
- Overview of working within given workspace (will choose either VSCode or PyCharm)
1.3. Working with Code (Stu)
- Documenting/organizing work
- Writing code
- Python syntax and practice
- Troubleshooting and help
1.4. Core data structure concepts (Lindsay)
- Variables, datatypes, lists, dicts, tuples etc.
- Methods/functions and how they work with data
Goals:
- Participants can read data into pandas dataframes from standard sources (csv, excel)
- Participants can view a dataset and provide basic information about it
- Participants can remove/replace NA data
Content: 2.1. Getting data (Lindsay)
- Using pandas
- Reading data from existing sources
- Creating data
- Viewing data
- Basic knowledge about dataset (overall stats, data types)
2.2. Data Cleaning (Stu)
- Dealing with missing (e.g. NA) data
- Changing data types
- Changing values
Goals:
- Participants can explore data in their datasets to enable understanding contents.
- Participants can summarize a dataset at a variety of levels of aggregation
- Participants can describe the difference in different styles of augmenting datasets (left/inner joins, concatentations, etc)
- Participants can produce report summary and statistics (and statistical tests) for different types of data
- Participants can produce charts to display statistics, including bar, line, scatter
Content:
3.1. Exploring data structures with pandas (Stu)
- Selecting rows and columns
- Grouping rows and columns
- Sorting data
- Creating columns (transform)
- Joining datasets
3.2. Graphical depictions of data (Visualizations) (Lindsay)
- Distributions (e.g. line and bar histograms)
- Comparisons (e.g. barcharts and line charts)
- Relationships (e.g. Scatter and bubble plots)
Goals:
- Participants can find and download data from the bcdata catalog
- Participants can produce a basic html from an ipynb workbook
- Participants can describe what the methods .fit(), .predict() will do in a basic sklearn pipeline
- Participants understand the difference between regression and classification techniques, and can explain which are the typical scoring methods used for each
Content:
4.1. Publishing/Reporting (Lindsay)
4.2. Fetching data from bcdata (Lindsay)
4.3. Advanced Pandas (Lindsay)
4.3. Machine Learning use cases – Scikit-learn (Stu)