diff --git a/docs/clean-prep/index.rst b/docs/clean-prep/index.rst index 1002fbf5a..26833f138 100644 --- a/docs/clean-prep/index.rst +++ b/docs/clean-prep/index.rst @@ -15,6 +15,10 @@ we also use several small, specialised libraries like :doc:`dedupe systems like `Great Expectations `_ or `MobyDQ `_. +.. tip:: + `cusy seminar: Cleanse and validate data with Python + `_ + Overview -------- diff --git a/docs/data-processing/index.rst b/docs/data-processing/index.rst index f63ea9e26..e7e2b2904 100644 --- a/docs/data-processing/index.rst +++ b/docs/data-processing/index.rst @@ -17,6 +17,10 @@ three tools in more detail that make data accessible: * :doc:`httpx/index` * :doc:`intake/index` +.. tip:: + `Read, write and provide data with Python + `_ + .. seealso:: `pandas I/O API `_ The pandas I/O API is a set of top level ``reader`` functions that diff --git a/docs/index.rst b/docs/index.rst index ab4485c63..5e627242b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -23,8 +23,86 @@ This tutorial is not intended to be an introduction to Python or programming in general; for that there is the :doc:`python-basics:index` tutorial. Instead, it is intended to show the Python data science stack – libraries such as :doc:`/workspace/ipython/index`, :doc:`/workspace/numpy/index`, -:doc:`/workspace/pandas/index`, :doc:`pyviz:matplotlib/index` and related tools -– so that you can subsequently effectively analyse and visualise your data. +:doc:`/workspace/pandas/index`, and related tools – so that you can +subsequently effectively analyse your data. We also offer the `Jupyter Tutorial +`_ and the `PyViz +Tutorial `_ as well +as the instructions for `data visualisation +`_ from the `cusy Design +System `_. + +All tutorials serve as seminar documents for our harmonised training courses: + ++---------------+--------------------------------------------------------------+ +| Duration | Topic | ++===============+==============================================================+ +| 3 days | `Introduction to Python`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Advanced Python`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Design patterns in Python`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Efficient testing with Python`_ | ++---------------+--------------------------------------------------------------+ +| 1 day | `Software documentation with Sphinx`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Technical writing`_ | ++---------------+--------------------------------------------------------------+ +| 3 days | `Jupyter notebooks for efficient data science workflows`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Numerical calculations with NumPy`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Analysing data with pandas`_ | ++---------------+--------------------------------------------------------------+ +| 3 days | `Read, write and provide data with Python`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Cleanse and validate data with Python`_ | ++---------------+--------------------------------------------------------------+ +| 5 days | `Visualising data with Python`_ | ++---------------+--------------------------------------------------------------+ +| 1 days | `Designing data visualisations`_ | ++---------------+--------------------------------------------------------------+ +| 2 days | `Create dashboards`_ | ++---------------+--------------------------------------------------------------+ +| 3 days | `Versioned and reproducible storage of code and data`_ | ++---------------+--------------------------------------------------------------+ +| Subscription | `News from Python for data science`_ | +| of 2 hours | | +| per quarter | | ++---------------+--------------------------------------------------------------+ + +.. _`Introduction to Python`: + https://cusy.io/en/our-training-courses/introduction-to-python +.. _`Advanced Python`: + https://cusy.io/en/our-training-courses/advanced-python +.. _`Design patterns in Python`: + https://cusy.io/en/our-training-courses/design-patterns-in-python +.. _`Efficient testing with Python`: + https://cusy.io/en/our-training-courses/efficient-testing-with-python +.. _`Software documentation with Sphinx`: + https://cusy.io/en/our-training-courses/software-documentation-with-sphinx +.. _`Technical writing`: + https://cusy.io/en/our-training-courses/technical-writing +.. _`Jupyter notebooks for efficient data science workflows`: + https://cusy.io/en/our-training-courses/jupyter-notebooks-for-efficient-data-science-workflows +.. _`Numerical calculations with NumPy`: + https://cusy.io/en/our-training-courses/numerical-calculations-with-numpy +.. _`Analysing data with pandas`: + https://cusy.io/en/our-training-courses/analysing-data-with-pandas +.. _`Read, write and provide data with Python`: + https://cusy.io/en/our-training-courses/read-write-and-provide-data-with-python +.. _`Cleanse and validate data with Python`: + https://cusy.io/en/our-training-courses/cleanse-and-validate-data-with-python +.. _`Visualising data with Python`: + https://cusy.io/en/our-training-courses/visualising-data-with-python +.. _`Designing data visualisations`: + https://cusy.io/en/our-training-courses/designing-data-visualisations +.. _`Create dashboards`: + https://cusy.io/en/our-training-courses/create-dashboards +.. _`Versioned and reproducible storage of code and data`: + https://cusy.io/en/our-training-courses/versioned-and-reproducible-storage-of-code-and-data +.. _`News from Python for data science`: + https://cusy.io/en/our-training-courses/news-from-python-for-data-science .. toctree:: :hidden: diff --git a/docs/productive/dvc/index.rst b/docs/productive/dvc/index.rst index 11558a188..7e302b500 100644 --- a/docs/productive/dvc/index.rst +++ b/docs/productive/dvc/index.rst @@ -9,21 +9,22 @@ For data analysis, and especially machine learning, it is extremely valuable to be able to reproduce different versions of analyses that have been carried out with different data sets and parameters. However, in order to obtain reproducible analyses, both the data and the model (including the algorithms, -parameters, etc.) must be versioned. Versioning data for reproducible analysis -is a bigger problem than versioning models because of the size of the data. -Tools like `DVC `_ help manage data by allowing users to -transfer it to a remote data store using a :doc:`Git <../git/index>` like -workflow. This simplifies the retrieval of certain versions of data in order to -reproduce an analysis. - -DVC was developed to be able to use ML models and data sets together and to -manage them in a comprehensible manner. It works with different version -managements, but does not need them. In contrast to `DataLad +parameters, :abbr:`etc. (et cetera)`) must be versioned. Versioning data for +reproducible analysis is a bigger problem than versioning models because of the +size of the data. Tools like `DVC `_ help manage data by +allowing users to transfer it to a remote data store using a :doc:`Git +<../git/index>` like workflow. This simplifies the retrieval of certain versions +of data in order to reproduce an analysis. + +DVC was developed to be able to use :abbr:`ML (Machine Learning)` models and +data sets together and to manage them in a comprehensible manner. It works with +different version managements, but does not need them. In contrast to `DataLad `_/`git-annex `_, for example, it is not limited to Git as version management, but can also be used together with Mercurial, see `github.com/crobarcro/dvc/dvc/scm.py `_. It also uses its -own system for storing files with support for SSH and HDFS, among others. +own system for storing files with support for :abbr:`SSH /Secure Shell)` and +:abbr:`HDFS (Hadoop Distributed File System)`, among others. DataLad, on the other hand, focuses more on discovering and consuming datasets, which are then easily managed with Git. DVC, on the other hand, stores each step @@ -35,6 +36,10 @@ visualizing DAGs, see, for example, :doc:`visualisation of DAGs External dependencies can also be specified with :ref:`dvc remote `. +.. tip:: + `Versioned and reproducible storage of code and data + `_ + .. seealso:: * `Tutorial `_ * `Documentation `_ diff --git a/docs/productive/git/index.rst b/docs/productive/git/index.rst index 4ecf63631..75171da9a 100644 --- a/docs/productive/git/index.rst +++ b/docs/productive/git/index.rst @@ -21,6 +21,10 @@ local repository can contain specific changes. However, Git can not only be used in a distributed way, it is also performant, secure and flexible. +.. tip:: + `Versioned and reproducible storage of code and data + `_ + Performance ----------- diff --git a/docs/productive/qa/code-smells.rst b/docs/productive/qa/code-smells.rst index 928f401c8..5d5c50431 100644 --- a/docs/productive/qa/code-smells.rst +++ b/docs/productive/qa/code-smells.rst @@ -10,6 +10,10 @@ design of a programme. For example, the overuse of isinstance checks against concrete classes is a code smell, as it makes the programme more difficult to extend to deal with new types in the future. +.. tip:: + `Design patterns in Python + `_ + Recognising code smells ----------------------- diff --git a/docs/workspace/numpy/index.rst b/docs/workspace/numpy/index.rst index 63b178379..49a34ecd6 100644 --- a/docs/workspace/numpy/index.rst +++ b/docs/workspace/numpy/index.rst @@ -30,6 +30,10 @@ the main functionality of NumPy: array-oriented programming and thinking is an important step on the way to becoming a data scientist. +.. tip:: + `cusy seminar: Numerical calculations with NumPy + `_ + .. seealso:: * `Home `_ diff --git a/docs/workspace/pandas/index.rst b/docs/workspace/pandas/index.rst index 9ad4ade6c..66f727e77 100644 --- a/docs/workspace/pandas/index.rst +++ b/docs/workspace/pandas/index.rst @@ -26,6 +26,10 @@ Python code. Mostly pandas is used to :doc:`/data-processing/serialisation-formats/json/index` data * prepare machine learning +.. tip:: + `Analysing data with pandas + `_ + .. seealso:: * `Home `_