From 130ef8ede7aeec566621a5b2041d6f5a240d1eb3 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Tue, 10 Sep 2024 21:30:39 +0200 Subject: [PATCH 1/7] :wrench: Add codespell * Add pre-commit check * Fix spelling --- .pre-commit-config.yaml | 5 +++++ docs/clean-prep/string-matching.ipynb | 2 +- docs/data-processing/apis/grpc/index.rst | 2 +- docs/data-processing/apis/index.rst | 4 ++-- docs/data-processing/geodata.rst | 2 +- docs/data-processing/nosql/column-oriented-db.rst | 2 +- docs/data-processing/nosql/document-oriented-db.rst | 2 +- docs/data-processing/nosql/graph-db.rst | 2 +- docs/data-processing/nosql/key-value-store.rst | 2 +- docs/data-processing/nosql/object-db.rst | 4 ++-- docs/data-processing/nosql/xml-db.rst | 2 +- docs/data-processing/pandas-io.rst | 2 +- docs/data-processing/postgresql/index.rst | 4 ++-- docs/data-processing/postgresql/pganalyze.rst | 2 +- .../serialisation-formats/json/index.rst | 2 +- .../serialisation-formats/protobuf.rst | 2 +- .../xml-html/beautifulsoup.ipynb | 2 +- .../serialisation-formats/yaml/index.rst | 2 +- docs/genindex.rst | 2 +- docs/performance/asyncio-example.ipynb | 4 ++-- docs/productive/envs/index.rst | 2 +- docs/productive/git/advanced/bisect.rst | 6 +++--- docs/productive/git/advanced/hooks/index.rst | 6 +++--- docs/productive/git/advanced/jupyter-notebooks.rst | 2 +- docs/productive/git/advanced/vs-code/index.rst | 2 +- docs/productive/git/install-config.rst | 4 ++-- docs/productive/git/review.rst | 12 ++++++------ docs/productive/git/workflows/split-repos.rst | 2 +- docs/productive/qa/black.rst | 6 +++--- docs/productive/qa/requests/adapters.py | 6 +++--- docs/productive/qa/requests/sessions.py | 4 ++-- docs/productive/qa/requests/utils.py | 2 +- docs/productive/security.rst | 3 +-- docs/workspace/ipython/display.ipynb | 4 ++-- docs/workspace/ipython/importing.ipynb | 2 +- .../workspace/ipython/unix-shell/create-delete.ipynb | 4 ++-- docs/workspace/ipython/unix-shell/grep-find.ipynb | 2 +- pyproject.toml | 5 +++++ 38 files changed, 67 insertions(+), 58 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 2694507c0..84372006b 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -38,3 +38,8 @@ repos: - id: sphinx-lint args: [--jobs=1] types: [rst] + - repo: https://github.com/codespell-project/codespell + rev: v2.3.0 + hooks: + - id: codespell + args: [--toml pyproject.toml] diff --git a/docs/clean-prep/string-matching.ipynb b/docs/clean-prep/string-matching.ipynb index 42511ae6e..cfd26a09a 100644 --- a/docs/clean-prep/string-matching.ipynb +++ b/docs/clean-prep/string-matching.ipynb @@ -40,7 +40,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 2. Imort" + "## 2. Import" ] }, { diff --git a/docs/data-processing/apis/grpc/index.rst b/docs/data-processing/apis/grpc/index.rst index 9edfab8b6..05e1303ec 100644 --- a/docs/data-processing/apis/grpc/index.rst +++ b/docs/data-processing/apis/grpc/index.rst @@ -114,7 +114,7 @@ specifies the communication between clients and servers: #. First the stream is started by the client with a mandatory ``Call Header`` #. followed by optional ``Initial-Metadata`` - #. followd by optional ``Payload Messages``. + #. followed by optional ``Payload Messages``. The contents of ``Call Header`` and ``Initial Metadata`` are sent as HTTP/2 headers compressed with ``HPACK``. diff --git a/docs/data-processing/apis/index.rst b/docs/data-processing/apis/index.rst index 83c98a6fc..f0f7b22c0 100644 --- a/docs/data-processing/apis/index.rst +++ b/docs/data-processing/apis/index.rst @@ -2,8 +2,8 @@ .. .. SPDX-License-Identifier: BSD-3-Clause -**A**\pplication **P**\rogramming **I**\nterface (API) -====================================================== +Application Programming Interface (API) +======================================= APIs can be used to provide the data. :doc:`fastapi/index` is a library that can generate APIs and documentation based on `OpenAPI `_ diff --git a/docs/data-processing/geodata.rst b/docs/data-processing/geodata.rst index 20b81a880..ed98871fb 100644 --- a/docs/data-processing/geodata.rst +++ b/docs/data-processing/geodata.rst @@ -473,7 +473,7 @@ Oceanography https://raster.shields.io/github/license/mvdh7/PyCO2SYS `pyoos `_ - High level data collection library for met/ocean data publically available. + High level data collection library for met/ocean data publicly available. .. image:: https://raster.shields.io/github/stars/ioos/pyoos diff --git a/docs/data-processing/nosql/column-oriented-db.rst b/docs/data-processing/nosql/column-oriented-db.rst index 456385e52..7fa990744 100644 --- a/docs/data-processing/nosql/column-oriented-db.rst +++ b/docs/data-processing/nosql/column-oriented-db.rst @@ -38,7 +38,7 @@ Examples of column-oriented database systems are :term:`Cassandra`, | | *Keyspaces* databases; no | | | | | logical structure, no scheme | | | +------------------------+--------------------------------+--------------------------------+--------------------------------+ -| **Query langauge** | `Cassandra Query Language | `Hypertable Query Language | Java Client API, Thrift/REST | +| **Query language** | `Cassandra Query Language | `Hypertable Query Language | Java Client API, Thrift/REST | | | (CQL)`_ | (HQL)`_ | API | +------------------------+--------------------------------+--------------------------------+--------------------------------+ | **Transactions, | :term:`Eventual Consistency` | :term:`MVCC – Multiversion | :term:`ACID` per line, | diff --git a/docs/data-processing/nosql/document-oriented-db.rst b/docs/data-processing/nosql/document-oriented-db.rst index 38cf6786a..a0f47820b 100644 --- a/docs/data-processing/nosql/document-oriented-db.rst +++ b/docs/data-processing/nosql/document-oriented-db.rst @@ -34,7 +34,7 @@ OrientDB and ArangoDB. | **Data model** | Flexible scheme with | Flexible scheme | Essentially | Multi-Model | Multi-model: documents, graphs | | | denormalised model | | :term:`Key/Value pair` | | and :term:`Key/value pair` | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ -| **Query langauge** | jQuery, :term:`MapReduce` | REST, :term:`MapReduce` | Key filter, :term:`MapReduce`, | `Gremlin`_ |`ArangoDB Query Language (AQL)`_| +| **Query language** | jQuery, :term:`MapReduce` | REST, :term:`MapReduce` | Key filter, :term:`MapReduce`, | `Gremlin`_ |`ArangoDB Query Language (AQL)`_| | | | | link walking, no ad-hoc | | | | | | | queries possible | | | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ diff --git a/docs/data-processing/nosql/graph-db.rst b/docs/data-processing/nosql/graph-db.rst index 3632da9c5..ffbfc7590 100644 --- a/docs/data-processing/nosql/graph-db.rst +++ b/docs/data-processing/nosql/graph-db.rst @@ -70,7 +70,7 @@ Typical graph databases are Neo4j, OrientDB and ArangoDB. | **Data model** | :term:`Property graph model` | Multi-Model | Multi-model: documents, graphs | | | | | and :term:`Key/value pair` | +------------------------+--------------------------------+--------------------------------+--------------------------------+ -| **Query langauge** | REST, `Cypher`_, `Gremlin`_ | `Extended SQL`_, `Gremlin`_ | ArangoDB Query Language (AQL)`_| +| **Query language** | REST, `Cypher`_, `Gremlin`_ | `Extended SQL`_, `Gremlin`_ | ArangoDB Query Language (AQL)`_| +------------------------+--------------------------------+--------------------------------+--------------------------------+ | **Transactions, |* :term:`Two-phase locking | :term:`ACID` | :term:`ACID`, | | concurrency** | (2PL)` | | :term:`MVCC – Multiversion | diff --git a/docs/data-processing/nosql/key-value-store.rst b/docs/data-processing/nosql/key-value-store.rst index aeee43d14..95d13c793 100644 --- a/docs/data-processing/nosql/key-value-store.rst +++ b/docs/data-processing/nosql/key-value-store.rst @@ -37,7 +37,7 @@ Key/value database systems are e.g. Riak, Cassandra, Redis and MongoDB. | | | to databases; no logical | lists, sets and sorted sets | | | | | structure, no scheme | | | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ -| **Query langauge** | Keyfilter, :term:`MapReduce`, | `Cassandra Query Language | | jQuery, :term:`MapReduce` | +| **Query language** | Keyfilter, :term:`MapReduce`, | `Cassandra Query Language | | jQuery, :term:`MapReduce` | | | Link walking, no ad hoc queries| (CQL)`_ | | | | | possible | | | | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ diff --git a/docs/data-processing/nosql/object-db.rst b/docs/data-processing/nosql/object-db.rst index 0894c6c77..740374820 100644 --- a/docs/data-processing/nosql/object-db.rst +++ b/docs/data-processing/nosql/object-db.rst @@ -70,7 +70,7 @@ Examples of object database systems are ZODB. | **Data model** | PersistentList, PersistentMapping, | | | BTree | +------------------------+----------------------------------------+ -| **Query langauge** | | +| **Query language** | | +------------------------+----------------------------------------+ | **Transactions, | :term:`ACID` | | concurrency** | | @@ -81,7 +81,7 @@ Examples of object database systems are ZODB. | **Remarks** | | +------------------------+----------------------------------------+ -.. _`ZODB`: hhttp://www.zodb.org/ +.. _`ZODB`: https://zodb.org/en/latest/ .. _`Objectivity/DB`: https://www.objectivity.com/products/objectivitydb/ .. _`zopefoundation/ZODB`: https://github.com/zopefoundation/ZODB .. _`zodb.org/en/latest/tutorial.html`: https://zodb.org/en/latest/tutorial.html diff --git a/docs/data-processing/nosql/xml-db.rst b/docs/data-processing/nosql/xml-db.rst index f3dc0ce2d..7b919bb98 100644 --- a/docs/data-processing/nosql/xml-db.rst +++ b/docs/data-processing/nosql/xml-db.rst @@ -30,7 +30,7 @@ Examples of XML database systems are eXist and MonetDB. +------------------------+------------------------------------------------+------------------------------------------------+------------------------------------------------+ | **Data model** | XML | XML, column-oriented data structure | XML | +------------------------+------------------------------------------------+------------------------------------------------+------------------------------------------------+ -| **Query langauge** | :term:`XQuery`, :term:`XPATH` | SQL | :term:`XQuery`, :term:`XPATH` | +| **Query language** | :term:`XQuery`, :term:`XPATH` | SQL | :term:`XQuery`, :term:`XPATH` | +------------------------+------------------------------------------------+------------------------------------------------+------------------------------------------------+ | **Transactions, | | :term:`Optimistic Concurrency ` | :term:`ACID`, XQuery Locks | | concurrency** | | | | diff --git a/docs/data-processing/pandas-io.rst b/docs/data-processing/pandas-io.rst index ad7eb6761..d86b65e5f 100644 --- a/docs/data-processing/pandas-io.rst +++ b/docs/data-processing/pandas-io.rst @@ -59,7 +59,7 @@ including +----------------------------------------------------+------------------------------------------------------------------------------------------+ | :doc:`pandas:reference/api/pandas.read_sql_table` | reads an entire SQL table (with | | | :doc:`postgresql/sqlalchemy`) as a pandas DataFrame | -| | (corresponds to a query that selects everything Rin this | +| | (corresponds to a query that selects everything in this | | | table with ``read_sql``) | +----------------------------------------------------+------------------------------------------------------------------------------------------+ | :doc:`pandas:reference/api/pandas.read_stata` | reads a data set from the | diff --git a/docs/data-processing/postgresql/index.rst b/docs/data-processing/postgresql/index.rst index 3b5acbbaf..42e6ad911 100644 --- a/docs/data-processing/postgresql/index.rst +++ b/docs/data-processing/postgresql/index.rst @@ -5,8 +5,8 @@ PostgreSQL ========== -Basic funtions --------------- +Basic functions +--------------- ACID compliant ACID (**A** tomicity, **C** onsistency, **I** solation, **D** urability) is diff --git a/docs/data-processing/postgresql/pganalyze.rst b/docs/data-processing/postgresql/pganalyze.rst index 37b9145d4..d9f7b9f41 100644 --- a/docs/data-processing/postgresql/pganalyze.rst +++ b/docs/data-processing/postgresql/pganalyze.rst @@ -136,7 +136,7 @@ The result can then look like this, for example: 2021/02/06 06:40:19 I [server1] Log test successful If the test was successful, the *Collector* must be restarted for the -confiugration to take effect: +configuration to take effect: .. code-block:: console diff --git a/docs/data-processing/serialisation-formats/json/index.rst b/docs/data-processing/serialisation-formats/json/index.rst index 649ff690e..9f1b1b514 100644 --- a/docs/data-processing/serialisation-formats/json/index.rst +++ b/docs/data-processing/serialisation-formats/json/index.rst @@ -17,7 +17,7 @@ Overview | | | JavaScript: ``NaN`` and ``Infinity`` become ``null``. | | | | | | | | Note that the JSON syntax also don’t support comments | -| | | and you have to work arround for example with a | +| | | and you have to work around for example with a | | | | ``__comment__`` key/value pair. | +-----------------------+-------+-------------------------------------------------------+ | Standardisation | \+ | JSON has a formal strongly typed `standard`_ (see | diff --git a/docs/data-processing/serialisation-formats/protobuf.rst b/docs/data-processing/serialisation-formats/protobuf.rst index 6a4af2bc9..cc16679c4 100644 --- a/docs/data-processing/serialisation-formats/protobuf.rst +++ b/docs/data-processing/serialisation-formats/protobuf.rst @@ -20,7 +20,7 @@ Overview | Language support | ++ | The protobuf format is well supported by many | | | | programming languages. | +-----------------------+-------+-------------------------------------------------------+ -| Human readability | -\- | Protobuf ist not designed to be human readable. | +| Human readability | -\- | Protobuf is not designed to be human readable. | +-----------------------+-------+-------------------------------------------------------+ | Speed | ++ | Protobuf is very fast, especially in C++. | +-----------------------+-------+-------------------------------------------------------+ diff --git a/docs/data-processing/serialisation-formats/xml-html/beautifulsoup.ipynb b/docs/data-processing/serialisation-formats/xml-html/beautifulsoup.ipynb index 40d863da8..38aebc075 100644 --- a/docs/data-processing/serialisation-formats/xml-html/beautifulsoup.ipynb +++ b/docs/data-processing/serialisation-formats/xml-html/beautifulsoup.ipynb @@ -70,7 +70,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "4. To structure the code, we create a new function `get_dom` (**D**ocument **O**bject **M**odel) that includes all the previous code:" + "4. To structure the code, we create a new function `get_dom` (Document Object Model) that includes all the previous code:" ] }, { diff --git a/docs/data-processing/serialisation-formats/yaml/index.rst b/docs/data-processing/serialisation-formats/yaml/index.rst index 333b3431d..85c2d4218 100644 --- a/docs/data-processing/serialisation-formats/yaml/index.rst +++ b/docs/data-processing/serialisation-formats/yaml/index.rst @@ -14,7 +14,7 @@ Overview | | | floats and dates. YAML even supports references and | | | | external data. | +-----------------------+-------+-------------------------------------------------------+ -| Standardisation | \+ | YAML is a strongly tpyed formal standard, but it’s | +| Standardisation | \+ | YAML is a strongly typed formal standard, but it’s | | | | hard to find schema validators. | +-----------------------+-------+-------------------------------------------------------+ | Schema-IDL | +- | Partly with `Kwalify`_, `Rx`_ and built-in language | diff --git a/docs/genindex.rst b/docs/genindex.rst index c49e92140..5355836b5 100644 --- a/docs/genindex.rst +++ b/docs/genindex.rst @@ -2,7 +2,7 @@ .. .. SPDX-License-Identifier: BSD-3-Clause -.. Workarround for displaying the index in the toc +.. Workaround for displaying the index in the toc Index ===== diff --git a/docs/performance/asyncio-example.ipynb b/docs/performance/asyncio-example.ipynb index 0ec30d195..7ff8ab080 100644 --- a/docs/performance/asyncio-example.ipynb +++ b/docs/performance/asyncio-example.ipynb @@ -18,7 +18,7 @@ "source": [ "If you get `RuntimeError: This event loop is already running`, [nest-asyncio] might help you.\n", "\n", - "Ihr könnt das Paket installieren mit\n", + "You can install the package with\n", "\n", "``` bash\n", "$ pipenv install nest-asyncio\n", @@ -246,7 +246,7 @@ "source": [ "### Third-party libraries\n", "\n", - "* [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) has helpfull things like fixtures for `event_loop`, `unused_tcp_port`, and `unused_tcp_port_factory`; and the ability to create your own [asynchronous fixtures](https://pytest-asyncio.readthedocs.io/en/latest/reference/fixtures/index.html).\n", + "* [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) has helpful things like fixtures for `event_loop`, `unused_tcp_port`, and `unused_tcp_port_factory`; and the ability to create your own [asynchronous fixtures](https://pytest-asyncio.readthedocs.io/en/latest/reference/fixtures/index.html).\n", "* [asynctest](https://asynctest.readthedocs.io/en/latest/index.html) has helpful tooling, including coroutine mocks and [exhaust_callbacks](https://asynctest.readthedocs.io/en/latest/asynctest.helpers.html#asynctest.helpers.exhaust_callbacks) so we don’t have to manually await tasks.\n", "* [aiohttp](https://docs.aiohttp.org/en/stable/) has some really nice built-in test utilities." ] diff --git a/docs/productive/envs/index.rst b/docs/productive/envs/index.rst index a7f7626cd..52d33b3c6 100644 --- a/docs/productive/envs/index.rst +++ b/docs/productive/envs/index.rst @@ -13,7 +13,7 @@ the Python package manager :term:`pip`, the call would look like this: $ python -m pip install --no-deps --require-hashes ----only-binary=:all: Dedicated environments (for example with :doc:`pipenv/index`, :term:`devpi` and -:doc:`Spack ` simplify this if you save the file with ther +:doc:`Spack ` simplify this if you save the file with their specifications, for example ``Pipfile``, ``Pipfile.lock``, ``package-lock.json`` :abbr:`etc (et cetera)`. In this way, you and others can reproduce the environments. diff --git a/docs/productive/git/advanced/bisect.rst b/docs/productive/git/advanced/bisect.rst index 6bfc3264b..fa50e6dd5 100644 --- a/docs/productive/git/advanced/bisect.rst +++ b/docs/productive/git/advanced/bisect.rst @@ -48,8 +48,8 @@ This means that only log₂(n+1) commits need to be tested. $ git show HEAD commit 2ddcca36c8bcfa251724fe342c8327451988be0d - Autor: Linus Torvalds - Datum: Sa 3. Mai 11:59:44 2008 -0700 + Author: Linus Torvalds + Date: Sat May 3 11:59:44 2008 -0700 Linux 2.6.26-rc1 @@ -64,7 +64,7 @@ This means that only log₂(n+1) commits need to be tested. -EXTRAVERSION = + SUBLEVEL = 26 + EXTRAVERSION = -rc1 - NAME = Funky Weasel ist Jiggy wit it + NAME = Funky Weasel is Jiggy with it # * DOKUMENTATION * diff --git a/docs/productive/git/advanced/hooks/index.rst b/docs/productive/git/advanced/hooks/index.rst index 199f2dbf9..1460c2ee9 100644 --- a/docs/productive/git/advanced/hooks/index.rst +++ b/docs/productive/git/advanced/hooks/index.rst @@ -11,13 +11,13 @@ in a Git repository, including: +---------------+-------------------------------------------------------+ | Command | Hook | +===============+=======================================================+ -| ``comit`` | ``comit-msg``, ``pre-commit`` | +| ``commit`` | ``commit-msg``, ``pre-commit`` | +---------------+-------------------------------------------------------+ -| ``merge`` | ``pre-merge``, ``comit-msg`` | +| ``merge`` | ``pre-merge``, ``commit-msg`` | +---------------+-------------------------------------------------------+ | ``rebase`` | ``pre-rebase`` | +---------------+-------------------------------------------------------+ -| ``pull`` | ``pre-merge``, ``comit-msg`` | +| ``pull`` | ``pre-merge``, ``commit-msg`` | +---------------+-------------------------------------------------------+ | ``push`` | ``pre-push`` | +---------------+-------------------------------------------------------+ diff --git a/docs/productive/git/advanced/jupyter-notebooks.rst b/docs/productive/git/advanced/jupyter-notebooks.rst index c318fde22..9994fa9b6 100644 --- a/docs/productive/git/advanced/jupyter-notebooks.rst +++ b/docs/productive/git/advanced/jupyter-notebooks.rst @@ -191,7 +191,7 @@ Set up *.ipynb filter=nbstrip_jq -#. If you then use ``git add`` to add your notebok to the stage area, the +#. If you then use ``git add`` to add your notebook to the stage area, the ``nbstrip_jq`` filter will be applied. .. note:: diff --git a/docs/productive/git/advanced/vs-code/index.rst b/docs/productive/git/advanced/vs-code/index.rst index 9c9e2983b..9e232fbce 100644 --- a/docs/productive/git/advanced/vs-code/index.rst +++ b/docs/productive/git/advanced/vs-code/index.rst @@ -48,7 +48,7 @@ select changes. If necessary, you will receive more specific commit actions in undo it with :menuselection:`Git: Undo Last Commit` in the :menuselection:`Command Palette` (:kbd:`⇧ ⌘ P`). -The sorce control icon in the activity bar on the left shows you how many +The source control icon in the activity bar on the left shows you how many changes you have made in your repository. Selecting the icon will give you a more detailed overview of your changes. Selecting a single file will show you the line-by-line text changes. You can also use the editor on the right to make diff --git a/docs/productive/git/install-config.rst b/docs/productive/git/install-config.rst index d2c60e043..517bf5aa7 100644 --- a/docs/productive/git/install-config.rst +++ b/docs/productive/git/install-config.rst @@ -355,12 +355,12 @@ among others: | /instance.log | :file:`logs/instance.log` | in the root directory of the | | | | repository. | +-------------------------------+-----------------------------------+-------------------------------+ -| .. code-block:: console | :file:`instance.log`, | Usualy the pattern match | +| .. code-block:: console | :file:`instance.log`, | Usually the pattern match | | | :file:`logs/instance.log` | files in any directory. | | instance.log | | | +-------------------------------+-----------------------------------+-------------------------------+ | .. code-block:: console | :file:`instance0.log`, | A question mark fits exactly | -| | :file:`instance1.log`, | on a charater. | +| | :file:`instance1.log`, | on a character. | | instance?.log | but not | | | | :file:`instance.log` or | | | | :file:`instance10.log` | | diff --git a/docs/productive/git/review.rst b/docs/productive/git/review.rst index 2bc9ad71f..c01f009b3 100644 --- a/docs/productive/git/review.rst +++ b/docs/productive/git/review.rst @@ -223,12 +223,12 @@ help. For relative timestamps you can use ``--date=relative``: .. code-block:: console $ git reflog --date=relative - 12bc4d4 (HEAD -> main, my-feature) HEAD@{vor 37 Minuten}: merge my-feature-branch: Fast-forward - 900844a HEAD@{vor 37 Minuten}: checkout: moving from my-feature-branch to main - 12bc4d4 (HEAD -> main, my-feature-branch) HEAD@{vor 37 Minuten}: commit (amend): Add my feature and more - 982d93a HEAD@{vor 38 Minuten}: commit: Add my feature - 900844a HEAD@{vor 39 Minuten}: checkout: moving from main to my-feature-branch - 900844a HEAD@{vor 40 Minuten}: commit (initial): Initial commit + 12bc4d4 (HEAD -> main, my-feature) HEAD@{37 minutes ago}: merge my-feature-branch: Fast-forward + 900844a HEAD@{37 minutes ago}: checkout: moving from my-feature-branch to main + 12bc4d4 (HEAD -> main, my-feature-branch) HEAD@{37 minutes ago}: commit (amend): Add my feature and more + 982d93a HEAD@{38 minutes ago}: commit: Add my feature + 900844a HEAD@{39 minutes ago}: checkout: moving from main to my-feature-branch + 900844a HEAD@{40 minutes ago}: commit (initial): Initial commit And for absolute timestamps you can also use ``--date=iso``: diff --git a/docs/productive/git/workflows/split-repos.rst b/docs/productive/git/workflows/split-repos.rst index 71eb867b2..0516324dd 100644 --- a/docs/productive/git/workflows/split-repos.rst +++ b/docs/productive/git/workflows/split-repos.rst @@ -11,7 +11,7 @@ Splitting repos It is often useful to divide a large Git repository into multiple smaller ones. This can be necessary in a project that has grown over time, or if you want to manage a sub-project in a separate repository. -Of couse you could simply create a new repository and copy the files, +Of course you could simply create a new repository and copy the files, but you would also loose the entire version history. Here I describe how you can split a Git repository without losing the associated diff --git a/docs/productive/qa/black.rst b/docs/productive/qa/black.rst index 1b66180b6..5f951d584 100644 --- a/docs/productive/qa/black.rst +++ b/docs/productive/qa/black.rst @@ -9,9 +9,9 @@ Black deterministic format. .. seealso:: - Was lesbaren Code auszeichnet, ist gut beschrieben im Trey Hunners Blog-Post - `Craft Your Python Like Poetry - `_. + What characterises readable code is well described in Trey Hunner's blog post + `Craft Your Python Like Poetry + `_. Installation ------------ diff --git a/docs/productive/qa/requests/adapters.py b/docs/productive/qa/requests/adapters.py index e4f571609..5fbc329bc 100644 --- a/docs/productive/qa/requests/adapters.py +++ b/docs/productive/qa/requests/adapters.py @@ -195,7 +195,7 @@ def init_poolmanager( maxsize=maxsize, block=block, strict=True, - **pool_kwargs + **pool_kwargs, ) def proxy_manager_for(self, proxy, **proxy_kwargs): @@ -221,7 +221,7 @@ def proxy_manager_for(self, proxy, **proxy_kwargs): num_pools=self._pool_connections, maxsize=self._pool_maxsize, block=self._pool_block, - **proxy_kwargs + **proxy_kwargs, ) else: proxy_headers = self.proxy_headers(proxy) @@ -231,7 +231,7 @@ def proxy_manager_for(self, proxy, **proxy_kwargs): num_pools=self._pool_connections, maxsize=self._pool_maxsize, block=self._pool_block, - **proxy_kwargs + **proxy_kwargs, ) return manager diff --git a/docs/productive/qa/requests/sessions.py b/docs/productive/qa/requests/sessions.py index 610560df4..f73068a60 100644 --- a/docs/productive/qa/requests/sessions.py +++ b/docs/productive/qa/requests/sessions.py @@ -166,7 +166,7 @@ def resolve_redirects( cert=None, proxies=None, yield_requests=False, - **adapter_kwargs + **adapter_kwargs, ): """Receives a Response. Returns a generator of Responses or Requests.""" @@ -270,7 +270,7 @@ def resolve_redirects( cert=cert, proxies=proxies, allow_redirects=False, - **adapter_kwargs + **adapter_kwargs, ) extract_cookies_to_jar(self.cookies, prepared_request, resp.raw) diff --git a/docs/productive/qa/requests/utils.py b/docs/productive/qa/requests/utils.py index b9127529a..5a26ea858 100644 --- a/docs/productive/qa/requests/utils.py +++ b/docs/productive/qa/requests/utils.py @@ -176,7 +176,7 @@ def super_len(o): current_position = total_length else: if hasattr(o, "seek") and total_length is None: - # StringIO and BytesIO have seek but no useable fileno + # StringIO and BytesIO have seek but no usable fileno try: # seek to end of file o.seek(0, 2) diff --git a/docs/productive/security.rst b/docs/productive/security.rst index 12daf81c3..d43820009 100644 --- a/docs/productive/security.rst +++ b/docs/productive/security.rst @@ -136,8 +136,7 @@ security-oriented best practices for open source software development: * at least one static code analysis tool applied to each planned major production release -You can also get a corresponding badge with the `OpenSSF Best Practices Badge -Programm `_. +You can also get a corresponding badge with the `OpenSSF Best Practices Badge Program `_. Continuous testing ------------------ diff --git a/docs/workspace/ipython/display.ipynb b/docs/workspace/ipython/display.ipynb index 90b69ffc3..5292dabc2 100644 --- a/docs/workspace/ipython/display.ipynb +++ b/docs/workspace/ipython/display.ipynb @@ -204,7 +204,7 @@ { "data": { "application/javascript": [ - "alert(\"Dies ist ein Beispiel für eine durch IPython angezeigte Javascript-Warnung.\")" + "alert(\"This is an example of a Javascript warning displayed by IPython.\")" ], "text/plain": [ "" @@ -219,7 +219,7 @@ "\n", "\n", "welcome = Javascript(\n", - " 'alert(\"Dies ist ein Beispiel für eine durch IPython angezeigte Javascript-Warnung.\")'\n", + " 'alert(\"This is an example of a Javascript warning displayed by IPython.\")'\n", ")\n", "display(welcome)" ] diff --git a/docs/workspace/ipython/importing.ipynb b/docs/workspace/ipython/importing.ipynb index e446c4b34..546ba1d1a 100644 --- a/docs/workspace/ipython/importing.ipynb +++ b/docs/workspace/ipython/importing.ipynb @@ -315,7 +315,7 @@ { "data": { "application/javascript": [ - "alert(\"Dies ist ein Beispiel für eine durch IPython angezeigte Javascript-Warnung.\")" + "alert(\"This is an example of a Javascript warning displayed by IPython.\")" ], "text/plain": [ "" diff --git a/docs/workspace/ipython/unix-shell/create-delete.ipynb b/docs/workspace/ipython/unix-shell/create-delete.ipynb index dcaeca7b0..aa11cfbab 100644 --- a/docs/workspace/ipython/unix-shell/create-delete.ipynb +++ b/docs/workspace/ipython/unix-shell/create-delete.ipynb @@ -232,7 +232,7 @@ "id": "peripheral-timer", "metadata": {}, "source": [ - "## Transfering files" + "## Transferring files" ] }, { @@ -286,7 +286,7 @@ "* `-D` targets only the following domain name\n", "* `-nH` avoids creating a subdirectory for the websites content\n", "* `-m` mirrors with time stamping, time stamping, infinite recursion depth, and preservation of FTP directory settings\n", - "* `-q` supresses the output to the screen" + "* `-q` suppresses the output to the screen" ] }, { diff --git a/docs/workspace/ipython/unix-shell/grep-find.ipynb b/docs/workspace/ipython/unix-shell/grep-find.ipynb index 77f2ee102..d650111f8 100644 --- a/docs/workspace/ipython/unix-shell/grep-find.ipynb +++ b/docs/workspace/ipython/unix-shell/grep-find.ipynb @@ -228,7 +228,7 @@ "id": "e17d432c", "metadata": {}, "source": [ - "With `-type f` the search ist restricted to files." + "With `-type f` the search is restricted to files." ] }, { diff --git a/pyproject.toml b/pyproject.toml index 9c630bc9b..cbd20d500 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -31,6 +31,7 @@ docs = [ dev = [ "Python4DataScience[docs]", "pre-commit", + "codespell", ] [project.urls] @@ -39,3 +40,7 @@ dev = [ [tool.setuptools] packages = [] + +[tool.codespell] +skip = "*.csv, *.pdf, *.ipynb, ./docs/_build/*, ./styles/*" +ignore-words-list = "fo, AAS, ans, Groth, Ned, Redict, redict, reStructedText, splitted" From 2421f1e1949d696ac11cd1a104834cba678fef23 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Tue, 10 Sep 2024 22:29:30 +0200 Subject: [PATCH 2/7] :wrench: Add vale * Fix spelling mistakes --- .gitignore | 3 + .vale.ini | 11 ++- README.rst | 8 +- .../nosql/document-oriented-db.rst | 4 +- docs/performance/index.rst | 2 +- docs/productive/cite/software/doi.rst | 4 +- docs/productive/cite/software/hermes.rst | 6 +- docs/productive/cite/software/index.rst | 1 - docs/productive/dvc/metrics.rst | 2 +- docs/productive/git/best-practices.rst | 8 +- docs/productive/git/install-config.rst | 2 +- docs/productive/qa/pysa.rst | 2 +- docs/productive/security.rst | 8 +- pyproject.toml | 1 + styles/.gitignore | 3 + styles/cusy/Polite.yml | 11 --- styles/cusy/Spelling.yml | 11 --- styles/cusy/ignore.txt | 87 ------------------- 18 files changed, 35 insertions(+), 139 deletions(-) create mode 100644 styles/.gitignore delete mode 100644 styles/cusy/Polite.yml delete mode 100644 styles/cusy/Spelling.yml delete mode 100644 styles/cusy/ignore.txt diff --git a/.gitignore b/.gitignore index 81d946e0b..c9a61b838 100644 --- a/.gitignore +++ b/.gitignore @@ -30,3 +30,6 @@ panel-examples deploy-panel.html test.png pyviz.pkl + +# vale +styles/* diff --git a/.vale.ini b/.vale.ini index dc6f73e2c..64878020c 100644 --- a/.vale.ini +++ b/.vale.ini @@ -2,12 +2,11 @@ # # SPDX-License-Identifier: BSD-3-Clause -StylesPath = ./styles +StylesPath = styles MinAlertLevel = suggestion -[*.{md,rst}] -BasedOnStyles = cusy +Packages = https://github.com/cusyio/cusy-vale/archive/refs/tags/v0.1.0.zip -vale.Redundancy = YES -vale.Repetition = YES -vale.GenderBias = YES +[*.{md,rst}] +TokenIgnores = (:linenos:) +BasedOnStyles = cusy-en diff --git a/README.rst b/README.rst index d1b51ce8c..1cf603e61 100644 --- a/README.rst +++ b/README.rst @@ -130,7 +130,7 @@ Installation You can find the PDF at ``docs/_build/latex/jupytertutorial.pdf``. -#. Install vnd run ale to check spelling +#. Install and run ale to check spelling You can install Vale with: @@ -138,7 +138,7 @@ Installation $ brew install vale - You can install the parser for Restructuredtext with: + You can install the parser for reStructuredText with: .. code-block:: console @@ -148,7 +148,7 @@ Installation * `Vale installation `_ * `Vale formats `_ - Now you can check the RestructuredText files with: + Now you can check the reStructuredText files with: .. code-block:: console @@ -176,6 +176,6 @@ suggestions. The following guidelines help us to maintain the German translation of the tutorial: -* Write commit messages in Englisch +* Write commit messages in English * Start commit messages with a `Gitmoji `__ * Stick to English names of files and folders. diff --git a/docs/data-processing/nosql/document-oriented-db.rst b/docs/data-processing/nosql/document-oriented-db.rst index a0f47820b..8e2477e02 100644 --- a/docs/data-processing/nosql/document-oriented-db.rst +++ b/docs/data-processing/nosql/document-oriented-db.rst @@ -45,8 +45,8 @@ OrientDB and ArangoDB. | | |* distributed systems: | | | | | | | :term:`BASE` | | | | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ -| **Replication, | Master-Slave replikation, | Master-master replication | Multi-master replication | Multi-Master-Replikation, | Master-slave replication, | -| skaling** | Auto-Sharding | | | Sharding | sharding | +| **Replication, | Master-Slave replication, | Master-master replication | Multi-master replication | Multi-Master-Replication, | Master-slave replication, | +| scaling** | Auto-Sharding | | | Sharding | sharding | +------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+ | **Remarks** | `BSON` with a maximum | | | | | | | document size of 16 MB. | | | | | diff --git a/docs/performance/index.rst b/docs/performance/index.rst index 3ba1c8339..30bdf5302 100644 --- a/docs/performance/index.rst +++ b/docs/performance/index.rst @@ -199,7 +199,7 @@ Special data structures Select compiler --------------- -Faster Cpython +Faster CPython ~~~~~~~~~~~~~~ At PyCon US in May 2021, Guido van Rossum presented `Faster CPython diff --git a/docs/productive/cite/software/doi.rst b/docs/productive/cite/software/doi.rst index 024f36800..b52f06b83 100644 --- a/docs/productive/cite/software/doi.rst +++ b/docs/productive/cite/software/doi.rst @@ -17,7 +17,7 @@ example of the Jupyter tutorial: Object Identifier)` for your upload. Leave the form open to upload your software later. -#. Create or modify the :doc:`codemeta`- und :doc:`cff` files in your software +#. Create or modify the :doc:`codemeta`- and :doc:`cff` files in your software directory. #. Include the badge in the :file:`README` file of your software: @@ -49,4 +49,4 @@ example of the Jupyter tutorial: #. Create a new release: .. figure:: github-release.png - :alt: Github releases + :alt: GitHub releases diff --git a/docs/productive/cite/software/hermes.rst b/docs/productive/cite/software/hermes.rst index 669fd474b..d7bfbee78 100644 --- a/docs/productive/cite/software/hermes.rst +++ b/docs/productive/cite/software/hermes.rst @@ -44,10 +44,10 @@ repositories. token `_ in your user profile with the name :samp:`HERMES workflow` and the scopes - :guilabel:`deposit:actions` und :guilabel:`deposit:write`: + :guilabel:`deposit:actions` and :guilabel:`deposit:write`: .. image:: zenodo-personal-access-token.png - :alt: Zenodo: Neues persönliches Zugangstoken + :alt: Zenodo: New personal access token #. Copy the newly created token to a new `GitHub secret `_ @@ -55,7 +55,7 @@ repositories. Variables --> Actions --> New repository secret`: .. image:: github-new-action-secret.png - :alt: GitHub: Neues Action-Secret + :alt: GitHub: New action secret #. Configure the GitHub action diff --git a/docs/productive/cite/software/index.rst b/docs/productive/cite/software/index.rst index 5f0a33521..8c7574f52 100644 --- a/docs/productive/cite/software/index.rst +++ b/docs/productive/cite/software/index.rst @@ -93,7 +93,6 @@ Tools :doc:`git2prov` generates PROV data from the information in a Git repository. - generiert PROV-Daten aus den Informationen eines Git-Repository. :doc:`hermes` simplifies the publication of research software by continuously retrieving existing metadata in :doc:`cff`, :doc:`codemeta` and :doc:`Git diff --git a/docs/productive/dvc/metrics.rst b/docs/productive/dvc/metrics.rst index 6fb16ca51..1a0f9fdac 100644 --- a/docs/productive/dvc/metrics.rst +++ b/docs/productive/dvc/metrics.rst @@ -11,7 +11,7 @@ experiments. `evaluate.py `_ -calculates the AUC (**A** rea **U** nder the **C** urve). It uses the test data +calculates the :abbr:`AUC (Area Under the Curve)`. It uses the test data set, reads the features from the file ``features/test.pkl`` and creates the metrics file ``auc.metric``. It can be identified as a DVC metric with the ``-M`` option of `dvc run `_, diff --git a/docs/productive/git/best-practices.rst b/docs/productive/git/best-practices.rst index 78ca3b810..be56c9590 100644 --- a/docs/productive/git/best-practices.rst +++ b/docs/productive/git/best-practices.rst @@ -243,7 +243,7 @@ can activate it globally with: $ git config --global commit.cleanup scissors -Git then starts each new commit message with the *Scissorsr* line: +Git then starts each new commit message with the *Scissors* line: .. code-block:: ini @@ -282,8 +282,8 @@ You should perform the following maintenance work regularly: Validate the repo ~~~~~~~~~~~~~~~~~ -The command ``git fsck`` checks whether all objects in the internal -datastructure of git are consistently connected with each other. +The command ``git fsck`` checks whether all objects in the internal data +structure of git are consistently connected with each other. Compresses the repo ~~~~~~~~~~~~~~~~~~~ @@ -332,7 +332,7 @@ template, for example, in a stage called ``secrets-detection`` in your The template creates secret detection jobs in your CI/CD pipeline and searches the source code of your project for secrets. The results are saved as a `Secret -Detection Report Artefakt +Detection Report Artifact `_ that you can download and analyse later. diff --git a/docs/productive/git/install-config.rst b/docs/productive/git/install-config.rst index 517bf5aa7..53a37ecbb 100644 --- a/docs/productive/git/install-config.rst +++ b/docs/productive/git/install-config.rst @@ -73,7 +73,7 @@ Specify your name and email address as follows: :samp:`$ git config --global user.email "{EMAIL-ADDRESS}"` defines the email address :samp:`{EMAIL-ADDRESS}` that will be linked to your commit transactions. -For better readability, activate the coloring of the command line output: +For better readability, activate the colouring of the command line output: :samp:`$ git config --global color.ui auto` diff --git a/docs/productive/qa/pysa.rst b/docs/productive/qa/pysa.rst index 505d85dc1..c27d38c6d 100644 --- a/docs/productive/qa/pysa.rst +++ b/docs/productive/qa/pysa.rst @@ -54,7 +54,7 @@ Pyre can be called, for example with The ``--save-results-to`` option stores detailed results in ``./taint-output.json``. -Pysa postprozessor +Pysa postprocessor ------------------ Installation diff --git a/docs/productive/security.rst b/docs/productive/security.rst index d43820009..9985f6b20 100644 --- a/docs/productive/security.rst +++ b/docs/productive/security.rst @@ -74,11 +74,11 @@ tells you that you should investigate the situation more closely. You can also display the activities of a project with badges, for example: .. image:: https://img.shields.io/github/commit-activity/y/veit/python4datascience - :alt: Jährliche Commit-Aktivität + :alt: Annual commit activity .. image:: https://img.shields.io/github/commit-activity/m/veit/python4datascience - :alt: Monatliche Commit-Aktivität + :alt: Monthly commit activity .. image:: https://img.shields.io/github/commit-activity/w/veit/python4datascience - :alt: Wöchentliche Commit-Aktivität + :alt: Weekly commit activity Is there a safety concept for the project? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -175,7 +175,7 @@ Risk: Medium `Static code analysis `_ tests the source code before the application is executed. This can prevent known -bug classes from being accidentally introduced into the codebase. +bug classes from being accidentally introduced into the code base. To check for vulnerabilities, you can use `bandit `_, which you can also integrate into your diff --git a/pyproject.toml b/pyproject.toml index cbd20d500..e075a26de 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -32,6 +32,7 @@ dev = [ "Python4DataScience[docs]", "pre-commit", "codespell", + "vale", ] [project.urls] diff --git a/styles/.gitignore b/styles/.gitignore new file mode 100644 index 000000000..888cacecc --- /dev/null +++ b/styles/.gitignore @@ -0,0 +1,3 @@ +# ignore everything except .gitignore +* +!.gitignore diff --git a/styles/cusy/Polite.yml b/styles/cusy/Polite.yml deleted file mode 100644 index 71369642c..000000000 --- a/styles/cusy/Polite.yml +++ /dev/null @@ -1,11 +0,0 @@ -# SPDX-FileCopyrightText: 2021 Veit Schiele -# -# SPDX-License-Identifier: BSD-3-Clause - -extends: existence -message: 'Do not use “%s” in technical documentation.' -level: warning -ignorecase: true -tokens: - - please - - thank you diff --git a/styles/cusy/Spelling.yml b/styles/cusy/Spelling.yml deleted file mode 100644 index a0ee6690c..000000000 --- a/styles/cusy/Spelling.yml +++ /dev/null @@ -1,11 +0,0 @@ -# SPDX-FileCopyrightText: 2021 Veit Schiele -# -# SPDX-License-Identifier: BSD-3-Clause - -extends: spelling -message: "Spelling check: '%s'?" -dicpath: /Library/Spelling -dictionaries: - - en_GB -level: warning -ignore: styles/cusy/ignore.txt diff --git a/styles/cusy/ignore.txt b/styles/cusy/ignore.txt deleted file mode 100644 index 03c1a53d3..000000000 --- a/styles/cusy/ignore.txt +++ /dev/null @@ -1,87 +0,0 @@ -Ansible -Apache -Appmode -Arcpy -Atlassian -attrs -Bitbucket -Bullard -California -Cassandra -cffi -Codecov -cookiecutter -Codacy -concretisation -crobarcro -Cusy -cyclomatic -Cython -Dask -DataCite -Dataverse -Distrowatch -DOI -dotfiles -dvc -Ecography -Esri -FastAPI -Gigascience -GitHub -GitLab -Gitleaks -Grafana -Graphviz -Hadoop -HoloViz -hotfix -Homebrew -Howison -Hstore -Hypertable -Intersphinx -Javascript -Jinja -Jupyter -Jython -Kaggle -Kotlin -Kwalify -Lua -Manylinux -Mapnik -mathjax -Matplotlib -Modin -multiscale -neurocomputing -neuroinformatics -monorepo -Magics -Mypy -Numba -Numpy -Pandoc -Pipenv -Plotly -Prettyfier -Protobuf -psychonomic -Psycopg -Pydantic -Pysa -Pytype -Redis -Riak -Roboflow -Rossum -sharding -slideshow -Spack -Starlette -Stata -Vue -Vuetify -Wireshark -Zenodo From 9948abe71d5c72b19938be8ba7dfe14f1b9875d2 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Thu, 26 Sep 2024 22:48:39 +0200 Subject: [PATCH 3/7] :memo: Update the file system libraries --- docs/data-processing/file-systems.rst | 298 +++++++++++++++++++ docs/data-processing/index.rst | 9 +- docs/data-processing/remote-file-systems.rst | 17 -- 3 files changed, 303 insertions(+), 21 deletions(-) create mode 100644 docs/data-processing/file-systems.rst delete mode 100644 docs/data-processing/remote-file-systems.rst diff --git a/docs/data-processing/file-systems.rst b/docs/data-processing/file-systems.rst new file mode 100644 index 000000000..13e8f3c79 --- /dev/null +++ b/docs/data-processing/file-systems.rst @@ -0,0 +1,298 @@ +.. SPDX-FileCopyrightText: 2021 Veit Schiele +.. +.. SPDX-License-Identifier: BSD-3-Clause + +File systems +============ + +High-level APIs +--------------- + +`PyFilesystem `_ + works with files and directories in archives, in storages, in the cloud, + :abbr:`etc (et cetera)`. + + .. image:: + https://raster.shields.io/github/stars/pyfilesystem/pyfilesystem2 + .. image:: + https://raster.shields.io/github/contributors/pyfilesystem/pyfilesystem2 + .. image:: + https://raster.shields.io/github/commit-activity/y/pyfilesystem/pyfilesystem2 + .. image:: + https://raster.shields.io/github/license/pyfilesystem/pyfilesystem2 + + Integrated file systems: + + * `AppFS `_ for predefined storage + locations in operating systems where applications can store data + * `FTPFS `_ for working with FTP + servers + * `MemoryFS `_ for caches, temporary data storage, unit tests, :abbr:`etc. (et cetera)` that exist in the + working memory + * `MountFS `_ for a virtual file + system that can mount other file systems + * `MultiFS `_ for a virtual file + system that combines other file systems + * `OSFS `_ for the OS file system + * `TarFS `_ reads and writes + compressed tar archives + * `TempFS `_ contains temporary + data + * `ZipFS `_ reads and writes Zip + files + + File systems of the PyFilesystem organization on GitHub: + + * `DropBoxFS `_ + * `S3FS `_ + * `WebDavFS `_ + + File systems from third-party developers: + + * `fs_basespace `_ for read access + to the Illumina Basespace + * `fs.dropboxfs `_ for Dropbox + * `fs.imapfs `_ for Imap + * `fs.googledrivefs `_ for + Google Drive + * `fs.onedrivefs `_ for OneDrive + * `fs.smbfs `_ for Samba + * `fs.sshfs `_ for SSH with + :ref:`paramiko` + * `mp-fs-wsgidav `_ for + WsgiDAV + +.. _fsspec: + +`fsspec `__ + Unified Python interface for many local, remote and embedded file systems + and byte storages. If you already use :doc:`/workspace/pandas/index`, + :doc:`/data-processing/intake/index`, :doc:`/performance/dask` or :doc:`DVC + ` in your project, for example, ``fsspec`` is already + available. + + .. image:: + https://raster.shields.io/github/stars/fsspec/filesystem_spec + .. image:: + https://raster.shields.io/github/contributors/fsspec/filesystem_spec + .. image:: + https://raster.shields.io/github/commit-activity/y/fsspec/filesystem_spec + .. image:: + https://raster.shields.io/github/license/fsspec/filesystem_spec + + In addition to the `integrated implementations + `_, + there are also many extensions, for example: + + * `abfs `_ for the Azure Blob Service + * `adl `_ for the Azure DataLake storage + * `alluxiofs `_ for the Alluxio + distributed cache + * `boxfs `_ for access to Box file storage + * `dropbox `_ for access to + Dropbox shares + * `dvc `_ for accessing a DVC repository + as a file system + * `gcsfs `_ for Google Cloud Storage + * `gdrive `_ for access to Google Drive + and shares + * `huggingface_hub + `_ + for access to the Hugging Face Hub file system + * `lakefs `_ for lakeFS + datalakes + * `ocifs `_ for access to the Oracle Cloud + Object Storage + * `ossfs `_ for the Alibaba Cloud (Aliyun) + object storage system (OSS) + * `p9fs `_ for :abbr:`9P (Plan 9 + Filesystem Protocol)` servers + * `s3fs `__ for Amazon S3 and other + compatible storage + * `wandbfs `_ for accessing Wandb data + * `webdav4 `_ for WebDAV + +.. seealso:: + `Rclone `_ is a command line programme for managing + files on cloud storage. It supports more than 70 cloud storages. You can find + an example of its use with Python in `rclone.py + `_. + +Specialised libraries +--------------------- + +`PyArrow `_ + Apache Arrow Python bindings for the `Hadoop Distributed File System (HDFS) + `_ + and other :ref:`fsspec`-compatible file systems. + + .. seealso:: + `Using fsspec-compatible filesystems with Arrow + `_ + + .. image:: + https://raster.shields.io/github/stars/apache/arrow + .. image:: + https://raster.shields.io/github/contributors/apache/arrow + .. image:: + https://raster.shields.io/github/commit-activity/y/apache/arrow + .. image:: + https://raster.shields.io/github/license/apache/arrow + +.. _paramiko: + +`paramiko `__ + Python implementation of the SSHv2 protocol, which offers both client and + server functions. It forms the basis for the high-level SSH library `Fabric + `_. + + .. image:: + https://raster.shields.io/github/stars/paramiko/paramiko + .. image:: + https://raster.shields.io/github/contributors/paramiko/paramiko + .. image:: + https://raster.shields.io/github/commit-activity/y/paramiko/paramiko + .. image:: + https://raster.shields.io/github/license/paramiko/paramiko + +`boto3 `_ + AWS SDK for Python facilitates integration with Amazon S3, Amazon EC2, + Amazon DynamoDB and others. + + .. image:: + https://raster.shields.io/github/stars/boto/boto3 + .. image:: + https://raster.shields.io/github/contributors/boto/boto3 + .. image:: + https://raster.shields.io/github/commit-activity/y/boto/boto3 + .. image:: + https://raster.shields.io/github/license/boto/boto3 + +`azure-storage-blob `_ + Azure Storage Blobs client library for Python. + + .. image:: + https://raster.shields.io/github/stars/Azure/azure-sdk-for-python + .. image:: + https://raster.shields.io/github/contributors/Azure/azure-sdk-for-python + .. image:: + https://raster.shields.io/github/commit-activity/y/Azure/azure-sdk-for-python + .. image:: + https://raster.shields.io/github/license/Azure/azure-sdk-for-python + +`oss2 `_ + Python SDK for the Alibaba Cloud Object Storage. + + .. image:: + https://raster.shields.io/github/stars/aliyun/aliyun-oss-python-sdk + .. image:: + https://raster.shields.io/github/contributors/aliyun/aliyun-oss-python-sdk + .. image:: + https://raster.shields.io/github/commit-activity/y/aliyun/aliyun-oss-python-sdk + .. image:: + https://raster.shields.io/github/license/aliyun/aliyun-oss-python-sdk + +`minio `_ + MinIO Python Client SDK for Amazon S3 compatible cloud storage. + + .. image:: + https://raster.shields.io/github/stars/minio/minio-py + .. image:: + https://raster.shields.io/github/contributors/minio/minio-py + .. image:: + https://raster.shields.io/github/commit-activity/y/minio/minio-py + .. image:: + https://raster.shields.io/github/license/minio/minio-py + +`PyDrive2 `_ + Python wrapper library of the `google-api-python-client + `_, which simplifies + many common Google Drive API tasks. + + .. image:: + https://raster.shields.io/github/stars/iterative/PyDrive2 + .. image:: + https://raster.shields.io/github/contributors/iterative/PyDrive2 + .. image:: + https://raster.shields.io/github/commit-activity/y/iterative/PyDrive2 + .. image:: + https://raster.shields.io/github/license/iterative/PyDrive2 + +`Qcloud COSv5 SDK `_ + Python SDK for the Tencent Cloud Object Storage (COS). + + .. image:: + https://raster.shields.io/github/stars/tencentyun/cos-python-sdk-v5 + .. image:: + https://raster.shields.io/github/contributors/tencentyun/cos-python-sdk-v5 + .. image:: + https://raster.shields.io/github/commit-activity/y/tencentyun/cos-python-sdk-v5 + .. image:: + https://raster.shields.io/github/license/tencentyun/cos-python-sdk-v5 + +`linode_api4 `_ + Python bindings for the Linode API v4. + + .. image:: + https://raster.shields.io/github/stars/linode/linode_api4-python + .. image:: + https://raster.shields.io/github/contributors/linode/linode_api4-python + .. image:: + https://raster.shields.io/github/commit-activity/y/linode/linode_api4-python + .. image:: + https://raster.shields.io/github/license/linode/linode_api4-python + +`airfs `_ + brings standard Python I/O to various storages (such as Alibaba Cloud OSS, + Amazon Web Services S3, GitHub, Microsoft Azure Blobs Storage and Files + Storage, OpenStack Swift/Object Store. + + .. image:: + https://raster.shields.io/github/stars/JGoutin/airfs + .. image:: + https://raster.shields.io/github/contributors/JGoutin/airfs + .. image:: + https://raster.shields.io/github/commit-activity/y/JGoutin/airfs + .. image:: + https://raster.shields.io/github/license/JGoutin/airfs + +`yandex-s3 `_ + Asyncio-compatible SDK for Yandex Object Storage. + + .. image:: + https://raster.shields.io/github/stars/mrslow/yandex-s3 + .. image:: + https://raster.shields.io/github/contributors/mrslow/yandex-s3 + .. image:: + https://raster.shields.io/github/commit-activity/y/mrslow/yandex-s3 + .. image:: + https://raster.shields.io/github/license/mrslow/yandex-s3 + +Dormant projects +---------------- + +`PyDrive `_ + Python wrapper library of the `google-api-python-client + `_, which simplifies + many common Google Drive API tasks. + + .. image:: + https://raster.shields.io/github/stars/googlearchive/PyDrive + .. image:: + https://raster.shields.io/github/contributors/googlearchive/PyDrive + .. image:: + https://raster.shields.io/github/commit-activity/y/googlearchive/PyDrive + .. image:: + https://raster.shields.io/github/license/googlearchive/PyDrive + +`digital-ocean-spaces `_ + Python client for Digital Ocean Spaces with an inbuilt shell. + + .. image:: + https://raster.shields.io/github/stars/ChariotDev/digital-ocean-spaces + .. image:: + https://raster.shields.io/github/contributors/ChariotDev/digital-ocean-spaces + .. image:: + https://raster.shields.io/github/commit-activity/y/ChariotDev/digital-ocean-spaces + .. image:: + https://raster.shields.io/github/license/ChariotDev/digital-ocean-spaces diff --git a/docs/data-processing/index.rst b/docs/data-processing/index.rst index e7e2b2904..9d30ddf37 100644 --- a/docs/data-processing/index.rst +++ b/docs/data-processing/index.rst @@ -9,9 +9,10 @@ You can get an overview of public repositories with research data e.g. in :doc:`opendata`. In addition to specific Python libraries for accessing -:doc:`/data-processing/remote-file-systems` and :doc:`/data-processing/geodata`, we will -introduce you to different :doc:`serialisation formats ` and -three tools in more detail that make data accessible: +:doc:`/data-processing/file-systems` and :doc:`/data-processing/geodata`, we +will introduce you to different :doc:`serialisation formats +` and three tools in more detail that make data +accessible: * :doc:`/data-processing/pandas-io` * :doc:`httpx/index` @@ -61,7 +62,7 @@ Python packages to :doc:`clean up and validate data <../clean-prep/index>`. serialisation-formats/index intake/index httpx/index - remote-file-systems + file-systems geodata postgresql/index nosql/index diff --git a/docs/data-processing/remote-file-systems.rst b/docs/data-processing/remote-file-systems.rst deleted file mode 100644 index 843a55099..000000000 --- a/docs/data-processing/remote-file-systems.rst +++ /dev/null @@ -1,17 +0,0 @@ -.. SPDX-FileCopyrightText: 2022 Veit Schiele -.. -.. SPDX-License-Identifier: BSD-3-Clause - -Remote file systems -=================== - -`boto3 `_ - S3 -`azure-storage-blob `_ - Azure -`pydrive2 `_ - Google Drive -`paramiko `_ - SSH -`PyArrow `_ - HDFS From 3babb58be804a3ba4c41f36b863ee1c5a8d930d9 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Fri, 27 Sep 2024 08:48:53 +0200 Subject: [PATCH 4/7] :pencil2: Fix internal link --- docs/performance/parallelise-pandas.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/performance/parallelise-pandas.rst b/docs/performance/parallelise-pandas.rst index 7f8aada93..f3848a180 100644 --- a/docs/performance/parallelise-pandas.rst +++ b/docs/performance/parallelise-pandas.rst @@ -44,9 +44,9 @@ The restrictions refer to ``pd.read_json``, which is only implemented for Dask ---- -:ref:`Dask DataFrame ` is a large parallel DataFrame made up of -multiple pandas DataFrames. Here, the ``dask.dataframe`` API is a subset of the -pandas API, although there are minor changes. +:ref:`/performance/dask.ipynb#dask-dataframe` is a large parallel DataFrame made +up of multiple pandas DataFrames. Here, the ``dask.dataframe`` API is a subset +of the pandas API, although there are minor changes. .. seealso:: From cd1425a8fdb2f92135f54765e5530c11f361a831 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Tue, 1 Oct 2024 13:06:16 +0200 Subject: [PATCH 5/7] :arrow_up: Update sphinx-lint --- .pre-commit-config.yaml | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 84372006b..944e22022 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -21,6 +21,11 @@ repos: - id: check-json types: [file] # override `types: [json]` files: \.(json|ipynb)$ + - repo: https://github.com/sphinx-contrib/sphinx-lint + rev: v1.0.0 + hooks: + - id: sphinx-lint + types: [rst] - repo: https://github.com/pycqa/isort rev: 5.13.2 hooks: @@ -32,12 +37,6 @@ repos: rev: 24.8.0 hooks: - id: black - - repo: https://github.com/sphinx-contrib/sphinx-lint - rev: v0.9.1 - hooks: - - id: sphinx-lint - args: [--jobs=1] - types: [rst] - repo: https://github.com/codespell-project/codespell rev: v2.3.0 hooks: From 10fcee0749334f455d0ad00c2eeb771299bd4c63 Mon Sep 17 00:00:00 2001 From: Veit Schiele Date: Tue, 1 Oct 2024 13:08:18 +0200 Subject: [PATCH 6/7] :wrench: Add blacken-docs * Fix Python syntax --- .pre-commit-config.yaml | 7 +++++++ docs/data-processing/postgresql/db-api.rst | 4 ++++ docs/data-processing/postgresql/ipython-sql.rst | 10 +++++----- docs/productive/envs/pipenv/env.rst | 2 +- docs/productive/envs/spack/combinatorial-builds.rst | 4 ++++ docs/productive/git/advanced/bisect.rst | 4 ++++ docs/productive/qa/code-smells.rst | 6 +++--- docs/workspace/ipython/extensions.rst | 4 ++++ 8 files changed, 32 insertions(+), 9 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 944e22022..5a5a5dfc3 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -37,6 +37,13 @@ repos: rev: 24.8.0 hooks: - id: black + - repo: https://github.com/adamchainz/blacken-docs + rev: "1.18.0" + hooks: + - id: blacken-docs + args: [--line-length=79] + additional_dependencies: + - black - repo: https://github.com/codespell-project/codespell rev: v2.3.0 hooks: diff --git a/docs/data-processing/postgresql/db-api.rst b/docs/data-processing/postgresql/db-api.rst index 985ec2262..a7ce19928 100644 --- a/docs/data-processing/postgresql/db-api.rst +++ b/docs/data-processing/postgresql/db-api.rst @@ -26,6 +26,8 @@ Connection Example: + .. blacken-docs:off + .. code-block:: python import driver @@ -45,6 +47,8 @@ Connection conn.commit() conn.close() + .. blacken-docs:on + Cursor `Cursor objects `_ are used to manage the context of a ``.fetch*()`` method. diff --git a/docs/data-processing/postgresql/ipython-sql.rst b/docs/data-processing/postgresql/ipython-sql.rst index 914619943..41687c2ab 100644 --- a/docs/data-processing/postgresql/ipython-sql.rst +++ b/docs/data-processing/postgresql/ipython-sql.rst @@ -22,7 +22,7 @@ First steps #. First, ipython-sql is activated in your notebook with - .. code-block:: python + .. code-block:: ipython In [1]: %load_ext sql @@ -30,13 +30,13 @@ First steps `_ is used to connect to the database: - .. code-block:: python + .. code-block:: ipython In [2]: %sql postgresql:// #. Then you can create a table, for example: - .. code-block:: python + .. code-block:: ipython In [3]: %%sql postgresql:// ....: CREATE TABLE accounts (login, name, email) @@ -44,7 +44,7 @@ First steps #. You can query the contents of the ``accounts`` table with - .. code-block:: python + .. code-block:: ipython In [4]: result = %sql select * from accounts @@ -88,7 +88,7 @@ pandas If pandas is installed, the ``DataFrame`` method can be used: -.. code-block:: python +.. code-block:: ipython In [5]: result = %sql SELECT * FROM accounts diff --git a/docs/productive/envs/pipenv/env.rst b/docs/productive/envs/pipenv/env.rst index 284ace6fc..122a8f253 100644 --- a/docs/productive/envs/pipenv/env.rst +++ b/docs/productive/envs/pipenv/env.rst @@ -28,7 +28,7 @@ and ``$ pipenv run`` will automatically load it: Loading .env environment variables... … -.. code-block:: python +.. code-block:: pycon >>> import os >>> os.environ["USERNAME"] diff --git a/docs/productive/envs/spack/combinatorial-builds.rst b/docs/productive/envs/spack/combinatorial-builds.rst index 991a8f94d..21e597a92 100644 --- a/docs/productive/envs/spack/combinatorial-builds.rst +++ b/docs/productive/envs/spack/combinatorial-builds.rst @@ -163,6 +163,8 @@ Spack provides a ``spec`` syntax for describing custom DAGs: * Spack packages are simple Python scripts: + .. blacken-docs:off + .. code-block:: python from spack import * @@ -198,6 +200,8 @@ Spack provides a ``spec`` syntax for describing custom DAGs: make() make("install") + .. blacken-docs:on + * Dependencies in Spack can be optional: * You can define *named variants*, for example in diff --git a/docs/productive/git/advanced/bisect.rst b/docs/productive/git/advanced/bisect.rst index fa50e6dd5..c7cb7bdc0 100644 --- a/docs/productive/git/advanced/bisect.rst +++ b/docs/productive/git/advanced/bisect.rst @@ -130,6 +130,8 @@ complicated changes in behaviour. For performance tests, we need a test programme that can perform multiple runs and determine the minimum time while eliminating possible noise: +.. blacken-docs:off + .. code-block:: python from subprocess import run @@ -153,6 +155,8 @@ eliminating possible noise: print("Fast enough") raise SystemExit(0) +.. blacken-docs:on + The programme executes :samp:`python perftest.py {PARAM}` ten times and measures the time for each execution. It then compares the minimum execution time with a limit value of ``X`` seconds. If the minimum time is above the limit value, it diff --git a/docs/productive/qa/code-smells.rst b/docs/productive/qa/code-smells.rst index a8a8aa937..d3a79abcf 100644 --- a/docs/productive/qa/code-smells.rst +++ b/docs/productive/qa/code-smells.rst @@ -59,14 +59,14 @@ method, and ``get_thumbnail()`` a property: thumbnail_resolution = 128 def __init__(self, path): - ... + "..." def crop(self, width, height): - ... + "..." @property def thumbnail(self): - ... + "..." return thumb Objects that should be functions diff --git a/docs/workspace/ipython/extensions.rst b/docs/workspace/ipython/extensions.rst index f25a9a07c..be8edc39a 100644 --- a/docs/workspace/ipython/extensions.rst +++ b/docs/workspace/ipython/extensions.rst @@ -54,6 +54,8 @@ Writing IPython extensions An IPython extension is an importable Python module that has special functions for loading and unloading: +.. blacken-docs:off + .. code-block:: python def load_ipython_extension(ipython): @@ -64,5 +66,7 @@ for loading and unloading: def unload_ipython_extension(ipython): # If you want your extension to be unloadable, put that logic here. +.. blacken-docs:on + .. seealso:: * :label:`defining_magics` From 96dc51da01a5c002922dd758fb676bd0ab692f59 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 7 Oct 2024 23:52:16 +0000 Subject: [PATCH 7/7] [pre-commit.ci] pre-commit autoupdate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.6.0...v5.0.0) - [github.com/psf/black: 24.8.0 → 24.10.0](https://github.com/psf/black/compare/24.8.0...24.10.0) - [github.com/adamchainz/blacken-docs: 1.18.0 → 1.19.0](https://github.com/adamchainz/blacken-docs/compare/1.18.0...1.19.0) --- .pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 5a5a5dfc3..1817f8751 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -10,7 +10,7 @@ ci: repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.6.0 + rev: v5.0.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -34,11 +34,11 @@ repos: entry: isort --profile=black name: isort (python) - repo: https://github.com/psf/black - rev: 24.8.0 + rev: 24.10.0 hooks: - id: black - repo: https://github.com/adamchainz/blacken-docs - rev: "1.18.0" + rev: "1.19.0" hooks: - id: blacken-docs args: [--line-length=79]