Skip to content

miguelgfierro/pybase

Repository files navigation

Issues Commits Last commit Code style:black Python 3.7+supported

Linkedin Blog

Python pybase

This is a codebase for basic Python utilities.

Dependencies

To install the dependencies:

pip install -r requirements.txt

For setting up PySpark, make sure Java and Spark are available in the machine. Then, we need to set the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to point to the python executable.

Press to get the instructions for PySpark on Linux or MacOS in a Conda environment

To set these variables every time the environment is activated, we can follow the steps of this guide. First, get the path of the Conda environment pybase is installed:

CONDA_ENV=$(conda env list | grep pybase | awk '{print $NF}')

Then, create the file $CONDA_ENV/etc/conda/activate.d/env_vars.sh and add:

#!/bin/sh
CONDA_ENV=$(conda env list | grep pybase | awk '{print $NF}')
export PYSPARK_PYTHON=$CONDA_ENV/bin/python
export PYSPARK_DRIVER_PYTHON=$CONDA_ENV/bin/python
export SPARK_HOME_BACKUP=$SPARK_HOME
export SPARK_HOME=/home/root/installer/spark

This will export the variables every time we do conda activate pybase. To unset these variables when we deactivate the environment, create the file $CONDA_ENV/etc/conda/deactivate.d/env_vars.sh and add:

#!/bin/sh
unset PYSPARK_PYTHON
unset PYSPARK_DRIVER_PYTHON
export SPARK_HOME=$SPARK_HOME_BACKUP
unset SPARK_HOME_BACKUP
Press to get the instructions for PySpark on Windows in a Conda environment

To set these variables every time the environment is activated, we can follow the steps of this guide. First, get the path of the environment pybase is installed:

for /f "delims=" %A in ('conda env list ^| grep pybase ^| awk "{print $NF}"') do set "CONDA_ENV=%A"

Then, create the file %CONDA_ENV%\etc\conda\activate.d\env_vars.bat and add:

@echo off
for /f "delims=" %%A in ('conda env list ^| grep pybase ^| awk "{print $NF}"') do set "CONDA_ENV=%%A"
set PYSPARK_PYTHON=%CONDA_ENV%\python.exe
set PYSPARK_DRIVER_PYTHON=%CONDA_ENV%\python.exe
set SPARK_HOME_BACKUP=%SPARK_HOME%
set SPARK_HOME=
set PYTHONPATH_BACKUP=%PYTHONPATH%
set PYTHONPATH=

This will export the variables every time we do conda activate pybase. To unset these variables when we deactivate the environment, create the file %CONDA_ENV%\etc\conda\deactivate.d\env_vars.bat and add:

@echo off
set PYSPARK_PYTHON=
set PYSPARK_DRIVER_PYTHON=
set SPARK_HOME=%SPARK_HOME_BACKUP%
set SPARK_HOME_BACKUP=
set PYTHONPATH=%PYTHONPATH_BACKUP%
set PYTHONPATH_BACKUP=

See more details on how to install PySpark on Windows here.

Press to get the instructions for CUDA and CuDNN on Linux or MacOS

TODO

Press to get the instructions for CUDA and CuDNN on Windows
  1. Check the capability of your GPU here.
  2. Select the version of CUDA toolkit you want to download. The latest version can be found here.
  3. Download the corresponding CuDNN based on the CUDA version here.
  4. Copy three files from the unzipped directory to CUDA X.X install location. For reference, NVIDIA team has put them in their own directory. So all you have to do is to copy file from :
    • {unzipped dir}/bin/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\bin
    • {unzipped dir}/include/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\include
    • {unzipped dir}/lib/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\lib

See the full installation guide here.

Doctests

To execute the tests:

pytest --doctest-modules --continue-on-collection-errors --durations 0 --disable-warnings

To execute coverage and see the report:

coverage run playground.py
coverage report

To see more details on the result, the following command will generate a web where the coverage details can be examined line by line:

coverage html

To handle variable outputs in doctest you need to add at the end of the execution line #doctest: +ELLIPSIS and substitute the variable output with ... An example can be found in the file timer.py.

Original:

>>> "Time elapsed {}".format(t)
'Time elapsed 0:00:1.9875734'

With ellipsis:

>>> "Time elapsed {}".format(t) # doctest: +ELLIPSIS
'Time elapsed 0:00:...'

To skip a test, one can also add: # doctest: +SKIP.

To handle exceptions, you can just add the Traceback info, then ... and then the exception:

>>> raise ValueError("Something bad happened")
Traceback (most recent call last):
    ...
ValueError: "Something bad happened"

To execute a context manager with doctests:

>>> with TemporaryDirectory() as td:
...     print(td.name)

Documentation

For the documentation, I'm using the Google Style.

To add a code block that can be rendered with sphinx:

.. code-block:: python

    import sys
    print(sys.executable) 

This is equivalent, having the python syntax:

Code::

    import sys
    print(sys.executable)

To add a note:

.. note::

    This is a note

or

Note:
    This is a note

Install libraries with different Python versions

In the requirements.txt file, you can specify the Python version for each library. For example:

dask[dataframe]>=0.17.1;python_version=='3.6'
dask>=0.17.1;python_version>='3.7'