This is a codebase for basic Python utilities.
To install the dependencies:
pip install -r requirements.txt
For setting up PySpark, make sure Java and Spark are available in the machine. Then, we need to set the environment variables PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
to point to the python executable.
Press to get the instructions for PySpark on Linux or MacOS in a Conda environment
To set these variables every time the environment is activated, we can follow the steps of this guide. First, get the path of the Conda environment pybase
is installed:
CONDA_ENV=$(conda env list | grep pybase | awk '{print $NF}')
Then, create the file $CONDA_ENV/etc/conda/activate.d/env_vars.sh
and add:
#!/bin/sh
CONDA_ENV=$(conda env list | grep pybase | awk '{print $NF}')
export PYSPARK_PYTHON=$CONDA_ENV/bin/python
export PYSPARK_DRIVER_PYTHON=$CONDA_ENV/bin/python
export SPARK_HOME_BACKUP=$SPARK_HOME
export SPARK_HOME=/home/root/installer/spark
This will export the variables every time we do conda activate pybase
.
To unset these variables when we deactivate the environment,
create the file $CONDA_ENV/etc/conda/deactivate.d/env_vars.sh
and add:
#!/bin/sh
unset PYSPARK_PYTHON
unset PYSPARK_DRIVER_PYTHON
export SPARK_HOME=$SPARK_HOME_BACKUP
unset SPARK_HOME_BACKUP
Press to get the instructions for PySpark on Windows in a Conda environment
To set these variables every time the environment is activated, we can follow the steps of this guide. First, get the path of the environment pybase
is installed:
for /f "delims=" %A in ('conda env list ^| grep pybase ^| awk "{print $NF}"') do set "CONDA_ENV=%A"
Then, create the file %CONDA_ENV%\etc\conda\activate.d\env_vars.bat
and add:
@echo off
for /f "delims=" %%A in ('conda env list ^| grep pybase ^| awk "{print $NF}"') do set "CONDA_ENV=%%A"
set PYSPARK_PYTHON=%CONDA_ENV%\python.exe
set PYSPARK_DRIVER_PYTHON=%CONDA_ENV%\python.exe
set SPARK_HOME_BACKUP=%SPARK_HOME%
set SPARK_HOME=
set PYTHONPATH_BACKUP=%PYTHONPATH%
set PYTHONPATH=
This will export the variables every time we do conda activate pybase
.
To unset these variables when we deactivate the environment,
create the file %CONDA_ENV%\etc\conda\deactivate.d\env_vars.bat
and add:
@echo off
set PYSPARK_PYTHON=
set PYSPARK_DRIVER_PYTHON=
set SPARK_HOME=%SPARK_HOME_BACKUP%
set SPARK_HOME_BACKUP=
set PYTHONPATH=%PYTHONPATH_BACKUP%
set PYTHONPATH_BACKUP=
See more details on how to install PySpark on Windows here.
Press to get the instructions for CUDA and CuDNN on Linux or MacOS
TODO
Press to get the instructions for CUDA and CuDNN on Windows
- Check the capability of your GPU here.
- Select the version of CUDA toolkit you want to download. The latest version can be found here.
- Download the corresponding CuDNN based on the CUDA version here.
- Copy three files from the unzipped directory to CUDA X.X install location. For reference, NVIDIA team has put them in their own directory. So all you have to do is to copy file from :
- {unzipped dir}/bin/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\bin
- {unzipped dir}/include/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\include
- {unzipped dir}/lib/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\lib
See the full installation guide here.
To execute the tests:
pytest --doctest-modules --continue-on-collection-errors --durations 0 --disable-warnings
To execute coverage and see the report:
coverage run playground.py
coverage report
To see more details on the result, the following command will generate a web where the coverage details can be examined line by line:
coverage html
To handle variable outputs in doctest you need to add at the end of the execution line #doctest: +ELLIPSIS
and substitute the variable output with ...
An example can be found in the file timer.py.
Original:
>>> "Time elapsed {}".format(t)
'Time elapsed 0:00:1.9875734'
With ellipsis:
>>> "Time elapsed {}".format(t) # doctest: +ELLIPSIS
'Time elapsed 0:00:...'
To skip a test, one can also add: # doctest: +SKIP
.
To handle exceptions, you can just add the Traceback
info, then ...
and then the exception:
>>> raise ValueError("Something bad happened")
Traceback (most recent call last):
...
ValueError: "Something bad happened"
To execute a context manager with doctests:
>>> with TemporaryDirectory() as td:
... print(td.name)
For the documentation, I'm using the Google Style.
To add a code block that can be rendered with sphinx:
.. code-block:: python
import sys
print(sys.executable)
This is equivalent, having the python syntax:
Code::
import sys
print(sys.executable)
To add a note:
.. note::
This is a note
or
Note:
This is a note
In the requirements.txt file, you can specify the Python version for each library. For example:
dask[dataframe]>=0.17.1;python_version=='3.6'
dask>=0.17.1;python_version>='3.7'