Skip to content
niemiaszek edited this page Dec 3, 2022 · 1 revision

General

We use Conda as our main tool for environment management. For each of our Python repos you should see related conda environment.

Miniconda

Miniconda is recommended, as it contains only essential features to build useful Conda environments.

Why Conda?

Why should you even use Conda? Imagine a very real situation when working with ML software.

Naive approach

Let's assume you don't use Conda and install everything with pip (standard python package installer) on your host machine.

  1. You want to install Nvidia CUDA for your host machine to handle your powerful GPUs - you install the newest, fastest version (11.8 atm). You spent 1h to get successful instalation, well working drivers with your system.
  2. Then you want to install your favorite framework for Machine Learning - PyTroch. You go to PyTorch website and you see that the newest supported version of CUDA for PyTorch 1.13.0 is 11.7 - lower than yours. 😠
  3. You downgrade your CUDA up for to 11.7 for 0.5h and you successfully install PyTorch. You are very happy!
  4. Then you see new cool neural network implemented in different popular framework, TensorFlow. You want to install it on your host, but you see it's working only with CUDA 11.2. What to do now? 🤯
  5. To make you even sadder, you stumble upon repository with this amazing neural network, which is built in older version of PyTorch or TensorFlow. You might even need older version of Python or 1000 different libraries - you are now doomed. ☠️

In short, you can easily get stuck in Dependency Hell 😈

This is basically a path that any practicioner had to follow. It took a lot of time to learn the tools and practices that. To begin with, it might take you a few times before you can setup CUDA on Ubuntu successfully. Then it will randomly break for no reason. Everytime you fix something you gain valuable experiance that lets you solve similar problems faster next time.

Conda (correct) approach

To save you a lot of time, use Conda. Nowadays, you will see Conda even recommended for installation of those frameworks. You can install different CUDA library version in each separated environment (those local CUDA libraries are compatibile with your host driver, which can be newer).

  1. You install newest CUDA for your host
  2. You setup environment for PyTorch
  3. Then you setup different environment for TensorFlow
  4. Then you can setup different environments for older versions of these frameworks...

It's as easy as it gets 😃. You can setup new environment for any new project you want to you. The only drawback of using conda is the storage usage - many environments will take much space. Sometimes its worth to prepare environments you can share between a few project.

Conda vs pip venv vs Docker

venv with pip

One of the most popular environment solution for python is virtual environment (venv) with provided pip requirements.txt, which installs listed packages. You can get separation of many environments this way, but you can't guarantee that installed packages versions will be compatibile - you might meet dependency errors. For example, if you list cudnn package in your conda installation, it will adjust cudnn version to selected CUDA

Docker

Docker is a wonderful tool - really. It allows to use many containers on one host, where each container is separated and run different environments (or even OS) without any conflicts. Containers are as much separated from the host as it is possible, but you still can mount drives, open them for network usage, allow to use GPUs or USB devices (like ZED Camera, check this) - but it takes some hussle. There is a lot of overhead in configuration of Docker containers when they need more resources to run and it requires much knowledge. If we had to deploy our code for 1000 Okońs, then docker would be the best tool to use.