Skip to content
TEH, Chi-En edited this page Oct 3, 2022 · 2 revisions

Python has become the most popular language for machine learning in recent years. It is also a common language for people who have just started learning programming in general.

Installation

You can find a rather comprehensive installation guide (unofficial) at https://realpython.com/installing-python/. Here, we make some remarks to complement what the webpage did not mention, update, or emphasize very well.

Choosing a version

The latest version is usually good enough for most purposes. Just be aware that some libraries may not catch up with the latest version right after the release. For example, Tensorflow only started supporting Python 3.10 a few months after the release (read more here). You can always visit the library homepage to check it.

Of course, do not also install Python version that is too old, or soon to expire, e.g. support for Python 3.7 has been scheduled on 2023-06-27, meaning that any bugs found after that date will no longer be fixed.

Operating Systems

Pick your poison:

Windows πŸͺŸ

Python does not come directly with Windows OS.

There are at least two popular ways of getting Python installed on a Windows machine - through the Anaconda distribution or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But here we would recommend against it for now (see here for its downsides).

Also, pay attention to whether you are downloading the 32-bit version or the 64-bit version. Most likely, you are using a 64-bit machine.

image

Mac OS 🍎

Mac OS typically comes with Python. So you may or may not need to install it again. Use your terminal to check the current version installed (see here).

There are at least three popular ways of getting Python installed on a Macbook - through the Anaconda distribution, the Homebrew package manager, or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But we would recommend against it for now (see here for its downsides). Homebrew is usually fine if you are already familiar with it. Otherwise, just stick to the official installer. See here for instructions.

Also, pay attention to the chip that your Macbook is using. That will determine whether you should get the universal installer or the Intel-only installer. Read here.

image

Linux 🐧

Many distros already come with Python installed. If it is already the version you need, then you don't have to install it anymore.

There are many ways to get Python installed on a Linux machine. Downloading through the Anaconda distribution is certainly one way, but we would recommend against it for now (see here for its downsides). Other than that, please check out https://realpython.com/installing-python/#how-to-install-python-on-linux.


Package management

When using Python, you almost always want to import some packages to save time and effort. Consider a simple example of computing the sample standard deviation for a given list x_list = [2, 3, 5, 7], which can be implemented without importing any libraries as below:

n = len(x_list)
x_mean = sum(x_list) / n
x_stdev = (sum([(x - x_mean)**2 for x in x_list]) / (n - 1))**0.5 # 2.217355782608345

But unless this is your home assignment, why would you want to waste your time reinventing the wheel? 🦽

During actual development, we almost always prefer to use code written by other people, which has probably been optimized and tested for numerous times. In the Python standard library, there exists a package called statistics, in which we can import and calculate the sample standard deviation with ease:

import statistics
x_stdev = statistics.stdev(x_list) # 2.217355782608345

Package manager

Once you have Python installed, you are already equipped with more than 100 packages that be readily imported from the standard library. But to create artificial intelligence apps, we often need many more specialized libraries or packages, i.e. the third-party libraries. This is when a package manager can be helpful.

There are at least three popular package managers used by Python developers: pip, conda, and brew. We will only focus on pip, the standard package manager for Python. When a package is created or updated, pip almost always has the latest version, whereas the other two would depend on if the package authors are active on those platforms as well. Also, starting from Python 3.4 (~2014), pip is already included with the standard installer. If you have Python, most likely you have already have pip installed too. πŸ˜„

Python Package Index

The Python Package Index (PyPI) is where all Python packages can be installed from and published to. Let's take pytest, a popular Python testing tool, for example.

image

To install, simply open up your terminal and type

pip install pytest

Depending on the package you are installing, the installation can take anywhere from a few seconds to hours (very rarely). Assuming there are no any errors, this one single line of "pip install something" is all you need. But before you go crazy and start installing a hundred packages with pip, please read the next section about Virtual Environment. Otherwise, you are at your own risk. πŸ’€

🚨 Because literally anyone could publish to PyPI, sometimes it is not clear which is the official download page. For example, could you easily tell which one is the official pytest package among all the search results?

image

There are three ways to counter this:

  1. The real one usually looks nicer, with logos, links, description, etc. :trollface:
  2. Don't use the search engine within PyPI. Use something like Google.
  3. Visit the package's homepage, e.g. pytest at here, and they will tell you how to install.

To read more, visit https://realpython.com/what-is-pip/.


Virtual Environment

We have already explained the usefulness of packages in Python development in previous section. When developing a real project, you will often find yourself installing numerous packages. Occasionally, some of these packages can only work well with one another for some particular versions.

So here is a hypothetical scenario: You have two projects, project A and project B. Project A, for whatever reason, can only work with pandas of version 1.3 or above. On the other hand, project B can only work with an older version of pandas, version 1.1. How can you deal with this? Use virtual environments!

The goal of a virtual environment is to create an isolated development environment such that all the code and dependencies only live within that environment. This minimizes the chance of having one project "polluting" the other project. In professional development, we almost always want to use a virtual environment, even if the project is small and we know there are no conflicting packages (yet).

There are many tools we can use to manage virtual environments, such as virtualenv, pyenv, and pipenv. But here we will only talk about two: venv and conda.

The venv package

Since Python 3.3 (first release in 2012), venv has been included in the standard library, so no extra installation is needed once you already have Python.

Create a virtual environment

To create a virtual environment, first open your terminal (e.g. PowerShell for Windows, Terminal for Mac OS and Linux); this tutorial does not support Windows' Command Prompt. Then, create a directory for your project:

mkdir my-first-project

Navigate into the project

cd my-first-project

Now create a virtual environment inside this project directory as following:

python -m venv env_project

The -m flag tells Python to invoke the module venv, and project_env is the directory that will be created to store the environment. Very often, people prefer shorter name, e.g. python -m venv env or python -m venv venv. We will continue using the long name here to make things clearer.

Now that you have created the environment, you can take a peek at it:

ls env_project

Windows πŸͺŸ

On Windows, you should see something like:

    Directory: C:\Users\aiclub\my-first-project\env_project

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
da----         9/30/2022  10:30 AM                Include
da----         9/30/2022  10:30 AM                Lib
da----         9/30/2022  10:30 AM                Scripts
-a----         9/30/2022  10:30 AM            165 pyvenv.cfg

Mac OS 🍎 or Linux 🐧

On Mac OS or Linux, you should see:

bin  include  lib  lib64  pyvenv.cfg  share

Activate the environment

To start using the environment, we have to activate it.

Windows πŸͺŸ

env_project/Scripts/activate

Depending on your shell configuration, you may or may not see a prefix (env_project) added to your terminal. Either way, you can check if your environment has been activated properly by enquiring the Python being used right now:

(gcm python).Path

The gcm command in Powershell is a shorthand for Get-Command. You should see something like:

C:\Users\aiclub\my-first-project\env_project\Scripts\python.exe

This tells you that you are indeed using the python.exe under env_project. You can also check pip in a similar manner.

Mac OS 🍎 or Linux 🐧

On Mac OS or Linux, you should see:

source env_project/bin/activate

Depending on your shell configuration, you may or may not see a prefix (env_project) added to your terminal. Either way, you can check if your environment has been activated properly by enquiring the Python being used right now:

which python

You should then see something like:

/home/aiclub/my-first-project/env_project/bin/python

This tells you that you are indeed using the python executable under env_project. You can also check pip in a similar manner.

In all three operating systems, you can deactivate the virtual environment by simply typing deactivate.

Once you are inside a virtual environment, any pip installation will be contained within the environment itself.

To read more, visit https://realpython.com/python-virtual-environments-a-primer/.

conda

Some of you might have heard of Anaconda. While both are closely related, they are not referring to the same thing. In short, Conda is the environment and package manager like venv but more versatile, whereas Anaconda is a collection of > 100 packages including conda as well as popular data science packages such as numpy, scipy, ipython notebook, etc.

While Anaconda is pretty common in many computer science classes nowadays, it is actually bloated because it installs many packages that you may not be using at all. Also, all those packages are by default installed globally, meaning that they are not contained properly within a virtual environment. Of course, you can always create a virtual environment with Anaconda, but then you will be installing even more packages then.

Here are some pros and cons for using conda.

Pros:

  • Conda can install not only Python packages but also packages from other languages, such as C++, Rust, FORTRAN, LaTeX, etc. This is especially useful when managing a project that uses multiple languages.
  • Conda checks for any conflicts before installation better than many other environment manager.
  • If you are installing a package that uses GPU, e.g. Tensorflow with CUDA, you need conda. See https://www.tensorflow.org/install/pip.

Cons:

  • Installing packages with conda install is much slower than pip install.
  • Sometimes the latest release might not be available on conda until much later.
  • Some less popular packages are only available via pip but not conda. (You can still invoke pip within conda, even though that's considered a bad practice)

If you still prefer conda, use Miniconda instead of Anaconda. A Miniconda is just like Anaconda but without all those optional packages that come pre-installed.

To learn how to manage an environment with conda, visit https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html.


Learning Resources

Beginners

Intermediate/Advanced