-
Notifications
You must be signed in to change notification settings - Fork 15
Python
Python has become the most popular language for machine learning in recent years. It is also a common language for people who have just started learning programming in general.
You can find a rather comprehensive installation guide (unofficial) at https://realpython.com/installing-python/. Here, we make some remarks to complement what the webpage did not mention, update, or emphasize very well.
The latest version is usually good enough for most purposes. Just be aware that some libraries may not catch up with the latest version right after the release. For example, Tensorflow only started supporting Python 3.10 a few months after the release (read more here). You can always visit the library homepage to check it.
Of course, do not also install Python version that is too old, or soon to expire, e.g. support for Python 3.7 has been scheduled on 2023-06-27, meaning that any bugs found after that date will no longer be fixed.
Pick your poison:
Python does not come directly with Windows OS.
There are at least two popular ways of getting Python installed on a Windows machine - through the Anaconda distribution or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But here we would recommend against it for now (see here for its downsides).
Also, pay attention to whether you are downloading the 32-bit version or the 64-bit version. Most likely, you are using a 64-bit machine.
![image](https://user-images.githubusercontent.com/21100851/188505342-6815cbd9-d90f-4ceb-921b-10bf271c4da0.png)
Mac OS typically comes with Python. So you may or may not need to install it again. Use your terminal to check the current version installed (see here).
There are at least three popular ways of getting Python installed on a Macbook - through the Anaconda distribution, the Homebrew package manager, or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But we would recommend against it for now (see here for its downsides). Homebrew is usually fine if you are already familiar with it. Otherwise, just stick to the official installer. See here for instructions.
Also, pay attention to the chip that your Macbook is using. That will determine whether you should get the universal installer or the Intel-only installer. Read here.
![image](https://user-images.githubusercontent.com/21100851/188505673-2a3c14aa-5d68-482a-9628-1a6eec4f3868.png)
Many distros already come with Python installed. If it is already the version you need, then you don't have to install it anymore.
There are many ways to get Python installed on a Linux machine. Downloading through the Anaconda distribution is certainly one way, but we would recommend against it for now (see here for its downsides). Other than that, please check out https://realpython.com/installing-python/#how-to-install-python-on-linux.
When using Python, you almost always want to import some packages to save time and effort. Consider a simple example of computing the sample standard deviation for a given list x_list = [2, 3, 5, 7]
, which can be implemented without importing any libraries as below:
n = len(x_list)
x_mean = sum(x_list) / n
x_stdev = (sum([(x - x_mean)**2 for x in x_list]) / (n - 1))**0.5 # 2.217355782608345
But unless this is your home assignment, why would you want to waste your time reinventing the wheel? π¦½
During actual development, we almost always prefer to use code written by other people, which has probably been optimized and tested for numerous times. In the Python standard library, there exists a package called statistics
, in which we can import and calculate the sample standard deviation with ease:
import statistics
x_stdev = statistics.stdev(x_list) # 2.217355782608345
Once you have Python installed, you are already equipped with more than 100 packages that be readily imported from the standard library. But to create artificial intelligence apps, we often need many more specialized libraries or packages, i.e. the third-party libraries. This is when a package manager can be helpful.
There are at least three popular package managers used by Python developers: pip, conda, and brew. We will only focus on pip, the standard package manager for Python. When a package is created or updated, pip almost always has the latest version, whereas the other two would depend on if the package authors are active on those platforms as well. Also, starting from Python 3.4 (~2014), pip is already included with the standard installer. If you have Python, most likely you have already have pip installed too. π
The Python Package Index (PyPI) is where all Python packages can be installed from and published to. Let's take pytest, a popular Python testing tool, for example.
To install, simply open up your terminal and type
pip install pytest
Depending on the package you are installing, the installation can take anywhere from a few seconds to hours (very rarely). Assuming there are no any errors, this one single line of "pip install something" is all you need. But before you go crazy and start installing a hundred packages with pip, please read the next section about Virtual Environment. Otherwise, you are at your own risk. π
π¨ Because literally anyone could publish to PyPI, sometimes it is not clear which is the official download page. For example, could you easily tell which one is the official pytest package among all the search results?
There are three ways to counter this:
- The real one usually looks nicer, with logos, links, description, etc.
- Don't use the search engine within PyPI. Use something like Google.
- Visit the package's homepage, e.g. pytest at here, and they will tell you how to install.
To read more, visit https://realpython.com/what-is-pip/.
We have already explained the usefulness of packages in Python development in previous section. When developing a real project, you will often find yourself installing numerous packages. Occasionally, some of these packages can only work well with one another for some particular versions.
So here is a hypothetical scenario: You have two projects, project A and project B. Project A, for whatever reason, can only work with pandas of version 1.3 or above. On the other hand, project B can only work with an older version of pandas, version 1.1. How can you deal with this? Use virtual environments!
The goal of a virtual environment is to create an isolated development environment such that all the code and dependencies only live within that environment. This minimizes the chance of having one project "polluting" the other project. In professional development, we almost always want to use a virtual environment, even if the project is small and we know there are no conflicting packages (yet).
There are many tools we can use to manage virtual environments, such as virtualenv
, pyenv
, and pipenv
. But here we will only talk about two: venv
and conda
.
Since Python 3.3 (first release in 2012), venv
has been included in the standard library, so no extra installation is needed once you already have Python.
To create a virtual environment, first open your terminal (e.g. PowerShell for Windows, Terminal for Mac OS and Linux); this tutorial does not support Windows' Command Prompt. Then, create a directory for your project:
mkdir my-first-project
Navigate into the project
cd my-first-project
Now create a virtual environment inside this project directory as following:
python -m venv env_project
The -m
flag tells Python to invoke the module venv
, and project_env
is the directory that will be created to store the environment. Very often, people prefer shorter name, e.g. python -m venv env
or python -m venv venv
. We will continue using the long name here to make things clearer.
Now that you have created the environment, you can take a peek at it:
ls env_project
On Windows, you should see something like:
Directory: C:\Users\aiclub\my-first-project\env_project
Mode LastWriteTime Length Name
---- ------------- ------ ----
da---- 9/30/2022 10:30 AM Include
da---- 9/30/2022 10:30 AM Lib
da---- 9/30/2022 10:30 AM Scripts
-a---- 9/30/2022 10:30 AM 165 pyvenv.cfg
On Mac OS or Linux, you should see:
bin include lib lib64 pyvenv.cfg share
To start using the environment, we have to activate it.
env_project/Scripts/activate
Depending on your shell configuration, you may or may not see a prefix (env_project)
added to your terminal. Either way, you can check if your environment has been activated properly by enquiring the Python being used right now:
(gcm python).Path
The gcm
command in Powershell is a shorthand for Get-Command
. You should see something like:
C:\Users\aiclub\my-first-project\env_project\Scripts\python.exe
This tells you that you are indeed using the python.exe
under env_project
. You can also check pip
in a similar manner.
On Mac OS or Linux, you should see:
source env_project/bin/activate
Depending on your shell configuration, you may or may not see a prefix (env_project)
added to your terminal. Either way, you can check if your environment has been activated properly by enquiring the Python being used right now:
which python
You should then see something like:
/home/aiclub/my-first-project/env_project/bin/python
This tells you that you are indeed using the python
executable under env_project
. You can also check pip
in a similar manner.
In all three operating systems, you can deactivate the virtual environment by simply typing deactivate
.
Once you are inside a virtual environment, any pip
installation will be contained within the environment itself.
To read more, visit https://realpython.com/python-virtual-environments-a-primer/.
Some of you might have heard of Anaconda. While both are closely related, they are not referring to the same thing. In short, Conda is the environment and package manager like venv
but more versatile, whereas Anaconda is a collection of > 100 packages including conda as well as popular data science packages such as numpy, scipy, ipython notebook, etc.
While Anaconda is pretty common in many computer science classes nowadays, it is actually bloated because it installs many packages that you may not be using at all. Also, all those packages are by default installed globally, meaning that they are not contained properly within a virtual environment. Of course, you can always create a virtual environment with Anaconda, but then you will be installing even more packages then.
Here are some pros and cons for using conda.
Pros:
- Conda can install not only Python packages but also packages from other languages, such as C++, Rust, FORTRAN, LaTeX, etc. This is especially useful when managing a project that uses multiple languages.
- Conda checks for any conflicts before installation better than many other environment manager.
- If you are installing a package that uses GPU, e.g. Tensorflow with CUDA, you need conda. See https://www.tensorflow.org/install/pip.
Cons:
- Installing packages with
conda install
is much slower thanpip install
. - Sometimes the latest release might not be available on conda until much later.
- Some less popular packages are only available via
pip
but notconda
. (You can still invokepip
withinconda
, even though that's considered a bad practice)
If you still prefer conda, use Miniconda instead of Anaconda. A Miniconda is just like Anaconda but without all those optional packages that come pre-installed.
To learn how to manage an environment with conda, visit https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html.
- Python Tutorial for Beginners: https://youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7
- Python for Beginners: https://youtu.be/kqtD5dpn9C8
- NumPy for absolute beginners (very well written!): https://numpy.org/doc/stable/user/absolute_beginners.html
- The Python Language Reference (official): https://docs.python.org/3/reference/index.html
- The Python Standard Library (official): https://docs.python.org/3/library/index.html
- Python OOP Tutorials: https://youtube.com/playlist?list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc
- NumPy for beginners: https://youtu.be/lLRBYKwP8GQ
- Pandas tutorials: https://youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS