As a preview of what's to come, here is how you would check if a number is in a range in any other language:
if 3 <= num and num < 10:
pass
And here's how you do it in Python:
if 3 <= num < 10:
pass
Tuples are immutable lists. Once they are created, they can't be changed. They are used everywhere in Python.
>>> exam = ('Final Exam', 91)
>>> exam[0]
>>> exam[1]
# Try to change our grade
>>> exam[1] = 100
# A tuple with one element
>>> singleton = ('one element',)
Tuples can be returned from functions.
def compute_statistics(nums):
return average(nums), max(nums), stddev(nums)
Tuples can also be unpacked into separate variables.
>>> name, grade = exam
>>> name
>>> grade
You can also unpack multiple layers of tuples (or any iterable).
>>> player = ('Shaquille', 'O\'Neal', (7, 1))
>>> first_name, last_name, (feet, inches) = player
>>> first_name
>>> last_name
>>> feet
>>> inches
Tuples are often convenient ways of quickly packaging related values together.
However, indexing them by number (ex. exam[0]
) may be confusing. Instead, we
can name a tuple's elements by using namedtuple
.
from collections import namedtuple
Height = namedtuple('Height', 'feet inches')
Player = namedtuple('Player', 'first_name last_name height')
>>> player = Player('Shaquille', 'O\'Neal', Height(7, 1))
>>> player.first_name
>>> player.height.feet
# namedtuples behave like regular tuples (so they can be unpacked)
>>> first_name, last_name, height = player
>>> last_name
>>> height.inches
Python has many commonly used string manipulation functions.
>>> names = 'Foo, Bar, Baz'
>>> names.split(', ')
>>> ' and '.join(names.split(', '))
>>> time_line = 'Time: 23.4'
>>> time_line.startswith('Time: ')
# Combining with unpacking provides a powerful way to parse strings
>>> _, seconds = time_line.split(': ')
>>> seconds = float(seconds)
>>> seconds
Python 3.6 introduced f-strings, a convenient way to interpolate variables into strings.
>>> name = 'Forrest Gump'
>>> birth_year = 1944
>>> skills = ['ping pong', 'running']
>>> f'{name} was born in {birth_year} and is good at: {skills}'
Arguments can be given default values.
def discount_price(price, discount=0.5):
return price * discount
>>> discount_price(10)
>>> discount_price(10, 0.75)
Just be careful! Default arguments are only evaluated once (when the function is defined), not at every function call.
def record_discount(discount, previous_discounts=[]):
previous_discounts.append(discount)
return previous_discounts
>>> record_discount(0.5)
>>> record_discount(0.75)
The better approach uses None
instead:
def record_discount(discount, previous_discounts=None):
if previous_discounts is None:
previous_discounts = []
previous_discounts.append(discount)
return previous_discounts
Arguments can and should be passed by name to reduce ambiguity. Compare the following:
# What do these parameters mean?
search_tweets('@SwiftOnSecurity', 20, True, False)
# Much more clear...
search_tweets('@SwiftOnSecurity', retweets=False, popular=True, limit=20)
def search_tweets(query, limit, popular, retweets=True):
pass
Using names makes your code more clear and also doesn't require you to remember parameter order!
You can even require that arguments be passed by name by using *
.
def search_tweets(query, *, limit, popular, retweets=True):
pass
search_tweets('@SwiftOnSecurity', 20, True, False)
Let's use our newfound knowledge of arguments to make a function that sums its arguments.
sum(1, 2, 3) # 6
sum(0) # 0
def sum(start, *nums):
total = start
for num in nums:
total += num
return total
This new *
syntax allows us to collect any extra arguments passed to the
function. We can also use it to expand iterables as if they were passed as
individual arguments.
nums = [1, 2, 3]
sum(*nums)
# is the same as...
sum(1, 2, 3)
This syntax even works for tuple unpacking!
>>> first, *rest = [1, 2, 3]
>>> first
>>> rest
In fact, Python already has a built-in sum
function that is much more
powerful.
>>> sum([1, 2, 3])
# The second parameter is the starting value to add values to
>>> sum([[1, 2, 3], [4, 5, 6]], [])
Python provides many more functional tools:
>>> min([10, 2, 22, 15])
>>> max([10, 2, 22, 15])
>>> sorted([10, 2, 22, 15])
>>> sorted([10, 2, 22, 15], reverse=True)
All of these functional tools allow you to specify a key
which tells the
function what value to compare.
>>> Graded = namedtuple('Graded', 'name grade')
>>> grades = [Graded('Midterm', 80), Graded('Final', 90), Graded('Project', 0)]
>>> min(grades, key=lambda x: x.grade)
>>> max(grades, key=lambda x: x.grade)
>>> sorted(grades, key=lambda x: x.grade, reverse=True)
lambda
s are convenient ways of expressing one line functions. The above one
is equivalent to:
def get_grade(x):
return x.grade
If you were to port iteration over a list from another language to Python, you may arrive at:
for i in range(len(nums)):
print(nums[i])
But we don't like explicit indices, because they are error prone. Instead, we can just iterate over the list itself:
for num in nums:
print(num)
If we need the index, we use enumerate
:
for i, num in enumerate(nums):
print(i, num)
What if we want to consider values from two lists at once? You may be inclined to write:
for i, name in enumerate(students):
print(f'{name} got a {grades[i]} in the class')
Instead, we use zip
to pair values from iterables together:
for name, grade in zip(students, grades):
print(f'{name} got a {grade} in the class')
Python is built around the idea of iterables and provides many standard library and language features for interacting with and creating them.
We've already seen that list
is an iterable and dict
is an iterable, which
also provides other ways of iteration (.items()
and .values()
). Most
collections in Python are iterables. Even files are iterables!
We know how to iterate through collections, but let's see how we can build our own iterables.
Generators are a powerful tool for building efficient iterables. Consider
replicating Python's range()
:
def our_range(stop):
num = 0
nums = []
while num < stop:
print(f'adding {num}')
nums.append(num)
num += 1
return nums
What's the problem with this approach?
for num in our_range(3):
print(f'received {num}')
Generators use the yield
keyword to stop their execution and hand a value
to the iterator's consumer.
def our_range(stop):
num = 0
while num < stop:
print(f'yielding {num}')
yield num
num += 1
Now, the numbers are generated on the fly:
for num in our_range(3):
print(f'received {num}')
This syntax can be expanded to create terse expressions of list
s, dict
s,
and even generators.
>>> [x * 2 for x in range(4)]
>>> sum(x * 2 for x in range(4))
>>> {x: x * 2 for x in range(4)}
In many ways, object orientation in Python looks exactly like other languages you may be familiar with like Java. There are a few subtle differences, though.
class Course:
def __init__(self, number, name, instructor):
self.number = number
self.name = name
self.instructor = instructor
self._student_grades = {}
def add_student_grade(self, student, grade):
self._student_grades[student] = grade
def curve(self, amount):
for student, grade in self._student_grades.items():
self.add_student_grade(student, grade + amount)
Notably, Python uses a lot of __x__
methods (we call them dunder methods).
Above, we see that the constructor is called __init__
. Additionally, all
methods take an explicit self
parameter, which is similar to this
in C++ or
Java.
Python has no visibility controls and instead prefers the convention of
prefixing private/protected properties with an _
.
>>> csf = Course('601.229', 'CSF', 'Peter')
>>> csf.number
>>> csf.add_student_grade('Johnny Hopkins', 85)
Python supports traditional Java-like inheritance (as well as multiple inheritance, which we won't delve into).
class PassFailCourse(Course):
def add_student_grade(self, student, *, passed):
super().add_student_grade(student, 100 if passed else 0)
We use super()
to access methods from ancestor classes. Other methods (in
this example, __init__
and curve
) are inherited as we would expect.
In Python methods on a class are called class methods, and we denote them by using a decorator.
class Course:
# ...
@classmethod
def retrieve_from_db(cls, number):
# Lookup in your database ...
return cls(number, name, instructor)
csf = Course.retrieve_from_db('601.229')
Decorators are a powerful Python feature that allow you to augment function behavior. They are functions that are passed another function and return a new function with modified behavior.
def cache(func):
func._cache = {}
def wrapped(url):
if url not in func._cache:
func._cache[url] = func(url)
return func._cache[url]
return wrapped
We then can use cache
as a decorator:
@cache
def slow_web_request(url):
print('requesting page')
# ...
return f'page: {url}'
This is equivalent to:
slow_web_request = cache(slow_web_request)
>>> slow_web_request('http://www.google.com')
>>> slow_web_request('http://www.apple.com')
>>> slow_web_request('http://www.apple.com')
from functools import wraps
def cache(func):
func._cache = {}
@wraps(func)
def wrapped(*args, **kwargs):
frozen_args = (args, tuple(sorted(kwargs.items())))
if frozen_args not in func._cache:
func._cache[frozen_args] = func(*args, **kwargs)
return func._cache[frozen_args]
return wrapped
There are a ton of dunder methods you can define on your classes.
class Person:
def __init__(self, first_name, last_name, age):
self.first_name = first_name
self.last_name = last_name
self.age = age
def __hash__(self):
return self.first_name, self.last_name
def __eq__(self, other):
if not isinstance(other, Person):
return False
return self.first_name == other.first_name \
and self.last_name == other.last_name \
and self.age == other.age
def __lt__(self, other):
return self.age < other.age
def __str__(self):
return f'{self.last_name}, {self.first_name} is {self.age} years old'
Defining all of these dunder methods can be tedious, so Python 3.7 introduced dataclasses, which give a much more terse way to express classes.
from dataclasses import dataclass, field
@dataclass
class Person:
first_name: str = field(compare=False)
last_name: str = field(compare=False)
age: int = field(hash=False)
Enums allow you to replace magic constants with efficient, named placeholders.
from enum import Enum, auto
class Color(Enum):
RED = auto()
BLUE = auto()
PURPLE = auto()
>>> Color.RED
>>> Color.RED.name
>>> Color.RED == Color.BLUE
>>> list(Color)
Python prefers a "do first, ask for forgiveness later" approach to exceptional cases at runtime. This has the benefit of being safe from data races.
IndexError
and KeyError
are common when indexing into a sequence or
accessing a key in a dict
-like data structure.
>>> cities = ['Boston', 'New York City', 'Los Angeles']
>>> cities[4]
>>> capitals = {'Maryland': 'Annapolis', 'Connecticut': 'Hartford'}
>>> capitals['Hawaii']
We handle exceptions in Python using a try
/except
(and maybe an
else
/finally
).
ValueError
is commonly used when an argument is in an unexpected format.
def get_birth_year():
while True:
try:
birth_year = int(input('Enter birth year: '))
except ValueError:
pass
else:
if 1900 <= birth_year <= 2018:
return birth_year
The Python standard library has a plethora of useful data structures.
Many common tasks in Python can be written as one-liners by combining these data structures in clever ways.
>>> unique_nums = set([5, 8, 4, 5, 8])
>>> unique_nums
>>> unique_nums.add(6)
>>> 6 in unique_nums
>>> len(unique_nums)
>>> unique_nums - set([5, 4])
>>> unique_nums | set([8, 4])
Sets (like most other data structures) can even be created by providing an iterable (like a generator):
>>> set(x * 2 for x in range(8))
Behaves like a dict
, but returns a default value when a key is not found.
from collections import defaultdict
names = ['Michael', 'Dwight', 'Jim', 'Pam', 'Meredith']
names_by_first_letter = defaultdict(list)
for name in names:
names_by_first_letter[name[0]].append(name)
print(names_by_first_letter)
Without it, we'd have to write something like:
names_by_first_letter = {}
for name in names:
try:
letter_names = names_by_first_letter[name[0]]
except KeyError:
letter_names = names_by_first_letter[name[0]] = []
finally:
letter_names.append(name)
When implementing Dijkstra's algorithm, a defaultdict
is a convenient way to
store the distances from the source node.
import sys
distance_from_source = defaultdict(lambda: sys.maxsize)
distance_from_source[source] = 0
If we just want to return a default value (and not also store that default
value), we can use a regular dict
:
>>> nicknames = {'Robert': 'Bob', 'Kate': 'Katherine'}
>>> nicknames.get('Brenda', 'Brenda')
>>> nicknames
To keep track of counts of just about anything (including other iterables) use
Counter
.
>>> from collections import Counter
>>> spellings = Counter(['color', 'colour', 'color', 'color', 'colour'])
>>> spellings.most_common()
>>> spellings.update(['color', 'color'])
>>> spellings.most_common(1)
list
has constant-time append and pop (from the end). If you need constant
time operations at both ends (a queue, instead of a stack), use a deque
.
>>> from collections import deque
>>> checkout = deque(['oranges', 'blueberries'])
>>> checkout.append('strawberries')
>>> checkout
>>> checkout.appendleft('eggs')
>>> checkout
>>> checkout.popleft()
File handling (and resource management) in Python is a breeze thanks to
context managers. These provide automatic cleanup of resources by using a
with
statement.
# By default, files are opened for reading
with open('essay.txt') as f:
essay = f.read()
This is almost equivalent to the following, but with one important distinction: exception handling!
f = open('essay.txt')
essay = f.read()
# What if an exception happens here?
f.close()
# By default, files are opened for reading
with open('essay.txt') as f:
essay = f.read()
translated = translate(essay, from_language='english',
to_language='french')
# 'w' truncates the file, then opens it for writing
with open('essay_translated.txt', 'w') as f:
print(translated, file=f)
Reading a file line-by-line is also straightforward.
# Each line in urls.txt contains a url
with open('urls.txt') as f:
for url in f:
download_file(url)
Python makes reading CSVs super simple.
import csv
with open('games.csv') as f:
for home_team, away_team, score in csv.reader(f):
print(f'{away_team} @ {home_team}: {score}')
And handling compressed data is just requires a slight modification.
import csv
import gzip
with gzip.open('games.csv.gz') as f:
for home_team, away_team, score in csv.reader(f):
print(f'{away_team} @ {home_team}: {score}')
Python provides a serialization method that works out of the box for every standard library class and almost every class you'll create:
import pickle
with open('dump.pkl', 'wb') as f:
pickle.dump(f, my_object)
# Then later...
with open('dump.pkl', 'rb') as f:
my_object = pickle.load(f)
JSON serialization can be done in an almost identical way.
import json
with open('dump.json', 'w') as f:
json.dump(f, my_dict)
# Then later...
with open('dump.json', 'r') as f:
my_dict = json.load(f)
JSON can be dumped/loaded to a string with json.dumps()
and json.loads()
.
Python 3.4 added a new API for handling file system paths in a cross-platform manner.
from pathlib import Path
# Finds all files of the form logs/*.log
log_files = (Path.cwd() / 'logs').glob('*.log')
# If files have form 20181112.log, builds a list of the timestamps
log_timestamps = [p.stem for p in log_files]
data_dir = Path.cwd() / 'dir'
# We can even open files given a path
train_file = (data_dir / 'train').open()
dev_dev = (data_dir / 'dev').open()
test_test = (data_dir / 'test').open()
If you have some computationally expensive process run on an iterable, Python's standard library makes it easy to distribute it across your computer's cores.
for x in inputs:
print(long_computation(x))
For an effortless speedup:
from multiprocessing import Pool
with Pool() as pool:
for x in pool.map(long_computation, inputs):
print(x)
Python is often used for building command line utilities, so parsing arguments is a common task. Fortunately, the standard library provides an incredibly comprehensive library for doing most of the heavy lifting.
from argparse import ArgumentParser, FileType
parser = ArgumentParser(description='processes some input files')
parser.add_argument('files', type=FileType('r'), nargs='+', help='input files')
parser.add_argument('--mode', choices=list(Mode), default=Mode.CONCAT)
parser.add_argument('--limit', type=int, default=10,
help='limit the number of lines')
parser.add_argument('-v', dest='verbose', action='store_true')
# Parses arguments like:
# foo.txt bar.txt -v --limit 100
#
args = parser.parse_args()
# A list of already opened files passed as CLI args
args.files
# The mode enum
args.mode
# The limit passed (or the default)
args.limit
Learning how to use argparse
is a key Python skill.
Python's standard library also makes it really simple to test your code. You have no excuse not to!
from my_package import Calculator
from unittest import main, TestCase
class TestCalculator(TestCase):
def setUp(self):
self.calculator = Calculator()
def test_add(self):
self.assertEqual(4, self.calculator.add(2, 2))
def test_divide_by_zero(self):
with self.assertRaises(DivisionByZeroError):
self.calculator.divide(2, 0)
if __name__ == '__main__':
main()
We'll combine everything that we've learned to build a tool that uses
requests
and bs4
to check if a person is still alive (according to
Wikipedia).
$ python3 -m pip install pipenv
$ pipenv install requests bs4
- As with any skill, the only way to improve is to practice!
- Find small tasks that need automating or menial things and try to write a script to help you
- Try out Python libraries like Django, Flask, and PyTorch for more domain specific experience
- Watch talks by Raymond Hettinger (he's a Python core developer and a fantastic speaker)