the_pythonic_way
contains a collection of useful snippets and recommendations of how to program in a "Pythonic way". People that come to Python from other languages like Java, C# or R might find these snippets helpful. the_pythonic_way
can be used as a comprehensive Python cheatsheet, where you can look for short answers and keywords of how to do things in Python, so that you can easily google further.
- Coding Styles (
PEP 8
, Google Style) - Environment Management
- Pythonic Objects, Protocols, and ABCs (
abc
,UML
) - Data Structures
- Tuple and Tuple Unpacking (
tuple
) - List and List comprehension
- Deque, Synchronized Queue, Priority Queue, Counter
- String (
f-strings
, string manipulations, regular expression) - Common Sequence Operations and Indexing
- Dictionary and Dictionary comprehension (
bunch
) - Set
- Enumeration (
enum34
) - Matrix with numpy (
numpy
,memmap
,bcolz
) - Dataframe with pandas (
pandas
)
- Tuple and Tuple Unpacking (
- Numeric and Bit Operators
- Control Flows (
compound statements
) - Functional Programming
- Modules
- Exceptions (
throw exception
) - Debugging (
pdb
,ipdb
) - IO Operations
- Csv, Json, and Yaml (
csv
,json
,ujson
,PyYAML
) - Data Class (
attrs
,mashmallow
,sqlalchemy
) - Path operations (
pathlib
,os.path
) - Pretty Print (
tabulate
,termcolor
) - Progress bar (
tqdm
)
- Csv, Json, and Yaml (
- Process and Parallel Processing
- Networking
- Testing
- CLI
- Dataclass
- Typing
- Scikit-learn (
LabelEncoder
) - Jupyter (
jupyter
) - Microservices (
12 factor
)
PEP 8
is highly recommended to go through, at least once in you life. It lists all the coding rules that accepted by the whole Python ecosystem. The Google Style is also a very good and comprehensive reference.
import os #lower_case for package name, as short as possible (use abbrs)
class MyBag: #CapitalizedWords for class name
"""Example class with types documented in the docstring.
You should always use CapitalizedWords for class name
"""
MAX_OVERFLOW = 10 #ALL_CAPITALIZED for constant
def __init__(self):
self.data = []
#blank line between two methods/functions
def add_x(self, x):
""" Example of class method, which should has lower_case_underscore name
Args:
x (int): The first parameter.
Returns:
bool: True if successful, False otherwise.
"""
self.data.append(x)
def _hidden_add(self, x):
""" Use _single_leading_underscore for weak "internal use" indicator"""
pass
# Use 4 spaces indentation
# Limit all lines to 79 characters
# Use utf-8 file encoding
# Always try to align with opening delimiter.
foo = long_function_name(var_one, var_two,
var_three, var_four)
# if use hanging indent: no arguments on the first line and further indentation on arguments.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
# The closing brace/bracket/parenthesis on multi-line constructs line up
# under the first non-whitespace character:
my_list = [
1, 2, 3,
4, 5, 6,
]
You can define a .editorconfig
file at the root folder of your python project. It is recognized by many common Python IDE and version control system such as Pycharm and github. These programs automatically enforces the rules defined in the .editorconfig
file. A common setting for python is
root = true
[*]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{html,js,css,json,yml,yaml}]
indent_size = 2
[Makefile]
indent_style = tab
You can use PyLint
to analyze your code and ensure the code quality.
pip
gives you access to thousands of battle-tested Python libraries. It is the foundation of how Python becomes so convenient and powerful. In a python project, you usually find a requirements.txt
file, which lists all the python packages that used in the project.
Flask>=0.12.2
flask-restplus>=0.10.1
PyYAML>=3.12
requests>=2.18.4
pyOpenSSL>=17.3.0
cryptography==1.9
All the requirements can be installed by using pip:
pip install -r requirements.txt
conda
is a highly recommended way to manage python environement. You can create a seperate python enviroment for each of your projects. Installing packages with conda install
is also more robust than using pip
, since it uses pre-compiled libraries.
Example of how to define an environment for conda in a conda_environement.yml
file:
name: ProjectChangeTheWorld
dependencies:
- python = 2.7.13
- pip:
- Flask>=0.12.2
- flask-restplus>=0.10.1
- PyYAML>=3.12
- requests>=2.18.4
- pyOpenSSL>=17.3.0
- cryptography==1.9
How to update/create a conda environment using the conda_environement.yml
file, and activate it:
conda env update --file conda_environment.yml
activate ProjectChangeTheWorld
The three most popular and important built-in data structures in python are tuple
, list
and dict
with their elegant comprehension
syntax; come behind are str
and set
. generator
is a powerful and elegant way to iterate over any type of container object. numpy
is the most important data structure for scientific computing dealling with matrix
operations. For data scientists, pandas
and its dataframe
data structure is the most popular way to represent a data table.
tuple
is immutable sequences, typically used to store collections of heterogeneous data (data of different types). Tuple is an important concept in functional programming, where all variables are immutable. It can be used as a flexible version of C struct
. It is commonly used to pack mutiple results returned by a function. Note that tuple
is a sequence structure, so all common sequence operations and indexing can be used.
tp = ('a', 1, 'b', 2) # declare a tuple
tp = 'a', 1, 'b', 2 # you do not need paratheses to declare tuple
a, av, b, bv = tp # unpacking tuple into variables, useful
def f(tp): # function that concate string at even index,
return (''.join(tp[::2]), sum(tp[1::2])) # and sum of number at odd index of input tuple.
name, total = f(tp) # unpacking tuple returned by a function
List are mutable sequences, typically used to store collections of homogeneous items (items of the same data type).
## Declare list
xs = [1, 2, 3, 4, 5] # declare a list
ys = list(range(5)) # convert range; generator; tuple;... to list by using list(.)
sorted(xs, reverse=True) # sort a list
xts = tuple(xs) # convert list into tuple, which is immutable and can be distribute
a,b,c = [1, 2, 3] # unpack list to multiple variables; number of variables must equal length of list
## List comprehension
[f(x) for x in xs] # map function f(.) to each element of xs. xs can be any sequence like list, string, tuple.
[f(x) for x in xs if is_g(x)] # filter with is_g(.) and map f(.) to each list element
[f(x) for xs in xss for x in xs] # map function f(.) to each element in a list_of_list and flatten it
[[f(x) for x in xs] for xs in xss] # nested list comprehension, apply f(.) to each element of list_of_list, preserving structure
[f(x, y) for (x, y) in zip(xs, ys)] # map a function f(.,.) to a list of tupple
## Modify list
xs.append(6) # append 6 to the end of list xs
xs.extend(ys) # append all elements of ys to the end of xs
xs.insert(0, 6) # insert 6 to position 0 of xs
xs.pop() # remove last element of xs and return its value
xs.pop(2) # remove element at index 2 of xs
xs.reverse() # reverse the elements of xs (in-place)
## Sort a list
sorted(xs, reverse=True) # return sorted version of xs (asc if reverse=True, otherwise, desc)
[x[0] for x in sorted(enumerate(xs), key=lambda tup:tup[1])] # return index of sorted elements of xs
## Unique list
list(set(xs)) # return new list with unique elements of xs
In python, str
is a sequence of char, which means all common sequence operations can be used. Here I only list string-specific operations, which includes string format, string manipulations, and regular expression.
Python supports multiple ways to format text strings. These include %-formatting
, str.format()
, and string.Template
. But in most case, the best way to format string in Python is by using Literal String Interpolation, or usually referred to as f-strings
.
# String format
st = 'one'; n_1 = 2; n_2 = 3.0
'%s %d %f' % (st, n_1, n_2) # %-formatting
'{} {} {}'.format('one', 2, 3.0) # str.format()
f'{st} {n_1} {n_2}' # Literal String Interpolation, or f-strings
# String manipulation
' spacious '.strip() # Remove leading and trailing spaces
st.startswith('on') # Check if st start with 'on'; can specify start and end index
st.startswith('ne')
''.join(['1', '22', 'xx']) # join list of strings into a single string
', '.join(['1', '22', 'xx']) # join list of strings with ', ' as seperator
'1,2,3'.split(',') # split a string using ',' as seperator.
'1,2,3'.split(',', maxsplit=1) # split a string maxsplit times using ',' as seperator (return ['1', '2,3'])
st.replace('ne', 'NE') # return a copy of the string with all occurrences of 'ne' replaced by new 'NE'.
# String search
#Find index where a substring first begins in a string
str1 = "this is string example....wow!!!";
str2 = "exam";
str1.find(str2) # return the lowest index in str1 where str2 is found.
str1.find(str2, start=10, end=20) # return the lowest index in range [start:end] where str2 is found in str1
str1.index(str2) # like find(), but raise ValueError when the str2 is not found.
# Regular Expression
import re
text = "He was carefully disguised but captured quickly by police."
re.findall(r"\w+ly", text) # find all occurrences of a pattern (return ['carefully', 'quickly'])
for m in re.finditer(r"\w+ly", text): # find all matches with their positions.
print(f'{m.start()}-{m.end()}: {m.group(0)}')
re.search(r"\w+ly", text) # return match object contains first match of pattern.
# match object has info about where the match occurs
re.sub(r"\w+ly", '__', text) # replace all occurences of a pattern with the string '__'.
re.sub(r"\w+ly", f, text) # replace all occurences x of a pattern with f(x).
re.split(r"but", text) # split string into list of strings using any substring that satisfies the pattern
All sequence data structures, including tuple
, list
, and string
, accepts the following common sequence operations.
xs = [3, 5, 7, 9] # xs is a sequence, can be list, tuple, or string
ys = [2, 4, 6, 8]
x = 1 # x is an item
x in xs # check if x is in xs
x not in xs # check if x is not in xs
xs.count(x) # number of items equal to x in xs
xs + ys # create new sequence by concenate xs and ys
xs*3 # create a new sequence by concenate xs 3 times to itself
len(xs); min(xs); max(xs) # length, min, and max value of xs
#Indexing
xs[1] # return item 1 of xs; the first index is always 0
xs[0:2] # slice of xs from 1 to 3
xs[0:3:2] # slice of xs from 0 to 3 with step 2
xs[::2] # all items at even index
xs[1::2] # all items at odd index
xs[:-1] # all items except the last
xs.index(x, 1, 3) # index of the first occurence of x in xs that comes after the 1th and before 3th item.
#Looping
[f(x) for x in xs] # map function f(.) to each list element
[f(idx, x) for idx, x in enumerate(xs)] # use enumerate(.) to loop through elements of a sequence with index
#Unpacking
a, b, c = xs # unpack elements of xs into a, b, and c; len(xs) must equal 3
f(*xs) # the single star operator unpacks the sequence xs into positional arguments of f(.)
{x: x for x in xs} # create dict from list
[(k,dct[k]) for k in dct ] # create list of tupple from dict
for key, value in dict.iteritems(): # looping through key and value
#Should use `bunch` for dict with attribute-style access. JSON and YAML-friendly
import bunch
b = Bunch()
b.hello = 'world'
b['hello'] += "!"
#Use enum34
#TODO
#TODO: Use numpy
#TODO: Use pandas
#TODO: if, while, for
#TODO: def, lambda, * star o
#TODO: import, isort, PYTHONPATH
In Python, we usually prefer raising exceptions than returning error code, check here
Use the logging package when you catch an exception so that you get a full exception traceback
import logging
try:
# change the world
except Exception as ex:
logging.exception('', exc_info=True)
You can setup custom logger (file or stream handler) like this:
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s:%(name)s:%(message)s')
file_handler = logging.FileHandler('sample.log')
file_handler.setLevel(logging.ERROR)
file_handler.setFormatter(formatter)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.addHandler(stream_handler)
...
logger.Info("Everything is working fine, sir!")
logger.exception("Alert, Alert!")
Create a pbd breakpoint by putting this line wherever you like Read More
:
import pdb; pdb.set_trace();
I usually prefer to experiment codes in Jupyter. You can use ipdb
to debug in Jupyter. To trigger
a debugging session in Jupyter, at the following line to your code at where you want to start debugging:
from IPython.core.debugger import Tracer; Tracer()()
json, yaml
You can either use os
package
# Join paths
os.path.join('path/the/dir','subdir/inside')
# Change working directory to the script directory, so that open file will work with relative path
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
Or you can use pathlib
, which is much more convient way to deal with path and files in Python.
from pathlib import Path
in_file_1 = Path.cwd() / "in" / "input.xlsx" #No need for complicated os.path.join(.)
[file for file in Path.cwd().iterdir() if not file.is_dir() ] #Easy to iterate a directory (cwd in this case)
Use tabulate
to pretty print tabular data, and termcolor
to colorize terminal output.
#Pretty print for tabular data, use tabulate
from tabulate import tabulate
print tabulate({"Name": ["Alice", "Bob"], "Age": [24, 19]}, headers="keys")
#Color for terminal output, to easily get attention, use termcolor
from termcolor import colored, cprint
text = colored('Hello, World!', 'red', attrs=['reverse', 'blink']); print(text)
cprint('Hello, World!', 'green', 'on_red')
print_red_on_cyan = lambda x: cprint(x, 'red', 'on_cyan')
print_red_on_cyan('Hello, World!')
Instantly make your loops show a smart progress meter. Just wrap any iterable with tqdm(iterable), and you’re done!
from tqdm import tqdm
for i in tqdm(range(10000)):
...
#Use setproctitle to set title for python process, useful when using ps to know which process to be killed.
import setproctitle
setproctitle(title)
getproctitle() #Return current process title
The recommended way to spawn mutiple processes in Python is to use the multiprocessing
package.
import multiprocessing
def worker(num):
"""thread worker function"""
print 'Worker:', num
return
if __name__ == '__main__': # always check for __main__ to prevent recursive spawning
jobs = []
p1 = multiprocessing.Process(target=worker, args=(1,))
p2 = multiprocessing.Process(target=worker, args=(2,))
p1.start()
p2.start()
p1.join() # wait for worker 1
p2.join() # wait for worker 2
Being the most downloaded package on pip, requests
has everything you need when interacting with REST APIs
import requests
params = {'param_1': 1, 'param_2': 2}
resp = requests.get('https://api.github.com/events', params, auth=(username, password))
resp = requests.post('http://httpbin.org/post', data = {'key':'value'})
resp = requests.put('http://httpbin.org/put', data = {'key':'value'})
resp = requests.delete('http://httpbin.org/delete')
resp = requests.head('http://httpbin.org/get')
resp = requests.options('http://httpbin.org/get')
flask
is the most common way to create a microservice in Python, which has many plugins that you can add to your service. The most common plugin is flask-restplus
, which automatically document your API and get you the swagger page for free.
You can also try connexion
, which built on top of flask
and following the "API design first" principle.
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "Hello World!"
if __name__ == "__main__":
app.run()
#sklearn.preprocessing.LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([1, 2, 2, 6])
le.classes_
# array([1, 2, 6])
le.transform([1, 1, 2, 6])
# array([0, 0, 1, 2]...)
le.inverse_transform([0, 0, 1, 2])
# array([1, 1, 2, 6])
!%matplotlib inline #inclide plots from matplotlib to the jupyter notebook
!%config IPCompleter.greedy=True #Press Tab to get autocomplete
#TODO: Jupyter Widget!
I use Jupyter and Vim8 together to program. I experiment my code first in Jupyter and reorganize things using Vim8. Code can be debugged in Jupyter with ipdb
. To maintain session, I use tmux
. I setup Vim8 similar to what introduced here: Use VIM as a python IDE
Useful shortcuts in VIM. Some I configured myself, some are default behavior of VIM or its plugins.
Shortcut | Function |
---|---|
"+ |
Select + (system clipboard) as vim register. Need vim-gtk |
Ctr-c ; Ctr-p |
Copy to and Paste from the system clipboard |
tab |
Fill spaces as tabbing the whole line, even in insert mode |
shift-tab |
Delete tabbing of the whole line |
space |
Toggle code fold |
Ctr-y |
Run YAPF for code reformating |
bb |
In insert mode, put breakpoint |
\d |
Go to code definition |
\r |
Rename |
Ctr-o |
Go to previous place |
Ctr-space |
Autocomplete with Jedi-vim |
:Isort |
Run isort to sort and clean import section |
F3 |
Run AutoFormat (call autopep8) |
Vim search pattern in all files: :vimgrep /pattern/ ./**/*.py | cw
Note: To use tmux-yank
while opening tmux through a SSH connection, you must have X forwarding enabled. Add -X option to ssh command, and make sure the ssh server has enabled X-forwarding.