Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via {sandpaper}
Source  : 9446aa0
Branch  : main
Author  : Robert Chisholm <[email protected]>
Time    : 2024-02-08 10:33:45 +0000
Message : First Review Changes - Part 1 (#13)

Various improvements from first review of the content
* Typographic corrections
* Increased pred-prey profiling solution detail.
* Replaced links to download with manual anchors with download attribute
* Explain call-stack
* Screenshot of timeline profiler viztracer
* Additional figures to explain hardware architecture of a computer
* Corrected discussion of Python built-ins
* Added greater context to code provided for exercises
* Added greater detail to testing coverage
* Acknowledgements page
  • Loading branch information
actions-user committed Feb 8, 2024
1 parent 94dc8e2 commit 3c282d1
Show file tree
Hide file tree
Showing 19 changed files with 2,038 additions and 372 deletions.
20 changes: 20 additions & 0 deletions acknowledgements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: Acknowledgements
---

**Funding**

The development of this course was funded by the [University of Sheffield](https://www.sheffield.ac.uk) to support researchers working with their [Stanage](https://docs.hpc.shef.ac.uk/en/latest/stanage/index.html#gsc.tab=0) HPC system.

**Authorship**

The initial materials were authored by [Robert Chisholm](https://www.sheffield.ac.uk/dcs/people/research-staff/robert-chisholm), with support from various colleagues within the university's [Research Software Engineering](https://rse.shef.ac.uk) and [Research IT](https://www.sheffield.ac.uk/it-services/research) teams.

Additional consulting was provided by James Kilbane a close friend (and general rubber duck).

**Resources**

Most of the content was drawn from the education and experience of the author, however the below resources provided inspiration:

* [High Performance Python, 2nd Edition](https://www.oreilly.com/library/view/high-performance-python/9781492055013/): This excellent book goes far deeper than this short course in explaining how to maximise performance in Python, however it inspired the examples; [memory allocation is not free](optimisation-memory.html#memory-allocation-is-not-free) and [vectorisation](optimisation-memory.html#memory-allocation-is-not-free).
* [What scientists must know about hardware to write fast code](https://viralinstruction.com/posts/hardware/): This notebook provides an array of hardware lessons relevant to programming for performance, which could be similarly found in most undergraduate Computer Science courses. Although the notebook is grounded in Julia, a lower level language than Python, it is referring to hardware so many of same lessons are covered in the [memory episode](optimisation-memory.html).
6 changes: 3 additions & 3 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,16 +64,16 @@ episodes:
- profiling-lines.md
- profiling-conclusion.md
- optimisation-introduction.md
- optimisation-data-structures-algorithms.md
- optimisation-minimise-python.md
- optimisation-use-latest.md
- optimisation-memory.md
- optimisation-list-tuple.md
- optimisation-dict-set.md
- optimisation-minimise-python.md
- optimisation-conclusion.md

# Information for Learners
learners:
- setup.md
- acknowledgements.md

# Information for Instructors
instructors:
Expand Down
Binary file added fig/annotated-motherboard.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,540 changes: 1,540 additions & 0 deletions fig/hardware.ai

Large diffs are not rendered by default.

Binary file added fig/hardware.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/predprey_out.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/snakeviz-predprey-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/testsuite-dir.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/viztracer-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 10 additions & 10 deletions md5sum.txt
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
"file" "checksum" "built" "date"
"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-01-03"
"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-01-03"
"config.yaml" "71b7cc873eb97b0f2c6a1f8d878a817f" "site/built/config.yaml" "2024-01-29"
"config.yaml" "b413b2dfbce4f70e178cae4d6d2d6311" "site/built/config.yaml" "2024-02-08"
"index.md" "5d420b7de3ab84e1eda988e6bc4d58b4" "site/built/index.md" "2024-01-29"
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-01-03"
"episodes/profiling-introduction.md" "a043fb5f1f772b7415f32175810c5f1e" "site/built/profiling-introduction.md" "2024-01-29"
"episodes/profiling-functions.md" "85294a5fa905fc2ea9dd5068164aed40" "site/built/profiling-functions.md" "2024-01-29"
"episodes/profiling-lines.md" "639730e60d1dee7cfa6624a24de92abe" "site/built/profiling-lines.md" "2024-01-29"
"episodes/profiling-introduction.md" "a0163cbc57865b4fad063468ac4c0a41" "site/built/profiling-introduction.md" "2024-02-08"
"episodes/profiling-functions.md" "4ea67773010619ae5fbaa2dc69ecc4f6" "site/built/profiling-functions.md" "2024-02-08"
"episodes/profiling-lines.md" "8bd8cf015fcc38cdb004edf5fad75a65" "site/built/profiling-lines.md" "2024-02-08"
"episodes/profiling-conclusion.md" "340969a321636eb94fff540191a511e7" "site/built/profiling-conclusion.md" "2024-01-29"
"episodes/optimisation-introduction.md" "ae3baa53a96cab9c1aace409de6c7634" "site/built/optimisation-introduction.md" "2024-01-29"
"episodes/optimisation-use-latest.md" "33531063e2b4d3b473f3f066cea65a14" "site/built/optimisation-use-latest.md" "2024-01-29"
"episodes/optimisation-memory.md" "ae7bb4df0f5b640f6000d65c1ee145b1" "site/built/optimisation-memory.md" "2024-01-29"
"episodes/optimisation-list-tuple.md" "9e9a398923bf1137ce92fa6e78446746" "site/built/optimisation-list-tuple.md" "2024-01-29"
"episodes/optimisation-dict-set.md" "64b8261d0c29bea3135e48501e6f8b56" "site/built/optimisation-dict-set.md" "2024-01-29"
"episodes/optimisation-minimise-python.md" "efab1af49121b0a197dab94e49b6ff30" "site/built/optimisation-minimise-python.md" "2024-01-29"
"episodes/optimisation-introduction.md" "496655bd664412eacb982024994d60b0" "site/built/optimisation-introduction.md" "2024-02-08"
"episodes/optimisation-data-structures-algorithms.md" "75dbff01d990fa1e99beec4b24b2b0ad" "site/built/optimisation-data-structures-algorithms.md" "2024-02-08"
"episodes/optimisation-minimise-python.md" "4af3642c2a613a36d8d0ffb056225083" "site/built/optimisation-minimise-python.md" "2024-02-08"
"episodes/optimisation-use-latest.md" "829f7a813b0a9a131fa22e6dbb534cf7" "site/built/optimisation-use-latest.md" "2024-02-08"
"episodes/optimisation-memory.md" "52c4b2884410050c9646cf987d2aa50e" "site/built/optimisation-memory.md" "2024-02-08"
"episodes/optimisation-conclusion.md" "e4a79aa1713310c75bc0ae9e258641c2" "site/built/optimisation-conclusion.md" "2024-01-29"
"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-01-03"
"learners/setup.md" "50d49ff7eb0ea2d12d75773ce1decd45" "site/built/setup.md" "2024-01-29"
"learners/acknowledgements.md" "c4064263d442f147d3796cb3dfa7b351" "site/built/acknowledgements.md" "2024-02-08"
"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-01-03"
Original file line number Diff line number Diff line change
@@ -1,25 +1,204 @@
---
title: "Dictionaries & Sets"
title: "Data Structures & Algorithms"
teaching: 0
exercises: 0
---

:::::::::::::::::::::::::::::::::::::: questions

- What's the most efficient way to construct a list?
- When should Tuples be used?
- When should generator functions be used?
- When are sets appropriate?
- How are sets used in Python?
- What is the best way to search a list?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Able to identify appropriate use-cases for dictionaries and sets
- Able to use dictionaries and sets effectively
- Able to summarise how Lists and Tuples work behind the scenes.
- Able to identify appropriate use-cases for tuples.
- Able to use generator functions in appropriate situations.
- Able to utilise dictionaries and sets effectively
- Able to use `bisect_left()` to perform a binary search of a list or array

::::::::::::::::::::::::::::::::::::::::::::::::

## Lists

Lists are a fundamental data structure within Python.

It is implemented as a form of dynamic array found within many programming languages by different names (C++: `std::vector`, Java: `ArrayList`, R: `vector`, Julia: `Vector`).

They allows direct and sequential element access, with the convenience to append items.

This is achieved by internally storing items in a static array.
This array however can be longer than the `List`, so the current length of the list is stored alongside the array.
When an item is appended, the `List` checks whether it has enough spare space to add the item to the end.
If it doesn't, it will reallocate a larger array, copy across the elements, and deallocate the old array.
Before copying the item to the end and incrementing the counter which tracks the list's length.

The amount the internal array grows by is dependent on the particular list implementation's growth factor.
CPython for example uses [`newsize + (newsize >> 3) + 6`](https://github.com/python/cpython/blob/a571a2fd3fdaeafdfd71f3d80ed5a3b22b63d0f7/Objects/listobject.c#L74), which works out to an over allocation of roughly ~12.5%.

![The relationship between the number of appends to an empty list, and the number of internal resizes in CPython.](episodes/fig/cpython_list_allocations.png){alt='A line graph displaying the relationship between the number of calls to append() and the number of internal resizes of a CPython list. It has a logarithmic relationship, at 1 million appends there have been 84 internal resizes.'}

This has two implications:

* If you are creating large static lists, they will use upto 12.5% excess memory.
* If you are growing a list with `append()`, there will be large amounts of redundant allocations and copies as the list grows.

### List Comprehension

If creating a list via `append()` is undesirable, the natural alternative is to use list-comprehension.

List comprehension can be twice as fast at building lists than using `append()`.
This is primarily because list-comprehension allows Python to offload much of the computation into faster C code.
General python loops in contrast can be used for much more, so they remain in Python bytecode during computation which has additional overheads.

This can be demonstrated with the below benchmark:

```python
from timeit import timeit

def list_append():
li = []
for i in range(100000):
li.append(i)

def list_preallocate():
li = [0]*100000
for i in range(100000):
li[i] = i

def list_comprehension():
li = [i for i in range(100000)]

repeats = 1000
print(f"Append: {timeit(list_append, number=repeats):.2f}ms")
print(f"Preallocate: {timeit(list_preallocate, number=repeats):.2f}ms")
print(f"Comprehension: {timeit(list_comprehension, number=repeats):.2f}ms")
```

`timeit` is used to run each function 1000 times, providing the below averages:

```output
Append: 3.50ms
Preallocate: 2.48ms
Comprehension: 1.69ms
```

Results will vary between Python versions, hardware and list lengths. But in this example list comprehension was 2x faster, with pre-allocate fairing in the middle. Although this is milliseconds, this can soon add up if you are regularly creating lists.

## Tuples

In contrast, Python's Tuples are immutable static arrays (similar to strings), their elements cannot be modified and they cannot be resized.

Their potential use-cases are greatly reduced due to these two limitations, they are only suitable for groups of immutable properties.

Tuples can still be joined with the `+` operator, similar to appending lists, however the result is always a newly allocated tuple (without a list's over-allocation).

Python caches a large number of short (1-20 element) tuples. This greatly reduces the cost of creating and destroying them during execution at the cost of a slight memory overhead.

This can be easily demonstrated with Python's `timeit` module in your console.

```sh
>python -m timeit "li = [0,1,2,3,4,5]"
10000000 loops, best of 5: 26.4 nsec per loop

>python -m timeit "tu = (0,1,2,3,4,5)"
50000000 loops, best of 5: 7.99 nsec per loop
```

It takes 3x as long to allocate a short list than a tuple of equal length. This gap only grows with the length, as the tuple cost remains roughly static whereas the cost of allocating the list grows slightly.


## Generator Functions

You may not even require your data be stored in a list or tuple if it is only accessed once and in sequence.

Generators are special functions, that use `yield` rather than `return`. Each time the generator is called, it resumes computation until the next `yield` statement is hit to return the next value.

This avoids needing to allocate a data structure, and can greatly reduce memory utilisation.

Common examples for generators include:

* Reading from a large file that may not fit in memory.
* Any generated sequence where the required length is unknown.

The below example demonstrates how a generator function (`fibonnaci_generator()`) differs from one that simply returns a constructed list (`fibonacci_list()`).

```python
from timeit import timeit

N = 1000000
repeats = 1000

def fibonacci_generator():
a=0
b=1
while True:
yield b
a,b= b,a+b

def fibonacci_list(max_val):
rtn = []
a=0
b=1
while b < max_val:
rtn.append(b)
a,b= b,a+b
return rtn

def test_generator():
t = 0
max_val = N
for i in fibonacci_generator():
if i > max_val:
break
t += i

def test_list():
li = fibonacci_list(N)
t = 0
for i in li:
t += i

def test_list_long():
t = 0
max_val = N
li = fibonacci_list(max_val*10)
for i in li:
if i > max_val:
break
t += i

print(f"Gen: {timeit(test_generator, number=repeats):.5f}ms")
print(f"List: {timeit(test_list, number=repeats):.5f}ms")
print(f"List_long: {timeit(test_list_long, number=repeats):.5f}ms")
```

The performance of `test_generator()` and `test_list()` are comparable, however `test_long_list()` which generates a list with 5 extra elements (35 vs 30) is consistently slower.

```output
Gen: 0.00251ms
List: 0.00256ms
List_long: 0.00332ms
```

Unlike list comprehension, a generator function will normally involve a Python loop. Therefore, their performance is typically slower than list comprehension where much of the computation can be offloaded to the CPython backend.

::::::::::::::::::::::::::::::::::::: callout

The use of `max_val` in the previous example moves the value of `N` from global to local scope.

The Python interpreter checks local scope first when finding variables, therefore this makes accessing local scope variables slightly faster than global scope, this is most visible when a variable is being accessed regularly such as within a loop.

Replacing the use of `max_val` with `N` inside `test_generator()` causes the function to consistently perform a little slower than `test_list()`, whereas before the change it would normally be a little faster.

:::::::::::::::::::::::::::::::::::::::::::::


## Dictionaries

Dictionaries are another fundamental Python data-structure.
Expand Down Expand Up @@ -163,15 +342,15 @@ uniqueListSort: 2.67ms
:::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::

## Checking Existence
## Searching

Independent of the performance to construct a unique set (as covered in the previous), it's worth identifying the performance to search the data-structure to retrieve an item or check whether it exists.
Independent of the performance to construct a unique set (as covered in the previous section), it's worth identifying the performance to search the data-structure to retrieve an item or check whether it exists.

The performance of a hashing data structure is subject to the load factor and number of collisions. An item that hashes with no collision can be checked almost directly, whereas one with collisions will probe until it finds the correct item or an empty slot. In the worst possible case, whereby all insert items have collided this would mean checking every single item. In practice, hashing data-structures are designed to minimise the chances of this happening and most items should be found or identified as missing with a single access.

In contrast if searching a list or array, the default approach is to start at the first item and check all subsequent items until the correct item has been found. If the correct item is not present, this will require the entire list to be checked. Therefore the worst-case is similar to that of the hashing data-structure, however it is guaranteed in cases where the item is missing. Similarly, on-average we would expect an item to be found half way through the list, meaning that an average search will require checking half of the items.

If the list or array is however sorted a binary search can be used. A binary search divides the list in half and checks which half the target item would be found in, this continues recursively until the search is exhausted whereby the item should be found or dismissed. This is significantly faster than performing a linear search of the list, checking `log N` items every time.
If however the list or array is sorted, a binary search can be used. A binary search divides the list in half and checks which half the target item would be found in, this continues recursively until the search is exhausted whereby the item should be found or dismissed. This is significantly faster than performing a linear search of the list, checking a total of `log N` items every time.

The below code demonstrates these approaches and their performance.

Expand Down Expand Up @@ -232,10 +411,13 @@ binary_search_list: 5.79ms

These results are subject to change based on the number of items and the proportion of searched items that exist within the list. However, the pattern is likely to remain the same. Linear searches should be avoided!


::::::::::::::::::::::::::::::::::::: keypoints

- List comprehension should be preferred when constructing lists.
- Where appropriate, Tuples and Generator functions should be preferred over Python lists.
- Dictionaries and sets are appropriate for storing a collection of unique data with no intrinsic order for random access.
- When used appropriately, dictionaries and sets are significantly faster than lists.
- If a list or array is used in-place of a set, it should be sorted and searched using `bisect_left()` (binary search).
- If searching a list or array is required, it should be sorted and searched using `bisect_left()` (binary search).

::::::::::::::::::::::::::::::::::::::::::::::::
Loading

0 comments on commit 3c282d1

Please sign in to comment.