This is the information and code for the Python North West workshop, held on 19th May 2022.
Before starting, ensure you have a suitable Python environment and a code editor to hand. We'll also be SQLite Browser for our SQL queries and viewing the SQLite database. There are install instructions on the website.
Next up, download the GoodReads SnakeReads book database.
For the first part of the workshop we'll be learning how to explore the database and insert new data.
In the second part of the workshop we'll take what we've learned and create a Python script to read from the database and insert data.
We'll be learning about the main "CRUD" operations and exploring the SnakeReads database.
First, let's try some querying or reading data
- Open the database
- Go to the "Execute SQL" tab
SELECT book_id, title, authors FROM books WHERE original_publication_year = 1937;
SELECT book_id, title, authors, original_publication_year
FROM books
WHERE authors like '%iain%banks%'
SELECT
book_id, title, authors, original_publication_year
FROM books
WHERE authors like '%iain%banks%'
ORDER BY original_publication_year;
SELECT
book_id, title, authors, original_publication_year
FROM books
WHERE authors like '%iain%banks%'
ORDER BY original_publication_year
LIMIT 5
Now lets try adding some data. This data set is missing python books!
- Replace that entered SQL with
INSERT INTO books(authors, title, original_title, original_publication_year, isbn13)
VALUES ('Luciano Ramalho', 'Fluent Python', 'Fluent Python', 2015, '9781491946008')
;
Feel free to replace those things with your own favourite python book, or other book!
UPDATE statements - changing something. Let's give Luciano his full name:
UPDATE books
SET authors = 'Luciano Gama de Sousa Ramalho'
WHERE title='Fluent Python'
We can also delete out data. Let's say user_id 4000 has left our site:
DELETE FROM ratings WHERE user_id = 4000;
Link tables together to get more information using JOIN statements.
Which books are on user #315's to read pile?
SELECT book_id FROM to_read WHERE user_id = 315;
... whole bunch of IDs, not so helpful. We could look them up 'by hand', as it were:
SELECT
book_id,
title,
authors
FROM books
WHERE book_id IN (
54,
374,
1758,
1941,
2218,
6082
)
But a join does this for us!
SELECT
books.book_id,
title,
authors
FROM to_read
LEFT JOIN books ON books.book_id = to_read.book_id
WHERE user_id = 315
How many books are on people's to read pile?
SELECT count(*) FROM to_read;
and for the hobbit? You might recall from ealier it has a book_id
of 7
SELECT count(*) FROM to_read WHERE book_id = 7;
What else can we do? Lets go to the ratings table:
SELECT avg(rating) FROM ratings;
Not so useful, this is our average rating across all books. It's quite high (almost 4!) but this is the "best books" extract.
So lets use GROUP BY
to compute it per book (And a LIMIT
clause to get the top 5):
SELECT
book_id,
avg(rating) as avg_rating
FROM ratings
GROUP BY book_id
ORDER BY avg_rating DESC
LIMIT 5;
This is perhaps still not very useful, it's just the book_id
But we have ids so, we can join!
-- this is a common table expression
-- that same query within a query
with best_books as (
SELECT
book_id,
avg(rating) as avg_rating
FROM ratings
GROUP BY book_id
ORDER BY avg_rating DESC
LIMIT 5
)
SELECT
books.book_id,
books.title,
books.authors,
avg_rating
FROM best_books
LEFT JOIN books ON best_books.book_id = books.book_id;
Or more complicated queries
with total_ratings as (
SELECT book_id,
count(*) as total_reviews,
count(case when rating = 5 then 1 end) as five_star -- `count` counts the number of non-null rows, and `case` emits null if there's no match
FROM ratings
GROUP BY book_id
), best_books as (
SELECT
book_id,
five_star,
total_reviews,
five_star * 1.0 / total_reviews as prop_five_star
FROM total_ratings
)
SELECT title, authors, five_star, total_reviews, prop_five_star
FROM best_books
LEFT JOIN books ON best_books.book_id = books.book_id
ORDER BY prop_five_star DESC
LIMIT 5
;
We'll be using the sqlite3
module to talk to our database, which is part of the Python standard library. The docs are here
(Note: you may want to explore using the third-party pysqlite3 library, as this has some nice additional features. It can be installed via pip install pysqlite3-binary
. More information can be found here
Use this code to setup a connection to your books database.
If you run the code as currently implemented, it shows an example of transactions. If this is run on a fresh download of the database it should produce the following output:
== Inserting book (Architecting Python) ================================
New book id: 9983
to_read table (after the first insert):
row['book_id']=9983 row['user_id']=100000
== Inserting book with an error (Fluent Python)===========================
New book id: 9984
to_read table (just before the error, still in the transaction):
row['book_id']=9983 row['user_id']=100000
row['book_id']=9984 row['user_id']=100000
Both our books are in the to read pile
Ooops, we failed!
== Querying after the error, outside the transaction =======================
to_read table (just after the error):
row['book_id']=9983 row['user_id']=100000
Books table:
9983 Architecture Patterns With Python Harry J.W. Percival, Bob Gregory