On macOS, run brew install python
. On Ubuntu, run sudo apt-get install python python-pip
.
Please note that if you don't have csvkit, you'll need to install it.
pip2 install csvkit
The following program reads data from STDIN that is in a csv format and filters rows where someone has made a purchase for water that is over $1000. It writes this data back to STDOUT.
#!/usr/bin/env python2
import csv
import sys
# read data from STDIN and split on each newline
data = sys.stdin.read().splitlines()
# use python's csv library to create a csv reader and a writer
reader = csv.DictReader(data)
writer = csv.DictWriter(sys.stdout, fieldnames=reader.fieldnames)
# write the header (first line of the csv)
writer.writeheader()
# loop through the rows in the original csv
for row in reader:
# filter rows
if row['PURPOSE'] == 'WATER' and float(row['AMOUNT']) > 1000:
# write rows that match above filter
writer.writerow(row)
-
cd into your assignments directory
-
save this program as
filter.py
-
make the script executable (hint: chmod)
-
curl
the 2017 quarter 1 expenditure file from https://projects.propublica.org/congress/assets/staffers/2017Q1-house-disburse-detail.csv and pipe it into./filter.py
note: you have to use the
-N
flag oncurl
curl -N "https://projects.propublica.org/congress/assets/staffers/2017Q1-house-disburse-detail.csv" | ./filter.py
-
redirect the output into a file
expensive_water.csv
-
pipe
expensive_water.csv
intocsvstat
and redirect that into a file calledexpensive_water_summary.txt
-
Modify the above program to get another subset of the data that is interesting to you.
-
Redirect the output to a file called
output.csv
. -
Write a short description. It can be as short as two sentences.
-
Put it in a file called
description.txt
. -
Slackcat the contents of
descrption.txt
to the#assignments
slack channel:cat description.txt | slackcat -s -c assignments
-
Pipe
output.csv
intocsvstat
and redirect that into a file calledsummary.txt
. -
You will learn how to submit summary.txt and output.csv via github tomorrow.
- in order to pipe an existing file into csvstat, use
cat
to send the contents of the file to stdout first - use
./filter.py
to run the program- remember the
.
refers to the current directory, so./filter.py
means run thefilter.py
script that is located in the current directory
- remember the
- make sure filter.py has a shebang on top. the shebang is
#!/usr/bin/env python2
. without the shebang, the shell won't know how to execute your script - use the 2017 quarter 1 file, quarter 2 might have some issues