Sprint Day One report - Brisbane #2

richyvk · 2017-06-01T06:51:39Z

Hi all

So, we've had a day of talking! We've deliberated a lot. We in Brisbane have concluded that the existing lesson is too Pandas, but the Software Carpentry gapminder lesson could work really well as the basis for the LC Python lesson.

So, handing over to whoever wants to take this up, or we'll be working on it tomorrow. We've imported the SC lesson into data-lessons account, the repo url is: https://github.com/data-lessons/library-python-intro

Stuff we are intending needs doing:

Remove the pandas stuff from the lesson - we've deemed Pandas pout of scope for this lesson.
Change wording and examples to be more library relevant in the rest of the lesson.

But, we figure a lot of it can pretty much stay as it is!

We have failed to come up with one single compelling 'superpower' example to run through the lesson. But, some more ideas we've had for examples (a lot of these might be useful for certain episodes):

Deleting rogue punctuation from an excel..
Comparing two sets of data for differences, eg two sets of article IDs - one locally and one on a vendor database - and you want to know which are missing form each - ie sets.
Cleaning webpages of non-preferred language - eg you have a list of preferred terms for things. (think branding etc) and you identify pages that use non-preferred alternatives to this language.
Cleaning dates in excel.

That's pretty much it from us for today. We'll get stuck in again tomorrow with editing the lesson. But go for it in the mean time if you want to!

libADS · 2017-06-01T07:34:43Z

"Comparing two sets of data for differences, eg two sets of article IDs - one locally and one on a vendor database - and you want to know which are missing form each - ie sets."

I just had to do this in the past few days, so this is a reasonable use case to me :)

drjwbaker · 2017-06-01T07:40:25Z

Pad for this at http://pad.software-carpentry.org/lc-new-python

richyvk · 2017-06-02T02:59:58Z

@libADS Can I ask how you did your comparing? Excel? Manually?

libADS · 2017-06-02T03:06:40Z

@richyvk initially, in Python, after loading the csv files in memory. The tricky part came from not necessarily having exact match, for instance titles might be spelled slighty differently between two files of articles. In the end I imported the data into a local postgres instance, it allows me to try different query much faster. For fuzzy matching in Python I used:

def similar(a, b):
    from difflib import SequenceMatcher
    return SequenceMatcher(None, a, b).ratio()

Postgres has an extension to do this too

sclayton29 mentioned this issue Jun 1, 2017

Updating language and examples to be more library relevant data-lessons/library-python-intro-DEPRECATED#4

Closed

drjwbaker mentioned this issue Jun 1, 2017

Report on Python work data-lessons/library-python-intro-DEPRECATED#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint Day One report - Brisbane #2

Sprint Day One report - Brisbane #2

richyvk commented Jun 1, 2017

libADS commented Jun 1, 2017 •

edited

Loading

drjwbaker commented Jun 1, 2017

richyvk commented Jun 2, 2017

libADS commented Jun 2, 2017

Sprint Day One report - Brisbane #2

Sprint Day One report - Brisbane #2

Comments

richyvk commented Jun 1, 2017

libADS commented Jun 1, 2017 • edited Loading

drjwbaker commented Jun 1, 2017

richyvk commented Jun 2, 2017

libADS commented Jun 2, 2017

libADS commented Jun 1, 2017 •

edited

Loading