Word-bothering made easy: quick text explorations with python
- Do it the way you can remember how to do it, not the nicest possible way
- Try stuff with list and string manipulation before getting fancy
- Put your English dictionaries in Python dictionaries for much faster searches
- If something gets complicated, save it for another day. Algorithm works on words but not sentences? Let's play with words today.
- If something isn't quite right, is it worth the trouble to fix it? My cmudict lookup only gets the first pronunciation of each word.
¯\_(ツ)_/¯
Good enough. My list of fishes stolen from wikipedia includes some things that aren't fishes.¯\_(ツ)_/¯ >< o>
- Not every exploration comes up with anything interesting.
-
The CMU Pronouncing Dictionary is my favorite wordlist for word-bothering: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
-
PronouncingJS is a handy javascript wrapper for the CMU Pronouncing Dictionary: https://github.com/aparrish/pronouncingjs
-
NLTK is natural language toolkit for python http://www.nltk.org/
-
PATTERN is ditto for ditto http://www.clips.ua.ac.be/pattern
-
RITA - language libraries for java and javascript https://rednoise.org/rita/
-
WORDNIK - online API - access by browser or using API library https://www.wordnik.com/
-
KIMONO LABS - scrape web sites and extract info for you https://www.kimonolabs.com/
-
SOWPODS - scrabble dictionary https://code.google.com/p/scrabblehelper/source/browse/trunk/ScrabbleHelper/src/dictionaries/
-
DARIUS' CORPORA - all kinds of lists of things https://github.com/dariusk/corpora