To scrape dictionary definition data from weblio.jp for definitions from the all-wonderful 三省堂 大辞林 (Daijirin), a Japanese dictionary. It then parses the proper HTML to be injected into note fields. A template is included in the file daijirin-scraper-example-card-layout.apkg, which you can double click and import directly into Anki (desktop version).
This is my first Python project. The goal was to learn more of the Anki codebase and automate a time-consuming process of adding definitions to cards.
It will not support multi-level nested definitions. At that point, I think you would be doing yourself a disservice listing out an entire definition on a flash card. Also, with Weblio's archaic HTML structure, the means by which creating something that could extract multi-level nests would not be an efficient effort, and I need to focus my time and energy on other things (like studying Japanese and building more tools to help people learn languages). If someone would like to implement this feature, I am open to contributions.
If you need to reference a word with a lengthy multi-nested list of definitions, I would suggest finding the one or two specific definitions and copy-pasting them from the website.
Click here to go to the shared addon page.
- With the Anki main window open, go to
Tools
>Addons
or typeCtrl + Shift + A
- Click on the
Get Add-ons...
button - Copy and paste the following code into your Anki addons dialog:
311119199
- Click
Ok
, wait for Daijirin Dictionary Scraper to appear on the addons list, close the window and restart Anki. - At the main window click Add or type
A
to open an editor dialogue. You will see a small green book button in the top right row of editor icons. Click it to begin using.
This project began as a command line script. The script adds the definitions to a text file (definitions.txt
) which could be copied to clipboard and pasted into Anki.
- Python v3.6+
- bs4 (Beautiful Soup v4)
- requests
- sys
- os
- pyperclip
After Python and pip are installed, you can get the additional non-standard packages by running:
pip install -r requirements.txt
from this project's directory.
I suggest using Git Bash or some other bash terminal emulator on Windows.
To run the script and add a definitions to the text file, run the following command.
/path/to/python.exe /path/to/daijirin_scraper.py 言葉
(言葉 can obviously be replaced with any term you would like to try)
You can alternatively make an alias as a shortcut for running the script like this:
alias daijirin="/path/to/python(.exe) /path/to/daijirin_scraper.py $1"
Then you could just run it like this
daijirin 言葉
instead of this
/path/to/python(.exe) /path/to/daijirin_scraper.py 言葉
If the entered term was found, it will be printed to the console along with the definitions and added to definitions.txt. If the term could not be found, an error will print stating that no terms matched. In some cases you might need to search for the term manually on weblio.jp.
You can view your stored definitions with:
daijirin list
Once you have the desired defnitions in definitions.txt, you can run:
daijirin cut
This will cut the definitions from the text file into your clipboard so you can paste into your Anki card fields. After running this command, definitions.txt will be empty so that it is ready when you want to use it next.
If you want to clear the definitions.txt file without copying them due to a mistake or otherwise, you can run:
daijirin clear
Your command line program will require a font with Japanese glyphs. I suggest OsakaMono. Also you will need to set your PYTHONIOENCODING variable to UTF-8 as well by running
export PYTHONIOENCODING=utf-8
If you are using Hyper, you will need the following inside your preferences file (.hyper.js):
env: {
LANG: 'en_US.UTF-8'
},
You may obviously have other properties alongside LANG
inside of env
, but LANG
is needed for this application.
Consult Google for help with changing your terminal font :)