An OCR tool for the GNU operating system that uses Transformers
.
Supports Xorg and Wayland.
ocr.mp4
This Manga OCR application is likely the most suckless and lightweight option available. The application is designed to work best with a tiling window manager. It requires a minimum of dependencies, and all of them you probably already have. However, it still has to rely on large Python libraries to work. To isolate the bloat, these libraries are installed in a dedicated folder. But if your computer is rather slow, use Tesseract instead.
Install from the AUR.
If you want to package this program for your distribution and know how to do it, please create a pull request. Otherwise, read the section below.
The steps below are for people who can't access the AUR.
Step 1. Install the following dependencies if they are not installed.
GNOME
Step 2. Install the program using Makefile.
git clone 'https://github.com/Ajatt-Tools/transformers_ocr.git'
cd -- 'transformers_ocr'
sudo make install
Before you start,
download manga-ocr
data:
transformers_ocr download
The files will be saved to ~/.local/share/manga_ocr
.
To show a help page, run transformers_ocr help
.
To OCR text on a manga page, run:
transformers_ocr recognize
Bind the command to a keyboard shortcut using your WM's config. This enables you to call the OCR from anywhere, as shown in the demo video.
For example, if you use i3wm, add this line to the config file.
bindsym $mod+o exec --no-startup-id transformers_ocr recognize
The first run will take longer than usual.
There are additional files that will be downloaded and saved to ~/.cache/huggingface
.
On the first run transformers_ocr
launches a listener process
that is running is the background and reads any new screenshots passed to it.
To speed up the first run, add the command below to autostart (using ~/.profile
, ~/.xinitrc
, etc.).
transformers_ocr start
Quite often one sentence, phrase or a chunk of meaning is split between two or more speech bubbles. This is a problem because if you take a screenshot of the whole area, including the area between the speech bubbles, you will likely end up with junk in the results. Processing each bubble separately is also not ideal since you want to analyze the entire sentence in GoldenDict, add it to Anki, etc.
A solution is to have transformers-ocr
hold text for you.
It will recognize one speech bubble, remember it, then wait for another,
and only copy the text from all bubbles altogether when you're done.
To use this feature, add a new keyboard shortcut to the config file of your WM,
for example Mod+Shift+o.
Example for i3wm
:
bindsym $mod+Shift+o exec --no-startup-id transformers_ocr hold
screencast.mp4
Every time you call hold
, a speech bubble will be recognized and saved for later.
Finally, call recognize
using the usual keyboard shortcut
to copy the last speech bubble and all the saved ones together.
The list of saved bubbles will be emptied when calling recognize
.
Optionally, you can create a config file.
mkdir -p ~/.config/transformers_ocr
touch ~/.config/transformers_ocr/config
Each line must have this format: key=value
.
Lines that start with #
are ignored.
The --image-path
argument can be used to manually parse image files
rather than to rely on a screenshot taking application.
Or, it can be used to add support to other screenshot taking applications.
Example usage in zsh:
flameshot_path=$(mktemp -u --suffix .png)
# cli usage for flameshot with no copy
flameshot gui --path "$flameshot_path" --delay 100
transformers_ocr recognize --image-path "$flameshot_path"
Instead of copying text to the clipboard,
you may want to pass it as an argument to an external application.
In the example below clip_command
is set to goldendict
which allows you to send recognized text directly to GoldenDict
and keep the system clipboard for other tasks.
echo 'clip_command=goldendict %TEXT%' >> ~/.config/transformers_ocr/config
transformers_ocr stop
transformers_ocr start
If %TEXT%
is passed as a parameter,
it will be replaced with the actual text in the speech bubble.
If not, the text will be passed to stdin
of the called program.
If you want to force CPU.
echo 'force_cpu=yes' >> ~/.config/transformers_ocr/config