- Homepage and Reference: https://alternat.readthedocs.io/
alternat automates the image alt-text generation workflow by offering ready to use methods for downloading (Collection in alternat lingo) images and then generating alt-text.
alternat features are grouped into tasks - Collection and Generation
Collection
Collection offers convenience methods to download images. It uses puppeteer (headless chrome) to automate the website crawling and image download process
Generation
Generation offers convenience methods to generate alt-texts. It offers drivers to generate the alt-texts.
- Azure API - Uses Azure API for image captioning and OCR. Note Azure is a paid service.
- Google API - Uses google API for image captioning and OCR. Note google is a paid service.
- Open Source - Uses free open source alternative for OCR and image captioning.
Supported Video and image file formats jpeg, jpg and png are supported.
-
Download and Install Docker Desktop for Mac using this link docker-desktop
-
Clone this repo https://github.com/keplerlab/alternat.git
-
Change your directory to your cloned repo.
-
Open terminal and run following commands
cd <path-to-repo> //you need to be in your repo folder
docker-compose up
- In a new terminal window open terminal inside docker container for running alternat using command line type following command:
docker-compose exec alternat bash
- You can use this command line to execute collect or generate command line application like this .
Please refer to os specific respective installation guides for macOS, ubuntu and Windows respectively.
If you want to generate alternate text for any image or folder containing multiple images, you can use Command line option which we call generation stage.
To run generation stage alone you can use following command:
# To run a single file, results will be collected under "results/generate"
# The image extensions supported are: .jpg, .jpeg, .png.
python app.py generate --output-dir-path="./results" --input-image-file-path="./sample/images_with_text/sample1.png"
or
# To run for entire directory, results will be collected under "results/generate"
# The image extensions supported are: .jpg, .jpeg, .png.
python app.py generate --input-dir-path="./sample/images_with_text" --output-dir-path="./results"
or
# To generate alt-text using specific driver (like azure, google or open source)
# Do not forget to add the credentials to their respective config files when using azure and google
# azure needs SUBSCRIPTION_KEY and ENDPOINT URL
# google needs ABSOLUTE_PATH_TO_CREDENTIALS_FILE (a credential json file)
python app.py generate --output-dir-path="./results" --input-image-file-path="./sample/images_with_text/sample1.png" --driver-config-file-path="./sample/generator_driver_conf/azure.json"
Sample images are located at sample/images and sample/images_with_text
First stage is called collection stage, it can be used to crawl and download images from any website or website url, to run the collection stage use following commands:
# To run the collection
python app.py collect --url=<WEBSITE_URL> --output-dir-path=./DATADUMP
# To run the collection
python app.py collect --download-recursive --url=<WEBSITE_URL> --output-dir-path=./DATADUMP
Please refer to FAQ\Troubleshooting section inside alternat documenation, or raise an Github issue.
- For open source ocr we are using EasyOCR project https://github.com/JaidedAI/EasyOCR by Rakpong Kittinaradorn.
- For opensource caption generation we are using model training and inference scripts using method at https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning by Sagar Vinodababu.
- For web crawling we are using apify wrapper over puppeteer library https://apify.com/.