llm-transformers

Plugin for llm adding support for 🤗 Hugging Face Transformers pipeline tasks.

Installation

Install this plugin in the same environment as LLM.

llm install llm-transformers

Some pipelines that accept audio/video inputs require the ffmpeg executable to be installed. The document-question-answering pipeline uses pytesseract which requires the tesseract executable.

Usage

This plugin exposes 🤗 Hugging Face transformers pipelines, the "model" name is transformers and the pipeline task and/or Hugging Face model are specified as model options, e.g.:

$ llm -m transformers -o task text-generation "A dog has"
$ llm -m transformers -o model facebook/musicgen-small "techno music"

If only -o task <task> is specified, the default model for that task will be used. If only -m model <model> is specified, the task will be inferred from the model. If both are specified, then the model must be compatible with the task.

Transformers logging is verbose and disabled by default. Specify the -o verbose True model option to enable it.

Most 🤗 Hugging Face models are freely accessible, some of them require accepting a license agreement and using a Hugging Face API token that has access to the model. You can use llm keys set huggingface, or set the HF_TOKEN env var, or use the --key option to llm.

$ llm -m transformers -o model meta-llama/Llama-3.2-1B "A dog has"
Error: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-3.2-1B.
$ llm --key hf_******************** -m transformers -o model meta-llama/Llama-3.2-1B "A dog has"
A dog has been named as the killer of a woman who was found dead in her home.

Some pipelines generate binary (audio, image, video) output, these are written to a temporary file and the path to the file is returned. A specific file can be specified with the -o output <path.suffx> model option. The suffix specifies the file type (e.g. .png vs .jpg etc).

Pipelines can be tuned by passing additional keyword arguments to the pipeline call. These are specified as a JSON string in the -o kwargs '<json>' model option. See the documentation for a specific pipeline for information on additional keyword arguments.

Transformer Pipeline Tasks

You can list available tasks with:

$ llm transformers list-tasks

audio-classification

The audio-classification task takes an audio URL or path, for example:

$ llm -m transformers -o task audio-classification https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac
_unknown_ (0.9972336888313293)
left (0.0019911774434149265)
yes (0.0003051063104066998)
down (0.0002108386834152043)
stop (0.00011406492558307946)

automatic-speech-recognition

$ llm -m transformers -o task automatic-speech-recognition https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE

depth-estimation

The depth-estimation task accepts an image url or path as input and generates an image file as output:

$ llm -m transformers -o task depth-estimation http://images.cocodataset.org/val2017/000000039769.jpg
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmpjvp9uo7x.png

document-question-answering

The document-question-answering task requires a context option which is a file or URL to an image:

$ llm -m transformers -o task document-question-answering -o context https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png "What is the invoice number?"
us-001

feature-extraction

Not supported.

fill-mask

fill-mask requires a placeholder in the prompt, thiis is typically <mask> but is different for different models:

$ llm -m transformers -o task fill-mask "My <mask> is about to explode"
My brain is about to explode (score=0.09140042215585709)
My heart is about to explode (score=0.07742168009281158)
My head is about to explode (score=0.05137857422232628)
My fridge is about to explode (score=0.029346412047743797)
My house is about to explode (score=0.02866862528026104)

image-classification

$ llm -m transformers -o task image-classification https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
macaw (0.9905233979225159)
African grey, African gray, Psittacus erithacus (0.005603480152785778)
toucan (0.001056905253790319)
sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita (0.0006811501225456595)
lorikeet (0.0006714339251630008)

image-feature-extraction

Not supported.

image-segmentation

$ llm -m transformers -o task image-segmentation https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmp0z8zvd8i.png (bird: 0.999439)
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmpik_7r5qn.png (bird: 0.998787)
$ llm -m transformers -o task image-segmentation -o output /tmp/segment.png https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
/tmp/segment-00.png (bird: 0.999439)
/tmp/segment-01.png (bird: 0.998787)

image-to-image

$ llm -m transformers -o task image-to-image http://images.cocodataset.org/val2017/000000039769.jpg
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmpczogz6cb.png

image-to-text

$ llm -m transformers -o task image-to-text https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
two birds are standing next to each other

mask-generation

Not supported.

object-detection

$ llm -m transformers -o task object-detection https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
[
    {
        "score": 0.9966394901275635,
        "label": "bird",
        "box": {
            "xmin": 69,
            "ymin": 171,
            "xmax": 396,
            "ymax": 507
        }
    },
    {
        "score": 0.999381422996521,
        "label": "bird",
        "box": {
            "xmin": 398,
            "ymin": 105,
            "xmax": 767,
            "ymax": 507
        }
    }
]

question-answering

$ llm -m transformers -o task question-answering -o context "My name is Wolfgang and I live in Berlin" "Where do I live?"
Berlin

summarization

Specify additional pipeline keyword args with the kwargs model option:

$ llm -m transformers -o task summarization "An apple a day, keeps the doctor away"
 An apple a day, keeps the doctor away from your doctor away . An apple every day is an apple that keeps you from going to the doctor . The apple is the best way to keep your doctor from getting a doctor's orders, according to the author of The Daily Mail
$ llm -m transformers -o task summarization -o kwargs '{"min_length": 2, "max_length": 7}' "An apple a day, keeps the doctor away"
 An apple a day

table-question-answering

table-question-answering takes a required context option - a path to a CSV file.

$ cat <<EOF > /tmp/t.csv
> Repository,Stars,Contributors,Programming language
Transformers,36542,651,Python
Datasets,4512,77,Python
Tokenizers,3934,34,"Rust, Python and NodeJS"
> EOF
$ llm -m transformers -o task table-question-answering -o context /tmp/t.csv "How many stars does the transformers repository have?"
AVERAGE > 36542
$ llm -m transformers -o task table-question-answering -o context /tmp/t.csv "How many contributors do all Python language repositories have?"
SUM > 651, 77

text2text-generation

$ llm -m transformers -o task text2text-generation "question: What is 42 ? context: 42 is the answer to life, the universe and everything"
the answer to life, the universe and everything

text-classification

$ llm -m transformers -o task text-classification "We are very happy to show you the 🤗 Transformers library"
POSITIVE (0.9997681975364685)

text-generation

Some text-generation models can be chatted with.

$ llm -m transformers -o task text-generation "I am going to elect"
I am going to elect the president of Mexico and that president should vote for our president," he said. "That's not very popular. That's not the American way. I would not want voters to accept the fact that that guy's running a
$ llm -m transformers -o task text-generation -o model HuggingFaceH4/zephyr-7b-beta -o kwargs '{"max_new_tokens": 2}' "What is the capital of France? Answer in one word."
Paris
$ llm chat -m transformers -o task text-generation -o model HuggingFaceH4/zephyr-7b-beta -o kwargs '{"max_new_tokens": 25}'
Chatting with transformers
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> What is the capital of France?
The capital of France is Paris (French: Paris). The official name of the city is "Ville de Paris"
> What question did I just ask you?
Your question was: "What is the capital of France?"
> quit

text-to-audio

text-to-audio generates audio, the response is the path to the generated audio file.

$ llm -m transformers -o kwargs '{"generate_kwargs": {"max_new_tokens": 100}}' -o model facebook/musicgen-small "techno music"
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmpoueh05y6.wav
$ llm -m transformers -o task text-to-audio "Hello world"
/var/folders/b1/1j9kkk053txc5krqbh0lj5t00000gn/T/tmpmpwhkd8p.wav
$ llm -m transformers -o task text-to-audio -o model facebook/mms-tts-eng -o output /tmp/speech.flac "Hello world"
/tmp/speech.flac

token-classification

$ llm -m transformers -o task token-classification "My name is Sarah and I live in London"
Sarah (I-PER: 0.9982994198799133)
London (I-LOC: 0.998397171497345)

translation_xx_to_yy

Substitute the from and to language codes into the task name, e.g. from en to fr would use task translation_en_to_fr:

$ llm -m transformers -o task translation_en_to_fr "How old are you?"
 quel âge êtes-vous?

video-classification

video-classification task expects a video path or URL as the prompt:

$ llm -m transformers -o task video-classification https://huggingface.co/datasets/Xuehai/MMWorld/resolve/main/Amazing%20street%20dance%20performance%20from%20Futunity%20UK%20-%20Move%20It%202013/Amazing%20street%20dance%20performance%20from%20Futunity%20UK%20-%20Move%20It%202013.mp4
dancing ballet (0.006608937866985798)
spinning poi (0.006111182738095522)
air drumming (0.005756791681051254)
singing (0.005747966933995485)
punching bag (0.00565463537350297)

visual-question-answering

visual-question-answering task requires an context option - a file or URL to an image:

$ llm -m transformers -o task visual-question-answering -o context https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png "What is she wearing?"
hat (0.9480269551277161)
fedora (0.00863664224743843)
clothes (0.003124270820990205)
sun hat (0.002937435172498226)
nothing (0.0020962499547749758)

zero-shot-classification

zero-shot-classification requires a comma separated list of labels to be specified in the context model option:

$ llm -m transformers -o task zero-shot-classification -o context "urgent,not urgent,phone,tablet,computer" "I have a problem with my iphone that needs to be resolved asap!!"
urgent (0.5036348700523376)
phone (0.4788002371788025)
computer (0.012600351125001907)
not urgent (0.0026557915844023228)
tablet (0.0023087668232619762)

zero-shot-image-classification

zero-shot-image-classification requires a comma separated list of labels to be specified in the context model option. The prompt is a path or URL to an image:

$ llm -m transformers -o task zero-shot-image-classification -o context "black and white,photorealist,painting" https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png
black and white (0.9736384749412537)
photorealist (0.02141517587006092)
painting (0.004946451168507338)

zero-shot-audio-classification

zero-shot-audio-classification requires a comma separated list of labels to be specified in the context model option. The prompt is a path or URL to an audio:

$ llm -m transformers -o task zero-shot-audio-classification -o context "Sound of a bird,Sound of a dog" https://huggingface.co/datasets/s3prl/Nonspeech/resolve/main/animal_sound/n52.wav
Sound of a bird (0.9998763799667358)
Sound of a dog (0.00012355657236184925)

zero-shot-object-detection

zero-shot-object-detection requires a comma separated list of labels to be specified in the context model option. The prompt is a path or URL to an image. The response is JSON and includes a bounding box for each label:

$ llm -m transformers -o task zero-shot-object-detection -o context "cat,couch" http://images.cocodataset.org/val2017/000000039769.jpg
[
    {
        "score": 0.2868139445781708,
        "label": "cat",
        "box": {
            "xmin": 324,
            "ymin": 20,
            "xmax": 640,
            "ymax": 373
        }
    },
    {
        "score": 0.2537268102169037,
        "label": "cat",
        "box": {
            "xmin": 1,
            "ymin": 55,
            "xmax": 315,
            "ymax": 472
        }
    },
    {
        "score": 0.12082991003990173,
        "label": "couch",
        "box": {
            "xmin": 4,
            "ymin": 0,
            "xmax": 642,
            "ymax": 476
        }
    }
]

Development

To set up this plugin locally, first checkout the code and install uv. uv sync to create a venv and install, then run tests:

$ uv sync --dev
$ uv run pytest
$ uv run ruff check
$ uv run ruff format --check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llm-transformers

Installation

Usage

Transformer Pipeline Tasks

audio-classification

automatic-speech-recognition

depth-estimation

document-question-answering

feature-extraction

fill-mask

image-classification

image-feature-extraction

image-segmentation

image-to-image

image-to-text

mask-generation

object-detection

question-answering

summarization

table-question-answering

text2text-generation

text-classification

text-generation

text-to-audio

token-classification

translation_xx_to_yy

video-classification

visual-question-answering

zero-shot-classification

zero-shot-image-classification

zero-shot-audio-classification

zero-shot-object-detection

Development

Files

README.md

Latest commit

History

README.md

File metadata and controls

llm-transformers

Installation

Usage

Transformer Pipeline Tasks

Development