Simple adapter for many language models - remote (Hugging Face, OpenAI, AnthropicAI, CohereAI) and local (transformers library).
Facilitates loading of many new models (Guanaco, Falcon, Vicuna, etc) in 16/8/4 bit modes.
It also supports embedding models (OpenAI, CohereAI, Sentence Transformers).
🚧 This is experimental software. Anything can change without any notice.
pip install git+https://github.com/mobarski/aidapter.git
Note: each vendor API requires manual installation of dependencies.
- simple, unified API to many models (remote and local)
- batching
- parallel calls
- caching
- usage tracking
- automatic retries
- response priming
completion:
>>> import aidapter
>>> model = aidapter.model('openai:gpt-3.5-turbo') # uses OPENAI_API_KEY env variable
>>> model.complete('2+2=')
4
>>> model.complete(['2+2=','7*6=']) # parallel calls
['4', '42']
embeddings:
>>> model = aidapter.model('sentence-transformers:multi-qa-mpnet-base-dot-v1')
>>> vector = model.embed('mighty indeed')
>>> vector[:5]
[-0.07946087, -0.2150347, -0.33358946, 0.18340564, 0.16403404]
>>> vectors = model.embed(['this is the way', 'so say we all']) # parallel / batch processing
>>> [x[:5] for x in vectors]
[[0.037638217, -0.30608281, -0.3064257, -0.46715638, -0.2608084],
[-0.063842215, -0.16669855, -0.22363697, -0.2893797, 0.060464755]]
multiple models:
>>> m1 = aidapter.model('transformers:ehartford/Wizard-Vicuna-13B-Uncensored:4bit') # 4 bit mode
>>> m2 = aidapter.model('anthropic:claude-instant-v1') # uses ANTHROPIC_API_KEY env variable
persistent cache and usage tracking:
>>> import shelve
>>> model.cache = shelve.open('/tmp/aidapter.cache') # persistant disk cache
>>> model.usage = shelve.open('/tmp/aidapter.usage') # persistant usage tracking
>>> import diskcache as dc
>>> model.cache = dc.Cache('/tmp/aidapter.cache') # persistant disk cache
>>> model.usage = dc.Cache('/tmp/aidapter.usage') # persistant usage tracking
function calling interface*:
>>> def get_weather(city):
>>> "get weather info for a city; city must be all caps after ISO country code and a : separator (e.g. FR:PARIS)"
>>> ...
>>> model = aidapter.model('openai:gpt-3.5-turbo-0613')
>>> model.complete('Whats the weather in the capital of Poland?', functions=[get_weather])
{'function_name': 'get_weather', 'arguments': {'city': 'PL:WARSAW'}}
* currently, it works only with selected OpenAI models
use last_hidden_state from any transformer as an embedding*:
>>> model = aidapter.model('transformers:RWKV/rwkv-raven-1b5')
>>> model.raw_embed_one('mighty indeed')[:5]
[0.14850381016731262, -0.021324729546904564, 0.09214707463979721, 0.34308338165283203, -0.11288302391767502]
* requires additional normalization over a corpus, API will change
aidapter.model(model_id, **api_kwargs) -> model
model_id
- model identifier in the following format<vendor_name>:<model_name>
api_kwargs
- default API arguments
model.complete(prompt, system='', start='', stop=[], limit=100, temperature=0, functions=[], cache='use', debug=False) -> str | list | dict
-
prompt
- main prompt or list of prompts -
system
- system prompt -
start
- the text that will be appended to the start of the response and to the end of the prompt (aka response priming) -
stop
- list of strings upon which to stop generating -
limit
- maximum number of tokens to generate before stopping (aka max_new_tokens, max_tokens_to_sample) -
temperature
- amount of randomness -
functions
- list of functions available to the model (none of them will be executed - only the signatures are used) -
cache
- cache usage:use
- use the cache if the temperature is 0 (default)skip
- don't use the cacheforce
- use the cache even if the temperature is not 0
-
debug
- if True, the function will return a dictionary (or a list of dictionaries) containing internal objects / valuesFULL_PROMPT =
system
+prompt
+start
model.embed(input, limit=None) -> list | list[list]
input
- text or list of textslimit
- limit the vector length to first n dimensions (default = None = no limit)
model configuration:
-
model.workers
- number of concurrent workers for parallel completion (default=4) -
model.show_progress
- show progress bar when performing parallel completion (default=False) -
model.retry_tries
- maximum number of retry attempts (default=5) -
model.retry_delay
- initial delay between retry attempts (default=0.1) -
model.retry_backoff
- multiplier applied to the delay between retry attempts (default=3)
-
openai:gpt-4
-
openai:gpt-4-32k
-
openai:gpt-3.5-turbo
-
openai:text-davinci-003
-
openai:code-davinci-002
-
...
API key env. variable: OPENAI_API_KEY
-
anthropic:claude-v1
-
anthropic:claude-instant-v1
-
anthropic:claude-v1-100k
-
anthropic:claude-instant-v1-100k
-
...
API key env. variable: ANTHROPIC_API_KEY
-
cohere:command
-
cohere:command-light
-
...
API key env. variable: CO_API_KEY
-
transformers:TheBloke/guanaco-7B-HF
-
transformers:tiiuae/falcon-7b
-
transformers:RWKV/rwkv-raven-3b
-
transformers:ehartford/Wizard-Vicuna-13B-Uncensored
-
transformers:roneneldan/TinyStories-33M
-
...
- initial support for HF API models
- removed old HF implementation
- OpenAI's embeddings now use BaseModelV2
- as_iter option in BaseModelV2.transform
- removed BaseModelV2.register_progress
- handle cache=False in BaseModelV2.transform_many
- hf2 brand renamed to huggingface
- initial support for HF API embeddings
- BaseModelV2
- cleaner code
- diskcache support
- batch + threads support
- retry configuration
- progress update
- initial support for the functions argument (works only with selected OpenAI models)
- initial support for raw_embed_one in transformers (for creating embeddings from ANY transformer models)
- fix: kw handling in get_cache_key
limit
option for embedding models
- initial support for embedding models (requires more work with batch / parallel processing):
- OpenAI
- Cohere
- Sentence Transformers
- response priming (
start
option)
stop
option for transformers
-
anthropic usage: tokens, characters
-
transformers usage: tokens, characters
- remove prompt from transformers output
- removed kvdb
- usage['time']
- fixed pad_token_id
- fixed limit in transformer models
-
initial support for local transformers models
-
float16 (add ":16bit" to the model name)
-
load_in_8bit (add ":8bit" to the model name)
-
load_in_4bit (add ":4bit" to the model name)
-
-
cache = use | skip | force
-
shelve based persistence (for cache and usage)
- kvdb import fix
- Cohere models
- disk cache
- OpenAI instruct models
- Anthropic models (ANTHROPIC_API_KEY env variable)
- complete: debug option
- BaseModel.RENAME_KWARGS
- pip install
- limit handling
- parallel calls / cache / usage tracking / retries
- OpenAI chat models
- HF API text generation
- llama.cpp models (GGML!)
- strangulate BaseModel with BaseModelV2