- Minuet AI
- Features
- Requirements
- Installation
- Configuration
- API Keys
- Prompt
- Providers
- Commands
- Contributing
- Acknowledgement
Minuet AI: Dance with Intelligence in Your Code 💃.
brings the grace and harmony of a minuet to your coding process.
Just as dancers move during a minuet.
- AI-powered code completion with dual modes:
- Specialized prompts and various enhancements for chat-based LLMs on code completion tasks.
- Fill-in-the-middle (FIM) completion for compatible models (DeepSeek, Codestral, and others).
- Support for multiple AI providers (OpenAI, Claude, Gemini, Codestral, Huggingface, and OpenAI-compatible services)
- Customizable configuration options
- Streaming support to enable completion delivery even with slower LLMs
- Support
,virtual text
With nvim-cmp / blink-cmp frontend:
With virtual text frontend:
- Neovim 0.10+.
- plenary.nvim
- optional: nvim-cmp
- optional: blink.cmp
- An API key for at least one of the supported AI providers
specs = {
config = function()
require('minuet').setup {
-- Your configuration options here
{ 'nvim-lua/plenary.nvim' },
-- optional, if you are using virtual-text frontend, nvim-cmp is not
-- required.
{ 'hrsh7th/nvim-cmp' },
-- optional, if you are using virtual-text frontend, blink is not required.
{ 'Saghen/blink.cmp' },
Given the response speed and rate limits of LLM services, we recommend you
either invoke minuet
completion manually or use a cost-effective model like
or codestral
for auto-completion.
Setting up with virtual text:
require('minuet').setup {
virtualtext = {
auto_trigger_ft = {},
keymap = {
-- accept whole completion
accept = '<A-A>',
-- accept one line
accept_line = '<A-a>',
-- accept n lines (prompts for number)
accept_n_lines = '<A-z>',
-- Cycle to prev completion item, or manually invoke completion
prev = '<A-[>',
-- Cycle to next completion item, or manually invoke completion
next = '<A-]>',
dismiss = '<A-e>',
Setting up with nvim-cmp:
require('cmp').setup {
sources = {
-- Include minuet as a source to enable autocompletion
{ name = 'minuet' },
-- and your other sources
performance = {
-- It is recommended to increase the timeout duration due to
-- the typically slower response speed of LLMs compared to
-- other completion sources. This is not needed when you only
-- need manual completion.
fetching_timeout = 2000,
-- If you wish to invoke completion manually,
-- The following configuration binds `A-y` key
-- to invoke the configuration manually.
require('cmp').setup {
mapping = {
["<A-y>"] = require('minuet').make_cmp_map()
-- and your other keymappings
Setting up with blink-cmp:
require('blink-cmp').setup {
keymap = {
-- Manually invoke minuet completion.
['<A-y>'] = require('minuet').make_blink_map(),
sources = {
-- Enable minuet for autocomplete
default = { 'lsp', 'path', 'buffer', 'snippets', 'minuet' },
-- For manual completion only, remove 'minuet' from default
providers = {
minuet = {
name = 'minuet',
module = 'minuet.blink',
score_offset = 8, -- Gives minuet higher priority among suggestions
LLM Provider Examples:
Fireworks (llama-3.3-70b
require('minuet').setup {
provider = 'openai_compatible',
provider_options = {
openai_compatible = {
api_key = 'FIREWORKS_API_KEY',
end_point = 'https://api.fireworks.ai/inference/v1/chat/completions',
model = 'accounts/fireworks/models/llama-v3p3-70b-instruct',
name = 'Fireworks',
optional = {
max_tokens = 256,
top_p = 0.9,
-- you can use deepseek with both openai_fim_compatible or openai_compatible provider
require('minuet').setup {
provider = 'openai_fim_compatible',
provider_options = {
openai_fim_compatible = {
api_key = 'DEEPSEEK_API_KEY',
name = 'deepseek',
optional = {
max_tokens = 256,
top_p = 0.9,
-- or
require('minuet').setup {
provider = 'openai_compatible',
provider_options = {
openai_compatible = {
end_point = 'https://api.deepseek.com/v1/chat/completions',
api_key = 'DEEPSEEK_API_KEY',
name = 'deepseek',
optional = {
max_tokens = 256,
top_p = 0.9,
require('minuet').setup {
provider = 'openai_fim_compatible',
n_completions = 1, -- recommend for local model for resource saving
provider_options = {
openai_fim_compatible = {
api_key = 'TERM',
name = 'Ollama',
end_point = 'http://localhost:11434/v1/completions',
model = 'qwen2.5-coder:14b',
optional = {
max_tokens = 256,
top_p = 0.9,
Minuet AI comes with the following defaults:
default_config = {
-- Enable or disable auto-completion. Note that you still need to add
-- Minuet to your cmp/blink sources. This option controls whether cmp/blink
-- will attempt to invoke minuet when minuet is included in cmp/blink
-- sources. This setting has no effect on manual completion; Minuet will
-- always be enabled when invoked manually. You can use the command
-- `MinuetToggle` to toggle this option.
cmp = {
enable_auto_complete = true,
blink = {
enable_auto_complete = true,
virtualtext = {
-- Specify the filetypes to enable automatic virtual text completion,
-- e.g., { 'python', 'lua' }. Note that you can still invoke manual
-- completion even if the filetype is not on your auto_trigger_ft list.
auto_trigger_ft = {},
-- specify file types where automatic virtual text completion should be
-- disabled. This option is useful when auto-completion is enabled for
-- all file types i.e., when auto_trigger_ft = { '*' }
auto_trigger_ignore_ft = {},
keymap = {
accept = nil,
accept_line = nil,
accept_n_lines = nil,
-- Cycle to next completion item, or manually invoke completion
next = nil,
-- Cycle to prev completion item, or manually invoke completion
prev = nil,
dismiss = nil,
provider = 'codestral',
-- the maximum total characters of the context before and after the cursor
-- 16000 characters typically equate to approximately 4,000 tokens for
-- LLMs.
context_window = 16000,
-- when the total characters exceed the context window, the ratio of
-- context before cursor and after cursor, the larger the ratio the more
-- context before cursor will be used. This option should be between 0 and
-- 1, context_ratio = 0.75 means the ratio will be 3:1.
context_ratio = 0.75,
throttle = 1000, -- only send the request every x milliseconds, use 0 to disable throttle.
-- debounce the request in x milliseconds, set to 0 to disable debounce
debounce = 400,
-- Control notification display for request status
-- Notification options:
-- false: Disable all notifications (use boolean false, not string "false")
-- "debug": Display all notifications (comprehensive debugging)
-- "verbose": Display most notifications
-- "warn": Display warnings and errors only
-- "error": Display errors only
notify = 'warn',
-- The request timeout, measured in seconds. When streaming is enabled
-- (stream = true), setting a shorter request_timeout allows for faster
-- retrieval of completion items, albeit potentially incomplete.
-- Conversely, with streaming disabled (stream = false), a timeout
-- occurring before the LLM returns results will yield no completion items.
request_timeout = 3,
-- If completion item has multiple lines, create another completion item
-- only containing its first line. This option only has impact for cmp and
-- blink. For virtualtext, no single line entry will be added.
add_single_line_entry = true,
-- The number of completion items encoded as part of the prompt for the
-- chat LLM. For FIM model, this is the number of requests to send. It's
-- important to note that when 'add_single_line_entry' is set to true, the
-- actual number of returned items may exceed this value. Additionally, the
-- LLM cannot guarantee the exact number of completion items specified, as
-- this parameter serves only as a prompt guideline.
n_completions = 3,
-- Defines the length of non-whitespace context after the cursor used to
-- filter completion text. Set to 0 to disable filtering.
-- Example: With after_cursor_filter_length = 3 and context:
-- "def fib(n):\n|\n\nfib(5)" (where | represents cursor position),
-- if the completion text contains "fib", then "fib" and subsequent text
-- will be removed. This setting filters repeated text generated by the
-- LLM. A large value (e.g., 15) is recommended to avoid false positives.
after_cursor_filter_length = 15,
-- proxy port to use
proxy = nil,
provider_options = {
-- see the documentation in each provider in the following part.
-- see the documentation in the `Prompt` section
default_template = {
template = '...',
prompt = '...',
guidelines = '...',
n_completion_template = '...',
default_fim_template = {
default_prompt = '...',
default_suffix = '...',
default_few_shots = { '...' },
Minuet AI requires API keys to function. Set the following environment variables:
for CodestralHF_API_KEY
for Huggingface- Custom environment variable for OpenAI-compatible services (as specified in your configuration)
Note: Provide the name of the environment variable to Minuet, not the
actual value. For instance, pass OPENAI_API_KEY
to Minuet, not the value
itself (e.g., sk-xxxx
If using Ollama, you need to assign an arbitrary, non-null environment variable as a placeholder for it to function.
See prompt for the default prompt used by minuet
instructions on customization.
Note that minuet
employs two distinct prompt systems:
- A system designed for chat-based LLMs (OpenAI, OpenAI-Compatible, Claude, and Gemini)
- A separate system designed for Codestral and OpenAI-FIM-compatible models
the following is the default configuration for OpenAI:
provider_options = {
openai = {
model = 'gpt-4o-mini',
system = "see [System Prompt] section for the default value",
few_shots = "see [System Prompt] section for the default value",
chat_input = "See [Prompt Section for default value]",
stream = true,
optional = {
-- pass any additional parameters you want to send to OpenAI request,
-- e.g.
-- stop = { 'end' },
-- max_tokens = 256,
-- top_p = 0.9,
The following configuration is not the default, but recommended to prevent request timeout from outputing too many tokens.
provider_options = {
openai = {
optional = {
max_tokens = 256,
the following is the default configuration for Claude:
provider_options = {
claude = {
max_tokens = 512,
model = 'claude-3-5-haiku-20241022',
system = "see [System Prompt] section for the default value",
few_shots = "see [System Prompt] section for the default value",
chat_input = "See [Prompt Section for default value]",
stream = true,
optional = {
-- pass any additional parameters you want to send to claude request,
-- e.g.
-- stop_sequences = nil,
Codestral is a text completion model, not a chat model, so the system prompt
and few shot examples does not apply. Note that you should use the
, as they are using different
endpoint. To use the Mistral endpoint, simply modify the end_point
parameters in the configuration.
the following is the default configuration for Codestral:
provider_options = {
codestral = {
model = 'codestral-latest',
end_point = 'https://codestral.mistral.ai/v1/fim/completions',
api_key = 'CODESTRAL_API_KEY',
stream = true,
template = {
prompt = "See [Prompt Section for default value]",
suffix = "See [Prompt Section for default value]",
optional = {
stop = nil, -- the identifier to stop the completion generation
max_tokens = nil,
The following configuration is not the default, but recommended to prevent request timeout from outputing too many tokens.
provider_options = {
codestral = {
optional = {
max_tokens = 256,
stop = { '\n\n' },
The following config is the default.
provider_options = {
gemini = {
model = 'gemini-1.5-flash-latest',
system = "see [System Prompt] section for the default value",
few_shots = "see [System Prompt] section for the default value",
chat_input = "See [Prompt Section for default value]",
stream = true,
optional = {},
The following configuration is not the default, but recommended to prevent request timeout from outputing too many tokens. You can also adjust the safety settings following the example:
provider_options = {
gemini = {
optional = {
generationConfig = {
maxOutputTokens = 256,
safetySettings = {
threshold = 'BLOCK_ONLY_HIGH',
Use any providers compatible with OpenAI's chat completion API.
For example, you can set the end_point
to use ollama
Note that not all openAI compatible services has streaming support, you should
change stream=false
to disable streaming in case your services do not support
The following config is the default.
provider_options = {
openai_compatible = {
model = 'llama-3.3-70b-versatile',
system = "see [System Prompt] section for the default value",
few_shots = "see [System Prompt] section for the default value",
chat_input = "See [Prompt Section for default value]",
end_point = 'https://api.groq.com/openai/v1/chat/completions',
api_key = 'GROQ_API_KEY',
name = 'Groq',
stream = true,
optional = {
stop = nil,
max_tokens = nil,
Use any provider compatible with OpenAI's completion API. This request uses the text completion API, not chat completion, so system prompts and few-shot examples are not applicable.
For example, you can set the end_point
to use ollama
Refer to the Completions Legacy section of the OpenAI documentation for details.
Note that not all openAI compatible services has streaming support, you should
change stream=false
to disable streaming in case your services do not support
Please note that not all OpenAI-compatible services support streaming. If your
service does not support streaming, you should set stream=false
to disable
Additionally, for Ollama users, it is essential to verify whether the model's
template supports FIM completion. For example, qwen2.5-coder's
template is a
supported model. However if may come as a surprise to some users that,
does not support the FIM template, and you should use
provider_options = {
openai_fim_compatible = {
model = 'deepseek-chat',
end_point = 'https://api.deepseek.com/beta/completions',
api_key = 'DEEPSEEK_API_KEY',
name = 'Deepseek',
stream = true,
template = {
prompt = "See [Prompt Section for default value]",
suffix = "See [Prompt Section for default value]",
optional = {
stop = nil,
max_tokens = nil,
The following configuration is not the default, but recommended to prevent request timeout from outputing too many tokens.
provider_options = {
openai_fim_compatible = {
optional = {
max_tokens = 256,
stop = { '\n\n' },
Currently only text completion model in huggingface is supported, so the system prompt and few shot examples does not apply.
provider_options = {
huggingface = {
end_point = 'https://api-inference.huggingface.co/models/bigcode/starcoder2-3b',
type = 'completion',
strategies = {
completion = {
markers = {
prefix = '<fim_prefix>',
suffix = '<fim_suffix>',
middle = '<fim_middle>',
strategy = 'PSM', -- PSM, SPM or PM
optional = {
parameters = {
-- The parameter specifications for different LLMs may vary.
-- Ensure you specify the parameters after reading the API
-- documentation.
stop = nil,
max_tokens = nil,
do_sample = nil,
The change_provider
command allows you to change the provider after Minuet
has been setup.
Example usage: Minuet change_provider claude
The change_model
command allows you to change both the provider and model in
one command. The format is provider:model
Example usage: Minuet change_model gemini:gemini-1.5-pro-latest
Note: For openai_compatible
and openai_fim_compatible
providers, the model
completions in cmdline are determined by the name
field in your
configuration. For example, if you configured:
provider_options.openai_compatible.name = 'Fireworks'
When entering Minuet change_model openai_compatible:
in the cmdline,
you'll see model completions specific to the Fireworks provider.
Enable or disable autocompletion for nvim-cmp
or blink.cmp
. While Minuet
must be added to your cmp/blink sources, this command only controls whether
Minuet is triggered during autocompletion. The command does not affect manual
completion behavior - Minuet remains active and available when manually
Example usage: Minuet blink toggle
, Minuet blink enable
, Minuet blink disable
Enable or disable the automatic display of virtual-text
completion in the
current buffer.
Example usage: Minuet virtualtext toggle
, Minuet virtualtext enable
Minuet virtualtext disable
offers the following functions to customize your key mappings:
-- accept whole completion
-- accept by line
-- accept n lines (prompts for number)
-- whether the virtual text is visible in current buffer
You can configure the icons of minuet
by using the following snippet
(referenced from cmp's
local cmp = require('cmp')
cmp.setup {
formatting = {
format = function(entry, vim_item)
-- Kind icons
vim_item.kind = string.format('%s %s', kind_icons[vim_item.kind], vim_item.kind) -- This concatenates the icons with the name of the item kind
-- Source
vim_item.menu = ({
minuet = ""
return vim_item
When using Minuet with auto-complete enabled, you may occasionally experience a
noticeable delay when pressing <CR>
to move to the next line. This occurs
because Minuet triggers autocompletion at the start of a new line, while cmp
blocks the <CR>
key, awaiting Minuet's response.
To address this issue, consider the following solutions:
- Unbind the
key from your cmp keymap. - Utilize cmp's internal API to avoid blocking calls, though be aware that this API may change without prior notice.
Here's an example of the second approach using Lua:
local cmp = require 'cmp'
opts.mapping = {
['<CR>'] = cmp.mapping(function(fallback)
-- use the internal non-blocking call to check if cmp is visible
if cmp.core.view:visible() then
cmp.confirm { select = true }
With nvim-cmp:
config = function()
require('minuet').setup {
-- Your configuration options here
optional = true,
opts = function(_, opts)
-- if you wish to use autocomplete
table.insert(opts.sources, 1, {
name = 'minuet',
group_index = 1,
priority = 100,
opts.performance = {
-- It is recommended to increase the timeout duration due to
-- the typically slower response speed of LLMs compared to
-- other completion sources. This is not needed when you only
-- need manual completion.
fetching_timeout = 2000,
opts.mapping = vim.tbl_deep_extend('force', opts.mapping or {}, {
-- if you wish to use manual complete
['<A-y>'] = require('minuet').make_cmp_map(),
With blink-cmp:
-- set the following line in your config/options.lua
vim.g.lazyvim_blink_main = true
config = function()
require('minuet').setup {
-- Your configuration options here
optional = true,
opts = {
keymap = {
['<A-y>'] = {
cmp.show { providers = { 'minuet' } }
sources = {
-- if you want to use auto-complete
default = { 'minuet' },
providers = {
minuet = {
name = 'minuet',
module = 'minuet.blink',
score_offset = 100,
- Implement
on the codebase and encode the codebase information into the request to LLM.
Contributions are welcome! Please feel free to submit a Pull Request.
- cmp-ai: Reference for the integration with
. - continue.dev: not a neovim plugin, but I find a lot LLM models from here.
- copilot.lua: Reference for the virtual text frontend.