Home

Node Usage Instructions

Models

API LLM Nodes and Their Loading Nodes

You can directly input system prompts and user prompts on the node, or use system prompt input and user prompt input to input, accepting string type inputs. system input is generally used to mount mask nodes. Essentially, it is no different from an input box.
The large model node can also accept the output of tool nodes from the tools interface and accept string inputs from the file_content interface. These inputs will be used as the model's knowledge base, and relevant content will be searched and input into the model based on word vector similarity.
The is_memory of the large model node can determine whether the large model has memory. You can change is_memory to disable, then run it, and the model will clear the previous conversation records. Switch back to enable, and the model will retain your conversation records in subsequent runs.
You can view the model's response in the current round of conversation through assistant_response, and you can also view the history of multiple rounds of conversation through history.
Even if external parameters remain unchanged, the large model node will always run because the large model always has different answers to the same question.
Input:
- is_tools_in_sys_prompt: Determines whether the information of tools will be input into the system prompt. If input into the system prompt, it can unlock tool capabilities for some models without tool capabilities.
- is_memory: When enabled, the LLM gains memory. If not enabled, it will clear memory each time and start over.
- is_locked: When you do not change any parameters, it directly returns the result of the last conversation, saving computing power and stabilizing the output results of the LLM.
- main_brain: Determines whether the large model is the model interfacing with the user. When disabled, the LLM node can be used as a tool for another LLM node.
- conversation_rounds: Determines the number of conversation rounds for the LLM. When the number of rounds is exceeded, only the most recent conversation rounds will be read.
- historical_record: Can load previous conversation records into the LLM to continue the last chat.
- tools: Input is the tool call interface of the LLM, and tool output is the interface for using this LLM node as a tool, generally not used.
- Imgbb api key is optional. If you use visual functions and do not fill in this key, it will be transmitted to OpenAI in base64 encoding. If you add a key, it will generate a URL after uploading to the image bed and pass the URL to OpenAI. Not filling it will not affect usage, but it will affect the readability of the conversation record.
Output:
- assistant_response: Text output of the LLM
- history: Conversation records of the LLM
- Tool: Enabled when the LLM is used as a tool for another LLM. In most cases, it can be ignored.
- Image: Under construction, useful in the future.
The LLM adapts to GPT-4's visual functions. You can input the imgbb_api_key into the imgbb API key. After filling it in, your image will be passed to GPT in URL format. If not filled, it will be passed in image encoding format.
The large model node can customize the model name, API_KEY, and base_url. Currently, only OpenAI type API interface calls are supported. It can be combined with One API to connect to any large model API.

Local LLM Nodes and Their Loading Nodes

The local LLM loader node has been greatly adjusted, and you no longer need to choose the model type yourself. The llava loader node and GGUF loader node have been re-added. The model type on the local LLM model chain node has been changed to three options: LLM, VLM-GGUF, and LLM-GGUF, corresponding to directly loading LLM models, loading VLM models, and loading GGUF format LLM models. Support for VLM models and GGUF format LLM models has been restored. Now local calls can be compatible with more models! Example workflows: LLM_local, llava, GGUF.
Fill in the model's project folder in model_name_or_path, compatible with all models that can be compatible with transformers. You can also fill in the repo ID on Hugging Face to directly pull the model.
The remaining parameters are consistent with the API LLM nodes.

VLM-GGUF Model Loader

ckpt_path and clip_path should be filled with the absolute paths of the LLM’s GGUF file and the CLIP’s GGUF file, respectively.
max_ctx is the maximum context length of the LVM model. If this length is exceeded, the model will automatically truncate.
gpu_layers is the number of layers of the LVM model on the GPU.
n_threads is the number of threads of the LVM model on the CPU.

LLM-GGUF Model Loader

Same as above, but only the absolute path of the LLM’s GGUF file needs to be filled in.

VLM Local Loader

Similar to the LLM local loader, but it only supports models like llama3.2-vision. When using this node to load, you need to set the model type on the model chain to LVM (testing). This loader is still in testing and cannot adapt to many models.

Embedding Model Loader

The file_content node can input a string, which will be used as the input of the word embedding model. The model will search this string and return the most relevant text content based on the question.
k is the number of paragraphs returned. chuck_size is the size of each text block when splitting the text, with a default of 200. chuck_overlap is the overlap size between each text block when splitting the text, with a default of 50.
Input embedding_path to call the word embedding model in this folder.

Other Loaders

Load_File Node

The path to read the file is in comfyui_LLM_party/file, you can put the file you want to read in this path, and then fill in the file name in this node.
You can choose absolute path input, in which case path can accept an absolute path.
The output is a string that contains all the text information in the file.
The adapted file formats are: ".docx", ".txt", ".pdf", ".xlsx", ".csv", ".py", ".js", ".java", ".c", ".cpp", ".html", ".css", ".sql", ".r", ".swift"

Load_Folder Node

folder_path can accept an absolute path of a folder, and this node will automatically read all the files in the folder.
The output is a string that contains all the text information in the folder.
The adapted file formats are: ".docx", ".txt", ".pdf", ".xlsx", ".csv", ".py", ".js", ".java", ".c", ".cpp", ".html", ".css", ".sql", ".r", ".swift"

Load_url_content Node

Can convert all web page content in a url into an md format output.
The output is a string that contains all the text information on the web page.

Load_Wikipedia Node

Can return all content related to the question in Wikipedia.

Load Model Names

Load model names from config.ini to facilitate model selection on the loader.

Load Keyword Retriever

Return the most relevant text paragraphs based on the question.
k is the number of paragraphs returned. chuck_size is the size of each text block when splitting the text, with a default of 200. chuck_overlap is the overlap size between each text block when splitting the text, with a default of 50.

Excel Iterator

Return the content of the Excel file row by row, facilitating row-by-row processing. Each time comfyui runs, it will return the next row's content.
Can be used with comfyui's auto-execution to iteratively batch process your work.
Input the absolute path of the Excel file you want to process in path.
is_reload determines whether to reset the returned row count.

Text Iterator

Segment the input content and return it segment by segment, facilitating segment-by-segment processing. Each time comfyui runs, it will return the next segment's content.
Can be used with comfyui's auto-execution to iteratively batch process your work.
Input the text content you want to process in file_content.
is_reload determines whether to reset the returned segment count.

Image Iterator

Return images from the folder specified in folder_path one by one.
Can be used with comfyui's auto-execution to iteratively batch process your work.
is_reload determines whether to reset the returned index.
Supports image formats ".png", ".jpg", ".jpeg", ".gif", ".bmp".

Google Search Loader

Input your google_api_key and cse_id to use this node to search for relevant content based on keyword.
Control paper_num to turn pages and view later search results.
In web search mode, this node will return the top 10 URLs and summaries from Google search.
In image search mode, this node will return the top 10 image URLs from Google search.

Bing Search Loader

Input your bing_api_key to use this node to search for relevant content based on keyword.
Control paper_num to turn pages and view later search results.
In search mode, this node will return the top 10 URLs and summaries from Bing search.
In image search mode, this node will return the top 10 image URLs from Bing search.

Persona

Classifier persona and Super Large Classifier persona Nodes

You can use this persona node as the system_prompt_input of the LLM node, allowing the large model to have the personality of the persona.
LLM will classify user_prompt according to the categories described on the classifier persona node.
Can be used in conjunction with classifier functions to output different categories of text to different workflows.

Custom persona

prompt is the system_prompt_input that will be input into the LLM node, which can contain some variables, such as: "You are an intelligent customer service about {app}"
prompt_template contains the corresponding rules for the variables in the prompt, generally in json format, which can be filled in as follows: {"app":"chatgpt"}, at this time, {app} in the prompt will be automatically replaced with chatgpt.

Load_Persona Node

Can return a preset persona persona, which can be used as the system_prompt_input of the large model, allowing the large model to have the personality of the persona.
The persona folder contains the persona of the image prompt assistant and DAN. You can add more personas to this folder for your use.

Translation persona

Translates language_A to language_B, translating the user_prompt into the corresponding language and then returning the translated content.
tone is the tone, which can be freely specified.
degree is the degree of translation, ranging from 0 to 10, with the tone becoming progressively stronger.

Functions

Classifier Function and Super Large Classifier Function Nodes

Can split the string processed by the LLM with a classifier persona into multiple strings, which can be used in conjunction with string logic to control the execution of the corresponding workflow.
A model with a high level of intelligence is required to achieve stable classification output. The author of this node does not recommend using it and suggests using string logic and string extraction nodes instead.

String Logic

option contains the following options: "A contain B", "A not contain B", "A relate to B", "A not relate to B", "A equal B", "A not equal B", "A is null", "A is not null" for selection.
When the condition is true, if will output A string, else will output an empty string, is_true will output true, is_false will output false, otherwise else will output A string, if will output an empty string, is_true will output false, is_false will output true.

Text Display Function

Can directly display the input string on the comfyui interface.

Send to WeCom, DingTalk, Feishu

Send the input string to WeCom, DingTalk, or Feishu. You need to configure the webhook addresses for Feishu, DingTalk, and WeCom in advance.

Extract String

Extract the desired content from the input string. substring will return the content between the first occurrence of start_string and end_string.
remaining_string will return the other characters after removing the content between the first occurrence of start_string and end_string.
You can reuse this node to repeatedly extract multiple substrings that meet the criteria from the string.

OpenAI Text-to-Speech

Convert the input string to speech. You need to configure the OpenAI api_key in advance.
voice can control the voice tone. You can refer to the official OpenAI documentation.

Play Audio

Play the input audio file. The input is the absolute path of an audio file.

Omost Decoder

Convert the code output by the Omost model into conditioning and mask.
mode includes three different fusion modes: greedy is the greedy algorithm, fusion first fuses conditioning and then fuses the mask by averaging according to weights before decoding, and block first fuses conditioning and mask by blocks before decoding.
strength is the weight of the mask.

Omost Settings

Facilitate users to output relevant code according to their needs. The output can be copied into the code generated by Omost for replacement.

Listen to Audio

Press press_key to start listening to audio. Press release_key to stop listening and return the path of the recorded audio file.

OpenAI Speech Recognition

Convert the input audio file to text. You need to configure the OpenAI api_key in advance.

Replace String

The node will replace old_string in input_string with new_string and return the replaced string.

CosyVoice Text-to-Speech

Convert the input string to speech. Although it is a free interface, it only supports Chinese and is prone to errors, so it is not recommended.

API Function

Fill in the URL of the website to be accessed.
Fill in the api_key of the API.
Fill in the parameters of the API. You need to use the parameter dictionary function to input.

Parameter Dictionary Function

Convert the input key and value into a dictionary to facilitate the use of the API function.
value can be a string, dictionary, or list. You can construct any JSON dictionary you want by combining the list and dictionary-related nodes in the combination node.

Get String

Convert the input text to string output.

Clear Model

Connect the two any interfaces to any connection line in the workflow.
The node will exit the model from memory when it is executed.

chatTTS Voice Synthesis

1. Node Input

Input Name	Description
text	Text to be converted to speech
model_path	Path to the TTS model
save_path	Path to save the audio file
seed	Seed for fixed voice tone
temperature	Effect similar to LLM
top_P	Effect similar to LLM
top_K	Effect similar to LLM
enableRefine	Whether to enable optimization
oral_param	Parameter to control the degree of orality when optimization is enabled
laugh_param	Parameter to control the degree of laughter when optimization is enabled
break_param	Parameter to control the length of pauses when optimization is enabled
is_enable	Whether to enable this node
load_mode	HF: Download model from Huggingface custom: Call model from model_path local: Directly call the model file in the current workspace path (usually the root directory of ComfyUI)

2. Node Output

Output Name	Description
audio	Path to save the audio file

JSON File Parsing

alt text

show_json_file: Read the JSON file and output it as a string
value_by_key: Get the value corresponding to the key in the JSON file by setting the parameter key

JSON Value Extraction

alt text Get the value corresponding to the key in the string by setting the parameter key (the string must be in JSON format, otherwise it cannot be parsed)

Convert String Paragraph to JSON

alt text The node will split the string based on the input character sep and convert the text to a JSON formatted string

Feishu Bot Send Message & Feishu Bot Read Group History