Skip to content

deepset-ai/hayhooks

Repository files navigation

Hayhooks

Hayhooks makes it easy to deploy and serve Haystack pipelines as REST APIs.

It provides a simple way to wrap your Haystack pipelines with custom logic and expose them via HTTP endpoints, including OpenAI-compatible chat completion endpoints. With Hayhooks, you can quickly turn your Haystack pipelines into API services with minimal boilerplate code.

PyPI - Version PyPI - Python Version Docker image release Tests

Table of Contents

Quick start

Install the package

Start by installing the package:

pip install hayhooks

Configuration

Currently, you can configure Hayhooks by:

  • Set the environment variables in an .env file in the root of your project.
  • Pass the supported arguments and options to hayhooks run command.
  • Pass the environment variables to the hayhooks command.

Environment variables

The following environment variables are supported:

  • HAYHOOKS_HOST: The host on which the server will listen.
  • HAYHOOKS_PORT: The port on which the server will listen.
  • HAYHOOKS_PIPELINES_DIR: The path to the directory containing the pipelines.
  • HAYHOOKS_ROOT_PATH: The root path of the server.
  • HAYHOOKS_ADDITIONAL_PYTHONPATH: Additional Python path to be added to the Python path.
  • HAYHOOKS_DISABLE_SSL: Boolean flag to disable SSL verification when making requests from the CLI.
  • HAYHOOKS_SHOW_TRACEBACKS: Boolean flag to show tracebacks on errors during pipeline execution and deployment.

CLI commands

The hayhooks package provides a CLI to manage the server and the pipelines. Any command can be run with hayhooks <command> --help to get more information.

CLI commands are basically wrappers around the HTTP API of the server. The full API reference is available at //HAYHOOKS_HOST:HAYHOOKS_PORT/docs or //HAYHOOKS_HOST:HAYHOOKS_PORT/redoc.

hayhooks run     # Start the server
hayhooks status  # Check the status of the server and show deployed pipelines

hayhooks pipeline deploy-files <path_to_dir>   # Deploy a pipeline using PipelineWrapper
hayhooks pipeline deploy <pipeline_name>       # Deploy a pipeline from a YAML file
hayhooks pipeline undeploy <pipeline_name>     # Undeploy a pipeline

Start Hayhooks

Let's start Hayhooks:

hayhooks run

This will start the Hayhooks server on HAYHOOKS_HOST:HAYHOOKS_PORT.

Deploy a pipeline

Now, we will deploy a pipeline to chat with a website. We have created an example in the examples/chat_with_website_streaming folder.

In the example folder, we have two files:

  • chat_with_website.yml: The pipeline definition in YAML format.
  • pipeline_wrapper.py (mandatory): A pipeline wrapper that uses the pipeline definition.

Why a pipeline wrapper?

The pipeline wrapper provides a flexible foundation for deploying Haystack pipelines by allowing users to:

  • Choose their preferred pipeline initialization method (YAML files, Haystack templates, or inline code)
  • Define custom pipeline execution logic with configurable inputs and outputs
  • Optionally expose OpenAI-compatible chat endpoints with streaming support for integration with interfaces like open-webui

The pipeline_wrapper.py file must contain an implementation of the BasePipelineWrapper class (see here for more details).

A minimal PipelineWrapper looks like this:

from pathlib import Path
from typing import List
from haystack import Pipeline
from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        pipeline_yaml = (Path(__file__).parent / "chat_with_website.yml").read_text()
        self.pipeline = Pipeline.loads(pipeline_yaml)

    def run_api(self, urls: List[str], question: str) -> str:
        result = self.pipeline.run({"fetcher": {"urls": urls}, "prompt": {"query": question}})
        return result["llm"]["replies"][0]

It contains two methods:

setup()

This method will be called when the pipeline is deployed. It should initialize the self.pipeline attribute as a Haystack pipeline.

You can initialize the pipeline in many ways:

run_api(Pydantic-compatible arguments) -> (Pydantic-compatible type)

This method will be used to run the pipeline in API mode, when you call the {pipeline_name}/run endpoint.

You can define the input arguments of the method according to your needs. The input arguments will be used to generate a Pydantic model that will be used to validate the request body. The same will be done for the response type.

NOTE: Since Hayhooks will dynamically create the Pydantic models, you need to make sure that the input arguments are JSON-serializable.

To deploy the pipeline, run:

hayhooks pipeline deploy-files -n chat_with_website examples/chat_with_website

This will deploy the pipeline with the name chat_with_website. Any error encountered during development will be printed to the console and show in the server logs.

Additional dependencies

After installing the Hayhooks package, it might happen that during pipeline deployment you need to install additional dependencies in order to correctly initialize the pipeline instance when calling the wrapper's setup() method. For instance, the chat_with_website pipeline requires the trafilatura package, which is not installed by default.

⚠️ Sometimes you may need to enable tracebacks in hayhooks to see the full error message. You can do this by setting the HAYHOOKS_SHOW_TRACEBACKS environment variable to true or 1.

Then, assuming you've installed the Hayhooks package in a virtual environment, you will need to install the additional required dependencies yourself by running:

pip install trafilatura

OpenAI-compatible endpoints generation

Hayhooks now can automatically generate OpenAI-compatible endpoints if you implement the run_chat_completion method in your pipeline wrapper.

This will make Hayhooks compatible with fully-featured chat interfaces like open-webui, so you can use it as a backend for your chat interface.

open-webui OpenAI connections

To enable the automatic generation of OpenAI-compatible endpoints, you need only to implement the run_chat_completion method in your pipeline wrapper.

Let's update the previous example to add a streaming response:

from pathlib import Path
from typing import Generator, List, Union
from haystack import Pipeline
from hayhooks import get_last_user_message, BasePipelineWrapper, log


URLS = ["https://haystack.deepset.ai", "https://www.redis.io", "https://ssi.inc"]


class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        ...  # Same as before

    def run_api(self, urls: List[str], question: str) -> str:
        ...  # Same as before

    def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]:
        log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

        question = get_last_user_message(messages)
        log.trace(f"Question: {question}")

        # Plain pipeline run, will return a string
        result = self.pipeline.run({"fetcher": {"urls": URLS}, "prompt": {"query": question}})
        return result["llm"]["replies"][0]

run_chat_completion(model: str, messages: List[dict], body: dict) -> Union[str, Generator]

Differently from the run_api method, the run_chat_completion has a fixed signature and will be called with the arguments specified in the OpenAI-compatible endpoint.

  • model: The name of the Haystack pipeline which is called.
  • messages: The list of messages from the chat in the OpenAI format.
  • body: The full body of the request.

Some notes:

  • Since we have only the user messages as input here, the question is extracted from the last user message and the urls argument is hardcoded.
  • In this example, the run_chat_completion method is returning a string, so the open-webui will receive a string as response and show the pipeline output in the chat all at once.
  • The body argument contains the full request body, which may be used to extract more information like the temperature or the max_tokens (see the OpenAI API reference for more information).

Here's how it looks like from the open-webui side:

chat-completion-example

Streaming responses in OpenAI-compatible endpoints

Hayhooks now provides a streaming_generator utility function that can be used to stream the pipeline output to the client.

Let's update the run_chat_completion method of the previous example:

from pathlib import Path
from typing import Generator, List, Union
from haystack import Pipeline
from hayhooks import get_last_user_message, BasePipelineWrapper, log, streaming_generator


URLS = ["https://haystack.deepset.ai", "https://www.redis.io", "https://ssi.inc"]


class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        ...  # Same as before

    def run_api(self, urls: List[str], question: str) -> str:
        ...  # Same as before

    def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]:
        log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

        question = get_last_user_message(messages)
        log.trace(f"Question: {question}")

        # Streaming pipeline run, will return a generator
        return streaming_generator(
            pipeline=self.pipeline,
            pipeline_run_args={"fetcher": {"urls": URLS}, "prompt": {"query": question}},
        )

Now, if you run the pipeline and call one of the following endpoints:

  • {pipeline_name}/chat
  • /chat/completions
  • /v1/chat/completions

You will see the pipeline output being streamed in OpenAI-compatible format to the client and you'll be able to see the output in chunks.

Here's how it looks like from the open-webui side:

chat-completion-streaming-example

Integration with haystack OpenAIChatGenerator

Since Hayhooks is OpenAI-compatible, it can be used as a backend for the haystack OpenAIChatGenerator.

Assuming you have a Haystack pipeline named chat_with_website_streaming and you have deployed it using Hayhooks, here's an example script of how to use it with the OpenAIChatGenerator:

from haystack.components.generators.chat.openai import OpenAIChatGenerator
from haystack.utils import Secret
from haystack.dataclasses import ChatMessage
from haystack.components.generators.utils import print_streaming_chunk

client = OpenAIChatGenerator(
    model="chat_with_website_streaming",
    api_key=Secret.from_token("not-relevant"),  # This is not used, you can set it to anything
    api_base_url="http://localhost:1416/v1/",
    streaming_callback=print_streaming_chunk,
)

client.run([ChatMessage.from_user("Where are the offices or SSI?")])
# > The offices of Safe Superintelligence Inc. (SSI) are located in Palo Alto, California, and Tel Aviv, Israel.

# > {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='The offices of Safe >Superintelligence Inc. (SSI) are located in Palo Alto, California, and Tel Aviv, Israel.')], _name=None, _meta={'model': >'chat_with_website_streaming', 'index': 0, 'finish_reason': 'stop', 'completion_start_time': '2025-02-11T15:31:44.599726', >'usage': {}})]}

Run Hayhooks programmatically

An Hayhooks app instance can be run programmatically created by using the create_app function. This is useful if you want to add custom routes or middleware to Hayhooks.

Here's an example script:

import uvicorn
from hayhooks.settings import settings
from fastapi import Request
from hayhooks import create_app

# Create the Hayhooks app
hayhooks = create_app()


# Add a custom route
@hayhooks.get("/custom")
async def custom_route():
    return {"message": "Hi, this is a custom route!"}


# Add a custom middleware
@hayhooks.middleware("http")
async def custom_middleware(request: Request, call_next):
    response = await call_next(request)
    response.headers["X-Custom-Header"] = "custom-header-value"
    return response


if __name__ == "__main__":
    uvicorn.run("app:hayhooks", host=settings.host, port=settings.port)

Deploy a pipeline using only its YAML definition

⚠️ This way of deployment is not maintained anymore and will be deprecated in the future.

We're still supporting the Hayhooks former way to deploy a pipeline.

The former command hayhooks deploy is now changed to hayhooks pipeline deploy and can be used to deploy a pipeline only from a YAML definition file.

For example:

hayhooks pipeline deploy -n chat_with_website examples/chat_with_website/chat_with_website.yml

This will deploy the pipeline with the name chat_with_website from the YAML definition file examples/chat_with_website/chat_with_website.yml. You then can check the generated docs at http://HAYHOOKS_HOST:HAYHOOKS_PORT/docs or http://HAYHOOKS_HOST:HAYHOOKS_PORT/redoc, looking at the POST /chat_with_website endpoint.

Deployment

For detailed deployment guidelines, see deployment_guidelines.md.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.