This document will guide you through the steps to contribute with a new
component. You'll add and test an operator that takes a string target
as input
and returns a "Hello, ${target}!"
string as the component output
In order to add a new component, you need to:
- Define the component configuration. This will determine the tasks that can be performed by the component and their input and output parameters. The console frontend will use the configuration files to render the component in the pipeline editor page.
- Implement the component interfaces so
pipeline-backend
can execute the component without knowing its implementation details. - Initialize the component, i.e., include the implementation of the component
interfaces as a dependency in the
pipeline-backend
execution.
This guide builds on top of other documents:
pipeline-backend
's README explains the basic concepts in the VDP domain.component
's README digs deeper into the Component entity, its structure and functionalities.- The repository's contribution guidelines document the conventions you'll need to follow when contributing to this repository. They also contain a guide on how to set your development environment, in case you want to see your component in action.
- If you find yourself wanting to know more, visit the Instill Docs.
$ cd $MY_WORKSPACE/pipeline-backend/pkg/component
$ mkdir -p operator/hello/v0 && cd $_
Components are isolated in their own packages under their component type (ai
,
data
, application
, etc.). The package is versioned so, in case a breaking
change needs to be introduced (e.g. supporting a new major version in a vendor
API), existing pipelines using the previous version of the component can keep
being triggered.
At the end of this guide, this will be the structure of the package:
operator/hello/v0
├──.compogen
│ └──extra-bottom.mdx
├──assets
│ └──hello.svg
├──config
│ ├──definition.json
│ └──tasks.json
├──main.go
├──main_test.go
└──README.mdx
Create a config
directory and add the files definition.json
, tasks.json
,
and setup.json
(optional). Together, these files define the behavior of the
component.
The definition.json
file describes the high-level information of the
component.
{
"id": "hello",
"uid": "e05d3d71-779c-45f8-904d-e90a050ca3b2",
"title": "Hello",
"type": "COMPONENT_TYPE_OPERATOR",
"description": "'Hello, world' operator used as a template for adding components",
"spec": {},
"availableTasks": [
"TASK_GREET"
],
"documentationUrl": "https://www.instill.tech/docs/component/operator/hello",
"icon": "assets/hello.svg",
"version": "0.1.0",
"sourceUrl": "https://github.com/instill-ai/pipeline-backend/pkg/component/blob/main/operator/hello/v0",
"releaseStage": "RELEASE_STAGE_ALPHA",
"public": true
}
This file defines the component properties:
id
is the ID of the component. It must be unique.uid
is a UUID string that must not be already taken by another component. Once it is set, it must not change.title
: is the display title of the component.description
: is a short sentence describing the purpose of the component. It should be written in imperative tense.spec
contains the parameters required to configure the component and that are independent from its tasks. E.g., the API token of a vendor. In general, only AI, data or application components need such parameters.availableTasks
defines the tasks the component can perform.- When a component is created in a pipeline, one of the tasks has to be selected, i.e., a configured component can only execute one task.
- Task configurations are defined in
tasks.json
.
documentationUrl
points to the official documentation of the component.icon
is the local path to the icon that will be displayed in the console when creating the component. If left blank, a placeholder icon will be shown.version
must be a SemVer string. It is encouraged to keep a tidy version history.sourceUrl
points to the codebase that implements the component. This will be used by the documentation generation tool and also will be part of the component definition list endpoint.releaseStage
describes the release stage of the component. Unimplemented stages (RELEASE_STAGE_COMING_SOON
orRELEASE_STAGE_OPEN_FOR_CONTRIBUTION
) will hide the component from the console (i.e. they can't be used in pipelines) but they will appear in the component definition list endpoint.public
indicates whether the component is visible to the public.
The tasks.json
file describes the task details of the component. The key
should be in the format TASK_NAME
.
{
"TASK_GREET": {
"shortDescription": "Greet someone / something",
"title": "Greet",
"input": {
"description": "Input",
"uiOrder": 0,
"properties": {
"target": {
"uiOrder": 0,
"description": "The target of the greeting",
"acceptFormats": [
"string"
],
"title": "Greeting target",
"format": "string"
}
},
"required": [
"target"
],
"title": "Input",
"type": "object"
},
"output": {
"description": "The greeting sentence",
"uiOrder": 0,
"properties": {
"greeting": {
"description": "A greeting sentence addressed to the target",
"uiOrder": 0,
"required": [],
"title": "Greeting",
"type": "string",
"format": "string"
}
},
"required": [
"greeting"
],
"title": "Output",
"type": "object"
}
}
}
This file defines the input and output schema of each task:
Properties within a Task
title
is used by the console to provide the title of the task in the component.description
andshortDescription
are used by the console to provide a description of the task in the component. IfshortDescription
does not exist, it will be the same asdescription
.input
is a JSON Schema that describes the input of the task.output
is a JSON Schema that describes the output of the task.
Properties within input
and output
Objects
required
indicates whether the property is required.format
: describes the format of this field, which could bestring
,number
,boolean
,file
,document
,image
,video
,audio
,array
, orobject
.title
is used by the console to provide the title of the property in the component.description
is used by the console to provide information about this task in the component.shortDescription
: is a concise version ofdescription
, used to fit smaller spaces such as a component form field. If this value is empty, thedescription
value will be used.uiOrder
defines the order in which the properties will be rendered in the component.
Properties within input
Objects
acceptFormats
is an array indicating the data types of acceptable input fields. It should be an array of Instill Format.- Currently, we do not support the
time
type. When the input is adate
ordatetime
, it should be represented as a string. Thedate
ordatetime
will be automatically parsed in UTC timezone by the YAML parser. Please ensure this point is noted in the documentation, specifically for thestart-to-read-date
in the Slack component.
- Currently, we do not support the
instillSecret
indicates the data must reference the secrets and cannot be used in plaintext.
Properties within output
Objects
format
indicates the data type of the output field, which should be one ofstring
,number
,boolean
,file
,document
,image
,video
,audio
,array
, orobject
. Please refer to Instill Format for more details.
See the example recipe to understand how these fields map to the recipe of a pipeline when configured to use this operator.
For components that need to set up some configuration before execution
(typically, components that connect with 3rd party applications or services that
need to set up a connection), setup.json
can be used to describe these
configurations. The format is the same as the input
objects in tasks.json
.
The setup of a component can be defined within the recipe as key-value fields, or as a reference to a Connection (see the Integrations doc for more information). Certain components support OAuth 2.0 integrations. If you want your component to support this sort of connection:
- In
setup.json
, add the OAuth information under theinstillOAuthConfig
property.authUrl
contains the address where the authorization code can be requested.accessUrl
contains the address where the authorization code can be exchanged for an access token.scopes
contains the permissions that will be associated with the access token. Refer to the vendor you want to integrate with in order to identify the scopes that will be required by the different tasks implemented by the component. Note that, if the component is extended later requiring more scopes, users will need to create a new connection in order to leverage the new functionality.
- The OAuth 2.0 exchange for an access token is implemented in the frontend. Make sure to engage with Instill Product in order to prioritise the OAuth support for this component.
Pipeline communicates with components through the IComponent
interface,
defined in the base
package. This package also defines base
implementations for these interfaces, so the hello
component will only need to
override the following methods:
CreateExecution(ComponentExecution) (IExecution, error)
will return an implementation of theIExecution
interface. A base execution implementation is passed in order to define only the behaviour of theExecute
method.Execute(context.Context []*structpb.Struct) ([]*structpb.Struct, error)
is the most important function in the component. All the data manipulation will take place here.
Paste the following code into a main.go
file in operator/hello/v0
:
package hello
import (
"fmt"
"sync"
_ "embed"
"google.golang.org/protobuf/types/known/structpb"
"github.com/instill-ai/pipeline-backend/pkg/component/base"
)
const (
taskGreet = "TASK_GREET"
)
var (
//go:embed config/definition.json
definitionJSON []byte
//go:embed config/tasks.json
tasksJSON []byte
once sync.Once
comp *component
)
type component struct {
base.Component
}
// Init returns an implementation of IComponent that implements the greeting
// task.
func Init(bc base.Component) *component {
once.Do(func() {
comp = &component{Component: bc}
err := comp.LoadDefinition(definitionJSON, nil, tasksJSON, nil)
if err != nil {
panic(err)
}
})
return comp
}
type execution struct {
base.ComponentExecution
}
func (c *component) CreateExecution(x base.ComponentExecution) (base.IExecution, error) {
e := &execution{ ComponentExecution: x }
if x.Task != taskGreet {
return nil, fmt.Errorf("unsupported task")
}
return e, nil
}
func (e *execution) Execute(ctx context.Context, jobs []*base.Job) error {
return nil
}
The hello
operator created in the previous section doesn't implement any
logic. This section will add the greeting logic to the Execute
method.
Let's modify the following methods:
type execution struct {
base.ComponentExecution
execute func(*structpb.Struct) (*structpb.Struct, error)
}
func (c *component) CreateExecution(x base.ComponentExecution) (base.IExecution, error) {
e := &execution{ ComponentExecution: x }
// A simple if statement would be enough in a component with a single task.
// If the number of task grows, here is where the execution task would be
// selected.
switch x.Task {
case taskGreet:
e.execute = e.greet
default:
return nil, fmt.Errorf("unsupported task")
}
return e, nil
}
func (e *execution) Execute(ctx context.Context, jobs []*base.Job) error {
// An execution might take several inputs. One result will be returned for
// each one of them, containing the execution output for that set of
// parameters.
for _, job := range jobs {
input, err := job.Input.Read(ctx)
if err != nil {
return err
}
output, err := e.execute(input)
if err != nil {
return err
}
err = job.Output.Write(ctx, output)
if err != nil {
return err
}
}
return nil
}
func (e *execution) greet(in *structpb.Struct) (*structpb.Struct, error) {
out := new(structpb.Struct)
target := in.Fields["target"].GetStringValue()
greeting := "Hello, " + target + "!"
out.Fields = map[string]*structpb.Value{
"greeting": structpb.NewStringValue(greeting),
}
return out, nil
}
The errmsg
package allows
us to attach messages to our errors.
func (e *execution) greet(in *structpb.Struct) (*structpb.Struct, error) {
out := new(structpb.Struct)
greetee := in.Fields["target"].GetStringValue()
if greetee == "Voldemort" {
return nil, errmsg.AddMessage(fmt.Errorf("invalid greetee"), "He-Who-Must-Not-Be-Named can't be greeted.")
}
greeting := "Hello, " + greetee + "!"
out.Fields = map[string]*structpb.Value{
"greeting": structpb.NewStringValue(greeting),
}
return out, nil
}
The middleware in pipeline-backend
will capture error messages in order to
to return a human-friendly errors to the API clients and console users.
Before initializing testing your component in 💧 Instill VDP, we can unit
test its behaviour. The following code covers the newly added logic by
replicating how the pipeline-backend
workers execute the component logic.
Create a main_test.go
file containing the following code:
package hello
import (
"context"
"testing"
"go.uber.org/zap"
"google.golang.org/protobuf/types/known/structpb"
qt "github.com/frankban/quicktest"
"github.com/instill-ai/pipeline-backend/pkg/component/base"
"github.com/instill-ai/pipeline-backend/pkg/component/internal/mock"
"github.com/instill-ai/x/errmsg"
)
func TestOperator_Execute(t *testing.T) {
c := qt.New(t)
ctx := context.Background()
bc := base.Component{Logger: zap.NewNop()}
component := Init(bc)
c.Run("ok - greet", func(c *qt.C) {
exec, err := component.CreateExecution(base.ComponentExecution{
Component: component,
Task: taskGreet,
})
c.Assert(err, qt.IsNil)
pbIn, err := structpb.NewStruct(map[string]any{"target": "bolero-wombat"})
c.Assert(err, qt.IsNil)
ir, ow, eh, job := mock.GenerateMockJob(c)
ir.ReadMock.Return(&pbIn, nil)
ow.WriteMock.Optional().Set(func(ctx context.Context, output *structpb.Struct) (err error) {
// Check JSON in the output string.
greeting := output.Fields["greeting"].GetStringValue()
// When we use minimock, we can't use the `c.Assert` function.
// Instead, we use the `mock.Equal` function built in the mock package.
mock.Equal(greeting, "Hello, bolero-wombat!")
return nil
})
eh.ErrorMock.Optional()
err = execution.Execute(ctx, []*base.Job{job})
c.Assert(err, qt.IsNil)
})
c.Run("nok - invalid greetee", func(c *qt.C) {
x, err := comp.CreateExecution(base.ComponentExecution{
Component: comp,
Task: taskGreet,
})
c.Assert(err, qt.IsNil)
pbIn, err := structpb.NewStruct(map[string]any{"target": "Voldemort"})
c.Assert(err, qt.IsNil)
ir, ow, eh, job := mock.GenerateMockJob(c)
ir.ReadMock.Return(pbIn, nil)
ow.WriteMock.Optional().Set(func(ctx context.Context, output *structpb.Struct) (err error) {
// Check JSON in the output string.
greeting := output.Fields["greeting"].GetStringValue()
mock.Equal(greeting, "Hello, Voldemort!")
return nil
})
eh.ErrorMock.Optional()
err = x.Execute(ctx, []*base.Job{job})
c.Assert(err, qt.ErrorMatches, "invalid greetee")
c.Assert(errmsg.Message(err), qt.Matches, "He-Who-Must-Not-Be-Named can't be greeted.")
})
}
func TestOperator_CreateExecution(t *testing.T) {
c := qt.New(t)
bc := base.Component{Logger: zap.NewNop()}
operator := Init(bc)
c.Run("nok - unsupported task", func(c *qt.C) {
task := "FOOBAR"
_, err := operator.CreateExecution(base.ComponentExecution{
Component: component,
Task: task,
})
c.Check(err, qt.ErrorMatches, "unsupported task")
})
}
In our testing methodology, we use two main approaches for mocking external services:
- In some components, we mock only the interface and skip testing the actual client, as with services like Slack and HubSpot.
- In other components, we create a fake server for test purposes, as with OpenAI integrations.
Moving forward, we plan to standardize on the second approach, integrating all components to use a fake server setup for testing.
The last step before being able to use the component in 💧 Instill VDP is
loading the hello
operator. This is done in the Init
function in
store.go
:
package store
import (
// ...
"github.com/instill-ai/pipeline-backend/pkg/component/operator/hello/v0"
)
// ...
func Init(logger *zap.Logger) *Store {
baseComp := base.component{Logger: logger}
once.Do(func() {
compStore = &Store{
componentUIDMap: map[uuid.UUID]*component{},
componentIDMap: map[string]*component{},
}
// ...
compStore.Import(hello.Init(baseComp))
})
return compStore
}
Re-run your local pipeline-backend
build:
$ make rm && make dev
$ docker exec pipeline-backend go run ./cmd/init # this will load the new component into the database
$ docker exec -d pipeline-backend go run ./cmd/worker # run without -d in a separate terminal if you want to access the logs
$ docker exec pipeline-backend go run ./cmd/main
Head to the console at http://localhost:3000/ (default password is password
)
and create a pipeline.
- In the variable component, add a
who
text field. - Create a hello operator and reference the variable input field by
adding
${variable.who}
to thetarget
field. - In the output component, add a
greeting
output value that references the hello output by introducing${hello-0.output.greeting}
.
You can copy the recipe from the example.
If you introduce a Wombat
string value in the Input form and Run the
pipeline, you should see Hello, Wombat!
in the response.
variable:
who:
title: Who
description: Who should be greeted?
format: string
component:
hello-0:
type: hello
task: TASK_GREET
input:
target: ${variable.who}
output:
greeting:
title: Greeting
description: The output greeting
value: ${hello-0.output.greeting}
Documentation helps user to integrate the component in their pipelines. A good
component definition will have clear names for their fields, which will also
contain useful descriptions. The information described in definition.json
and
tasks.json
is enough to understand how a component should be used. compogen
is a tool that parses the component configuration and builds a README.mdx
file
document displaying its information in a human-readable way. To generate the
document, just add the following line on top of operator/hello/v0/main.go
:
//go:generate compogen readme ./config ./README.mdx
Then, go to the base of the pipeline-backend
repository and run:
$ make gen-component-doc
The documentation of the component can be extended with the --extraContents
flag:
$ mkdir -p operator/hello/.compogen
$ echo '### Final words
Thank you for reading!' > operator/hello/.compogen/extra-bottom.mdx
//go:generate compogen readme ./config ./README.mdx --extraContents bottom=.compogen/extra-bottom.mdx
Check compogen
's README for more information.
The version of a component is useful to track its evolution and to set expectations about its stability. When the interface of a component (defined by its configuration files) changes, its version should change following the Semantic Versioning guidelines.
- Patch versions are intended for bug fixes.
- Minor versions are intended for backwards-compatible changes, e.g., a new task or a new input field with a default value.
- Major versions are intended for backwards-incompatible changes.
- At this point, since there might be pipelines using the previous version, a
new package MUST be created. E.g.,
operator/json/v0
->operator/json/v1
.
- At this point, since there might be pipelines using the previous version, a
new package MUST be created. E.g.,
- Build and pre-release labels are discouraged, as components are shipped as part of 💧 Instill VDP and they aren't likely to need such fine-grained version control.
It is recommended to start a component at v0.1.0
. A major version 0 is
intended for rapid development.
The releaseStage
property in definition.json
indicates the stability of a
component.
- A component skeleton (with only the minimal configuration files and a dummy implementation of the interfaces) may use the Coming Soon or Open For Contribution stages in order to communicate publicly about upcoming components. The major and minor versions in this case MUST be 0.
- Alpha pre-releases are used in initial implementations, intended to gather feedback and issues from early adopters. Breaking changes are acceptable at this stage.
- Beta pre-releases are intended for stable components that don't expect breaking changes.
- General availability indicates production readiness. A broad adoption of the beta version in production indicates the transition to GA is ready.
The typical version and release stage evolution of a component might look like this:
Version | Release Stage |
---|---|
0.1.0 | RELEASE_STAGE_ALPHA |
0.1.1 | RELEASE_STAGE_ALPHA |
0.1.2 | RELEASE_STAGE_ALPHA |
0.2.0 | RELEASE_STAGE_ALPHA |
0.2.1 | RELEASE_STAGE_ALPHA |
0.3.0 | RELEASE_STAGE_BETA |
0.3.1 | RELEASE_STAGE_BETA |
0.4.0 | RELEASE_STAGE_BETA |
1.0.0 | RELEASE_STAGE_GA |