Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances #664

nscuro · 2023-07-11T12:06:42Z

For CI/CD use-cases, Dependency-Track offers a mechanism that allows clients to poll whether the BOM they just uploaded is still being processed. "Processing" in this context refers to:

Consumption (Parsing)
Ingestion (Sync parsed data with database)
Vulnerability analysis
Policy Evaluation

This is an important capability, allowing for implementation of quality gates and reporting in CI/CD pipelines.

The mechanism works by assigning identifiers (UUIDs) to events in Alpine's in-memory event system. As long as an event associated to a given identifier can be found in its internal queue, the identifier is considered to be "processing":

Events can be chained, such that a BomUploadEvent will trigger a VulnerabilityAnalysisEvent, which will trigger a PolicyEvaluationEvent, and so on. The event identifier is inherited by chained events.

As everything happens in-memory, this mechanism does not work when multiple instances of the API server are used in an active-active deployment. For the functionality to continue to work, the state of processing steps must be persistent to external storage.

Decoupled from #633.

Implementation Status

Design

Note
The goal for now is not to build a multi-purpose workflow engine, but to track state of one specific workflow. In a future iteration, we may invest more time into coming up with a generic workflow engine concept.

What to track

BOM consumption (parsing, de-duplication, etc.)
BOM processing (ingestion into database)
Vulnerability analysis
Repository metadata analysis
Policy evaluation
Metrics update

The order of execution as of today is as follows:

flowchart LR
    A(BOM_CONSUMPTION) --> B(BOM_PROCESSING)
    B --> C(VULN_ANALYSIS)
    B --> D(REPO_META_ANALYSIS)
    C --> E(POLICY_EVALUATION)
    E --> F(METRICS_UPDATE)
    B -. BOM contains no <br/>components to analyze .-> F

Note
Completion of repository metadata analysis can currently not be tracked. We'll need something similar to what we introduced in DependencyTrack/hyades-apiserver#40 for vulnerability analysis completion tracking. For the initial implementation, it may be OK to not track it.

States

There are multiple states a processing step can be in:

PENDING: Initial state
COMPLETED: Completion detected; No failures
FAILED: Completion detected; Failures
CANCELLED: Never started because a precondition failed
- e.g. vulnerability analysis will be cancelled when BOM ingestion failed
NOT_APPLICABLE: The step is not applicable to the subject of the analysis
- e.g. the analysis was triggered for an existing project, without BOM upload
TIMED_OUT: The deadline for the step completion was exceeded; Step is unlikely to complete

stateDiagram-v2
    [*] --> PENDING
    PENDING --> COMPLETED
    COMPLETED --> [*]
    PENDING --> FAILED
    FAILED --> [*]
    PENDING --> CANCELLED
    CANCELLED --> [*]
    [*] --> NOT_APPLICABLE
    NOT_APPLICABLE --> [*]
    PENDING --> TIMED_OUT
    TIMED_OUT --> COMPLETED
    TIMED_OUT --> FAILED

Each step is responsible for updating its own state.

The overall state can be considered to be complete, if there's no step in PENDING state.

When a step failure is detected, a "failure reason" message must be persisted. If multiple steps fail, (rough) failure details for each step must be available.

There should be a deadline mechanism, which automatically transitions steps from PENDING into TIMED_OUT state. Steps in TIMED_OUT state communicate that it is unlikely that a terminal state can be reached (COMPLETED, FAILED). However, it is still possible (e.g. due to significant consumer lag, events may arrive late).

Workflow

The API server will act as workflow orchestrator, kicking off new steps as needed. This implies that completion of a step must be registered by the API server.

To illustrate how the initial stage of the workflow should look like:

sequenceDiagram
    Client ->>+ BomResource: Upload BOM
    BomResource ->> BomResource: Validate BOM
    BomResource ->> BomResource: Generate Correlation ID (UUID)
    BomResource ->> Database: Create workflow steps
    Note over BomResource, Database: Token=<CORRELATION_ID>, Step=BOM_CONSUMPTION<br/>Token=<CORRELATION_ID>, Step=BOM_PROCESSING<br/>Token=<CORRELATION_ID>, Step=VULN_ANALYSIS<br/>Status=PENDING
    BomResource ->> BomUploadProcessingTask: Dispatch BomUploadEvent
    BomResource ->>- Client: Correlation ID
    par
        loop Continuously
            Client ->> BomResource: Poll Status
            Note over Client, BomResource: GET /api/v1/bom/token/<CORRELATION_ID>/status
        end
    and
        BomUploadProcessingTask ->> Database: Update step start time
        activate BomUploadProcessingTask
        Note over BomUploadProcessingTask, Database: Step=BOM_CONSUMPTION<br/>StartedAt=<NOW>
        BomUploadProcessingTask ->> BomUploadProcessingTask: Consume BOM<br/>(Parse, De-dupe)
        alt Consumption succeeded
            BomUploadProcessingTask ->> Database: Update step status
            Note over BomUploadProcessingTask, Database: Step=BOM_CONSUMPTION<br/>Status=COMPLETED<br/>UpdatedAt=<NOW>
            BomUploadProcessingTask ->> Database: Process BOM<br/>(Insert, Update, Delete)
            alt Processing succeeded
                BomUploadProcessingTask ->> Database: Update step status
                Note over BomUploadProcessingTask, Database: Step=BOM_PROCESSING<br/>Status=COMPLETED<br/>UpdatedAt=<NOW>
                BomUploadProcessingTask ->> Kafka: Publish events to "dtrack.vuln-analysis.component" topic
                BomUploadProcessingTask ->> Kafka: Publish events to "dtrack.repo-meta-analysis.component" topic
            else Processing failed
                BomUploadProcessingTask ->> Database: Update step status
                Note over BomUploadProcessingTask, Database: Step=BOM_PROCESSING<br/>Status=FAILED<br/>UpdatedAt=<NOW>
                BomUploadProcessingTask ->> Database: Cancel follow-up steps
            Note over BomUploadProcessingTask, Database: Step=VULN_ANALYSIS<br/>Status=CANCELLED
            end
        else Consumption failed
            BomUploadProcessingTask ->> Database: Update step Status
            Note over BomUploadProcessingTask, Database: Status=FAILED<br/>UpdatedAt=<NOW>
            BomUploadProcessingTask ->> Database: Cancel follow-up steps
            Note over BomUploadProcessingTask, Database: Step=BOM_PROCESSING<br/>Status=CANCELLED
        end
        deactivate BomUploadProcessingTask
    end

When triggering a re-analysis of an already existing project, the workflow may be kicked off as follows:

sequenceDiagram
    Client ->>+ FindingResource: Re-Analyze project
    FindingResource ->> FindingResource: Generate Correlation ID (UUID)
    FindingResource ->> Database: Create workflow steps
    Note over FindingResource, Database: Token=<CORRELATION_ID>, Step=VULN_ANALYSIS<br/>Token=<CORRELATION_ID>, Step=POLICY_EVALUATION<br/>Status=PENDING
    FindingResource ->> Kafka: Publish events to "dtrack.vuln-analysis.component" topic
    FindingResource ->>- Client: Correlation ID
    loop Continuously
        Client ->> FindingResource: Poll Status
        Note over Client, FindingResource: GET /api/v1/bom/token/<CORRELATION_ID>/status
    end

Proposed Table Schema

Each step of the workflow will be represented in a dedicated row. This allows us to add or remove steps without altering the database schema (see original version of the schema further down below), or even add steps while the workflow is running. It also plays better with concurrent writes, as no two threads / instances will need to modify the same row.

Name	Type	Nullable	Example
ID	`SERIAL`	❌	1
PARENT_STEP_ID	`SERIAL FK`	✅	0
TOKEN	`VARCHAR(36)`	❌	`484d9eaa-7ea4-4476-97d6-f36327b5a626`
STARTED_AT	`TIMESTAMP`	✅	`1999-01-08 04:05:06`
UPDATED_AT	`TIMESTAMP`	✅	`1999-01-08 04:05:06`
STEP	`VARCHAR(64)`	❌	`METRICS_UPDATE`
STATUS	`VARCHAR(64)`	❌	`PENDING`
FAILURE_REASON	`TEXT`	✅	`Failed to acquire database connection`

Potential Future Improvements:

Do we need/want to capture the order in which steps are supposed to be executed?
Do we need/want to capture metadata of the overall workflow (who triggered it, when was it triggered, correlation id, ...)?

Original Version

Name	Type	Nullable	Example
TOKEN	`VARCHAR(36)`	❌	`484d9eaa-7ea4-4476-97d6-f36327b5a626`
STARTED_AT	`TIMESTAMP`	❌	`1999-01-08 04:05:06`
UPDATED_AT	`TIMESTAMP`	❌	`1999-01-08 04:05:06`
BOM_CONSUMPTION	`VARCHAR(64)`	❌	`PENDING`
BOM_PROCESSING	`VARCHAR(64)`	❌	`PENDING`
VULN_ANALYSIS	`VARCHAR(64)`	❌	`PENDING`
REPO_META_ANALYSIS	`VARCHAR(64)`	❌	`PENDING`
POLICY_EVALUATION	`VARCHAR(64)`	❌	`PENDING`
METRICS_UPDATE	`VARCHAR(64)`	❌	`PENDING`
FAILURE_REASON	`TEXT`	✅	-

FAILURE_REASON is a field of unlimited length. It either holds no value (NULL), or a JSON object listing failure details per step, e.g.:

{
  "BOM_PROCESSING": "Failed to acquire database connection"
}

Where applicable, the "detailed" status of a step is tracked in a dedicated table. For example, for vulnerability analysis:

hyades/commons-persistence/src/main/resources/migrations/postgres/V0.0.1__API-Server-4.8.2.sql

Lines 158 to 171 in e70c47f

    
           CREATE TABLE "VULNERABILITYSCAN" 
        
           ( 
        
               "ID" BIGSERIAL, 
        
               "EXPECTED_RESULTS" int4 NOT NULL, 
        
               "RECEIVED_RESULTS" int4 NOT NULL, 
        
               "STARTED_AT" timestamptz NOT NULL, 
        
               "STATUS" varchar(255) NOT NULL, 
        
               "TARGET_IDENTIFIER" varchar(255) NOT NULL, 
        
               "TARGET_TYPE" varchar(255) NOT NULL, 
        
               "TOKEN" varchar(255) NOT NULL, 
        
               "UPDATED_AT" timestamptz NOT NULL, 
        
               "VERSION" int8 NOT NULL, 
        
               CONSTRAINT "VULNERABILITYSCAN_PK" PRIMARY KEY ("ID") 
        
           );

Retention

Rows in the table should be cleaned up on a recurring basis.

This could be as simple as scheduling a job that executes this SQL query:

DELETE FROM org.dependencytrack.model.WorkflowStep WHERE this.updatedAt < :threshold

A retention time of 1-3 days since the last update should be reasonable.

REST API endpoints

The existing endpoint to check whether a BOM is still being processed should continue to work as expected:

GET /api/v1/bom/token/<CORRELATION_ID>

{
  "processing": true
}

An additional endpoint may be added, which allows for retrieval of the individual step states:

GET /api/v1/workflow/token/<CORRELATION_ID>/status

[
  {
    "step": "BOM_CONSUMPTION",
    "status": "COMPLETED",
    "startedAt": "1999-01-08 04:05:06",
    "updatedAt": "1999-01-08 04:05:06"
  },
  {
    "step": "BOM_PROCESSING",
    "status": "FAILED",
    "startedAt": "1999-01-08 04:05:06",
    "updatedAt": "1999-01-08 04:05:06",
    "failureReason": "Failed to acquire database connection"
  },
  {
    "step": "VULN_ANALYSIS",
    "status": "CANCELLED",
  }
]

If all a client cares about are vulnerability analysis results, they could stop polling immediately after vulnerabilityAnalysis transitions into the COMPLETED state.

The text was updated successfully, but these errors were encountered:

So it can be revisited and reviewed without going through loads of issues. Signed-off-by: nscuro <[email protected]>

So it can be revisited and reviewed without going through loads of issues. Closes #664 Signed-off-by: nscuro <[email protected]>

Signed-off-by: mehab <[email protected]>

nscuro added enhancement New feature or request p1 Critical bugs that prevent DT from being used, or features that must be implemented ASAP proposal size/M Medium effort component/api-server labels Jul 11, 2023

nscuro added this to Hyades Jul 11, 2023

nscuro moved this to Todo in Hyades Jul 11, 2023

nscuro mentioned this issue Jul 11, 2023

Externalize BOM ingestion pipeline #633

Open

nscuro changed the title ~~WIP: Track processing / analysis status in database to make it accessible to multiple API server instances~~ Track processing / analysis status in database to make it accessible to multiple API server instances Jul 11, 2023

nscuro changed the title ~~Track processing / analysis status in database to make it accessible to multiple API server instances~~ Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances Jul 12, 2023

This was referenced Jul 14, 2023

Implement data model and queries for workflow state of BOM processing / analysis #671

Closed

Achieve horizontal scalability of API server #375

Closed

VithikaS mentioned this issue Jul 18, 2023

Create model for workflow engine DependencyTrack/hyades-apiserver#241

Merged

5 tasks

This was referenced Jul 20, 2023

Implement REST endpoints to query workflow states #684

Closed

Implement workflow initiation upon BOM upload #685

Closed

Implement cleanup task for cleaning up and timing out workflow steps #686

Closed

nscuro added size/XL Higher effort meta and removed size/M Medium effort proposal labels Jul 20, 2023

sahibamittal mentioned this issue Jul 21, 2023

Implement workflow status for vulnerability analysis stage #693

Closed

This was referenced Jul 25, 2023

Implement workflow status for metrics update task #700

Closed

Implement workflow status for policy evaluation task #701

Closed

nscuro moved this from Todo to In Progress in Hyades Jul 26, 2023

sahibamittal mentioned this issue Jul 27, 2023

Isuue-693 : workflow for vuln analysis stage DependencyTrack/hyades-apiserver#252

Merged

5 tasks

nscuro pinned this issue Jul 28, 2023

nscuro added a commit that referenced this issue Aug 1, 2023

Decouple design docs from #664 into markdown file

f121e96

So it can be revisited and reviewed without going through loads of issues. Signed-off-by: nscuro <[email protected]>

nscuro added a commit that referenced this issue Aug 1, 2023

Decouple design docs from #664 into markdown file

225d453

So it can be revisited and reviewed without going through loads of issues. Closes #664 Signed-off-by: nscuro <[email protected]>

nscuro mentioned this issue Aug 1, 2023

Decouple design docs from #664 into markdown file #715

Merged

sahibamittal closed this as completed in #715 Aug 1, 2023

sahibamittal pushed a commit that referenced this issue Aug 1, 2023

Decouple design docs from #664 into markdown file (#715)

f85659c

github-project-automation bot moved this from In Progress to Done in Hyades Aug 1, 2023

nscuro unpinned this issue Aug 2, 2023

mehab pushed a commit that referenced this issue Sep 12, 2023

Decouple design docs from #664 into markdown file (#715)

be3264f

Signed-off-by: mehab <[email protected]>

nscuro mentioned this issue Jan 15, 2024

Proper feedback on BOM upload DependencyTrack/dependency-track#3379

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances #664

Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances #664

nscuro commented Jul 11, 2023 •

edited

Loading

Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances #664

Track workflow state of BOM processing / analysis in database to make it accessible to multiple API server instances #664

Comments

nscuro commented Jul 11, 2023 • edited Loading

Implementation Status

Design

What to track

States

Workflow

Proposed Table Schema

Retention

REST API endpoints

nscuro commented Jul 11, 2023 •

edited

Loading