Postgres backend support #33

cgillum · 2023-10-06T20:44:57Z

This PR adds support for postgres as as a built-in alternative to sqlite for storing orchestration state. The idea is to support an open-source database that can be used more easily in scaled-out deployments.

I've also updated the project to target go 1.20 by default since 1.18 is no longer supported.

discovered after testing w/dapr

ItalyPaleAle · 2023-10-12T22:26:19Z

backend/postgres/postgres.go

+
+	"google.golang.org/protobuf/encoding/protojson"
+
+	_ "github.com/jackc/pgx/v5/stdlib"


Although it's a bit more of a change, I recommend using pgx's native APIs and not the stdlib wrapper.

The authors of pgx recommend using the stdlib wrapper only when you're trying to use the same code for multiple databases.

The native driver has much better performance.

The native driver offers some additional things (see my other comments below)

ItalyPaleAle · 2023-10-12T22:27:15Z

backend/postgres/schema.sql

+CREATE SCHEMA IF NOT EXISTS durabletask;
+
+CREATE TABLE IF NOT EXISTS durabletask.Instances (
+    InstanceID VARCHAR PRIMARY KEY NOT NULL,


Since these are UUIDs, consider using the native UUID type for Postgres. It's more efficient as it stores the UUID as a binary sequence (internally). This is especially helpful since the UUID is used as primary key, so it's in many indexes

We don't restrict instance IDs to be UUIDs. This is just our convention for defaults. Many users, however, will desire to provide their own instance IDs to match some business entity or to implement the singleton pattern.

Makes sense

ItalyPaleAle · 2023-10-12T22:27:27Z

backend/postgres/schema.sql

+    Name VARCHAR NOT NULL, -- the type name of the orchestration or entity
+    Version VARCHAR NULL, -- the version of the orchestration (optional)
+    RuntimeStatus VARCHAR NOT NULL,
+    CreatedTime TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,


For all timestaps, I recommend using "TIMESTAMP WITH TIME ZONE"

ItalyPaleAle · 2023-10-12T22:28:41Z

backend/postgres/schema.sql

+    Input VARCHAR NULL,
+    Output VARCHAR NULL,
+    CustomStatus VARCHAR NULL,
+    FailureDetails JSON NULL,


Are you planning on querying the fields inside the FailureDetails section? If not, best to store JSON in a VARCHAR (i.e. TEXT) column. The JSON column (and in theory, JSONB should be used instead) is useful to be able to query inside a JSON document, but if it's just for storing data, using TEXT/VARCHAR is more efficient.

ItalyPaleAle · 2023-10-12T22:29:24Z

backend/postgres/schema.sql

+CREATE INDEX IF NOT EXISTS IX_Instances_CreatedTime ON durabletask.Instances(CreatedTime);
+
+CREATE TABLE IF NOT EXISTS durabletask.History (
+    InstanceID VARCHAR NOT NULL,


Would it make sense to make all these InstanceID's foreign key references? This way, if the record is deleted from the Instances table, the deletion is cascaded

ItalyPaleAle · 2023-10-12T22:45:09Z

backend/postgres/postgres.go

+		be.workerName,     // LockedBy for Instances table
+		newLockExpiration, // Updated LockExpiration for Instances table
+		now,               // LockExpiration for Instances table
+		now,               // VisibleTime for NewEvents table


Duplicate. Can use $3 twice and pass one parameter only

ItalyPaleAle · 2023-10-12T22:45:29Z

backend/postgres/postgres.go

+			UPDATE durabletask.NewEvents SET DequeueCount = DequeueCount + 1, LockedBy = $1 WHERE ctid IN (
+				SELECT ctid FROM durabletask.NewEvents
+				WHERE InstanceID = $2 AND (VisibleTime IS NULL OR VisibleTime <= $3)
+				LIMIT 1000


Magic number?

ItalyPaleAle · 2023-10-12T22:46:28Z

backend/postgres/postgres.go

+		}
+
+		e := new(protos.HistoryEvent)
+		err := protojson.Unmarshal(eventPayload, e)


Why not storing protobuf inside the DB instead? It's more efficient than JSON anyways, as long as we don't care that users may want to modify the data themselves by hand?

Debuggability. It's incredibly painful to try and inspect the database when things go wrong if we store everything as protobuf (which is what we do currently in Dapr and in the SQLite database). I thought the performance vs. debuggability tradeoff would be worthwhile.

That can make sense. I would still recommend changing the column types to TEXT however, unless you need to perform queries that look into the JSON data.

Or... https://github.com/mpartel/postgres-protobuf :)

ItalyPaleAle · 2023-10-12T22:47:12Z

backend/postgres/postgres.go

+		if ctx.Err() != nil {
+			return nil, ctx.Err()
+		}


Suggested change

if ctx.Err() != nil {

return nil, ctx.Err()

}

if errors.Is(err, ctx.Err()) {

return nil, ctx.Err()

}

ItalyPaleAle · 2023-10-12T22:52:25Z

backend/postgres/postgres.go

+	_, err = tx.ExecContext(ctx, "INSERT INTO durabletask.NewEvents (InstanceID, EventPayload) VALUES ($1, $2)", string(wi.InstanceID), bytes)
+	if err != nil {
+		if ctx.Err() != nil {
+			return ctx.Err()
+		}
+		return fmt.Errorf("failed to insert into NewEvents table: %w", err)
+	}
+
+	dbResult, err := tx.ExecContext(ctx, "DELETE FROM durabletask.NewTasks WHERE SequenceNumber = $1 AND LockedBy = $2", wi.SequenceNumber, wi.LockedBy)


For funsies, you could perform both these queries in a single statement to avoid the use of a transaction, so you'd reduce the number of round-trips from 4 to 1 and avoid locking the tables...

WITH del AS ( DELETE FROM durabletask.NewTasks WHERE SequenceNumber = $1 AND LockedBy = $2 ) INSERT INTO durabletask.NewEvents (InstanceID, EventPayload) VALUES ($3, $4)

See for example: https://dba.stackexchange.com/questions/267243/want-to-delete-then-insert-with-single-statement-using-a-cte-in-postgres

ItalyPaleAle · 2023-10-12T23:04:59Z

backend/postgres/schema.sql

+    ExecutionID VARCHAR NOT NULL,
+    Name VARCHAR NOT NULL, -- the type name of the orchestration or entity
+    Version VARCHAR NULL, -- the version of the orchestration (optional)
+    RuntimeStatus VARCHAR NOT NULL,


Consider using an ENUM type here: https://www.postgresql.org/docs/current/datatype-enum.html

ItalyPaleAle · 2023-10-12T23:38:59Z

backend/postgres/postgres.go

+	}
+	defer tx.Rollback()
+
+	row := tx.QueryRowContext(ctx, "SELECT 1 FROM durabletask.Instances WHERE InstanceID = $1", string(id))


Here's another option to write this method without using transactions and a single atomic query:

For the History table, make InstanceID a foreign key reference to Instances.InstanceID, with an "ON DELETE CASCADE". This means that if the row is deleted from Instances, it's automatically (and atomically) deleted from History too.

Next, use this query that performs both a lookup and a deletion atomically, without the need for a transaction:

WITH del AS ( DELETE FROM durabletask.Instances WHERE InstanceID = $1 AND RuntimeStatus IN ('COMPLETED', 'FAILED', 'TERMINATED') RETURNING InstanceID ), delcount AS ( SELECT count(*) AS count FROM del ) SELECT delcount.count FROM durabletask.Instances, delcount WHERE durabletask.Instances.InstanceID = $1

This query has 3 possible states:

No rows returned: this means the instance doesn't exist (you can return api.ErrInstanceNotFound)

Returned value is "0" -> The instance exists but it's running (you can return api.ErrNotCompleted)

Returned value is "1" -> The instance exists and it's been deleted successfully (it was in completed/failed/terminated state)

balchua · 2023-12-30T00:56:38Z

Hi i am also interested in having postgres support, do you have the timeline on when this will land?
Thanks

ivan-penchev · 2024-08-15T05:50:42Z

Any updates, if the team would be working on this :)?

cgillum added 3 commits October 6, 2023 20:41

Postgres backend support

50b5a78

Switch from pq to pgx

7507685

Various fixes & improvements

5a5e683

discovered after testing w/dapr

ItalyPaleAle reviewed Oct 12, 2023

View reviewed changes

balchua mentioned this pull request Jan 14, 2024

Issue with distributed traces and multiple instances of the worker #59

Closed

acx1729 mentioned this pull request Nov 21, 2024

Postgres Support #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres backend support #33

Postgres backend support #33

cgillum commented Oct 6, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

cgillum Oct 24, 2023

ItalyPaleAle Oct 24, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

cgillum Oct 24, 2023

ItalyPaleAle Oct 24, 2023

ItalyPaleAle Oct 24, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

ItalyPaleAle Oct 12, 2023

balchua commented Dec 30, 2023

ivan-penchev commented Aug 15, 2024


		"google.golang.org/protobuf/encoding/protojson"

		_ "github.com/jackc/pgx/v5/stdlib"

Postgres backend support #33

Are you sure you want to change the base?

Postgres backend support #33

Conversation

cgillum commented Oct 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balchua commented Dec 30, 2023

ivan-penchev commented Aug 15, 2024