Snowflake is a PostgreSQL extension that provides an int8
and sequence
based unique ID solution to optionally replace the PostgreSQL built-in bigserial
data type.
Internally, snowflakes
are 64 bit integers represented externally as bigint
values. The 64 bits are divided into bit fields:
bit 63 - unused (sign of int8)
bits 22-62 - timestamp with millisecond precision
bits 10-21 - counter within one millisecond
bits 0-9 - unique PostgreSQL node number set in postgresql.conf
-
The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.
-
The counter is a 12-bit unsigned value that increments per ID allocation. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second.
-
The
node number
is a 10-bit unique identifier of the PostgreSQL instance inside a global cluster. This value is set with the GUCsnowflake.node
in thepostgresql.conf
file.
With this design each Snowflake ID is unique within one sequence
across multiple PostgreSQL instances in a distributed cluster.
Installing Snowflake with pgEdge binaries
To use pgEdge binaries to install Snowflake, go to pgeEdge Github and install the pgEdge CLI; then use the CLI to create a PostgreSQL node with the Snowflake extension installed:
./pgedge install pg16 --start : install snowflake
Installing Snowflake from source code
If you're installing Snowflake from source code, we assume that you're familiar with how to build standard PostgreSQL extensions from source. Before building Snowflake:
- Build a copy of PostgreSQL in your preferred version.
- Add pg_config to your $PATH environment variable.
Then, you can build the snowflake extension
cd contrib
git clone https://github.com/pgEdge/snowflake.git
cd snowflake
USE_PGXS=1 make
USE_PGXS=1 make install
After installing the Snowflake extension with the pgEdge binary or from source code, connect to your Postgres database and create the extension with the command:
CREATE EXTENSION snowflake;
The Snowflake extension uses a custom GUC named snowflake.node
that determines the node
part of each Snowflake generated by the PostgreSQL instance. The permitted values are 1 thru 1023.
This GUC has an invalid default value (on purpose); if not set, the Snowflake extension will throw an exception on a call to snowflake.nextval()
. This is intended to prevent you from accidentally missing this GUC in your postgresql.conf
file.
If you intend to use Snowflake in a multi-node, distributed, or replicated setup, it is important to set the GUC to a unique value for each PostgreSQL instance. There is no protection in place to prevent assigning multiple PostgreSQL instances in a multi-master cluster the same node number.
If you are using the spock extension, you can use the pgedge spock sequence-convert
command at the psql command line to convert your existing sequence definitions into Snowflake sequences. Note that this converts the sequence definition; existing values in a sequence column will not change. The command syntax is:
pgedge spock.sequence_convert sequence_name
Where
sequence_name
is the name of a sequence; you can use wildcards when specifying the sequence name to convert sequences in more than one table.
For example, the following command:
[pgedge]$ ./pgedge spock.sequence_convert my_sequence
Converts the sequences in tables that reside in the public
schema of the acctg
database to use Snowflake sequences. If you invoke the command on one node of a replicating cluster, the table definition updates are propagated to the other nodes in the cluster.
You can query the following Snowflake functions at the psql command line to return useful information about tables that use Snowflake sequences.
snowflake.nextval([sequence regclass])
Generates the next Snowflake in the specified sequence. If no sequence is specified, the internal, database-wide sequence snowflake.id_seq
is used. For example, the following query returns the next number in the orders_id_seq
sequence:
acctg=# SELECT * FROM snowflake.nextval('orders_id_seq'::regclass);
nextval
--------------------
136169504773242881
(1 row)
snowflake.currval([sequence regclass])
Returns the current value of the specified sequence. This value is undefined until the function snowflake.nextval()
has been called for the sequence in the current session. For example, the following query returns the current value of the orders_id_seq
sequence:
acctg=# SELECT * FROM snowflake.currval('orders_id_seq'::regclass);
currval
--------------------
136169504773242881
(1 row)
snowflake.get_epoch(snowflake int8)
Returns the timestamp part of a Snowflake as EPOCH
(seconds since 2023-01-01) as a NUMERIC
value with precision of three digits. For example, the timestamp of the following sequence value is 1704996539.845
seconds past Jan. 1, 2023:
acctg=# SELECT * FROM snowflake.get_epoch(136169504773242881);
get_epoch
----------------
1704996539.845
(1 row)
You can use the to_timestamp(snowflake.get_epoch(<value>))
function to convert the epoch into a timestamp. For example:
acctg=# SELECT to_timestamp(snowflake.get_epoch(136169504773242881));
to_timestamp
----------------------------
2024-01-11 13:08:59.845-05
(1 row)
snowflake.get_count(snowflake int8)
Returns the count part of the given Snowflake as int4
value; count resets to 0
for each new millisecond. For example:
acctg=# SELECT snowflake.get_count(136169504773242881);
get_count
-----------
3
(1 row)
snowflake.get_node(snowflake int8)
Returns the value of snowflake.node
in postgresql.conf
for the host of the specified Snowflake sequence. For example, the Snowflake sequence number 136169504773242881
resides on Node 1
:
acctg=# SELECT * FROM snowflake.get_node(136169504773242881);
get_node
----------
1
(1 row)
snowflake.format(snowflake int8)
Returns a jsonb
object of the given Snowflake in the form: {"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}
. For example:
acctg=# SELECT * FROM snowflake.format(136169504773242881);
format
-----------------------------------------------------------
{"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}
(1 row)
If you query the table meta-data before performing a sequence conversion with the \d+ orders
command, the PostgreSQL sequence is defined in the Default
column:
acctg=# \d+ orders;
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
----------+-------------------+-----------+----------+------------------------------------+----------+-------------+--------------+-------------
customer | character varying | | | | extended | | |
invoice | character varying | | | | extended | | |
id | bigint | | not null | nextval('orders_id_seq'::regclass) | plain | | |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
Access method: heap
After performing a conversion, the \d+ table_name
command shows a Default
value of snowflake.nextval('orders_id_seq'::regclass)
:
acctg=# \d+ orders
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Descripti
on
----------+-------------------+-----------+----------+----------------------------------------------+----------+-------------+--------------+----------
---
customer | character varying | | | | extended | | |
invoice | character varying | | | | extended | | |
id | bigint | | not null | snowflake.nextval('orders_id_seq'::regclass) | plain | | |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
Access method: heap