Skip to content

pgEdge/snowflake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake Sequences for PostgreSQL

Snowflake is a PostgreSQL extension that provides an int8 and sequence based unique ID solution to optionally replace the PostgreSQL built-in bigserial data type.

Internally, snowflakes are 64 bit integers represented externally as bigint values. The 64 bits are divided into bit fields:

bit  63    - unused (sign of int8)
bits 22-62 - timestamp with millisecond precision
bits 10-21 - counter within one millisecond
bits 0-9   - unique PostgreSQL node number set in postgresql.conf
  • The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.

  • The counter is a 12-bit unsigned value that increments per ID allocation. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second.

  • The node number is a 10-bit unique identifier of the PostgreSQL instance inside a global cluster. This value is set with the GUC snowflake.node in the postgresql.conf file.

With this design each Snowflake ID is unique within one sequence across multiple PostgreSQL instances in a distributed cluster.

Installation

Installing Snowflake with pgEdge binaries

To use pgEdge binaries to install Snowflake, go to pgeEdge Github and install the pgEdge CLI; then use the CLI to create a PostgreSQL node with the Snowflake extension installed:

./pgedge install pg16 --start : install snowflake

Installing Snowflake from source code

If you're installing Snowflake from source code, we assume that you're familiar with how to build standard PostgreSQL extensions from source. Before building Snowflake:

  1. Build a copy of PostgreSQL in your preferred version.
  2. Add pg_config to your $PATH environment variable.

Then, you can build the snowflake extension

cd contrib
git clone https://github.com/pgEdge/snowflake.git
cd snowflake
USE_PGXS=1 make
USE_PGXS=1 make install

After installing the Snowflake extension with the pgEdge binary or from source code, connect to your Postgres database and create the extension with the command:

CREATE EXTENSION snowflake;

Configuring Snowflake

The Snowflake extension uses a custom GUC named snowflake.node that determines the node part of each Snowflake generated by the PostgreSQL instance. The permitted values are 1 thru 1023.

This GUC has an invalid default value (on purpose); if not set, the Snowflake extension will throw an exception on a call to snowflake.nextval(). This is intended to prevent you from accidentally missing this GUC in your postgresql.conf file.

If you intend to use Snowflake in a multi-node, distributed, or replicated setup, it is important to set the GUC to a unique value for each PostgreSQL instance. There is no protection in place to prevent assigning multiple PostgreSQL instances in a multi-master cluster the same node number.

Converting a PostgreSQL Sequence to a Snowflake Sequence with the Spock extension

If you are using the spock extension, you can use the pgedge spock sequence-convert command at the psql command line to convert your existing sequence definitions into Snowflake sequences. Note that this converts the sequence definition; existing values in a sequence column will not change. The command syntax is:

pgedge spock.sequence_convert sequence_name

Where

sequence_name is the name of a sequence; you can use wildcards when specifying the sequence name to convert sequences in more than one table.

For example, the following command:

[pgedge]$ ./pgedge spock.sequence_convert my_sequence

Converts the sequences in tables that reside in the public schema of the acctg database to use Snowflake sequences. If you invoke the command on one node of a replicating cluster, the table definition updates are propagated to the other nodes in the cluster.

Snowflake Functions

You can query the following Snowflake functions at the psql command line to return useful information about tables that use Snowflake sequences.

snowflake.nextval([sequence regclass]) Generates the next Snowflake in the specified sequence. If no sequence is specified, the internal, database-wide sequence snowflake.id_seq is used. For example, the following query returns the next number in the orders_id_seq sequence:

acctg=# SELECT * FROM snowflake.nextval('orders_id_seq'::regclass);
      nextval       
--------------------
 136169504773242881
(1 row)

snowflake.currval([sequence regclass]) Returns the current value of the specified sequence. This value is undefined until the function snowflake.nextval() has been called for the sequence in the current session. For example, the following query returns the current value of the orders_id_seq sequence:

acctg=# SELECT * FROM snowflake.currval('orders_id_seq'::regclass);
      currval       
--------------------
 136169504773242881
(1 row)

snowflake.get_epoch(snowflake int8) Returns the timestamp part of a Snowflake as EPOCH (seconds since 2023-01-01) as a NUMERIC value with precision of three digits. For example, the timestamp of the following sequence value is 1704996539.845 seconds past Jan. 1, 2023:

acctg=# SELECT * FROM snowflake.get_epoch(136169504773242881);
   get_epoch    
----------------
 1704996539.845
(1 row)

You can use the to_timestamp(snowflake.get_epoch(<value>)) function to convert the epoch into a timestamp. For example:

acctg=# SELECT to_timestamp(snowflake.get_epoch(136169504773242881));
        to_timestamp        
----------------------------
 2024-01-11 13:08:59.845-05
(1 row)

snowflake.get_count(snowflake int8) Returns the count part of the given Snowflake as int4 value; count resets to 0 for each new millisecond. For example:

acctg=# SELECT snowflake.get_count(136169504773242881);
 get_count 
-----------
         3
(1 row)

snowflake.get_node(snowflake int8) Returns the value of snowflake.node in postgresql.conf for the host of the specified Snowflake sequence. For example, the Snowflake sequence number 136169504773242881 resides on Node 1:

acctg=# SELECT * FROM snowflake.get_node(136169504773242881);
 get_node 
----------
        1
(1 row)

snowflake.format(snowflake int8) Returns a jsonb object of the given Snowflake in the form: {"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}. For example:

acctg=# SELECT * FROM snowflake.format(136169504773242881);
                          format                           
-----------------------------------------------------------
 {"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}
(1 row)

Reviewing Sequence Definitions in your Table Meta-data

If you query the table meta-data before performing a sequence conversion with the \d+ orders command, the PostgreSQL sequence is defined in the Default column:

acctg=# \d+ orders;
                                                             Table "public.orders"
  Column  |       Type        | Collation | Nullable |              Default               | Storage  | Compression | Stats target | Description 
----------+-------------------+-----------+----------+------------------------------------+----------+-------------+--------------+-------------
 customer | character varying |           |          |                                    | extended |             |              | 
 invoice  | character varying |           |          |                                    | extended |             |              | 
 id       | bigint            |           | not null | nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heap

After performing a conversion, the \d+ table_name command shows a Default value of snowflake.nextval('orders_id_seq'::regclass):

acctg=# \d+ orders
                                                                  Table "public.orders"
  Column  |       Type        | Collation | Nullable |                   Default                    | Storage  | Compression | Stats target | Descripti
on 
----------+-------------------+-----------+----------+----------------------------------------------+----------+-------------+--------------+----------
---
 customer | character varying |           |          |                                              | extended |             |              | 
 invoice  | character varying |           |          |                                              | extended |             |              | 
 id       | bigint            |           | not null | snowflake.nextval('orders_id_seq'::regclass) | plain    |             |              | 
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
Access method: heap