This extension for PostgreSQL provides a new data type uuid_v1
for
RFC 4122 version 1 UUID values that is more efficient than the standard
UUID data type in many scenarios.
Many applications use UUID's to provide unique identifiers for various reasons.
While many simply use random UUID's, applications that are design to operate in distributed environments and/or where concurrent execution is a major design aspect, also allow to or default to using version 1 UUID's, which provide clear requirements for generating identifiers concurrently inside the same machine or across many machines without the need to synchronize remotely and yet providing guarantees for global uniqueness (or at least within the space of the application).
This makes them very attractive to use if an application wants to really scale out but is generating a lot of data records (throughput).
After the extension has been installed (see the Build/Install section), you can start using the data type as follows:
CREATE EXTENSION uuid_v1;
CREATE TABLE my_log (
id uuid_v1 PRIMARY KEY,
...
);
To provide additional features when using the uuid_v1
data type, the following
functions are provided.
The function uuid_v1_get_timestamp(uuid_v1)
extracts the timestamp into an
instance of the PostgreSQL type timestamp with time zone
, e.g.:
SET timezone TO 'Asia/Tokyo';
SELECT uuid_v1_get_timestamp('b647e96b-862d-11e9-ae2b-db6f0f573554');
uuid_v1_get_timestamp
-------------------------------
2019-06-04 03:30:50.132721+09
(1 row)
The function uuid_v1_get_epoch(uuid_v1)
extracts the timestamp into a
64-bit float representing the epoch (also known as Unix time) with
microsecond precision, e.g.:
SELECT uuid_v1_get_epoch('b647e96b-862d-11e9-ae2b-db6f0f573554');
uuid_v1_get_epoch
-------------------
1559586650.132721
(1 row)
As opposed to function uuid_v1_get_timestamp(uuid_v1)
, this function is
safe to use for indexing, since it is immutable (see also the
official documentation about volatility and its impact).
The function uuid_v1_get_clockseq(uuid_v1)
provides direct access to the
clock sequence value and returns a smallint
value, e.g.:
SELECT uuid_v1_get_clockseq('4938f30e-8449-11e9-ae2b-e03f49467033');
uuid_v1_get_clockseq
----------------------
11819
(1 row)
Please also note that the bits of the variant are not part of the clock sequence value in compliance with the standard.
The function uuid_v1_get_node(uuid_v1)
returns the node value (where it was
generated initially) as bytea
,
e.g.:
SELECT uuid_v1_get_node('b647e96b-862d-11e9-ae2b-db6f0f573554');
uuid_v1_get_node
------------------
\xdb6f0f573554
(1 row)
Instances of the uuid_v1
data type can be compared to each other using the
standard operators <
, >
, <=
, >=
, =
, <>
and !=
, which internally
prefers comparison against the timestamp component, which also is the default
internal sort order (as opposed to byte order for the standard
uuid
type).
In addition, you can also compare a uuid_v1
to a timestamp with time zone
:
uuid_v1 <~ timestamp with time zone
(UUID older than timestamp)uuid_v1 <=~ timestamp with time zone
(UUID older than or equal to timestamp)uuid_v1 >~ timestamp with time zone
(UUID newer than timestamp)uuid_v1 >=~ timestamp with time zone
(UUID newer than or equal to timestamp)uuid_v1 =~ timestamp with time zone
(UUID at timestamp)uuid_v1 <>~ timestamp with time zone
(UUID not at timestamp)
ATTENTION: Please note that comparison is done using the full timestamp
precision, so a value such as 2019-06-11 10:02:19
will be interpreted as
2019-06-11 10:02:19.000000
and will not match a UUID at timestamp
2019-06-11 10:02:19.000001
. If you need more control over the matching, you
can combine functions uuid_v1_get_timestamp(uuid_v1)
and
date/time functions:
SELECT * FROM events WHERE date_trunc('minute', uuid_v1_get_timestamp(id)) = '2019-06-11 10:02:00';
...or:
SELECT * FROM events WHERE uuid_v1_get_timestamp(id)::date = '2019-06-11';
If you have such a use-case and a lot of data, consider creating an appropriate index using function
uuid_v1_get_epoch
instead to avoid re-calculating the date/time value from the UUID for each and every row on each and every query execution.
Straight forward but please ensure that you have the necessary PostgreSQL development headers in-place as well as PGXS (which should be made available with installing the development package).
make
To build the extension for a non-default PostgreSQL version, supply the
PG_CONFIG
variable pointing to the specific pg_config
location, e.g.:
make PG_CONFIG=/usr/lib/postgresql/11/bin/pg_config
Some basic tests are included by making use of pg_regress
which can be run with:
make installcheck
Please make sure you have an apropriate super-user account at the target cluster.
If your target PostgreSQL installation doesn't listen on standard port 5432,
you can adapt it by specifying REGRESS_PORT
variable, e.g.:
make installcheck REGRESS_PORT=5433
This also requires PGXS as it figures out where to find the installation:
sudo make install
If you want to install it into a non-default PostgreSQL installation, just
specify the path to the respective pg_config
binary, e.g.:
sudo make PG_CONFIG=/usr/lib/postgresql/11/bin/pg_config install
In this case, please also make sure you have compiled it against the desired PostgreSQL version.
In order to integrate this extension into a Docker image, you will need to compile and install it inside your own image.
The file example.Dockerfile
provides an example of how
to achieve this for the Debian based PostgreSQL image.
Build the Image:
docker build -t ancoron/pg-uuid-v1:local -f example.Dockerfile .
Start a Server:
docker run --rm --name pg-uuid -p 15432:5432 -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=test ancoron/pg-uuid-v1:local
Connect via psql:
psql --username=postgres --password --dbname=test --host=127.0.0.1 --port 15432
Create Extensions:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION uuid_v1;
Now you are ready to use the uuid_v1
data type and its functions.
As this data type is tailored to version 1 UUID's we can optimize the internal behavior much better than as it is the case for the standard UUID data type in PostgreSQL.
First, the parsing of a UUID string input value has been optimized to execute ~33% faster than the standard UUID parser does.
In addition - and this is most important when having an index at a UUID type column - the internal structure is completely different but still needs only 16 bytes of value storage. The internal structure is now optimized for the time-series nature of version 1 UUID's (which usually is created using the current time). To reflect this fact, the default sort order is no longer tied to the byte values of the UUID but to the 60-bit parsed UUID timestamp. This has lead to a speed-up of factor 6-7 for internal B-Tree comparison logic.
Another nice side effect is that the UUID values can now benefit from the
PostgreSQL B-Tree "fastpath", which optimizes for ever-inceasing index values
by basically caching the right-most index page. This means that most of the
time, an INSERT into an indexed table column will not need to search for the
relevant index page and we're getting a nice fastpath hit-rate. The standard
UUID spends ~50% of the time during INSERT's in this index page searching,
while this should be less than 1% for the data type uuid_v1
.
When larger numbers of UUID's need to be converted into a string representation the performance of the conversion method plays a significant role.
Testing (using COPY) has revealed that the implementation for the uuid_v1
data
type is ~5 times faster compared to the standard UUID output. However, the
resulting overall performance benefit (e.g. using COPY
with format text
) is
limited to a speedup factor of ~ 1.5 due to unrelated processing in PostgreSQL.
The uuid_v1
data type also comes with additional comparison operators. One set
of operators can be used to efficiently compare uuid_v1
values to
timestamp with time zone
values, which means that time-series queries
are actually possible directly against these UUID's, e.g.:
SELECT *
FROM my_log
WHERE id >=~ '2019-03-01' AND id <~ '2019-04-01'
ORDER BY id;
...giving you an execution plan such as the following:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_log_pkey on my_log
Index Cond: ((id >=~ '2019-03-01 00:00:00+09'::timestamp with time zone) AND (id <~ '2019-04-01 00:00:00+09'::timestamp with time zone))
(2 rows)
...and with aggregation:
WITH pre_sort AS (
SELECT uuid_v1_get_timestamp(id) AS ts
FROM my_log
WHERE id >=~ '2019-03-01' AND id <~ '2019-04-01'
ORDER BY id DESC
)
SELECT date(ts), count(*)
FROM pre_sort
GROUP BY 1
ORDER BY 1 DESC
...execution plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Sort Key: (date(pre_sort.ts)) DESC
CTE pre_sort
-> Index Only Scan Backward using t_uuid_v1_pkey on t_uuid_v1
Index Cond: ((id >=~ '2019-03-01 00:00:00+09'::timestamp with time zone) AND (id <~ '2019-04-01 00:00:00+09'::timestamp with time zone))
-> HashAggregate
Group Key: date(pre_sort.ts)
-> CTE Scan on pre_sort
(8 rows)