We are the boutique analytics consultancy that turns disorganised data into real business value. Get in touch to learn more about how Tasman can help solve your organisations data challenges.
Identity resolution (sometimes referred to as 'identity stitching' or 'identity backstitching') is the process by which multiple user identities are unified into a single profile. It is a critical step in the tracking process to ensure accurate measurement of user behaviour across multiple apps or sessions, and the creation of a single customer view.
We find all the different identifiers that are connected together, these identifiers are called vertices, and links between them, called edges. In an identity graph, each node/vertex represents a user identifier, and a link/edge exists where two or more identifiers have at some point in time been captured together.
Each connected component (set of connected edges and vertices) of the identity graph represents a single user profile, and enables the building of a comprehensive view of each user across multiple sessions and apps.
Identity graphs are constantly evolving, and a user profile may have lots of different identifiers over its lifetime.
As an overview, our approach takes a set of events, and creates user profiles using this connected component algorithm. For this approach you need an understanding of SQL and dbt to implement/orchestrate the work.
You need to select which identifiers you wish to use in this graph, the ones chosen in this example are a backend user_id
, the anonymous_id
from the events, and an email
. You only need 2 to make this work.
The output table will provide you for every id found the profile_id
for that individual user profile.
ORIGINAL_ID | PROFILE_ID |
---|---|
anon1 | anon1 |
user1 | anon1 |
[email protected] | anon1 |
[email protected] | anon1 |
anon2 | anon2 |
user2 | anon2 |
[email protected] | anon2 |
anon4 | anon4 |
user4 | anon4 |
[email protected] | anon4 |
anon5 | anon5 |
anon6 | anon5 |
user5 | anon5 |
[email protected] | anon5 |
[email protected] | anon5 |
You can then use this table to join back on your events table and find the way that they join together to create a single user.
Key Features:
- 🔥 Uses a conected component algorithm to resolve identities of users
- ⚙️ Configure the number of times you would like the resolution to run.
- You need a table of identify events in the format: event_id, user_id, anonymous_id, email, event_time
This package currently supports Snowflake.
This package has been written and is maintained by Tasman Analytics
If you find a bug, or for any questions please open an issue on GitHub.