Skip to content

πŸ₯ͺ🏭 A simple CLI for generating synthetic Jaffle Shop data.

License

Notifications You must be signed in to change notification settings

dbt-labs/jaffle-shop-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8e841be Β· Apr 7, 2024

History

59 Commits
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024
Apr 7, 2024

Repository files navigation

πŸ₯ͺ Jaffle Shop Generator 🏭

The Jaffle Shop Generator or jafgen is a python package with a simple command line tool for generating a synthetic data set suitable for analytics engineering practice or demonstrations.

Installation

If you have pipx installed, jafgen is an ideal tool to use via pipx. You can generate data without installing anything in the local workspace using the following:

pipx run jafgen [options]

You can also install jafgen into your project or workspace, ideally in a virtual environment.

pip install jafgen

Use

jafgen takes one argument:

  • [int] Years to generate data for. The default is 1 year.

The following options are available:

  • --pre sets a prefix for the generated files in the format [prefix]_[file_name].csv. It defaults to raw.

Generate a simulation spanning 3 years from 2016-2019 with a prefix of cool:

jafgen 3 --pre cool

Purpose

Finding a good data set to practice, learn, or teach analytics engineering with can be difficult. Most open datasets are great for machine learning -- they offer single wide tables that you can manipulate and analyze. Full, real relational databases on the other hand are generally protected by private companies. Not only that, but they're a bit too real. To get to a state that a beginner or intermediate person can understand, there needs to be an advanced amount of analytics engineering transformation applied.

Approach

Coming soon.

Contribution

Coming soon.