singer-aws

CLI tool for easy execution of Singer Taps, managing module dependencies and persistence of state files and config files in AWS.

ToDo:

package the whole project as Docker image
create infrastructure: S3 buckets, SSM parameter store items (manually), SingerTap IAM role tied to S3 bucket and allowed to access Redshift
store logs in logs/ folder

install singer-aws module

Currently this module can be installed only by running pip install . at the root of the project, locally.

Add new tap (or target) integration

Add item to the list in singer_project_config.yml. You may give that item a custom name (different from tap module name), just remember to add module: ... parameter with the valid name of a tap. This can be useful e.g. if you want to maintain two tap integrations based on the same tap library (e.g. one integration tied to release 0.9.1 and the other tied to 1.0.0), you would assign two different aliases.
add folder in ./taps/... with name corresponding to tap alias
add a file with all modules needed to tap installation, by creating ./taps/tap-alias/requirements.txt file
create virtual environment for the tap by running: singer-aws-install
if you want to store tap config in the project folder, add config.json to .taps/tap-alias/config.json. Otherwise make sure that tap configuration is available as environment variable or AWS Parameter Store. Check Tap and target config files section.

Same steps can be performed for adding target integration, just replace tap with target when following the instructions.

Tap and target config files

The order of proceeding when preparing singer config file (applies to both tap and target):

Use config.json file found in taps/tap-alias/ (targets/target-name/) directory
Create config.json file based on TAP_TAPNAME_CONFIG or TARGET_TARGETNAME_CONFIG env variables
Create config.json file based on SSM parameter with a path like /ssm_prefix/TAP_TAPNAME_CONFIG or /ssm_prefix/TARGET_TARGETNAME_CONFIG. Value of ssm_prefix is configurable from singer_project_config.yml file, e.g. /acme_singer_project/

Schema discovery

in the command line, export AWS_PROFILE that has permissions to read from SSM and read/write to S3:

export AWS_PROFILE=your_profile

run

singer-aws-discover --tap adwords

Inspecting singer catalogs for subsequent discoveries

Subsequent schema discoveries, e.g. updates or selecting new fields for replication may be annoying because the schema-discovery utility (see credits below) only shows what's available, without showing what's already selected (i.e. you kind of start catalog selection form the scratch with each discovery). To allow for more seamless workflow, there's a singer-aws-inspect command that can be used to inspect current state of catalog of a given tap. For instance:

singer-aws-inspect --tap adwords

would print:

...
inspecting stream: ads...
    ❌ stream is NOT SELECTED
    • found 6 available properties
    • found 0 selected properties
    • selected properties: []
inspecting stream: accounts...
    ❌ stream is NOT SELECTED
    • found 6 available properties
    • found 0 selected properties
    • selected properties: []
inspecting stream: KEYWORDS_PERFORMANCE_REPORT...
    ✅ stream is SELECTED
    • found 139 available properties
    • found 12 selected properties
    • selected properties: ['account', 'campaignID', 'campaign', 'clicks', 'conversions', 'cost', 'keyword', 'day', 'device', 'customerID', 'keywordID', 'impressions']
...

Credits

singer-aws-discover command leverages an amazing utility: singer-discover. Thank you @chrisgoddard!

Running tap | target

in the command line, export AWS_PROFILE that has permissions to read from SSM and read/write to S3:

export AWS_PROFILE=your_profile

run

export AWS_PROFILE=your_profile_name
singer-aws-sync --tap adwords --target redshift

which will run a command identical to:

venv/tap-adwords/bin/tap-adwords --config taps/tap-adwords/config.json | venv/target-redshift/bin/target-redshift --config targets/target-redshift/config.json > state.json

or, if there is any state file existing for a given tap (state file must exist in a remote S3 bucket), the following command will be executed:

venv/tap-adwords/bin/tap-adwords --config taps/tap-adwords/config.json --properties taps/tap-adwords/catalog.json --state states_in/tap-adwords-state.json | venv/target-redshift/bin/target-redshift --config targets/target-redshift/config.json > /Users/tomaszzbrozek/singer-aws/states_out/tap-adwords-state.json

☝️ state file (in states_in/ directory) is fetched on the fly before executing the tap | target. Specify S3 bucket name and prefix in the states_bucket parameter inside singer_project_config.yml file.

You can pass --ignore-state to ignore previous state files when executing the tap and use start_date as specified in tap's config to replicate all streams:

singer-aws-sync --tap adwords --target redshift --ignore-state

What this project is not intended for

this project will not create schemas in your target data warehouse. You still need to do it as you would need when working with singer tap directly.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
singer_aws		singer_aws
taps		taps
targets		targets
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py
singer_project_config.yml		singer_project_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

singer-aws

ToDo:

install singer-aws module

Add new tap (or target) integration

Tap and target config files

Schema discovery

Inspecting singer catalogs for subsequent discoveries

Credits

Running tap | target

What this project is not intended for

About

Releases

Packages

Languages

tomekzbrozek/singer-aws

Folders and files

Latest commit

History

Repository files navigation

singer-aws

ToDo:

install singer-aws module

Add new tap (or target) integration

Tap and target config files

Schema discovery

Inspecting singer catalogs for subsequent discoveries

Credits

Running tap | target

What this project is not intended for

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages