A Terraform module which deploys the Snowplow Redshift Loader on an EC2 node.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
Redshift Loader loads shredded events from S3 bucket to Redshift.
For more information on how it works, see this overview.
To configure Redshift, please refer to the quick start guide.
NOTE: You will need to ensure that the loader can access the cluster over whatever port is configured for the cluster (generally 5439
). If running in the same VPC as the pipeline you can allowlist the security group of the loader directly (module output = sg_id
).
Duration settings such as folder_monitoring_period
or retry_period
should be given in the documented duration format.
Normally, this module would be used as part of our quick start guide. However, you can also use it standalone for a custom setup.
See example below:
# Note: This should be the same bucket that is used by the transformer to produce data to load
module "s3_pipeline_bucket" {
source = "snowplow-devops/s3-bucket/aws"
bucket_name = "your-bucket-name"
}
# Note: This should be the same queue that is passed to the transformer to produce data to load
resource "aws_sqs_queue" "rs_message_queue" {
content_based_deduplication = true
kms_master_key_id = "alias/aws/sqs"
name = "rs-loader.fifo"
fifo_queue = true
}
module "transformer_stsv" {
source = "snowplow-devops/transformer-kinesis-ec2/aws"
name = "transformer-server-stsv"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
stream_name = module.enriched_stream.name
s3_bucket_name = module.s3_pipeline_bucket.id
s3_bucket_object_prefix = "transformed/good/shredded/tsv"
window_period_min = 1
sqs_queue_name = aws_sqs_queue.rs_message_queue.name
transformation_type = "shred"
default_shred_format = "TSV"
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
module "rs_loader" {
source = "snowplow-devops/redshift-loader-ec2/aws"
name = "rs-loader-server"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
sqs_queue_name = aws_sqs_queue.rs_message_queue.name
redshift_host = "<HOST>"
redshift_database = "<DATABASE>"
redshift_port = <PORT>
redshift_schema = "<SCHEMA>"
redshift_loader_user = "<LOADER_USER>"
redshift_password = "<PASSWORD>"
redshift_aws_s3_bucket_name = module.s3_pipeline_bucket.id
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | >= 3.72.0 |
Name | Version |
---|---|
aws | >= 3.72.0 |
Name | Source | Version |
---|---|---|
instance_type_metrics | snowplow-devops/ec2-instance-type-metrics/aws | 0.1.2 |
service | snowplow-devops/service-ec2/aws | 0.2.0 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
aws_cloudwatch_log_group.log_group | resource |
aws_iam_instance_profile.instance_profile | resource |
aws_iam_policy.iam_policy | resource |
aws_iam_policy.sts_credentials_policy | resource |
aws_iam_role.iam_role | resource |
aws_iam_role.sts_credentials_role | resource |
aws_iam_role_policy_attachment.policy_attachment | resource |
aws_iam_role_policy_attachment.sts_credentials_policy_attachement | resource |
aws_security_group.sg | resource |
aws_security_group_rule.egress_tcp_443 | resource |
aws_security_group_rule.egress_tcp_80 | resource |
aws_security_group_rule.egress_tcp_redshift | resource |
aws_security_group_rule.egress_udp_123 | resource |
aws_security_group_rule.egress_udp_statsd | resource |
aws_security_group_rule.ingress_tcp_22 | resource |
aws_caller_identity.current | data source |
aws_iam_policy_document.sts_credentials_role | data source |
aws_region.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
name | A name which will be prepended to the resources created | string |
n/a | yes |
redshift_aws_s3_bucket_name | AWS bucket name where data to load is stored | string |
n/a | yes |
redshift_database | Redshift database name | string |
n/a | yes |
redshift_host | Redshift cluster hostname | string |
n/a | yes |
redshift_loader_user | Name of the user that will be used for loading data | string |
n/a | yes |
redshift_password | Password for redshift_loader_user used by loader to perform loading | string |
n/a | yes |
redshift_schema | Redshift schema name | string |
n/a | yes |
sqs_queue_name | SQS queue name | string |
n/a | yes |
ssh_key_name | The name of the SSH key-pair to attach to all EC2 nodes deployed | string |
n/a | yes |
subnet_ids | The list of subnets to deploy Loader across | list(string) |
n/a | yes |
vpc_id | The VPC to deploy Loader within | string |
n/a | yes |
amazon_linux_2_ami_id | The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used | string |
"" |
no |
app_version | Version of rdb loader redshift | string |
"5.8.0" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
cloudwatch_logs_enabled | Whether application logs should be reported to CloudWatch | bool |
true |
no |
cloudwatch_logs_retention_days | The length of time in days to retain logs for | number |
7 |
no |
config_override_b64 | App config uploaded as a base64 encoded blob. This variable facilitates dev flow, if config is incorrect this can break the deployment. | string |
"" |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Stream Shredder | list(object({ |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Stream Shredder | list(object({ |
[ |
no |
folder_monitoring_enabled | Whether folder monitoring should be activated or not | bool |
false |
no |
folder_monitoring_period | How often to folder should be checked by folder monitoring | string |
"8 hours" |
no |
folder_monitoring_since | Specifies since when folder monitoring will check | string |
"14 days" |
no |
folder_monitoring_until | Specifies until when folder monitoring will check | string |
"6 hours" |
no |
health_check_enabled | Whether health check should be enabled or not | bool |
false |
no |
health_check_freq | Frequency of health check | string |
"1 hour" |
no |
health_check_timeout | How long to wait for a response for health check query | string |
"1 min" |
no |
iam_permissions_boundary | The permissions boundary ARN to set on IAM roles created | string |
"" |
no |
instance_type | The instance type to use | string |
"t3a.micro" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
redshift_aws_s3_folder_monitoring_stage_url | AWS bucket URL of folder monitoring stage - must be within 'redshift_aws_s3_bucket_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) | string |
"" |
no |
redshift_aws_s3_folder_monitoring_transformer_output_stage_url | AWS bucket URL of transformer output stage - must be within 'redshift_aws_s3_bucket_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) | string |
"" |
no |
redshift_jsonpaths_bucket | S3 path that holds JSONPaths | string |
"" |
no |
redshift_max_error | Redshift max error setting which controls amount of acceptable loading errors | number |
1 |
no |
redshift_port | Redshift port | number |
5439 |
no |
retry_period | How often batch of failed folders should be pulled into a discovery queue | string |
"10 min" |
no |
retry_queue_enabled | Whether retry queue should be enabled or not | bool |
false |
no |
retry_queue_interval | Artificial pause after each failed folder being added to the queue | string |
"10 min" |
no |
retry_queue_max_attempt | How many attempt to make for each folder | number |
-1 |
no |
retry_queue_size | How many failures should be kept in memory | number |
-1 |
no |
sentry_dsn | DSN for Sentry instance | string |
"" |
no |
sentry_enabled | Whether Sentry should be enabled or not | bool |
false |
no |
sp_tracking_app_id | App id for Snowplow tracking | string |
"" |
no |
sp_tracking_collector_url | Collector URL for Snowplow tracking | string |
"" |
no |
sp_tracking_enabled | Whether Snowplow tracking should be activated or not | bool |
false |
no |
ssh_ip_allowlist | The list of CIDR ranges to allow SSH traffic from | list(any) |
[ |
no |
statsd_enabled | Whether Statsd should be enabled or not | bool |
false |
no |
statsd_host | Hostname of StatsD server | string |
"" |
no |
statsd_port | Port of StatsD server | number |
8125 |
no |
stdout_metrics_enabled | Whether logging metrics to stdout should be activated or not | bool |
false |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
webhook_collector | URL of webhook collector | string |
"" |
no |
webhook_enabled | Whether webhook should be enabled or not | bool |
false |
no |
Name | Description |
---|---|
asg_id | ID of the ASG |
asg_name | Name of the ASG |
sg_id | ID of the security group attached to the Redshift Loader servers |
The Terraform AWS Redshift Loader on EC2 project is Copyright 2023-present Snowplow Analytics Ltd.
Licensed under the Snowplow Community License. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.