Skip to content

A terraform module for deploying the Datafold infrastructure on AWS.

License

Notifications You must be signed in to change notification settings

datafold/terraform-aws-datafold

Repository files navigation

=======

Datafold AWS module

This repository provisions resources on AWS, preparing them for a deployment of the application on an EKS cluster.

About this module

Prerequisites

  • An AWS account, preferably a new isolated one.
  • Terraform >= 1.4.6
  • A customer contract with Datafold
    • The application does not work without credentials supplied by sales
  • Access to our public helm-charts repository

This deployment will create the following resources:

  • AWS VPC
  • AWS subnet
  • AWS S3 bucket for clickhouse backups
  • AWS external load balancer
  • AWS ACM certificate, unless preregistered and provided
  • Three EBS volumes for local data storage
  • AWS RDS Postgres database
  • An EKS cluster
  • Service accounts for the EKS cluster to perform actions outside of its cluster boundary:
    • Provisioning existing EBS volumes
    • Updating load balancer target group to point to specific pods in the cluster
    • Rescaling the nodegroup between 1-2 nodes

Negative scope

  • This module will not provision DNS names in your zone.

How to use this module

  • See the example for a potential setup, which has dependencies on our helm-charts

Create the bucket and dynamodb table for terraform state file:

  • Use the files in bootstrap to create a terraform state bucket and a dynamodb lock table.
  • Run ./run_bootstrap.sh to create them. Enter the deployment_name when the question is asked.
    • The deployment_name is important. This is used for the k8s namespace and datadog unified logging tags and other places.
    • Suggestion: company-datafold
  • Transfer the name of that bucket and table into the backend.hcl (symlinked into both infra and application)
  • Set the target_account_profile and region where the bucket / table are stored.
  • backend.hcl is only about where the terraform state file is located.

The example directory contains a single deployment example, which cleanly separates the underlying runtime infra from the application deployment into kubernetes. Some specific elements from the infra directory are copied and encrypted into the application directory.

Setting up the infrastructure:

  • It is easiest if you have full admin access in the target project.
  • Pre-create the ACM certificate you want to use on AWS and validate it in your DNS.
  • Pre-create a symmetric encryption key that is used to encrypt/decrypt secrets of this deployment.
    • Use the alias instead of the mrk link. Put that into locals.tf
  • Refer to that certificate in main.tf using it's domain name: (Replace "datafold.acme.com")
  • Change the settings in locals.tf (the versions in infra and application are sym-linked)
    • provider_region = which region you want to deploy in.
    • aws_profile = The profile you want to use to issue the deployments. Targets the deployment account.
    • kms_profile = Can be the same profile, unless you want the encryption key elsewhere.
    • kms_key = A pre-created symmetric KMS key. It's only purpose is for encryption/decryption of deployment secrets.
    • deployment_name = The name of the deployment, used in kubernetes namespace, container naming and datadog "deployment" Unified Tag)
  • Run terraform init -backend-config=../backend.hcl in both application and infra directory.
  • Our team will reach out to give you two secrets files:
    • application_secrets.yaml goes into the application directory.
    • infra_secrets.yaml goes into the infra directory.
    • Encrypt both files with sops and call both secrets.yaml
  • Run terraform apply in infra directory. This should complete ok.
    • Check in the console if you see the load balancer, the EKS cluster, etc.
  • Run terraform apply in application directory.
    • Check the settings made in the main.tf file. Maybe you want to set "datadog.install" to false.
    • Check with your favourite kubernetes tool if you see the namespace and several datafold pods running there.

About subnets and where they get created

The module by default deploys in two availability zones. This is because by default, the subnets for private and public CIDR ranges have a list of two cidr ranges specified.

The AZ in which things get deployed depends on which AZ's get selected and in which order. This is an alphabetical ordering. In us-east this could be as many as 6 AZ's.

What the module does is sort the AZs and then it will iteratively deploy a public / private subnet specifying it's AZ in the module. Thus:

  • [10.0.0.0/24] will get deployed in us-east-1a
  • [10.0.1.0/24] will get deployed in us-east-1b

To deploy to three AZ's, you should override the public/private subnet settings. Then it will iterate across 3 elements, but the order of the AZ's will be the same by default.

You can add an "exclusion list" to the AZ ID's. The AZ ID is not the same as the AZ name. The AZ name on AWS is shuffled between their actual location across all AWS accounts. This means that your us-east-1a might be use1-az1 for you, but it might be use1-az4 for an account elsewhere. So if you need to match AZ's, you should match Availability zone ID's, not Availability zone names. The AZ ID is visible in the EC2 screen in the "settings" screen. There you see a list of enabled AZ's, their ID and their name.

To specifically select particular AZ ID's, exclude the ones you do not want in the az_id_exclude_filter. This is a list. That way, you can restrict this to only AZ's you want. Unfortunately it is an exclude filter and not an include filter. That means if AWS adds additional AZ's, it could create replacements for a future AZ.

Good news is that when there letters in use, I'd expect those letters to be maintained per AZ ID once they exist. Just for new accounts these can be shuffled all over again. So from terraform state perspective, things should be consistent at least.

Initializing the application

The deployment is created and the initjob should have created the databases and done the initialization of the site settings.

If that didn't complete successfully, try to restart the job.

Once the deployment is complete and the initjob succeeded, we can set the install to that for false in config.yaml:

initjob:
  install: false

Alternatively, here are the manual steps to achieve the same:

Establish a shell into the <deployment>-dfshell container. It is likely that the scheduler and server containers are crashing in a loop.

All we need to is to run these commands:

  1. ./manage.py clickhouse create-tables
  2. ./manage.py database create-or-upgrade
  3. ./manage.py installation set-new-deployment-params

Now all containers should be up and running.

Requirements

Name Version
aws >= 4.8.0
dns 3.2.1

Providers

Name Version
aws >= 4.8.0
random n/a

Modules

Name Source Version
clickhouse_backup ./modules/clickhouse_backup n/a
database ./modules/database n/a
eks ./modules/eks n/a
load_balancer ./modules/load_balancer n/a
networking ./modules/networking n/a
security ./modules/security n/a

Resources

Name Type

Inputs

Name Description Type Default Required
alb_certificate_domain Pass a domain name like example.com to this variable in order to enable ALB HTTPS listeners.
Terraform will try to find AWS certificate that is issued and matches asked domain,
so please make sure that you have issued a certificate for asked domain already.
string n/a yes
apply_major_upgrade Sets the flag to allow AWS to apply major upgrade on the maintenance plan schedule. bool false no
aws_auth_accounts List of account maps to add to the aws-auth configmap list(any) [] no
aws_auth_users List of user maps to add to the aws-auth configmap list(any) [] no
backend_app_port The target port to use for the backend services number 80 no
clickhouse_data_size EBS volume size for clickhouse data in GB number 40 no
clickhouse_logs_size EBS volume size for clickhouse logs in GB number 40 no
clickhouse_s3_bucket Bucket where clickhouse backups are stored string "clickhouse-backups-abcguo23" no
create_aws_auth_configmap Whether to create the AWS authentication configmap bool false no
create_rds_kms_key Set to true to create a separate KMS key (Recommended). bool true no
create_ssl_cert Creates an SSL certificate is set. bool n/a yes
database_name RDS database name string "datafold" no
db_instance_tags The extra tags to be applied to the RDS instance. map(any) {} no
db_parameter_group_tags The extra tags to be applied to the parameter group map(any) {} no
db_subnet_group_tags The extra tags to be applied to the parameter group map(any) {} no
default_node_disk_size Disk size for a node in GB number 40 no
deploy_vpc_flow_logs Activates the VPC flow logs if set. bool false no
deployment_name Name of the current deployment. string n/a yes
dhcp_options_domain_name Specifies DNS name for DHCP options set string "" no
dhcp_options_domain_name_servers Specify a list of DNS server addresses for DHCP options set list(string)
[
"AmazonProvidedDNS"
]
no
dhcp_options_tags Tags applied to the DHCP options set. map(string) {} no
dns_egress_cidrs List of Internet addresses to which the application has access list(string) [] no
ebs_extra_tags The extra tags to be applied to the EBS volumes map(any) {} no
ebs_iops IOPS of EBS volume number 3000 no
ebs_throughput Throughput of EBS volume number 1000 no
ebs_type Type of EBS volume string "gp3" no
enable_dhcp_options Flag to use custom DHCP options for DNS resolution. bool false no
environment Global environment tag to apply on all datadog logs, metrics, etc. string n/a yes
host_override Overrides the default domain name used to send links in invite emails and page links. Useful if the application is behind cloudflare for example. string "" no
ingress_enable_http_sg Whether regular HTTP traffic should be allowed to access the load balancer bool false no
k8s_cluster_version Ref. https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html string "1.29" no
k8s_module_version EKS terraform module version string "~> 19.7" no
lb_idle_timeout The time in seconds that the connection is allowed to be idle. number 120 no
lb_internal Set to true to make the load balancer internal and not exposed to the internet. bool false no
manage_aws_auth_configmap Determines whether to manage the aws-auth configmap bool false no
managed_node_grp Ref. https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/eks-managed-node-group any n/a yes
managed_node_grp_default Ref. https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt list(any) [] no
nat_gateway_public_ip Public IP of the NAT gateway when reusing the NAT gateway instead of recreating string "" no
private_subnet_tags The extra tags to be applied to the private subnets map(any) {} no
propagate_intra_route_tables_vgw If intra subnets should propagate traffic. bool false no
propagate_private_route_tables_vgw If private subnets should propagate traffic. bool false no
propagate_public_route_tables_vgw If public subnets should propagate traffic. bool false no
provider_azs List of availability zones to consider. If empty, the modules will determine this dynamically. list(string) [] no
provider_region The AWS region in which the infrastructure should be deployed string n/a yes
public_subnet_tags The extra tags to be applied to the public subnets map(any) {} no
rds_allocated_storage The size of RDS allocated storage in GB number 20 no
rds_backups_replication_retention_period RDS backup replication retention period number 14 no
rds_backups_replication_target_region RDS backup replication target region string null no
rds_extra_tags The extra tags to be applied to the RDS instance map(any) {} no
rds_instance EC2 insance type for PostgreSQL RDS database.
Available instance groups: t3, m4, m5.
Available instance classes: medium and higher.
string "db.t3.medium" no
rds_kms_key_alias RDS KMS key alias. string "datafold-rds" no
rds_max_allocated_storage The upper limit the database can grow in GB number 100 no
rds_param_group_family The DB parameter group family to use string "postgres15" no
rds_port Port the RDS database should be listening on. number 5432 no
rds_ro_username RDS read-only user name (not currently used). string "datafold_ro" no
rds_username Overrides the default RDS user name that is provisioned. string "datafold" no
rds_version Postgres RDS version to use. string "15.5" no
redis_data_size Redis EBS volume size in GB number 10 no
s3_clickhouse_backup_tags The extra tags to be applied to the S3 clickhouse backup bucket map(any) {} no
self_managed_node_grp Ref. https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/self-managed-node-group any {} no
self_managed_node_grp_default Configuration for the self managed node group any {} no
self_managed_node_grp_instance_type Ref. https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt string "THe instance type for the self managed node group." no
sg_tags The extra tags to be applied to the security group map(any) {} no
tags Tags to apply to the general module any {} no
use_default_rds_kms_key Flag weither or not to use the default RDS KMS encryption key. Not recommended. bool false no
vpc_cidr The CIDR of the new VPC, if the vpc_cidr is not set string "10.0.0.0/16" no
vpc_id The VPC ID of an existing VPC to deploy the cluster in. Creates a new VPC if not set. string "" no
vpc_private_subnets The private subnet CIDR ranges when a new VPC is created. list(string)
[
"10.0.0.0/24",
"10.0.1.0/24"
]
no
vpc_propagating_vgws ID's of virtual private gateways to propagate. list(any) [] no
vpc_public_subnets The public network CIDR ranges list(string)
[
"10.0.100.0/24",
"10.0.101.0/24"
]
no
vpc_tags The extra tags to be applied to the VPC map(any) {} no
vpc_vpn_gateway_id ID of the VPN gateway to attach to the VPC string "" no
whitelisted_egress_cidrs List of Internet addresses the application can access going outside list(string) n/a yes
whitelisted_ingress_cidrs List of CIDRs that can pass through the load balancer list(string) n/a yes

Outputs

Name Description
clickhouse_access_key The access key of the IAM user doing the clickhouse backups.
clickhouse_data_size The size in GB of the clickhouse EBS data volume
clickhouse_data_volume_id The EBS volume ID where clickhouse data will be stored.
clickhouse_logs_size The size in GB of the clickhouse EBS logs volume
clickhouse_logs_volume_id The EBS volume ID where clickhouse logs will be stored.
clickhouse_password The generated clickhouse password to be used in the application deployment
clickhouse_s3_bucket The location of the S3 bucket where clickhouse backups are stored
clickhouse_s3_region The region where the S3 bucket is created
clickhouse_secret_key The secret key of the IAM user doing the clickhouse backups.
cloud_provider A string describing the type of cloud provider to be passed onto the helm charts
cluster_name The name of the EKS cluster
cluster_scaler_role_arn The ARN of the role that is able to scale the EKS cluster nodes.
db_instance_id The ID of the RDS database instance
deployment_name The name of the deployment
domain_name The domain name to be used in DNS configuration
k8s_load_balancer_controller_role_arn The ARN of the role provisioned so the k8s cluster can edit the target group through the AWS load balancer controller.
lb_name The name of the external load balancer
load_balancer_ips The load balancer IP when it was provisioned.
postgres_database_name The name of the pre-provisioned database.
postgres_host The DNS name for the postgres database
postgres_password The generated postgres password to be used by the application
postgres_port The port configured for the RDS database
postgres_username The postgres username to be used by the application
redis_data_size The size in GB of the Redis data volume.
redis_data_volume_id The EBS volume ID of the Redis data volume.
redis_password The generated redis password to be used in the application deployment
security_group_id The security group ID managing ingress from the load balancer
target_group_arn The ARN to the target group where the pods need to be registered as targets.
vpc_cidr The CIDR of the entire VPC

About

A terraform module for deploying the Datafold infrastructure on AWS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •