This repository contains a pair of Terraform templates that solve the problem of how to specify that Terraform should use a backend contained in a Google Cloud Storage bucket, while also having the bucket created using Terraform.
Terraform is an infrastructure-as-code templating language that requires a backend, in the form of a file, for storing state. "State" here refers to persisting information about the cloud resources currently being managed through a Terraform template. This backend can be stored in different ways, and the default way is locally. "Locally" refers to the machine from which the Terraform CLI is run. In this case, the file is stored in that machine.
However, for cases in which multiple people want to be able to manage resources from Terraform from different machines, it can be helpful to store the state in a mutually accessible storage bucket on the cloud. However, Terraform cannot create resources before a backend is loaded. So if we want Terraform to create the bucket for holding state, we run into a recursive (AKA chicken-and-egg) problem. That is, we need the state in the bucket to use Terraform, but we want Terraform to create the bucket. There's no obvious way to get started.
The insight that leads to the solution is that we don't actually need the state of the state bucket stored in the same state file as everything else. Most of the infrastructure can be managed with the state in the state bucket as we originally wanted, because we want all developers to be able to work together with that infrastructure. For this, we have a "main" template with most of the infrastructure that is managed with that state. But the state bucket itself can be on a different "state storage" template. Like the main template, the state storage template creates a state. In other words, the state bucket both holds a state (for the rest of the infrastructure on the main template) and is managed by a state. Because we don't expect the state bucket to change much, we have much more freedom to choose where to put the state that manages it. In this case, the local machine, which is Terraform's default location, is fitting. It can be backed up informally if desired, for example to Google Drive.
However, this creates a problem with reusing information from one template in another. For example, you may want to prevent bucket name collisions by concatenating a random ID to a fixed string for the name of the state bucket. However, there is no obvious way to refer to that bucket in the other template. The random ID is itself a resource managed by the state bucket template, and Terraform templates can't reference dynamically created values from another template.
My solution, at least for the Google Cloud environment, is to use GCP VM metadata to hold the information from the state storage template and make it accessible to the second template. VM metadata consists of key-value pairs. I use metadata because the keys and values can be specified in a Terraform template, and they are accessible to other VMs or Google Cloud Shell. (Google Cloud Shell is a convenience VM with many developer tools. It is designed to be used from the GCP console to do quick tasks related to GCP.) Metadata can be associated with individual VMs, or a project as a whole. In this case, I use project metadata.
Specifically I create a random_id
resource in the state storage template, along with a bucket and a metadata resource. (There are actually two metadata resources, but only one is necessary to solve this problem.) The name of the bucket is the random ID concatenated with a base string. This same concatenated string is also passed into the metadata as a value to a hard-coded key. After we init
and apply
this template, we have a bucket with the concatenated string as the name, as well as that same string in the metadata. I use wrappers for some of the terraform
functions for unrelated reasons.
In the same directory as the second main template, I have a wrapper for terraform init
. In that wrapper, I use the GCP CLI (AKA gcloud
) to get the metadata value containing the state bucket name. This is supplied through Terraform's and GCP's special syntax to the terraform init
argument backend-config
. (There are also wrappers for the other terraform
functions, again for unrelated reasons.) Notably, there is a way on most VMs to get metadata using cURL only and not using the GCP CLI, but this does not work from the GCP Cloud Shell VM for reasons specific to the GCP implementation of Cloud Shell.
The code is intended to run on Google Cloud Shell, although it may work from other environments. First make sure the Google Cloud Compute API is enabled for the project - this is necessary for VM metadata to be created. To use it, clone the GitHub repo to Cloud Shell. Then, create a new directory outside the GitHub repo for your main infrastructure. This can be another repo if desired. Move the terraform_main
directory from the original cloned repo to the new repo/directory. Then, add code to the main.tf
file that you would normally create for the infrastructure you desire. In other words, terraform_main/main.tf
in the new directory/repo is to be used as starter code. Do not hard code resource location here. Rather, modify it in the state storage repo during the next step if us-east1
is not desired. In this current step, use var.location
instead.
Then, navigate to the terraform_state_storage
directory in the original repo. Run terraform init
, and then sh terraform_apply_wrapper.sh
. Back up the Terraform state in this directory informally if desired, for example to Google Drive.
Then, navigate to the terraform_main
directory with the main.tf
that you added infrastructure to. Run sh terraform_init_wrapper.sh
. The terminal output should show that state was created in with the "gcs" backend - this stands for Google Cloud Storage. Finally, use Terraform as you normally would, except that when using plan
, apply
, and destroy
, use the provided wrappers (e.g. sh terraform_apply_wrapper.sh
).
If at any time you want to get rid of everything, you must destroy the main infrastructure first. But, you then must manually remove the Terraform state from the bucket (for example in the GCP console), along with any backups that may be created by object versioning. This is because force_destroy
is set to false on the state storage bucket (as a precaution). Then, navigate to the terraform_state_storage
directory in its repo and run the destroy wrapper to destroy the state bucket. The cloned repo with the state bucket template can be deleted. (Don't forget to delete any backups of the state from that template as well if you originally created them.)