Skip to content

Commit

Permalink
add post 2024-04-14-block-search-engine-indexing (#70)
Browse files Browse the repository at this point in the history
  • Loading branch information
lucascantor authored Apr 14, 2024
1 parent 3ccd9c3 commit 2ac9e5c
Show file tree
Hide file tree
Showing 4 changed files with 89 additions and 0 deletions.
1 change: 1 addition & 0 deletions content/posts/2023/10-16-an-iphone-app-built-for-two.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description:
date: 2023-10-16
tags:
- iOS Dev
disclaimer:
---

## Hello Again 👋🏼
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ tags:
- Infrastructure as Code
- Terraform
- AWS
disclaimer:
---

After years without a good solution to my "static AWS IAM user secrets" problem, I've recently set up [Dynamic Provider Credentials](https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials) for AWS in my Terraform Cloud org.
Expand Down
1 change: 1 addition & 0 deletions content/posts/2024/01-05-tines-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description:
date: 2024-01-05
tags:
- No-Code
disclaimer:
---

I'm excited to announce that [Tines has published a case study](https://www.tines.com/case-studies/intercom) based on my recent work deploying it as a business process automation platform for IT, InfoSec, and the entire company at Intercom! 🎉
Expand Down
86 changes: 86 additions & 0 deletions content/posts/2024/04-14-block-search-engine-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: Block Search Engine Indexing of CloudFront Content with a Custom Response Headers Policy
description:
date: 2024-04-14
tags:
- Security
- Infrastructure as Code
- Terraform
- AWS
disclaimer:
---

If, like me, you use AWS CloudFront as a CDN to host content stored in an S3 bucket, you might not necessarily want search engines to index that content. When researching a solution to this problem for myself, I found plenty of forum discussions and blog posts suggesting you can accomplish this with a simple `robots.txt` file stored at the root of your S3 bucket. [Google's robots.txt documentation](https://developers.google.com/search/docs/crawling-indexing/robots/intro) warns against doing this, however:

> Warning: Don't use a robots.txt file as a means to hide your web pages (including PDFs and other text-based formats supported by Google) from Google search results.
>
> If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex.
Reviewing [Google's noindex documentation](https://developers.google.com/search/docs/crawling-indexing/block-indexing), it seemed clear to me that adding a `X-Robots-Tag: noindex` HTTP response header to every response from my entire CloudFront distribution would be the best way to achieve my desired goal, regardless of file type(s) in my S3 bucket:

> A response header can be used for non-HTML resources, such as PDFs, video files, and image files.
Thankfully, CloudFront supports attaching custom [response headers policies](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/modifying-response-headers.html) directly to CloudFront distributions, so I didn't need to set up a dedicated web server or even a Lambda function for this. Even better, this was very easy to configure entirely via Terraform!

Creating the custom headers policy itself:

```hcl
resource "aws_cloudfront_response_headers_policy" "custom_headers_policy" {
name = "CustomHeadersPolicy"
comment = "Adds a set of custom headers to every response"
custom_headers_config {
items {
header = "X-Robots-Tag"
value = "noindex"
override = true
}
}
}
```

Setting the custom headers policy's id as a CloudFront distribution's `response_headers_policy_id` argument:

```hcl
resource "aws_cloudfront_distribution" "cdn" {
aliases = [
"example.com",
]
default_root_object = "index.html"
default_cache_behavior {
allowed_methods = [
"GET",
"HEAD",
]
cached_methods = [
"GET",
"HEAD",
]
cache_policy_id = var.managed_cloudfront_caching_optimized_policy_id
compress = true
response_headers_policy_id = aws_cloudfront_response_headers_policy.custom_headers_policy.id
target_origin_id = var.target_origin_id
viewer_protocol_policy = "redirect-to-https"
}
enabled = true
is_ipv6_enabled = true
origin {
domain_name = "example.com.s3.amazonaws.com"
origin_id = "S3-example.com"
s3_origin_config {
origin_access_identity = aws_cloudfront_origin_access_identity.identities["example.com"].cloudfront_access_identity_path
}
}
restrictions {
geo_restriction {
restriction_type = "none"
locations = []
}
}
viewer_certificate {
acm_certificate_arn = aws_acm_certificate.example_com.arn
minimum_protocol_version = "TLSv1.2_2021"
ssl_support_method = "sni-only"
}
}
```

0 comments on commit 2ac9e5c

Please sign in to comment.