Add new bot-blocking rules and add structure to Traffic Policy gallery (

#912) This PR adds some new rules around bots/crawlers (also coming to the ngrok blog 8/29). As I was adding those rules, I started to feel as though the gallery was getting kind of unruly and disorganized (mostly my fault for dumping in 10 new ones last week), and thought I'd create a rough structure for now. Open to ideas about categories/naming! --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
ngrok · Aug 30, 2024 · fd2525b · fd2525b
1 parent af12378
commit fd2525b
Show file tree

Hide file tree

Showing 2 changed files with 120 additions and 43 deletions.
diff --git a/docs/http/traffic-policy/gallery.mdx b/docs/http/traffic-policy/gallery.mdx
@@ -6,7 +6,9 @@ pagination_label: Rule Gallery
 import {
 	AddCompression,
 	AddRobotsTxt,
+	AddRobotsTxtSpecific,
 	BlockCountries,
+	BlockSpecificBots,
 	CustomResponse,
 	Deny,
 	DeprecateVersion,
@@ -24,107 +26,133 @@ import {
 # Rule Gallery
 
 Explore a curated collection of example configurations spanning from common to
-unconventional use-cases for the Traffic Policy module.
+unconventional use cases for the Traffic Policy module.
 
 A number of these examples come from a longer article about how ngrok [makes policy management accessible](https://ngrok.com/blog-post/api-gateway-policy-management-examples) to developers, including a simple Go-based application for testing these and other configurations.
 
-## Deny non-GET requests:
+See the following categories for specific expressions and actions:
 
-This rule denies all inbound traffic that is not a GET request.
+- [Authentication](#authentication)
+- [Rate limiting](#rate-limiting)
+- [Block unwanted requests](#block-unwanted-requests)
+- [Other](#other)
 
-<Deny />
+## Authentication
 
-## Custom response for unauthorized requests
+### Add JWT authentication and key-based rate limiting
 
-This rule sends a custom response with status code `401` and body `Unauthorized`
-for requests without an Authorization header.
+Building from our [Auth0 guide](https://ngrok.com/docs/integrations/auth0/jwt-action/), these rules also add rate limiting based on your consumers' JWTs.
 
-<CustomResponse />
+<JWTsRateLimiting />
+
+## Rate limiting
 
-## Rate limit for specific endpoint
+### Rate limit for specific endpoint
 
 This rule applies rate limiting of `30` requests per second to the endpoint
 `/api/videos`.
 
 <RateLimit />
 
-## User agent filtering
+### Rate limit API consumers based on authentication status
 
-We deliver tailored content to Microsoft Edge users by examining the
-`User-Agent` header for the case-insensitive string `(?i)edg/` succeeded by
-digits `\d`. To see how this works in practice, explore the following
-[regex101 demonstration](https://regex101.com/r/3NPVub/1).
+Create a low rate limit for unauthenticated (likely free) users, while allowing authenticated users a higher level of capacity.
 
-To ensure correct decoding from YAML/JSON, it's necessary to properly escape the
-`\d` sequence. In YAML, if your string is not enclosed in quotes, use a single
-escape: `\\d`. However, when your string is wrapped in quotes, either in YAML or
-JSON, you need to double-escape: `\\\\d` for accurate decoding.
+<RateLimitAuthentication />
 
-<UserAgentFilter />
+### Rate limit API consumers based on pricing tiers
+
+Using a naming scheme in your upstream servers, and API calls using a `tier` header, you can quickly customize access to your API based on any number of pricing tiers.
+
+<RateLimitPricing />
+
+## Block unwanted requests
 
-## Add custom response for robots.txt
+### Disallow bots and crawlers with a `robots.txt`
 
-This rule returns a custom response for robots.txt, denying search engine crawlers on all paths.
-[What is robots.txt?](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
+This rule returns a custom response with a [`robots.txt` file](https://developers.google.com/search/docs/crawling-indexing/robots/intro) to deny search engine or AI crawlers on all paths.
 
 <AddRobotsTxt />
 
-## Add JWT authentication and key-based rate limiting
+You can also extend the expression above to create specific rules for crawlers based on their user agent strings, like `ChatGPT-User` and `GPTBot`.
 
-Building from our [Auth0 guide](https://ngrok.com/docs/integrations/auth0/jwt-action/), these rules also add rate limiting based on your consumers' JWTs.
+<AddRobotsTxtSpecific />
 
-<JWTsRateLimiting />
+### Block bots and crawlers by user agent
 
-## Rate limit API consumers based on authentication status
+In addition to, or instead of, denying bots and crawlers with a `robots.txt` file, you can also take action on only incoming requests that contain specific strings in the [`req.user_agent` request variable](/docs/http/traffic-policy/expressions/variables.mdx#requser_agent).
 
-Create a low rate limit for unauthenticated (likely free) users, while allowing authenticated users a higher level of capacity.
+You can extend the expression to include additional user agents by extending `(chatgpt-user|gptbot)` like so: `(chatgpt-user|gptbot|anthropic|claude|any|other|user-agent|goes|here)`.
 
-<RateLimitAuthentication />
+<BlockSpecificBots />
 
-## Rate limit API consumers based on pricing tiers
+### Deny non-GET requests
 
-Using a naming scheme in your upstream servers, and API calls using a `tier` header, you can quickly customize access to your API based on any number of pricing tiers.
+This rule denies all inbound traffic that is not a GET request.
 
-<RateLimitPricing />
+<Deny />
+
+### Custom response for unauthorized requests
+
+This rule sends a custom response with status code `401` and body `Unauthorized`
+for requests without an Authorization header.
+
+<CustomResponse />
 
-## Block traffic from specific countries
+### Block traffic from specific countries
 
 Remain compliant with data regulations or sanctions by blocking requests originating from one or more countries using their respective [ISO country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).
 
 <BlockCountries />
 
-## Deprecate an API version
+### Limit request sizes
+
+Prevent excessively large user uploads, like text or images, that might cause performance or availability issues for your upstream service.
+
+<LimitSize />
+
+## Other
+
+### User agent filtering
+
+We deliver tailored content to Microsoft Edge users by examining the
+`User-Agent` header for the case-insensitive string `(?i)edg/` succeeded by
+digits `\d`. To see how this works in practice, explore the following
+[regex101 demonstration](https://regex101.com/r/3NPVub/1).
+
+To ensure correct decoding from YAML/JSON, it's necessary to properly escape the
+`\d` sequence. In YAML, if your string is not enclosed in quotes, use a single
+escape: `\\d`. However, when your string is wrapped in quotes, either in YAML or
+JSON, you need to double-escape: `\\\\d` for accurate decoding.
+
+<UserAgentFilter />
+
+### Deprecate an API version
 
 By include a `X-Api-Version` header in your API reference or developer documentation, you can quickly return a helpful error message, which encourages them to explore usage of the new version.
 
 <DeprecateVersion />
 
-## Manipulate request headers
+### Manipulate request headers
 
 Add new headers to requests to give your upstream service more context about the consumer, which in turn allows for richer functionality, such as localized languages and pricing.
 
 <ManipulateHeaders />
 
-## Add compression
+### Add compression
 
 Quickly ensure all JSON responses are [compressed](/docs/http/traffic-policy/actions/compress-response.mdx) en route to your API consumer. If your upstream service already handles compression, ngrok skips this step.
 
 <AddCompression />
 
-## Enforce TLS version
+### Enforce TLS version
 
 Prevent obsolete and potentially vulnerable browsers, SDKs, or CLI tools like `curl` from attempting to access your API.
 
 <EnforceTLS />
 
-## Log unsuccessful events
+### Log unsuccessful events
 
 Connect your API to ngrok's [event logging system](/docs/obs/index.mdx) for smarter troubleshooting of your API gateway and upstream services.
 
 <LogUnsuccessful />
-
-## Limit request sizes
-
-Prevent excessively large user uploads, like text or images, that might cause performance or availability issues for your upstream service.
-
-<LimitSize />
diff --git a/traffic-policy/gallery.mdx b/traffic-policy/gallery.mdx
@@ -87,6 +87,7 @@ export const AddRobotsTxt = () => (
 		config={{
 			inbound: [
 				{
+					name: "Add `robots.txt` to deny all bots and crawlers",
 					expressions: ["req.url.contains('/robots.txt')"],
 					actions: [
 						{
@@ -106,6 +107,54 @@ export const AddRobotsTxt = () => (
 	/>
 );
 
+export const AddRobotsTxtSpecific = () => (
+	<ConfigExample
+		config={{
+			inbound: [
+				{
+					name: "Add `robots.txt` to deny specific bots and crawlers",
+					expressions: ["req.url.contains('/robots.txt')"],
+					actions: [
+						{
+							type: "custom-response",
+							config: {
+								status_code: 200,
+								content: "User-agent: ChatGPT-User\\r\\nDisallow: /",
+								headers: {
+									"content-type": "text/plain",
+								},
+							},
+						},
+					],
+				},
+			],
+		}}
+	/>
+);
+
+export const BlockSpecificBots = () => (
+	<ConfigExample
+		config={{
+			inbound: [
+				{
+					name: "Block specific bots by user agent",
+					expressions: [
+						"req.user_agent.matches('(?i).*(chatgpt-user|gptbot)/\\\\d+.*')",
+					],
+					actions: [
+						{
+							type: "deny",
+							config: {
+								status_code: 404,
+							},
+						},
+					],
+				},
+			],
+		}}
+	/>
+);
+
 export const JWTsRateLimiting = () => (
 	<ConfigExample
 		config={{