Skip to content

Commit

Permalink
Add new bot-blocking rules and add structure to Traffic Policy gallery (
Browse files Browse the repository at this point in the history
#912)

This PR adds some new rules around bots/crawlers (also coming to the
ngrok blog 8/29).

As I was adding those rules, I started to feel as though the gallery was
getting kind of unruly and disorganized (mostly my fault for dumping in
10 new ones last week), and thought I'd create a rough structure for
now. Open to ideas about categories/naming!

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
  • Loading branch information
joelhans and autofix-ci[bot] authored Aug 30, 2024
1 parent af12378 commit fd2525b
Show file tree
Hide file tree
Showing 2 changed files with 120 additions and 43 deletions.
114 changes: 71 additions & 43 deletions docs/http/traffic-policy/gallery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ pagination_label: Rule Gallery
import {
AddCompression,
AddRobotsTxt,
AddRobotsTxtSpecific,
BlockCountries,
BlockSpecificBots,
CustomResponse,
Deny,
DeprecateVersion,
Expand All @@ -24,107 +26,133 @@ import {
# Rule Gallery

Explore a curated collection of example configurations spanning from common to
unconventional use-cases for the Traffic Policy module.
unconventional use cases for the Traffic Policy module.

A number of these examples come from a longer article about how ngrok [makes policy management accessible](https://ngrok.com/blog-post/api-gateway-policy-management-examples) to developers, including a simple Go-based application for testing these and other configurations.

## Deny non-GET requests:
See the following categories for specific expressions and actions:

This rule denies all inbound traffic that is not a GET request.
- [Authentication](#authentication)
- [Rate limiting](#rate-limiting)
- [Block unwanted requests](#block-unwanted-requests)
- [Other](#other)

<Deny />
## Authentication

## Custom response for unauthorized requests
### Add JWT authentication and key-based rate limiting

This rule sends a custom response with status code `401` and body `Unauthorized`
for requests without an Authorization header.
Building from our [Auth0 guide](https://ngrok.com/docs/integrations/auth0/jwt-action/), these rules also add rate limiting based on your consumers' JWTs.

<CustomResponse />
<JWTsRateLimiting />

## Rate limiting

## Rate limit for specific endpoint
### Rate limit for specific endpoint

This rule applies rate limiting of `30` requests per second to the endpoint
`/api/videos`.

<RateLimit />

## User agent filtering
### Rate limit API consumers based on authentication status

We deliver tailored content to Microsoft Edge users by examining the
`User-Agent` header for the case-insensitive string `(?i)edg/` succeeded by
digits `\d`. To see how this works in practice, explore the following
[regex101 demonstration](https://regex101.com/r/3NPVub/1).
Create a low rate limit for unauthenticated (likely free) users, while allowing authenticated users a higher level of capacity.

To ensure correct decoding from YAML/JSON, it's necessary to properly escape the
`\d` sequence. In YAML, if your string is not enclosed in quotes, use a single
escape: `\\d`. However, when your string is wrapped in quotes, either in YAML or
JSON, you need to double-escape: `\\\\d` for accurate decoding.
<RateLimitAuthentication />

<UserAgentFilter />
### Rate limit API consumers based on pricing tiers

Using a naming scheme in your upstream servers, and API calls using a `tier` header, you can quickly customize access to your API based on any number of pricing tiers.

<RateLimitPricing />

## Block unwanted requests

## Add custom response for robots.txt
### Disallow bots and crawlers with a `robots.txt`

This rule returns a custom response for robots.txt, denying search engine crawlers on all paths.
[What is robots.txt?](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
This rule returns a custom response with a [`robots.txt` file](https://developers.google.com/search/docs/crawling-indexing/robots/intro) to deny search engine or AI crawlers on all paths.

<AddRobotsTxt />

## Add JWT authentication and key-based rate limiting
You can also extend the expression above to create specific rules for crawlers based on their user agent strings, like `ChatGPT-User` and `GPTBot`.

Building from our [Auth0 guide](https://ngrok.com/docs/integrations/auth0/jwt-action/), these rules also add rate limiting based on your consumers' JWTs.
<AddRobotsTxtSpecific />

<JWTsRateLimiting />
### Block bots and crawlers by user agent

## Rate limit API consumers based on authentication status
In addition to, or instead of, denying bots and crawlers with a `robots.txt` file, you can also take action on only incoming requests that contain specific strings in the [`req.user_agent` request variable](/docs/http/traffic-policy/expressions/variables.mdx#requser_agent).

Create a low rate limit for unauthenticated (likely free) users, while allowing authenticated users a higher level of capacity.
You can extend the expression to include additional user agents by extending `(chatgpt-user|gptbot)` like so: `(chatgpt-user|gptbot|anthropic|claude|any|other|user-agent|goes|here)`.

<RateLimitAuthentication />
<BlockSpecificBots />

## Rate limit API consumers based on pricing tiers
### Deny non-GET requests

Using a naming scheme in your upstream servers, and API calls using a `tier` header, you can quickly customize access to your API based on any number of pricing tiers.
This rule denies all inbound traffic that is not a GET request.

<RateLimitPricing />
<Deny />

### Custom response for unauthorized requests

This rule sends a custom response with status code `401` and body `Unauthorized`
for requests without an Authorization header.

<CustomResponse />

## Block traffic from specific countries
### Block traffic from specific countries

Remain compliant with data regulations or sanctions by blocking requests originating from one or more countries using their respective [ISO country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).

<BlockCountries />

## Deprecate an API version
### Limit request sizes

Prevent excessively large user uploads, like text or images, that might cause performance or availability issues for your upstream service.

<LimitSize />

## Other

### User agent filtering

We deliver tailored content to Microsoft Edge users by examining the
`User-Agent` header for the case-insensitive string `(?i)edg/` succeeded by
digits `\d`. To see how this works in practice, explore the following
[regex101 demonstration](https://regex101.com/r/3NPVub/1).

To ensure correct decoding from YAML/JSON, it's necessary to properly escape the
`\d` sequence. In YAML, if your string is not enclosed in quotes, use a single
escape: `\\d`. However, when your string is wrapped in quotes, either in YAML or
JSON, you need to double-escape: `\\\\d` for accurate decoding.

<UserAgentFilter />

### Deprecate an API version

By include a `X-Api-Version` header in your API reference or developer documentation, you can quickly return a helpful error message, which encourages them to explore usage of the new version.

<DeprecateVersion />

## Manipulate request headers
### Manipulate request headers

Add new headers to requests to give your upstream service more context about the consumer, which in turn allows for richer functionality, such as localized languages and pricing.

<ManipulateHeaders />

## Add compression
### Add compression

Quickly ensure all JSON responses are [compressed](/docs/http/traffic-policy/actions/compress-response.mdx) en route to your API consumer. If your upstream service already handles compression, ngrok skips this step.

<AddCompression />

## Enforce TLS version
### Enforce TLS version

Prevent obsolete and potentially vulnerable browsers, SDKs, or CLI tools like `curl` from attempting to access your API.

<EnforceTLS />

## Log unsuccessful events
### Log unsuccessful events

Connect your API to ngrok's [event logging system](/docs/obs/index.mdx) for smarter troubleshooting of your API gateway and upstream services.

<LogUnsuccessful />

## Limit request sizes

Prevent excessively large user uploads, like text or images, that might cause performance or availability issues for your upstream service.

<LimitSize />
49 changes: 49 additions & 0 deletions traffic-policy/gallery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ export const AddRobotsTxt = () => (
config={{
inbound: [
{
name: "Add `robots.txt` to deny all bots and crawlers",
expressions: ["req.url.contains('/robots.txt')"],
actions: [
{
Expand All @@ -106,6 +107,54 @@ export const AddRobotsTxt = () => (
/>
);

export const AddRobotsTxtSpecific = () => (
<ConfigExample
config={{
inbound: [
{
name: "Add `robots.txt` to deny specific bots and crawlers",
expressions: ["req.url.contains('/robots.txt')"],
actions: [
{
type: "custom-response",
config: {
status_code: 200,
content: "User-agent: ChatGPT-User\\r\\nDisallow: /",
headers: {
"content-type": "text/plain",
},
},
},
],
},
],
}}
/>
);

export const BlockSpecificBots = () => (
<ConfigExample
config={{
inbound: [
{
name: "Block specific bots by user agent",
expressions: [
"req.user_agent.matches('(?i).*(chatgpt-user|gptbot)/\\\\d+.*')",
],
actions: [
{
type: "deny",
config: {
status_code: 404,
},
},
],
},
],
}}
/>
);

export const JWTsRateLimiting = () => (
<ConfigExample
config={{
Expand Down

0 comments on commit fd2525b

Please sign in to comment.