Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first steps towards data-plane-gateway deprecation #1628

Merged
merged 8 commits into from
Sep 20, 2024
Merged

Conversation

jgraettinger
Copy link
Member

@jgraettinger jgraettinger commented Sep 16, 2024

Description:

  • Add new /authorize/user/task and /authorize/user/collection APIs for authorizing and retrieving addresses and access tokens for interacting with the hosting data-plane of a task or collection.
    • As part of this, incorporate user_grants into authorization snapshots and refactor RBAC evaluation
  • Bring in native Gazette support for grpc-web interfaces for journals and shards
    • This provides broker / reactor-hosted APIs which are equivalent with how the UI interacts with DPG today
  • Update flowctl to make it aware of the control-plane API and to use the new authorization APIs.
  • Add a "network" Go module which implements the connector networking feature
    • Remove DPG from local stacks.

Issue #1627

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@jgraettinger jgraettinger added the change:planned This is a planned change label Sep 16, 2024
@jgraettinger
Copy link
Member Author

jgraettinger commented Sep 16, 2024

Calling /authorize/user/task:

$ curl -v -H "origin: http://localhost:3000" -H "authorization: Bearer $CONTROL_TOKEN" -H "content-type: application/json" --data-binary '{"task":"johnnyCo/hello/source-hello-world"}' http://localhost:8675/authorize/user/task | jq
* Host localhost:8675 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8675...
* Connected to localhost (::1) port 8675
> POST /authorize/user/task HTTP/1.1
> Host: localhost:8675
> User-Agent: curl/8.5.0
> Accept: */*
> origin: http://localhost:3000
> authorization: Bearer (trim)
> content-type: application/json
> Content-Length: 44
> 
} [44 bytes data]
< HTTP/1.1 200 OK
< content-type: application/json
< content-length: 2007
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: http://localhost:3000
< date: Mon, 16 Sep 2024 02:29:42 GMT
< 
{ [2007 bytes data]
100  2051  100  2007  100    44   446k  10029 --:--:-- --:--:-- --:--:--  500k
* Connection #0 to host localhost left intact
{
  "brokerAddress": "http://localhost:8080",
  "brokerToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJjYXAiOjEwLCJleHAiOjE3MjY0NTY0MjMsImlhdCI6MTcyNjQ1Mzc4MiwiaXNzIjoibG9jYWwtY2x1c3Rlci5kcC5lc3R1YXJ5LWRhdGEuY29tIiwic2VsIjp7ImluY2x1ZGUiOnsibGFiZWxzIjpbeyJuYW1lIjoibmFtZSIsInZhbHVlIjoib3BzL3Rhc2tzL3B1YmxpYy9sb2NhbC1jbHVzdGVyL2xvZ3MvMGViODMwYTI3OTgwMDE5YS9raW5kPWNhcHR1cmUvbmFtZT1qb2hubnlDbyUyRmhlbGxvJTJGc291cmNlLWhlbGxvLXdvcmxkL3Bpdm90PTAwIn0seyJuYW1lIjoibmFtZSIsInZhbHVlIjoib3BzL3Rhc2tzL3B1YmxpYy9sb2NhbC1jbHVzdGVyL3N0YXRzLzBlYjgzMGEyNzk4MDAxOWEva2luZD1jYXB0dXJlL25hbWU9am9obm55Q28lMkZoZWxsbyUyRnNvdXJjZS1oZWxsby13b3JsZC9waXZvdD0wMCJ9XX19LCJzdWIiOiI2ZjMxYmUxYS1kODc1LTQ1M2QtOTA4ZS03OGQ0ZjJlZjFlMWYiLCJwcmVmaXhlcyI6WyJvcHMvdGFza3MvcHVibGljL2xvY2FsLWNsdXN0ZXIvbG9ncy8wZWI4MzBhMjc5ODAwMTlhL2tpbmQ9Y2FwdHVyZS9uYW1lPWpvaG5ueUNvJTJGaGVsbG8lMkZzb3VyY2UtaGVsbG8td29ybGQvcGl2b3Q9MDAiLCJvcHMvdGFza3MvcHVibGljL2xvY2FsLWNsdXN0ZXIvc3RhdHMvMGViODMwYTI3OTgwMDE5YS9raW5kPWNhcHR1cmUvbmFtZT1qb2hubnlDbyUyRmhlbGxvJTJGc291cmNlLWhlbGxvLXdvcmxkL3Bpdm90PTAwIl19.a4ALlxPQGcZzqLGLl9o8y_0M-_6yv4Fpn8MmFoHXO2w",
  "opsLogsJournal": "ops/tasks/public/local-cluster/logs/0eb830a27980019a/kind=capture/name=johnnyCo%2Fhello%2Fsource-hello-world/pivot=00",
  "opsStatsJournal": "ops/tasks/public/local-cluster/stats/0eb830a27980019a/kind=capture/name=johnnyCo%2Fhello%2Fsource-hello-world/pivot=00",
  "reactorAddress": "http://localhost:9000",
  "reactorToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJjYXAiOjI2MjE1NCwiZXhwIjoxNzI2NDU2NDIzLCJpYXQiOjE3MjY0NTM3ODIsImlzcyI6ImxvY2FsLWNsdXN0ZXIuZHAuZXN0dWFyeS1kYXRhLmNvbSIsInNlbCI6eyJpbmNsdWRlIjp7ImxhYmVscyI6W3sibmFtZSI6ImlkIiwidmFsdWUiOiJjYXB0dXJlL2pvaG5ueUNvL2hlbGxvL3NvdXJjZS1oZWxsby13b3JsZC8wZWI4MzE1YTkyMDBlYzAwLyIsInByZWZpeCI6dHJ1ZX1dfX0sInN1YiI6IjZmMzFiZTFhLWQ4NzUtNDUzZC05MDhlLTc4ZDRmMmVmMWUxZiIsInByZWZpeGVzIjpbImpvaG5ueUNvL2hlbGxvL3NvdXJjZS1oZWxsby13b3JsZCJdfQ.cD_LMSd_ycu-eUmAiOAyI2dK41ym6jaBly5XtvyHpjE",
  "retryMillis": 0,
  "shardIdPrefix": "capture/johnnyCo/hello/source-hello-world/0eb8315a9200ec00/"
}

Calling /authorize/user/collection:

$ curl -v -H "origin: http://localhost:3000" -H "authorization: Bearer $CONTROL_TOKEN" -H "content-type: application/json" --data-binary '{"collection":"johnnyCo/hello/events"}' http://localhost:8675/authorize/user/collection | jq
* Host localhost:8675 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8675...
* Connected to localhost (::1) port 8675
> POST /authorize/user/collection HTTP/1.1
> Host: localhost:8675
> User-Agent: curl/8.5.0
> Accept: */*
> origin: http://localhost:3000
> authorization: Bearer (trim)
> content-type: application/json
> Content-Length: 38
> 
} [38 bytes data]
< HTTP/1.1 200 OK
< content-type: application/json
< content-length: 695
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: http://localhost:3000
< date: Mon, 16 Sep 2024 02:31:22 GMT
< 
{ [695 bytes data]
100   733  100   695  100    38   539k  30206 --:--:-- --:--:-- --:--:--  715k
* Connection #0 to host localhost left intact
{
  "brokerAddress": "http://localhost:8080",
  "brokerToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJjYXAiOjEwLCJleHAiOjE3MjY0NTczMDIsImlhdCI6MTcyNjQ1Mzg4MiwiaXNzIjoibG9jYWwtY2x1c3Rlci5kcC5lc3R1YXJ5LWRhdGEuY29tIiwic2VsIjp7ImluY2x1ZGUiOnsibGFiZWxzIjpbeyJuYW1lIjoiZXN0dWFyeS5kZXYvY29sbGVjdGlvbiIsInZhbHVlIjoiam9obm55Q28vaGVsbG8vZXZlbnRzIn0seyJuYW1lIjoibmFtZSIsInZhbHVlIjoiam9obm55Q28vaGVsbG8vZXZlbnRzLzBlYjgzMTVhOTIwMGVjMDAvIiwicHJlZml4Ijp0cnVlfV19fSwic3ViIjoiNmYzMWJlMWEtZDg3NS00NTNkLTkwOGUtNzhkNGYyZWYxZTFmIiwicHJlZml4ZXMiOlsiam9obm55Q28vaGVsbG8vZXZlbnRzLzBlYjgzMTVhOTIwMGVjMDAvIl19.3Tl3VKOxGGq-W6-ODFPJd4ICIfg-j2DJrBACRe1g4LY",
  "journalNamePrefix": "johnnyCo/hello/events/0eb8315a9200ec00/",
  "retryMillis": 0
}

Using brokerAddress and brokerToken:

$ curl -vv --insecure -H "Origin: http://localhost:3000"  -H "authorization: Bearer $BROKER_TOKEN"  --data-binary '{"selector":{"include":{"labels":[]},"exclude":{"labels": []}}}' http://localhost:8080/v1/journals/list 
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> POST /v1/journals/list HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.5.0
> Accept: */*
> Origin: http://localhost:3000
> authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJjYXAiOjEwLCJleHAiOjE3MjY0NTczMDIsImlhdCI6MTcyNjQ1Mzg4MiwiaXNzIjoibG9jYWwtY2x1c3Rlci5kcC5lc3R1YXJ5LWRhdGEuY29tIiwic2VsIjp7ImluY2x1ZGUiOnsibGFiZWxzIjpbeyJuYW1lIjoiZXN0dWFyeS5kZXYvY29sbGVjdGlvbiIsInZhbHVlIjoiam9obm55Q28vaGVsbG8vZXZlbnRzIn0seyJuYW1lIjoibmFtZSIsInZhbHVlIjoiam9obm55Q28vaGVsbG8vZXZlbnRzLzBlYjgzMTVhOTIwMGVjMDAvIiwicHJlZml4Ijp0cnVlfV19fSwic3ViIjoiNmYzMWJlMWEtZDg3NS00NTNkLTkwOGUtNzhkNGYyZWYxZTFmIiwicHJlZml4ZXMiOlsiam9obm55Q28vaGVsbG8vZXZlbnRzLzBlYjgzMTVhOTIwMGVjMDAvIl19.3Tl3VKOxGGq-W6-ODFPJd4ICIfg-j2DJrBACRe1g4LY
> Content-Length: 63
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: Cache-Control, Content-Language, Content-Length, Content-Type, Expires, Last-Modified, Pragma, Authorization
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Access-Control-Allow-Origin: http://localhost:3000
< Content-Type: application/json
< Grpc-Metadata-Content-Type: application/grpc
< Date: Mon, 16 Sep 2024 02:35:03 GMT
< Transfer-Encoding: chunked
< 
{"result":{"header":{"processId":{"zone":"local","suffix":"cool-terrapin"},"route":{"primary":-1},"etcd":{"clusterId":"14841639068965178418","memberId":"10276657743932975437","revision":"85","raftTerm":"2"}},"journals":[{"spec":{"name":"johnnyCo/hello/events/0eb8315a9200ec00/pivot=00","replication":3,"labels":{"labels":[{"name":"app.gazette.dev/managed-by","value":"estuary.dev/flow"},{"name":"content-type","value":"application/x-ndjson"},{"name":"estuary.dev/build","value":"0eb8315a9700019a"},{"name":"estuary.dev/collection","value":"johnnyCo/hello/events"},{"name":"estuary.dev/key-begin","value":"00000000"},{"name":"estuary.dev/key-end","value":"ffffffff"}]},"fragment":{"length":"536870912","compressionCodec":"GZIP","stores":["gs://estuary-trial/collection-data/"],"refreshInterval":"300s","pathPostfixTemplate":"utc_date={{.Spool.FirstAppendTime.Format \"2006-01-02\"}}/utc_hour={{.Spool.FirstAppendTime.Format \"15\"}}"},"flags":4,"maxAppendRate":"4194304"},"modRevision":"83","route":{"members":[{"zone":"local","suffix":"cool-terrapin"}],"endpoints":["http://localhost:8080"]},"createRevision":"83"}]}}
* Connection #0 to host localhost left intact

@jgraettinger jgraettinger force-pushed the johnny/dpg branch 3 times, most recently from 03f0e76 to ca30917 Compare September 17, 2024 20:35
Track managed data-plane metadata that users care about.

Also add an `enable_l2` toggle to toggle whether data-planes are
included in L2 roll-ups.
Add `evaluate()` for evaluation of a generic policy, while encapsulating
Snapshot refresh and retry semantics. We'll use this shortly for
additional authorization APIs.
Add UserGrants table.

Refactor RBAC search into a joint search that's generalized over
both user and role grants.
…utes

`/authorize/user/task` enables UI shard listings/status and retrieval
of task logs, as well as access to private connector networking.

`/authorize/user/collection` enables UI journal listing and data preview.

Both offer temporary support for the current data-plane-gateway,
which implements legacy authorization checks using claimed prefixes.

Also introduce an address rewrite mechanism for mapping an internal
data-plane legacy service address into the data-plane-gateway address in
external call contexts.

Issue #1627
Don't configure Router with a default service address.

Instead, journal and shard Client instances are configured with a
default service address and metadata which is _used_ by Router when
picking a route.

This makes it possible to cheaply clone Client instances and give each a
different service address and authorization header, while still using
the same underlying pool of gRPC connections.
…ions

This change introduces the agent API to `flowctl`, which is the
proverbial straw which motivated a deeper refactor of flowctl
configuration.

As a headline feature, `flowctl` supports the new task and collection
authorization APIs and uses them in support of serving existing
subcommands for reading collections, previews, and read ops logs or
stats.

Clean up management of access and refresh tokens by obtaining access
tokens or generating refresh tokens prior to calling into a particular
sub-command. Preserve the ability to run `flowctl` in an unauthenticated
mode.

Make it easier to use `flowctl` against a local stack by introducing
alternative defaults when running under a "local" profile.

Also fix handling of single-use refresh tokens, where we must retain the
updated secret after using it to generate a new access token. We could
now consider having `flowctl` create single-use refresh tokens rather
than multi-use ones, but I didn't want to take that step just yet.

Also fix mis-ordering of output when reading journals.

Also fix OffsetNotYetAvailable error when reading a journal in non-blocking mode.

Issue #1627
Move connector networking entirely into this repo, from the legacy
data-plane-gatweay repo, and significantly retool it along the way to:

* Improve latency and throughput of HTTP reverse-proxy cases,
  by allowing the reverse proxy to use multiple pooled connections
  built atop network proxy RPCs with reasonable idle timeouts.

  This improves concurrency as many HTTP/2 requests can be in flight at
  once, and improves latency to the end user by ammortizing connections
  to reduce aggregate TCP and TLS startup time.

* Improve user-facing error experience around misconfigurations,
  by often assuming an HTTP protocol and yielding a more informative
  error.

* Overhauling metrics that we collect.

* Updating the authorization flow, laying groundwork for the
  UI to use the /authorize/user/task API (but not requiring it just
  yet).
Bring in Gazette updates for grpc-web gateways and net.Listener customization.

Deeply rework Tiltfile to remove data-plane-gateway,
by creating a self-signed TLS CA and Certificate that are used by the
broker and reactor (and may be used by other services if desired).

Use a naming strategy of `thing.flow.localhost`, because many libraries
like rustls don't accept wildcard certs of a toplevel `*.localhost`
but will happily accept `*.flow.localhost`.

Update for changes to --broker.allow-origin, --consumer.allow-origin,
and --flow.dashboard, as well as extracting explicit arguments into
environment variables.
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and good riddance to DPG! 🪦

alter table data_planes add column cidr_blocks cidr[] not null default '{}';
alter table data_planes add column enable_l2 boolean not null default false;
alter table data_planes add column gcp_service_account_email text;
alter table data_planes add column ssh_private_key text;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These could use some comment on columns


alter table data_planes add column aws_iam_user_arn text;
alter table data_planes add column cidr_blocks cidr[] not null default '{}';
alter table data_planes add column enable_l2 boolean not null default false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for excluding a data plane from the L2 rollups? Is this for increased data privacy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed in VC, but this is to be able to temporarily turn off data planes without it breaking L2 reporting.

axum::Extension(claims): axum::Extension<super::ControlClaims>,
super::Request(request): super::Request<Request>,
) -> axum::response::Response {
super::wrap(async move { do_authorize_user_task(&app, claims, request).await }).await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally fine to merge as is, but I wanted to mention this because I'd seen a comment you'd left about it: Axum lets you impl IntoResponse for an error type, which allows your handler to just return a Result<T, E> where T and E both implement IntoResponse. They have an example of this here

pub user_refresh_token: Option<RefreshToken>,

#[serde(skip)]
is_local: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, just tossing out the idea: Instead of all the matches in the get_* functions, it might be a bit cleaner to have a struct like

struct ProfileDefaults {
  agent_url: Url,
  dashboard_url: Url,
  ...
}

and

#[serde(skip)]
profile_defaults: ProfileDefaults

}
cancel() // We don't use the returned context.

/* TODO(johnny): Inspect claims once UI is updated to use /authorize/user/task API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm my uderstanding is that we won't actually verify authZ at this stage, but it won't actually allow unauthorized access because this endpoint can't be reached in our current public data plane. So we'll need to update the UI to use the new authorize api, and then enforce authZ here before setting up any of the new private data planes. That sound right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right

userHandledCounter.WithLabelValues(task, port, proto, "OK").Inc()
}

func (f *Frontend) serveConnErr(conn net.Conn, status int, body string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

} else if len(shards) == 0 {
conn.sniErr = errors.New("the requested subdomain does not match a known task and port combination")
} else {
conn.resolved = newResolvedSNI(conn.parsed, &shards[0].Spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for now, but it might not be ideal to always pick the first shard in cases where there's multiple shards and the hostname doesn't specify a key+rclock. It seems better to pick on randomly in order to distribute the load. I don't feel super confident that random selection would be ideal either, but wanted to raise the idea since splitting source-http-ingest shards had been on my mind lately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used to dial a shard, it's only used to fetch task metadata, and any shard will do -- we're just plucking out the shard ID , port protocol, and whether it's public.

We list shards again and randomize them when picking a primary to dial.

@jgraettinger jgraettinger merged commit 0034fa3 into master Sep 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:planned This is a planned change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants