Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically mount service account tokens when needed #3888

Merged
merged 6 commits into from
Jun 12, 2024

Conversation

piax93
Copy link
Contributor

@piax93 piax93 commented Jun 5, 2024

I thought I would need to jump through many hoops, but this simple of a change should work.

  • For new service instances, the list of volume gets updated dynamically on top of what is loaded from the config, so everything easy peasy there.
  • For existing instances, I saw that the configuration hash is calculated on the whole pod spec, minus a couple of fields, but definitely including volumes, so if a service which was not required to do auth, and some point becomes required, the hash should mismatch and a new pod get deployed.

The only thing I'm not crazy about is hardcoding that "authenticating.yaml". Maybe I could place that in a field of the system config? Although that may be a way to add a step without much of a benefit. Very open to suggestions here.

@piax93 piax93 force-pushed the u/mpiano/SEC-18955_deploy branch from fa5aab9 to c36e89d Compare June 5, 2024 15:27
@danielpops
Copy link
Member

For existing instances, I saw that the configuration hash is calculated on the whole pod spec, minus a couple of fields, but definitely including volumes, so if a service which was not required to do auth, and some point becomes required, the hash should mismatch and a new pod get deployed.

One thought that comes to mind is that now a change in ServiceA's yelpsoa-configs can cause a bounce for ServiceB, which I think is a necessary evil for this, but it's worth thinking about how we can make this clear to users since it likely wouldn't be intuitive (maybe jenkins posts this detail as a PR comment or something). I also could envision eventually unconditionally mounting the token for all services, regardless of whether they currently need to authenticate or not, which would alleviate this problem but could have other downsides (e.g. additional load on api server).

tests/test_kubernetes_tools.py Outdated Show resolved Hide resolved
@danielpops
Copy link
Member

The only thing I'm not crazy about is hardcoding that "authenticating.yaml". Maybe I could place that in a field of the system config? Although that may be a way to add a step without much of a benefit. Very open to suggestions here.

No strong opinions here. Right now afaict this value is only in this repo and in puppet, which isn't too widespread and probably fine to be hard coded.

@nemacysts
Copy link
Member

nemacysts commented Jun 6, 2024

One thought that comes to mind is that now a change in ServiceA's yelpsoa-configs can cause a bounce for ServiceB, which I think is a necessary evil for this, but it's worth thinking about how we can make this clear to users since it likely wouldn't be intuitive (maybe jenkins posts this detail as a PR comment or something).

@danielpops i think as long as we're not bouncing every service when this happens, this isn't the end of the world: part of the paasta contract is that a paasta instance can be restarted at any time (see here: PaaSTA reserves the right to cause a bounce (our term for when all of your service’s containers are replaced) on your service at any time.

(which isn't to say we should yolo restart - but we do generally expect that service bounces are safe at any time)

@nemacysts
Copy link
Member

@danielpops @piax93 i guess i should confirm: what's the worst case scenario for how many services would be bounced when updating the auth rules for 1 service? could we end up bouncing everything should there be a change that involves internalapi?

@nemacysts
Copy link
Member

@danielpops thanks for the example - that makes things a lot clearer. i think in steady-state operation this probably wouldn't be too bad, but while we're rolling out this might cause a lot of churn (and some services like y-m are a bit noisy when restarted - which is made worse by the fact that it has many callers so it would presumably be in many authorization.yamls)

@nemacysts
Copy link
Member

(i've started a CI thread to see what folks think here)

@piax93
Copy link
Contributor Author

piax93 commented Jun 7, 2024

@nemacysts @danielpops I think one easy solution for the issue is to keep a separate list of critical services which will always have the token mounted, so we can be sure they are not affected by someone referencing/removing them from an authorization config.

@danielpops
Copy link
Member

@nemacysts a bounce will only happen when a service that previously was not referenced in any other service's authorization.yaml is added OR the opposite, a service previously in any other service's authorization.yaml has all references removed.

Examples

Service starts being referenced

security_happyhour/authorization.yaml starting state:

authorization:
  enabled: true
  rules:
  - name: service-level granularity
    description: The description
    identity_groups:
      services:
        - yelp_main

anti_phishing is added to a rule:

authorization:
  enabled: true
  rules:
  - name: service-level granularity
    description: The description
    identity_groups:
      services:
        - yelp_main
        - anti_phishing

When this ships it will cause anti_phishing to bounce

Service no longer referenced

security_happyhour/authorization.yaml starting state:

authorization:
  enabled: true
  rules:
  - name: service-level granularity
    description: The description
    identity_groups:
      services:
        - yelp_main
        - anti_phishing

yelp_main removed from authorization.yaml file and not referenced in any other authorization.yaml files:

authorization:
  enabled: true
  rules:
  - name: service-level granularity
    description: The description
    identity_groups:
      services:
        - anti_phishing

When this ships it will cause yelp_main to bounce

@Yelp Yelp deleted a comment from danielpopskandor Jun 7, 2024
@Yelp Yelp deleted a comment from danielpopskandor Jun 7, 2024
@danielpops
Copy link
Member

@nemacysts if yelp-main is in 10 authorization.yaml's and removed from one of them, a bounce wouldn't happen. The bounce would only happen when yelp-main is removed from all of them, since it would no longer have the projected token volume mount causing a bounce. so from that perspective, the removal is a much less likely scenario that we'd encounter.

The main problem is that the unsuspecting PR author and reviewer probably wouldn't realize that adding yelp-main to service-a's authorization.yaml could cause a bounce for yelp-main

@piax93 piax93 force-pushed the u/mpiano/SEC-18955_deploy branch from 50fc76e to f997e49 Compare June 10, 2024 09:20
@piax93 piax93 force-pushed the u/mpiano/SEC-18955_deploy branch from a6df45f to 7e86692 Compare June 10, 2024 12:54
if (
token_config
and service_name in get_authenticating_services(soa_dir)
and not any(volume == token_config for volume in config_volumes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it might be worth doing a more generic dedupe in the future - presumably having user-supplied duplicates is also something we'd like to avoid on top of preventing folks from colliding with the auto-mounted token-config volume

i.e., we probably wanna figure out a central place to dedupe all volume configs and handle all of them consistently :p

def get_pod_volumes(
has us dedupe a couple different types of volume mounts, but not all of them ;_;

Copy link
Contributor

@jfongatyelp jfongatyelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate that we'll be able to control this set of always-authenticating services in puppet/system configs. hopefully there aren't too many services that can't deal with 'surprise' bounces well.

def get_projected_sa_volumes(self) -> List[ProjectedSAVolume]:
config_volumes = super().get_projected_sa_volumes()
token_config = (
load_system_paasta_config().get_service_auth_token_volume_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So each service needs only a single token if it needs to auth to any number of services?

And the token format will be basically the same across all services (coming from a system-level paasta config, not service/instance-level)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "PaaSTA Services" of CEP-2167 describes in more detail, but in short, yes. The only thing that we can actually control is the token audience, a that's going to be the same everywhere. Identity is then inferred from other metadata fields.

Comment on lines +2707 to +2708
def get_service_auth_token_volume_config(self) -> ProjectedSAVolume:
return self.config_dict.get("service_auth_token_settings", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol this difference in name is going to trip me up in the future, I just know it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, apologies, I'm a terrible human being. I created this config first thinking that each JSON file is loaded in a different "namespace", but then realized that everything gets flattened out, so started adding words over words to the key.

@piax93 piax93 merged commit bc305ea into master Jun 12, 2024
10 checks passed
@piax93 piax93 deleted the u/mpiano/SEC-18955_deploy branch June 12, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants