-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEOS-1560: expose entities in api for transformpiitext #2836
NEOS-1560: expose entities in api for transformpiitext #2836
Conversation
nickzelei
commented
Oct 22, 2024
•
edited
Loading
edited
- adds new rpc in transformers service that exposes available pii entities
- updates transform_pii_text config to allow specifying allowed entities
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
@@ -501,4 +519,9 @@ service TransformersService { | |||
rpc IsTransformerNameAvailable(IsTransformerNameAvailableRequest) returns (IsTransformerNameAvailableResponse) {} | |||
rpc ValidateUserJavascriptCode(ValidateUserJavascriptCodeRequest) returns (ValidateUserJavascriptCodeResponse) {} | |||
rpc ValidateUserRegexCode(ValidateUserRegexCodeRequest) returns (ValidateUserRegexCodeResponse) {} | |||
|
|||
// Retrieve a list of available Pii recognizers for use with the TransformPiiText transformer | |||
rpc GetTransformPiiRecognizers(GetTransformPiiRecognizersRequest) returns (GetTransformPiiRecognizersResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this here because it seemed more generic than just shoving it in the anonymization service since it is specific to the transformPiitext transformer.
wdyt @alishakawaguchi, @evisdrenova ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to the getEntities endpoint that we talked about or something else? If so, are we just using recognizers instead of entities everywhere? I like entities because that's industry standard i.e. Named Entity Recognition model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah actually good call, I'm surfacing the wrong function. I should be surfacing the entities, not the entity recognziers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is kind of beside my original question though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah back to your original point - IMO i'd put it in the anonymizer service since it retrieves the entities from the presidio api and functionally is currently only used for anonymization even if it is consumed by a transformer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only problem is that ceases to be true once we wire it up to the async job flow. it's a supporting value for a transformer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which i think is okay imo - you're still getting those values from presidio it's just being used by the transformer, for ex. we have the connection rpcs in the connection_data service even though it's an input into a job. but i think you can go either way with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the transformer service makes the most sense
// Configure a list of recognizers to be used for PII analysis. If not provided or empty, all recognizers are used | ||
// If this is specified, any ad-hoc, or deny_recognizers names must also be provided. | ||
// To see available builtin recognizers, call the GetPiiTextRecognizers() RPC method to see what is available for your account. | ||
repeated string allowed_recognizers = 4 [(buf.validate.field).repeated = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we think about this name?
Will it be confusing when looking at the property above called deny_recognizers
?
Should we have a different name here?
wdyt: @alishakawaguchi @evisdrenova ?
Maybe it makes more sense once we rename the deny_recognizers
to just be recognizers
or adhoc_recognizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think it shoudl be called something different since it's not technically a recognizer as we have it defined in the deny_recognizer
from a type perspective. I think it might be be clearer to call it something like allowed_terms
or allowed_words
or something along that line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I was calling it the wrong thing as I was exposing the wrong function. this is will more clearly be named allowed_entities
now
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2836 +/- ##
==========================================
- Coverage 37.33% 30.36% -6.98%
==========================================
Files 296 297 +1
Lines 34693 34743 +50
==========================================
- Hits 12953 10549 -2404
- Misses 20210 23023 +2813
+ Partials 1530 1171 -359 ☔ View full report in Codecov by Sentry. |
return nil, fmt.Errorf("was unable to retrieve available recognizers: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("was unable to retrieve available recognizers: %w", err) | |
} | |
return nil, fmt.Errorf("was unable to retrieve available entities: %w", err) | |
} |
return nil, fmt.Errorf("received non-200 response from recognizer api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("received non-200 response from recognizer api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body)) | |
} | |
return nil, fmt.Errorf("received non-200 response from entities api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body)) | |
} |
baseSystemTransformerSourceMap = map[mgmtv1alpha1.TransformerSource]*mgmtv1alpha1.SystemTransformer{} | ||
|
||
allSystemTransformersSourceMap = map[mgmtv1alpha1.TransformerSource]*mgmtv1alpha1.SystemTransformer{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later would be nice to show all the transformers regardless of plan but have some disabled with like an "upgrade now" button to incentivize users to upgrade in order to get access to those transformers if they're in like a personal plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just left a couple of small comments