Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEOS-1560: expose entities in api for transformpiitext #2836

Conversation

nickzelei
Copy link
Member

@nickzelei nickzelei commented Oct 22, 2024

  • adds new rpc in transformers service that exposes available pii entities
  • updates transform_pii_text config to allow specifying allowed entities

Copy link

linear bot commented Oct 22, 2024

Copy link

vercel bot commented Oct 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
neosync-docs ⬜️ Ignored (Inspect) Visit Preview Oct 22, 2024 11:15pm

@nickzelei nickzelei self-assigned this Oct 22, 2024
Copy link

github-actions bot commented Oct 22, 2024

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedOct 22, 2024, 11:16 PM

@@ -501,4 +519,9 @@ service TransformersService {
rpc IsTransformerNameAvailable(IsTransformerNameAvailableRequest) returns (IsTransformerNameAvailableResponse) {}
rpc ValidateUserJavascriptCode(ValidateUserJavascriptCodeRequest) returns (ValidateUserJavascriptCodeResponse) {}
rpc ValidateUserRegexCode(ValidateUserRegexCodeRequest) returns (ValidateUserRegexCodeResponse) {}

// Retrieve a list of available Pii recognizers for use with the TransformPiiText transformer
rpc GetTransformPiiRecognizers(GetTransformPiiRecognizersRequest) returns (GetTransformPiiRecognizersResponse) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here because it seemed more generic than just shoving it in the anonymization service since it is specific to the transformPiitext transformer.
wdyt @alishakawaguchi, @evisdrenova ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to the getEntities endpoint that we talked about or something else? If so, are we just using recognizers instead of entities everywhere? I like entities because that's industry standard i.e. Named Entity Recognition model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah actually good call, I'm surfacing the wrong function. I should be surfacing the entities, not the entity recognziers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is kind of beside my original question though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah back to your original point - IMO i'd put it in the anonymizer service since it retrieves the entities from the presidio api and functionally is currently only used for anonymization even if it is consumed by a transformer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only problem is that ceases to be true once we wire it up to the async job flow. it's a supporting value for a transformer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which i think is okay imo - you're still getting those values from presidio it's just being used by the transformer, for ex. we have the connection rpcs in the connection_data service even though it's an input into a job. but i think you can go either way with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the transformer service makes the most sense

// Configure a list of recognizers to be used for PII analysis. If not provided or empty, all recognizers are used
// If this is specified, any ad-hoc, or deny_recognizers names must also be provided.
// To see available builtin recognizers, call the GetPiiTextRecognizers() RPC method to see what is available for your account.
repeated string allowed_recognizers = 4 [(buf.validate.field).repeated = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about this name?

Will it be confusing when looking at the property above called deny_recognizers?

Should we have a different name here?
wdyt: @alishakawaguchi @evisdrenova ?

Maybe it makes more sense once we rename the deny_recognizers to just be recognizers or adhoc_recognizers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it shoudl be called something different since it's not technically a recognizer as we have it defined in the deny_recognizer from a type perspective. I think it might be be clearer to call it something like allowed_terms or allowed_words or something along that line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was calling it the wrong thing as I was exposing the wrong function. this is will more clearly be named allowed_entities now

@nickzelei nickzelei added the enhancement New feature or request label Oct 22, 2024
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

Attention: Patch coverage is 17.80822% with 60 lines in your changes missing coverage. Please review.

Project coverage is 30.36%. Comparing base (dd554a2) to head (f038598).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ces/mgmt/v1alpha1/transformers-service/entities.go 0.00% 24 Missing ⚠️
...alpha1/transformers-service/system_transformers.go 40.74% 16 Missing ⚠️
backend/internal/cmds/mgmt/serve/connect/cmd.go 0.00% 15 Missing ⚠️
...1/transformers-service/userdefined_transformers.go 0.00% 4 Missing ⚠️
...ices/mgmt/v1alpha1/transformers-service/service.go 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (dd554a2) and HEAD (f038598). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (dd554a2) HEAD (f038598)
2 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2836      +/-   ##
==========================================
- Coverage   37.33%   30.36%   -6.98%     
==========================================
  Files         296      297       +1     
  Lines       34693    34743      +50     
==========================================
- Hits        12953    10549    -2404     
- Misses      20210    23023    +2813     
+ Partials     1530     1171     -359     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nickzelei nickzelei marked this pull request as ready for review October 22, 2024 22:13
Comment on lines 38 to 39
return nil, fmt.Errorf("was unable to retrieve available recognizers: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("was unable to retrieve available recognizers: %w", err)
}
return nil, fmt.Errorf("was unable to retrieve available entities: %w", err)
}

Comment on lines 41 to 42
return nil, fmt.Errorf("received non-200 response from recognizer api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("received non-200 response from recognizer api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body))
}
return nil, fmt.Errorf("received non-200 response from entities api: %s %d %s", resp.Status(), resp.StatusCode(), string(resp.Body))
}

Comment on lines +634 to +636
baseSystemTransformerSourceMap = map[mgmtv1alpha1.TransformerSource]*mgmtv1alpha1.SystemTransformer{}

allSystemTransformersSourceMap = map[mgmtv1alpha1.TransformerSource]*mgmtv1alpha1.SystemTransformer{}
Copy link
Contributor

@evisdrenova evisdrenova Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later would be nice to show all the transformers regardless of plan but have some disabled with like an "upgrade now" button to incentivize users to upgrade in order to get access to those transformers if they're in like a personal plan

Copy link
Contributor

@evisdrenova evisdrenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just left a couple of small comments

@nickzelei nickzelei merged commit b43319b into main Oct 22, 2024
16 of 19 checks passed
@nickzelei nickzelei deleted the nick/neos-1560-expose-entities-allowdeny-in-api-for-transformpiitext branch October 22, 2024 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants