Skip to content

Latest commit

 

History

History
40 lines (36 loc) · 8.19 KB

additional-2.md

File metadata and controls

40 lines (36 loc) · 8.19 KB

Additional - 2 - User Data Interaction:

Additional-2 requirements are a set of requirements that must be met in addition to baseline if an AI/ML application if it is either using user data as input to the model or interacting with end-users in some way.

There are 19 questions in total, all of them being yes/no questions that should be filled out by the development team/ML engineer for the application. The last 3 questions are specific to generative AI applications only.

Questions highlighted in bold indicate those which require minor testing before the question can be answered.

CATEGORY DESCRIPTION
Data Is user behavioral data being collected/used as input to the model without their knowledge?
Is the training user data representative enough of different user population groups?
Does the input to the AI model contain any sensitive and often privileged information like health, legal, financial, biometric, etc.? Do you have processes in place to prevent loss or misconfiguration of this data?
Is your application collecting data from specific vulnerable or protected groups, like children, for example?
Is user data properly pre-processed (use recommended data splits, attribute information is made available, reduce biases in sampling, correct formatting, and metadata recording)?
Are users notified about collection and use of their data for training the AI/ML system?
Do users have choice in revoking consent to use their data for AI/ML input at any time?
Have retention policies been put in place for AI/ML training data that contains user information?
Artefact Can users ask questions or provide feedback about the output if it is incorrect?
Do you provide users with information on how the AI/ML application works? Have you tested to check if the output/actions of the model can be misinterpreted?
Have checks been performed to test the model output for different user groups? Is the correctness of the results documented?
Can one or more user groups be uniquely identified by the features of this model either directly or indirectly?
You might need to consult with the de-identification team to answer this question.
Does the output allowing logging recent interactions?
Can the predicted output contain user data?
Note that this user data need not be personal information - it can also be behavioral inferences, like shopping habits, or last movie watched, for instance.
Are users notified that they are interacting with an AI/ML system?
Are users provided with enough information about potential benefits and risks of using the AI/ML system?
Do users know the consequences of their input?
Such consequences can include but not limited to, the input being used as feedback to the model, or any other purposes. For example, notifying the user that hiding an advertisement could prevent them from receiving further tailored advertisements.
Model Are users affected adversely when the system fails? Have these failure modes that could affect users documented?
System/Infrastructure Are adequate security and privacy controls provided to users if they choose if they are interacting with the AI/ML model?
Generative AI
Artefact Do you check the scope of the output of the model? Can there be cases where the output can be counterproductive for users?
To prevent this, typically models limit the scope of what they produce as output. For example, a retail customer service chatbot should not provide healthcare information.
Can the AI/ML system automatically make decisions for a user without involving the user in the process?
For example, if there is an AI/ML system processing that some customers use more data than their original plan, can it automatically enroll them into a higher data plan without consulting the customer?
Model Does the AI system automatically label users? Do you check for correctness in the labels?
Incorrect labeling can lead to incorrect predictions.