You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First and foremost, congratulations on this amazing initiative :). I've seen that there is a dialogue about incorporating RAI concepts into the project. Our research (mine, @abelgomez, and @jcabot) has taken a similar approach, developing a structured domain-specific language to describe datasets on top of RAI documentation initiatives such as the ones studied by the Croissant Task Force team (Datasheets for datasets…). It would be interesting to see if some parts of our proposal could facilitate the discussion about adopting RAI concepts.
For example, we found specific provenance aspects that could be interesting for users in terms of discovering/searching for data and evaluating it.
For instance:
A text dataset gathered from Australian speakers could drop the accuracy of a conversational ML model intended to work in the US because of different language styles. As a user searching for text data, the dataset’s target demographics are relevant to me.
A dataset annotated by a team of crowd workers (via Amazon Mechanical Turk) of one country could not fit my use case, as they, for example, may have geographical bias. Instead, you will prefer a similar dataset annotated by experts or authors.
While searching for data, users may be interested in aspects such as the profile of the many teams involved, the infrastructure used, or the type of annotation (bounding boxes, entity annotation) or gathering (physical data collection, secondary analysis).
I plan to attend the next meeting, so we can also meet there and see if this makes sense for the project.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
First and foremost, congratulations on this amazing initiative :). I've seen that there is a dialogue about incorporating RAI concepts into the project. Our research (mine, @abelgomez, and @jcabot) has taken a similar approach, developing a structured domain-specific language to describe datasets on top of RAI documentation initiatives such as the ones studied by the Croissant Task Force team (Datasheets for datasets…). It would be interesting to see if some parts of our proposal could facilitate the discussion about adopting RAI concepts.
For example, we found specific provenance aspects that could be interesting for users in terms of discovering/searching for data and evaluating it.
For instance:
While searching for data, users may be interested in aspects such as the profile of the many teams involved, the infrastructure used, or the type of annotation (bounding boxes, entity annotation) or gathering (physical data collection, secondary analysis).
I plan to attend the next meeting, so we can also meet there and see if this makes sense for the project.
Congrats again for the initiative!!.
Joan
P.D: We also developed a VSCode plugin to play with the DSL if you want to take a look: https://www.youtube.com/watch?app=desktop&v=Bf3bhWB-UJY
Beta Was this translation helpful? Give feedback.
All reactions