Skip to content

This repository contains comments that have been annotated for attributes that correlate with prosocial or antisocial outcomes in online conversation.

License

Notifications You must be signed in to change notification settings

conversationai/Bridging-Comments-Benchmark-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Bridging Comments Benchmark Dataset

This repository currently contains 11,973 comments that have been annotated for attributes that correlate with prosocial or constructive outcomes in online conversation. These attributes are: Reasoning, Curiosity, Respect, Compassion, Alienation, and Moral Outrage

This work is a collaboration among:

  • Ruta Wheelock and Scott Friedman at SIFT, a research and development consulting company that uses NLP and other technologies to make the information flow between humans and technology better for both sides,
  • Sonja Schmer- Galunder, Glenn and Deborah Renwick Leadership Professor in AI and Ethics at the University of Florida, and
  • Zaria Jalan, Alyssa Chvasta, and Emily Saltz as part of the Conversation AI project, a collaborative research effort at Jigsaw exploring ML as a tool for better discussions online, and

Background

Current annotation practices pose many issues, ranging from Western-centric bias, poor working conditions, risks from exploitative power imbalance, and diverse representation among annotators. Within the machine learning community, a focus on data-hungry models and optimization for interrater reliability has led to a focus on data quantity over data quality. However, the process of data-labeling, especially when labeling more complex linguistic constructs like moral justifications of harms, intentionality or constructive conversations, is a highly qualitative task of induction and meaning, often requiring social and cultural knowledge of the context it is embedded in. Definitions for constructs that are theory-driven, albeit well informed in an academic sense, often clash with the intuitive understanding an annotator may have.

In a forthcoming paper, we will describe the results of our annotation work to address some of the problems mentioned above, describing qualitative and quantitative methods for increasing interrater reliability while improving conceptual understanding as well as taking the situatedness of annotation workers and their working conditions into consideration. We publish here the resulting benchmark dataset for assessing constructive conversations.

Methods

Curation and labeling

The dataset is composed of 11,973 comments from Civil Comments, a publicly available dataset of comments from independent and international news sites that were created from 2015–2017. The data was labeled for six attributes: constructive, curiosity, respect, empathy, alienation, and moral outrage. Due to the low prevalence of these attributes, the data that was annotated was first scored by a proprietary model and then filtered by score to ensure a higher proportion of in-class comments. The data was sent to a pool of 7 annotators in three batches which allowed for iterative data sampling improvements as time went on, and later batches of the data constrain the minimum and maximum text length and limit the amount of text dealing with Canadian politics by dropping the comments containing the terms “Trudeau” and “Canada”. Each comment received 4 annotations frpm the annotator pool. Additionally 698 of the 11973 examples have identity terms labeled within the Civil Comments Dataset.

Definitions

Label Definition
Reasoning Makes specific or well-reasoned points to provide a fuller understanding of the topic without disrespect or provocation.
Curiosity Attempts to clarify or ask follow-up questions to better understand another person or topic.
Respect Shows deference or appreciation to others, or acknowledges the validity of another person.
Compassion Expressions of care and concern for others, understanding the feelings or viewpoint of others, including support or condolences.
Alienation Portrays someone as inferior, implies a lack of belonging, or frames the statement in an us vs. them context.
Moral outrage Anger, disgust, or frustration directed toward other people or entities who seem to violate the author’s ethical values or standards.

Copyright and license

All data in this repository is made available under the Creative Commons Attribution 0 1.0 Universal license (CC0 1.0 DEED). A full copy of the license can be found at https://creativecommons.org/publicdomain/zero/1.0/

Bibliography

About

This repository contains comments that have been annotated for attributes that correlate with prosocial or antisocial outcomes in online conversation.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published