Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAMSL tags don't match #1

Open
lzfelix opened this issue Jan 14, 2018 · 5 comments
Open

DAMSL tags don't match #1

lzfelix opened this issue Jan 14, 2018 · 5 comments

Comments

@lzfelix
Copy link

lzfelix commented Jan 14, 2018

Initially I would like to thank you for making this code available.

On section 1c of the coder's manual we can see the table with the 42 clustered labels, although it has 43 rows, as you mention on your page. However, one of these classes is "% -", which can't be found on the dataset (I've performed a scan on it, and 0 matches were found). If the classes "% -" and "%" are merged (since both have a similar meaning), we are back to 42 classes as desired. This seemed to be done on Stolcke et al. [1] paper, as shown on Table 2. I've also noticed that on your page, the "% -" has the same full count as "%".

[1] Stolcke, Andreas, et al. "Dialogue act modeling for automatic tagging and recognition of conversational speech." Computational linguistics 26.3 (2000): 339-373.

Thanks.

@ruizheliUOA
Copy link

@lzfelix You are right. When I processed this dataset, I also did not find any "% -" tag in that dataset. Meanwhile, I did not find how to process "+" tag in the dataset. Because there is no "+" tag in those 42 tags, but the number of "+" tag is over 10,000 in the original dataset. Whether the "+" tag is replaced with the corresponding tag of previous utterance from the same speaker?

@tnlin
Copy link

tnlin commented Aug 25, 2018

@ruizheliUOA same here, don't know what to do with "+" tag. Having check many paper but get no idea... replacing with the corresponding tag of previous utterance from the same speaker seems reasonable.

Reference:
1997_Switchboard SWBD-DAMSL Shallow-Discourse-Function Annotation
2000_Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
2017_ Unsupervised Dialogue Act Induction using Gaussian Mixtures

@lzfelix
Copy link
Author

lzfelix commented Aug 25, 2018

To my understanding, you can either do that or simply disregard these utterances, depending on your problem.

@tnlin
Copy link

tnlin commented Sep 11, 2018

FYI, This paper mention about label "+" (finally...)
AAAI 2005 Dialogue Act Classification Based on Intra-Utterance Features
http://staffwww.dcs.shef.ac.uk/people/Y.Wilks/papers/AAAI05_A.pdf

@Shrinidhi-C
Copy link

Utterances marked as + are interrupted conversations
These should be concatenated with continued dialogue of interrupted one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants