Feature/149-Add-benchmark-datasets #157

NoB0 · 2024-05-14T13:02:42Z

What's changed?

The organization in the folder data, dialogues in the DK format are now saved under data/datasets/<dataset_name>
Add a script to download ReDial dataset from source and process it to extract items, ratings, and dialogue formatted with regards to DialogueKit
Add a script to artificially augment ReDial dialogues with dialogue acts and information need. This is needed for the training of TUS
Item collections (incl. items and ratings) are now stored in data/item_collections

Part of #149

github-actions · 2024-05-14T13:12:14Z

Current Branch	Main Branch

kbalog

LGTM with some comments.

data/datasets/README.md

scripts/datasets/information_need_annotation/information_need_annotator.py

scripts/datasets/information_need_annotation/information_need_prompt_default.txt

scripts/datasets/information_need_annotation/information_need_annotator.py

scripts/datasets/redial/format_redial.py

NoB0 added 2 commits May 14, 2024 14:59

Reorganize data folder

60584c2

Add script to format ReDial dataset to DK format

b3dc9b9

NoB0 added 4 commits May 14, 2024 15:57

Fix pre-commit

0f951d4

Merge branch 'main' into feature/149-Add-benchmark-datasets

fd2405d

Add augmentation script

c754725

Remove unnecessary comments

7d4fd7c

NoB0 marked this pull request as ready for review October 1, 2024 09:45

NoB0 added 4 commits October 1, 2024 11:45

Formatting

5e6a271

Add type annotation

567ae66

Add serialization to IN

1be7048

Reorganize scripts

a666f3f

NoB0 requested a review from kbalog October 1, 2024 10:22

Quick fix

bccda4c

kbalog approved these changes Oct 1, 2024

View reviewed changes

Address review comments

c15bdb5

NoB0 merged commit f5bfd46 into main Oct 1, 2024
5 checks passed

NoB0 deleted the feature/149-Add-benchmark-datasets branch October 1, 2024 14:14

NoB0 mentioned this pull request Oct 1, 2024

Add benchmark datasets #149

Open