Skip to content

An Icelandic Error corpus, annotated for mistakes related to spelling, grammar, and other issues.

Notifications You must be signed in to change notification settings

antonkarl/iceErrorCorpus

Repository files navigation

Icelandic Error Corpus (IceEC)

Version 1.1

Copyright 2021 Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, Xindan Xu.

Repository: https://github.com/antonkarl/iceErrorCorpus

Contact: [email protected]

License: Creative Commons Attribution 4.0 International (CC BY 4.0; See repository for text)

The Icelandic Error Corpus (IceEC) is a collection of texts in modern Icelandic annotated for mistakes related to spelling, grammar, and other issues. The texts are organized by genre. The current version includes sentences from student essays, online news texts and Wikipedia articles.

Sentences within texts in the student essays had to be shuffled due to the license which they were originally published under, but neither the online news texts nor the Wikipedia articles needed to be shuffled.

Citing the Error Corpus:

Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, and Xindan Xu. 2021. The Icelandic Error Corpus (IceEC). Version 1.1. (https://github.com/antonkarl/iceErrorCorpus)

The project is funded by the Icelandic Government as a part of the Language Technology Programme for Icelandic 2019–2023 which is described in the following publication:

Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf)

Íslensk villumálheild (IceEC)

Útgáfa 1.1

Höfundarréttur 2021 Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, Xindan Xu.

Slóð: https://github.com/antonkarl/iceErrorCorpus

Tengiliður: [email protected]

Leyfi: Creative Commons Attribution 4.0 International (CC BY 4.0; sjá leyfistexta í gagnaskjóðu).

Íslenska villumálheildin er safn texta á nútímaíslensku sem eru merktir fyrir villum, t.d. hvað varðar stafsetningu, málfræði og fleira. Textarnir eru flokkaðir eftir textategund. Þessi útgáfa inniheldur málsgreinar úr nemendaritgerðum, fréttum af vefmiðlum og greinum af Wikipedia.

Setningum í nemendaritgerðum þurfti að stokka upp vegna leyfis sem þær voru upphaflega gefnar út með en ekki þurfti að stokka setningum í fréttatextum eða greinunum af Wikipedia.

Vísun í heimildaskrá:

Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, and Xindan Xu. 2021. The Icelandic Error Corpus (IceEC). Version 1.1. (https://github.com/antonkarl/iceErrorCorpus)

Þetta verkefni er fjármagnað af ríkissjóði Íslands sem hluti af Máltækniáætlun fyrir íslensku 2019-2023. Máltækniáætluninni er nánar lýst í eftirfarandi grein:

Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf)

About

An Icelandic Error corpus, annotated for mistakes related to spelling, grammar, and other issues.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published