Turning silver into gold: Error-focused corpus reannotation with active learning

Pierre André Ménard; Antoine Mougeot

doi:10.26615/978-954-452-056-4_088

Turning silver into gold: Error-focused corpus reannotation with active learning

Pierre André Ménard
, Antoine Mougeot

Computer Research Institute of Montreal

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférence › Participation à un ouvrage collectif lié à un colloque ou une conférence › Revue par des pairs

3 Citations (Scopus)

Résumé

While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.

langue originale	Anglais
titre	International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
rédacteurs en chef	Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
Editeur	Incoma Ltd
Pages	758-767
Nombre de pages	10
ISBN (Electronique)	9789544520557
Les DOIs	https://doi.org/10.26615/978-954-452-056-4_088
état	Publié - 2019
Modification externe	Oui
Evénement	12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgarie Durée: 2 sept. 2019 → 4 sept. 2019

Série de publications

Nom	International Conference Recent Advances in Natural Language Processing, RANLP
Volume	2019-September
ISSN (imprimé)	1313-8502

Conférence

Conférence	12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Pays/Territoire	Bulgarie
La ville	Varna
période	2/09/19 → 4/09/19

Accès au document

10.26615/978-954-452-056-4_088

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « Turning silver into gold: Error-focused corpus reannotation with active learning ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

Ménard, P. A., & Mougeot, A. (2019). Turning silver into gold: Error-focused corpus reannotation with active learning. Dans G. Angelova, R. Mitkov, I. Nikolova, I. Temnikova, & I. Temnikova (eds.), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings (p. 758-767). (International Conference Recent Advances in Natural Language Processing, RANLP; Vol 2019-September). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_088

Ménard, Pierre André ; Mougeot, Antoine. / Turning silver into gold : Error-focused corpus reannotation with active learning. International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Editeur / Galia Angelova ; Ruslan Mitkov ; Ivelina Nikolova ; Irina Temnikova ; Irina Temnikova. Incoma Ltd, 2019. p. 758-767 (International Conference Recent Advances in Natural Language Processing, RANLP).

@inproceedings{39dc6d4ce17b4c3aa6f401270c19ed6d,

title = "Turning silver into gold: Error-focused corpus reannotation with active learning",

abstract = "While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71\% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82\% compared to other query approaches.",

author = "M{\'e}nard, \{Pierre Andr{\'e}\} and Antoine Mougeot",

note = "Publisher Copyright: {\textcopyright} 2019 Association for Computational Linguistics (ACL). All rights reserved.; 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 ; Conference date: 02-09-2019 Through 04-09-2019",

year = "2019",

doi = "10.26615/978-954-452-056-4\_088",

language = "English",

series = "International Conference Recent Advances in Natural Language Processing, RANLP",

publisher = "Incoma Ltd",

pages = "758--767",

editor = "Galia Angelova and Ruslan Mitkov and Ivelina Nikolova and Irina Temnikova and Irina Temnikova",

booktitle = "International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings",

}

Ménard, PA & Mougeot, A 2019, Turning silver into gold: Error-focused corpus reannotation with active learning. Dans G Angelova, R Mitkov, I Nikolova, I Temnikova & I Temnikova (eds), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. International Conference Recent Advances in Natural Language Processing, RANLP, VOL. 2019-September, Incoma Ltd, p. 758-767, 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgarie, 2/09/19. https://doi.org/10.26615/978-954-452-056-4_088

Turning silver into gold: Error-focused corpus reannotation with active learning. / Ménard, Pierre André; Mougeot, Antoine.
International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Ed. / Galia Angelova; Ruslan Mitkov; Ivelina Nikolova; Irina Temnikova; Irina Temnikova. Incoma Ltd, 2019. p. 758-767 (International Conference Recent Advances in Natural Language Processing, RANLP; Vol 2019-September).

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférence › Participation à un ouvrage collectif lié à un colloque ou une conférence › Revue par des pairs

TY - GEN

T1 - Turning silver into gold

T2 - 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019

AU - Ménard, Pierre André

AU - Mougeot, Antoine

PY - 2019

Y1 - 2019

N2 - While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.

AB - While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.

UR - https://www.scopus.com/pages/publications/85076477516

U2 - 10.26615/978-954-452-056-4_088

DO - 10.26615/978-954-452-056-4_088

M3 - Contribution to conference proceedings

AN - SCOPUS:85076477516

T3 - International Conference Recent Advances in Natural Language Processing, RANLP

SP - 758

EP - 767

BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings

A2 - Angelova, Galia

A2 - Mitkov, Ruslan

A2 - Nikolova, Ivelina

A2 - Temnikova, Irina

PB - Incoma Ltd

Y2 - 2 September 2019 through 4 September 2019

ER -

Ménard PA, Mougeot A. Turning silver into gold: Error-focused corpus reannotation with active learning. Dans Angelova G, Mitkov R, Nikolova I, Temnikova I, Temnikova I, rédacteurs en chef, International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Incoma Ltd. 2019. p. 758-767. (International Conference Recent Advances in Natural Language Processing, RANLP). doi: 10.26615/978-954-452-056-4_088