TY - GEN
T1 - Turning silver into gold
T2 - 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
AU - Ménard, Pierre André
AU - Mougeot, Antoine
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics (ACL). All rights reserved.
PY - 2019
Y1 - 2019
N2 - While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.
AB - While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.
UR - https://www.scopus.com/pages/publications/85076477516
U2 - 10.26615/978-954-452-056-4_088
DO - 10.26615/978-954-452-056-4_088
M3 - Contribution to conference proceedings
AN - SCOPUS:85076477516
T3 - International Conference Recent Advances in Natural Language Processing, RANLP
SP - 758
EP - 767
BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
A2 - Angelova, Galia
A2 - Mitkov, Ruslan
A2 - Nikolova, Ivelina
A2 - Temnikova, Irina
A2 - Temnikova, Irina
PB - Incoma Ltd
Y2 - 2 September 2019 through 4 September 2019
ER -