Passer à la navigation principale Passer à la recherche Passer au contenu principal

Multilingual sentence-level bias detection in Wikipedia

  • University of Montreal
  • Computer Research Institute of Montreal

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférenceParticipation à un ouvrage collectif lié à un colloque ou une conférenceRevue par des pairs

16 Citations (Scopus)

Résumé

We propose a multilingual method for the extraction of biased sentences from Wikipedia, and use it to create corpora in Bulgarian, French and English. Sifting through the revision history of the articles that at some point had been considered biased and later corrected, we retrieve the last tagged and the first untagged revisions as the before/after snapshots of what was deemed a violation of Wikipedia's neutral point of view policy. We extract the sentences that were removed or rewritten in that edit. The approach yields sufficient data even in the case of relatively small Wikipedias, such as the Bulgarian one, where 62k articles produced 5k biased sentences. We evaluate our method by manually annotating 520 sentences for Bulgarian and French, and 744 for English. We assess the level of noise and analyze its sources. Finally, we exploit the data with well-known classification methods to detect biased sentences. Code and datasets are hosted at https://github.com/crim-ca/wiki-bias.

langue originaleAnglais
titreInternational Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
rédacteurs en chefGalia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
EditeurIncoma Ltd
Pages42-51
Nombre de pages10
ISBN (Electronique)9789544520557
Les DOIs
étatPublié - 2019
Modification externeOui
Evénement12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgarie
Durée: 2 sept. 20194 sept. 2019

Série de publications

NomInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2019-September
ISSN (imprimé)1313-8502

Conférence

Conférence12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Pays/TerritoireBulgarie
La villeVarna
période2/09/194/09/19

Empreinte digitale

Voici les principaux termes ou expressions associés à « Multilingual sentence-level bias detection in Wikipedia ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation