Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset

Thanh Dung Le; Philippe Jouvet; Rita Noumeir

doi:10.1109/JTEHM.2025.3576570

Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset

Thanh Dung Le
, Philippe Jouvet
, Rita Noumeir

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

1 Citation (Scopus)

Résumé

Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts, especially with constrained computational resources. This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital. The MoE-Transformer addresses the dual challenges of effective training with limited data and low-resource computation suitable for in-house hospital use. Despite the success of biomedical pre-trained models such as CamemBERT-bio, DrBERT, and AliBERT, their high computational demands make them impractical for many clinical settings. Our MoE-Transformer model not only outperforms DistillBERT, CamemBERT, FlauBERT, and Transformer models on the same dataset but also achieves impressive results: an accuracy of 87%, precision of 87%, recall of 85%, and F1-score of 86%. While the MoE-Transformer does not surpass the performance of biomedical pre-trained BERT models, it can be trained at least 190 times faster, offering a viable alternative for settings with limited data and computational resources. Although the MoE-Transformer addresses challenges of generalization gaps and sharp minima, demonstrating some limitations for efficient and accurate clinical text classification, this model still represents a significant advancement in the field. It is particularly valuable for classifying small French clinical narratives within the privacy and constraints of hospital-based computational resources.

langue originale	Anglais
Pages (de - à)	261-274
Nombre de pages	14
journal	IEEE Journal of Translational Engineering in Health and Medicine
Volume	13
Les DOIs	https://doi.org/10.1109/JTEHM.2025.3576570
état	Publié - 2025

Accès au document

10.1109/JTEHM.2025.3576570

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

@article{48d2360c6ce145418e74be7769029536,

title = "Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset",

abstract = "Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts, especially with constrained computational resources. This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital. The MoE-Transformer addresses the dual challenges of effective training with limited data and low-resource computation suitable for in-house hospital use. Despite the success of biomedical pre-trained models such as CamemBERT-bio, DrBERT, and AliBERT, their high computational demands make them impractical for many clinical settings. Our MoE-Transformer model not only outperforms DistillBERT, CamemBERT, FlauBERT, and Transformer models on the same dataset but also achieves impressive results: an accuracy of 87\%, precision of 87\%, recall of 85\%, and F1-score of 86\%. While the MoE-Transformer does not surpass the performance of biomedical pre-trained BERT models, it can be trained at least 190 times faster, offering a viable alternative for settings with limited data and computational resources. Although the MoE-Transformer addresses challenges of generalization gaps and sharp minima, demonstrating some limitations for efficient and accurate clinical text classification, this model still represents a significant advancement in the field. It is particularly valuable for classifying small French clinical narratives within the privacy and constraints of hospital-based computational resources.",

keywords = "BERT, Clinical natural language processing, Mixture of Expert, Transformer, cardiac failure",

author = "Le, \{Thanh Dung\} and Philippe Jouvet and Rita Noumeir",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2025",

doi = "10.1109/JTEHM.2025.3576570",

language = "English",

volume = "13",

pages = "261--274",

journal = "IEEE Journal of Translational Engineering in Health and Medicine",

issn = "2168-2372",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset. / Le, Thanh Dung; Jouvet, Philippe; Noumeir, Rita.
Dans: IEEE Journal of Translational Engineering in Health and Medicine, Vol 13, 2025, p. 261-274.

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

TY - JOUR

T1 - Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset

AU - Le, Thanh Dung

AU - Jouvet, Philippe

AU - Noumeir, Rita

PY - 2025

Y1 - 2025

N2 - Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts, especially with constrained computational resources. This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital. The MoE-Transformer addresses the dual challenges of effective training with limited data and low-resource computation suitable for in-house hospital use. Despite the success of biomedical pre-trained models such as CamemBERT-bio, DrBERT, and AliBERT, their high computational demands make them impractical for many clinical settings. Our MoE-Transformer model not only outperforms DistillBERT, CamemBERT, FlauBERT, and Transformer models on the same dataset but also achieves impressive results: an accuracy of 87%, precision of 87%, recall of 85%, and F1-score of 86%. While the MoE-Transformer does not surpass the performance of biomedical pre-trained BERT models, it can be trained at least 190 times faster, offering a viable alternative for settings with limited data and computational resources. Although the MoE-Transformer addresses challenges of generalization gaps and sharp minima, demonstrating some limitations for efficient and accurate clinical text classification, this model still represents a significant advancement in the field. It is particularly valuable for classifying small French clinical narratives within the privacy and constraints of hospital-based computational resources.

AB - Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts, especially with constrained computational resources. This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital. The MoE-Transformer addresses the dual challenges of effective training with limited data and low-resource computation suitable for in-house hospital use. Despite the success of biomedical pre-trained models such as CamemBERT-bio, DrBERT, and AliBERT, their high computational demands make them impractical for many clinical settings. Our MoE-Transformer model not only outperforms DistillBERT, CamemBERT, FlauBERT, and Transformer models on the same dataset but also achieves impressive results: an accuracy of 87%, precision of 87%, recall of 85%, and F1-score of 86%. While the MoE-Transformer does not surpass the performance of biomedical pre-trained BERT models, it can be trained at least 190 times faster, offering a viable alternative for settings with limited data and computational resources. Although the MoE-Transformer addresses challenges of generalization gaps and sharp minima, demonstrating some limitations for efficient and accurate clinical text classification, this model still represents a significant advancement in the field. It is particularly valuable for classifying small French clinical narratives within the privacy and constraints of hospital-based computational resources.

KW - BERT

KW - Clinical natural language processing

KW - Mixture of Expert

KW - Transformer

KW - cardiac failure

UR - https://www.scopus.com/pages/publications/105007289919

U2 - 10.1109/JTEHM.2025.3576570

DO - 10.1109/JTEHM.2025.3576570

M3 - Journal Article

AN - SCOPUS:105007289919

SN - 2168-2372

VL - 13

SP - 261

EP - 274

JO - IEEE Journal of Translational Engineering in Health and Medicine

JF - IEEE Journal of Translational Engineering in Health and Medicine

ER -