TY - GEN
T1 - Data Augmentation and Class Imbalance Compensation Using CTGAN to Improve Gas Detection Systems
AU - Mahinnezhad, Shima
AU - Mahinnezhad, Shirin
AU - Kaur, Kuljeet
AU - Shih, Andy
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The use of sensors in gas detection systems for environmental monitoring is largely affected by sensor drift over time which reduces accurate classification. This drift can be minimized by using machine learning models trained on sensor data. Here, two different machine learning models are trained on the Gas Sensor Array Drift Dataset. However, this dataset, which has been collected over three years, suffers not only from drift but also from class imbalance. As a result, machine learning models cannot perform properly on this dataset. To address these problems, this paper introduces an innovative methodology for data compensation and augmentation using Conditional Tabular Generative Adversarial Networks (CTGAN). By employing this methodology, we can counteract the class imbalance and limit drift by bringing diversity to the dataset, which in turn improves the accuracy of machine learning models for gas detection systems. With class imbalance compensation, Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) achieved an improvement in classification accuracy in five batches, up to 20% for certain batches. Through data augmentation, they reached higher accuracy across six batches, with certain batches exceeding a 10% improvement. These achievements highlight the effectiveness and reliability of the use of synthetic data generation in tabular data for sensors.
AB - The use of sensors in gas detection systems for environmental monitoring is largely affected by sensor drift over time which reduces accurate classification. This drift can be minimized by using machine learning models trained on sensor data. Here, two different machine learning models are trained on the Gas Sensor Array Drift Dataset. However, this dataset, which has been collected over three years, suffers not only from drift but also from class imbalance. As a result, machine learning models cannot perform properly on this dataset. To address these problems, this paper introduces an innovative methodology for data compensation and augmentation using Conditional Tabular Generative Adversarial Networks (CTGAN). By employing this methodology, we can counteract the class imbalance and limit drift by bringing diversity to the dataset, which in turn improves the accuracy of machine learning models for gas detection systems. With class imbalance compensation, Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) achieved an improvement in classification accuracy in five batches, up to 20% for certain batches. Through data augmentation, they reached higher accuracy across six batches, with certain batches exceeding a 10% improvement. These achievements highlight the effectiveness and reliability of the use of synthetic data generation in tabular data for sensors.
KW - CTGAN
KW - Data augmentation
KW - Data balancing
KW - Gas detection
KW - Gas sensor
UR - https://www.scopus.com/pages/publications/85197759595
U2 - 10.1109/I2MTC60896.2024.10561121
DO - 10.1109/I2MTC60896.2024.10561121
M3 - Contribution to conference proceedings
AN - SCOPUS:85197759595
T3 - Conference Record - IEEE Instrumentation and Measurement Technology Conference
BT - I2MTC 2024 - Instrumentation and Measurement Technology Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Instrumentation and Measurement Technology Conference, I2MTC 2024
Y2 - 20 May 2024 through 23 May 2024
ER -