TY - GEN
T1 - Cluster-Based Symbolic Compression of Time Series for Scalable Forecasting and Analysis
AU - Jararweh, Yaser
AU - Daraghmeh, Mustafa
AU - Agarwal, Anjali
AU - Kaur, Kuljeet
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The increasing volume and velocity of time series data make it very challenging to perform effective and scalable predictive modeling. This paper presents a novel time series compression and transformation pipeline to address this challenge. We have combined three key components in a novel approach: sliding window segmentation for identifying local patterns, centroid-based clustering for converting continuous values into discrete symbols, and run-length encoding for compact representation of similar symbolic sequences. This combination allows us to convert raw time series data into easily understandable numerical symbolic representations, capturing essential time patterns while significantly reducing the volume of data. We utilized real-world metrics from Azure Function system-wide traces, including the number of applications, functions, invocations, and average execution time, which were collected every 5 minutes over a 14-day period, to evaluate the proposed method. Experimental results demonstrate impressive compression ratios while preserving pattern interpretations with lightweight computation, highlighting the method's effectiveness in reducing the learning cost for regression and forecasting tasks. The resulting model-agnostic representation can be easily applied to a variety of machine learning architectures, including recurrent and transformer-based models, providing an effective solution for scalable time series analytics in various fields, including both cloud and edge computing environments.
AB - The increasing volume and velocity of time series data make it very challenging to perform effective and scalable predictive modeling. This paper presents a novel time series compression and transformation pipeline to address this challenge. We have combined three key components in a novel approach: sliding window segmentation for identifying local patterns, centroid-based clustering for converting continuous values into discrete symbols, and run-length encoding for compact representation of similar symbolic sequences. This combination allows us to convert raw time series data into easily understandable numerical symbolic representations, capturing essential time patterns while significantly reducing the volume of data. We utilized real-world metrics from Azure Function system-wide traces, including the number of applications, functions, invocations, and average execution time, which were collected every 5 minutes over a 14-day period, to evaluate the proposed method. Experimental results demonstrate impressive compression ratios while preserving pattern interpretations with lightweight computation, highlighting the method's effectiveness in reducing the learning cost for regression and forecasting tasks. The resulting model-agnostic representation can be easily applied to a variety of machine learning architectures, including recurrent and transformer-based models, providing an effective solution for scalable time series analytics in various fields, including both cloud and edge computing environments.
KW - and Pattern Preservation
KW - Run-Length Encoding
KW - Symbolic Representation
KW - Time-Series Compression
UR - https://www.scopus.com/pages/publications/105026281751
U2 - 10.1109/GACLM67198.2025.11231829
DO - 10.1109/GACLM67198.2025.11231829
M3 - Contribution to conference proceedings
AN - SCOPUS:105026281751
T3 - 2025 2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025
SP - 256
EP - 261
BT - 2025 2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025
A2 - Lloret, Jaime
A2 - Jararweh, Yaser
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025
Y2 - 18 August 2025 through 21 August 2025
ER -