Cluster-Based Symbolic Compression of Time Series for Scalable Forecasting and Analysis

  • Yaser Jararweh
  • , Mustafa Daraghmeh
  • , Anjali Agarwal
  • , Kuljeet Kaur

Research output: Contribution to Book/Report typesContribution to conference proceedingspeer-review

Abstract

The increasing volume and velocity of time series data make it very challenging to perform effective and scalable predictive modeling. This paper presents a novel time series compression and transformation pipeline to address this challenge. We have combined three key components in a novel approach: sliding window segmentation for identifying local patterns, centroid-based clustering for converting continuous values into discrete symbols, and run-length encoding for compact representation of similar symbolic sequences. This combination allows us to convert raw time series data into easily understandable numerical symbolic representations, capturing essential time patterns while significantly reducing the volume of data. We utilized real-world metrics from Azure Function system-wide traces, including the number of applications, functions, invocations, and average execution time, which were collected every 5 minutes over a 14-day period, to evaluate the proposed method. Experimental results demonstrate impressive compression ratios while preserving pattern interpretations with lightweight computation, highlighting the method's effectiveness in reducing the learning cost for regression and forecasting tasks. The resulting model-agnostic representation can be easily applied to a variety of machine learning architectures, including recurrent and transformer-based models, providing an effective solution for scalable time series analytics in various fields, including both cloud and edge computing environments.

Original languageEnglish
Title of host publication2025 2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025
EditorsJaime Lloret, Yaser Jararweh
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages256-261
Number of pages6
ISBN (Electronic)9798331594060
DOIs
Publication statusPublished - 2025
Event2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025 - Valencia, Spain
Duration: 18 Aug 202521 Aug 2025

Publication series

Name2025 2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025

Conference

Conference2nd International Generative AI and Computational Language Modelling Conference, GACLM 2025
Country/TerritorySpain
CityValencia
Period18/08/2521/08/25

!!!Keywords

  • and Pattern Preservation
  • Run-Length Encoding
  • Symbolic Representation
  • Time-Series Compression

Fingerprint

Dive into the research topics of 'Cluster-Based Symbolic Compression of Time Series for Scalable Forecasting and Analysis'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this