Data strategy frameworks that you should be familiar with
The automated detection of anomalies in data sets makes it possible to respond to changes more quickly and effectively
If individual data points or trends are identified in data sets that deviate from the expected patterns, these are considered anomalies. They provide valuable insights into potential trends, changing consumer behavior, or possible sources of error in the applications used. The data obtained in this way is primarily used to make any necessary strategy adjustments or modify business processes.
A univariate time series is a series of data points created at regular intervals. These are often used to monitor key performance indicators or industrial processes. In the age of the Internet of Things (IoT) and connected real-time data sources, numerous applications produce important data that changes over time. The analysis of such time series provides valuable insights for any application.
The specific characteristics of time series mean that anomaly detection is often associated with certain challenges:
This involves finding patterns that deviate from the normal behavior. A distinction is made between three types of anomalies.
Depending on whether the available data is labeled (each data point is marked as normal or abnormal) or unlabeled, there are three different methods of implementing anomaly detection. These can either be performed in the time range or in another segment (e.g. frequency range).
In the case of time series, anomaly detection has to take into account that this is a sequence of data. Typical examples of anomalies in time series from a business perspective include unexpected increases and decreases, changes in trends, as well as changes in levels.
Statistics-based: One of the many methods involves breaking down time series into trend, seasonal, and remaining components and then applying the mean absolute deviation to the remainder to ensure reliable anomaly detection. Another method is based on Robust Principal Component Analysis (RPCS) to detect low level representations of data, noise, and anomalies through repeated singular value decomposition (SVD). This also includes applying thresholds to singular values and errors in each individual operation.
Prediction-based: This includes methods such as the moving average, autoregressive moving average models (ARMA) and their extensions (ARIMA), exponential smoothing, Kalman filters, etc. These are used to build a prediction model for the signal. Anomaly detection is then carried out by comparing the predicted and original signal on the basis of statistical tests.
Hidden Markov model-based (HMM): These methods model the system as a Markov model. This is a finite automaton that characterizes a system based on observable states. It is assumed that the normal time series is produced by a hidden process. Probabilities are assigned to the observed data sequences. As a result, anomalies are always those observations that are highly improbable.
Decision-based: Current methods include, for example, long short term memory networks (LSTM). These are a kind of recurrent neural networks. Classification and regression trees (CART) are also used to perform binary classification (normal and abnormal). Extreme gradient boosting (XGBoost) is the most popular algorithm for CART training. Both methods can also be applied as prediction-based methods.
There are various key performance indicators (KPIs) in e-commerce that are suitable for time series analysis. These include:
The diagram shows an example of anomaly detection in a consumer goods retailer’s sales time series. The time series data (marked blue) includes both trend and seasonal components. Point anomalies have been highlighted in red. The recorded increases and decreases in the graph are used for further analysis to determine the reasons and controlling factors.