Enter the search phrase
Authors: Monika Wysoczańska, Paweł Kotowski. Image source: [4]
In the previous post we gave you a brief introduction to anomaly detection. We reviewed there some potential real-life applications and outlined common Machine Learning techniques that are being used for this task. In this post we cover a specific type of data regarding anomaly detection, which is time series.
The data generated by many applications is a continuous temporal process, therefore acquired and represented as a series of events. In such cases the temporal component plays a key role in outlier analysis. This scenario arises in the context of many applications such as sensor data, mechanical system diagnosis, medical data, network intrusion data or financial posts.
Let’s look at a simple example. Imagine you have a device, and you monitor its CPU Usage – as a result you obtain the data as on the plot below.
As you can see, there is an event with suspiciously high CPU Usage. This may indicate some potential issues at your device working time. Therefore, detecting such anomalous events may turn out to be crucial for your device maintenance.
Due to the temporal character of time series data, we can divide types of outliers mainly into the three groups [3]:
This brings us to another aspect of anomaly detection in time series which is a type of input data. To this end, we can divide applied approaches as follows [3]:
With many features, the situation becomes complicated, since there can be outliers that do not appear as outlying observations when considering each dimension separately and therefore will not be detected from the univariate criterion. Thus, all the features need to be considered together using a multivariate approach.
In practice, we are mostly given multiple features to search for an anomalous behaviour, although it is highly dependent on a particular problem. In some cases, we may also be given hundreds of such variables to analyze, but allowed to finally use only a few of them in production environments. This problem especially arises in terms of computationally constrained environments. In such cases a robust feature selection method is crucial for choosing a final set of features that will be used in a production environment.
In this post we outlined the most important aspects of anomaly detection in time series data. In the next one, we will introduce some specific applications of outlier detection methods in Automotive and show how we leverage Machine Learning to solve this task in Robotec.ai.
[1] https://deepai.org/machine-learning-glossary-and-terms/anomaly-detection (Accessed: 03.02.2021)
[2] Charu C. Aggarwal. 2016. Outlier Analysis (2nd ed.). Springer Publishing Company, Incorporated.
[3] A review on outlier/anomaly detection in time series data: https://arxiv.org/abs/2002.04236
[4] https://pixabay.com/pl/photos/czas-zegar-budzik-pastelowe-kolory-3435879/