This is the supporting page for our paper Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress.
Keywords: Anomaly detection, benchmark datasets, deep learning, and time series analysis.
Update (Sep. 9, 2021): Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE), doi:10.1109/TKDE.2021.3112126.
Update (Aug. 14, 2021): The Hexagon ML/UCR Time Series Anomaly Archive is now available: https:
ACM SIGKDD 2021 Time Series Anomaly Detection Contest
For more details, go to the contest landing page.
We are delighted to announce that there will be a time series anomaly detection contest under the auspices of ACM SIGKDD 2021.
The contest will provide 200 time series datasets, divided into a train phase and a test phase. The task is to build your model on the train phase and predict the location of the single anomaly in the test phase.
We hope to encourage the development of general-purpose time series anomaly detection algorithms, thus the test datasets come from various domains: medicine, industry, human behavior, animal behavior, etc.
- First Prize: $2000 USD
- Second Prize: $1000 USD
- Third Prize: $500 USD
For the top 15 participants, we will provide a certificate with rank. All other participants will get a participation certificate.
UCR Time Series Anomaly Archive
You can download the Hexagon ML/UCR Time Series Anomaly Archive from https:
Update (Jul. 12, 2021): We’re planning to release all data with labels and provenance during SIGKDD 2021. Update (Mar. 15, 2021): The anomaly detection contest is now live. See Time Series Anomaly Detection Contest at SIGKDD 2021 for details. Update (Jan. 7, 2021): Due to unforeseen situation, the contest is delayed (but not canceled). We will update this page once more information is available. We will publicly release the UCR Time Series Anomaly Archive once our anomaly detection contest is over. The contest will be hosted by a third party and expected to be announced in the next 2 to 6 weeks on major sites, including Reddit (r/MachineLearning), KDnuggets and DBWorld.
E0509m dataset contains
E0509mdataset: e0509m.csv. SHA1:
The added noise
E0509m dataset also contains
15,000 datapoints. We use
e0509m_rand_50 = e0509m + randn(size(e0509m)) * 50;
to generate the added noise version. You can also get it from
- Pre-generated added noise
E0509mdataset: e0509m_rand_50.csv. SHA1:
The pre-trained Telemanom model is trained with default parameters in its paper1:
- Pre-trained Telemanom model: e0509m.h5. SHA1:
The training and testing datasets in
npy format for Telemanom:
- Training dataset: training_e0509m.npy. SHA1:
- Testing datasets
Gallery of Problems with Benchmark Datasets
We also show some examples for some other datasets that are occasionally used for the anomaly detection.
In some cases we give some text or comments, in others we just do a screen dump of a MATLAB figure.
- Presentation: Gallery_of_Benchmark_Problems.pptx. SHA1:
Original Figures in the Paper
Below presentation contains all original figures in our paper.
In some cases we give some text or comments on how they are plotted, in others we just do a screen dump of a MATLAB figure.
- Presentation: anomaly_detection_benchmark_figures.pptx. SHA1:
All MATLAB scripts are written with MATLAB R2018b.
- Brute-force one-liners on Yahoo Benchmark: bruteforceYahooBatch.m. SHA1:
- Collect the rightmost anomaly’s location in Yahoo A1 Benchmark: scanYahooA1.m. SHA1:
- Matrix Profile to compute Discord5 scores: mpx.m. SHA1:
Reproduce Results in the Paper
All original figures in the paper can be found in the slides in section Slides: Original Figures in the Paper.
Section 2.2: Triviality
|Dataset||Solvable with||# Time Series Solved||# Time Series in Dataset||Percent|
The numbers in Table. 1 results from our brute-force algorithm:
bruteforceYahooBatch(1, '/path/to/yahoo-a1-benchmark/', 'A1Benchmark.csv'); bruteforceYahooBatch(1, '/path/to/yahoo-a2-benchmark/', 'A2Benchmark.csv'); bruteforceYahooBatch(0, '/path/to/yahoo-a3-benchmark/', 'A3Benchmark.csv'); bruteforceYahooBatch(0, '/path/to/yahoo-a4-benchmark/', 'A4Benchmark.csv');
Detailed explanations on our brute-force algorithm and its results (individual
bs) can be found in the slides in section Slides: Gallery of Problems with Benchmark Datasets.
bruteforceYahooBatch.m can be found in section MATLAB Scripts; Yahoo Benchmark2 datasets can be downloaded from here; The output files
A4Benchmark.csv will be stored in the current working directory.
Section 2.5: Run-to-failure Bias
To produce Fig. 10, we run
locations = scanYahooA1('/path/to/yahoo-a1-benchmark/', 1);
to collect the rightmost anomaly’s location in Yahoo A1 Benchmark.
The returned variable
locations will store the start index of all anomalies. If no anomalies are found, the corresponding index will be
Section 4.2: Algorithms should be Explained with Reference to their Invariances
To produce Fig. 13, we first use
discord_original = mpx(e0509m, 100, 300); discord_rand_50 = mpx(e0509m_rand_50, 100, 300);
to compute Discord5 scores.
For Telemanom1, we use the default parameters in their paper for training. The training datasets are the first
3,000 datapoints (normalized to
[0, 1]) in original
12,000 datapoints (normalized to
[0, 1]) in original and added noise
E0509m dataset are used for computing anomaly scores in Fig. 13.
Both the training and testing datasets can be found in section Datasets:
K. Hundman et al., “Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding,” Proc. 24th ACM SIGKDD Intl. Conf. Knowledge Discovery & Data Mining (KDD 18), 2018, pp. 387-395. ↩ ↩2 ↩3 ↩4
N. Laptev, S. Amizadeh and Y. Billawala, “S5 - A Labeled Anom-aly Detection Dataset, version 1.0 (16M),” Mar. 2015. ↩ ↩2 ↩3
S. Ahmad et al., “Unsupervised Real-Time Anomaly Detection for Streaming Data,” Neurocomputing, vol. 262, 2017, pp. 134-147. ↩
Y. Su et al., “Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network,” Proc. 25th ACM SIGKDD Intl. Conf. Knowledge Discovery & Data Mining (KDD 19), 2019, pp. 2828-2837. ↩