This is the supporting page for our paper Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress.
Keywords: Anomaly detection, benchmark datasets, deep learning, and time series analysis.
Update (Jan. 25, 2022): Extended abstract accepted by 38th IEEE International Conference on Data Engineering (ICDE2022), TKDE Poster Track.
Update (Sep. 9, 2021): Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE), doi:10.1109/TKDE.2021.3112126.
Update (Aug. 14, 2021): The Hexagon ML/UCR Time Series Anomaly Archive is now available: https:
ACM SIGKDD 2021 Time Series Anomaly Detection Contest
For more details, go to the contest landing page.
We are delighted to announce that there will be a time series anomaly detection contest under the auspices of ACM SIGKDD 2021.
The contest will provide 200 time series datasets, divided into a train phase and a test phase. The task is to build your model on the train phase and predict the location of the single anomaly in the test phase.
We hope to encourage the development of generalpurpose time series anomaly detection algorithms, thus the test datasets come from various domains: medicine, industry, human behavior, animal behavior, etc.
 First Prize: $2000 USD
 Second Prize: $1000 USD
 Third Prize: $500 USD
For the top 15 participants, we will provide a certificate with rank. All other participants will get a participation certificate.
Datasets
UCR Time Series Anomaly Archive
You can download the Hexagon ML/UCR Time Series Anomaly Archive from https:
Update (Jul. 12, 2021): We’re planning to release all data with labels and provenance during SIGKDD 2021.
Update (Mar. 15, 2021): The anomaly detection contest is now live. See Time Series Anomaly Detection Contest at SIGKDD 2021 for details.
Update (Jan. 7, 2021): Due to unforeseen situation, the contest is delayed (but not canceled). We will update this page once more information is available.
We will publicly release the UCR Time Series Anomaly Archive once our anomaly detection contest is over. The contest will be hosted by a third party and expected to be announced in the next 2 to 6 weeks on major sites, including Reddit (r/MachineLearning), KDnuggets and DBWorld.
E0509m
Dataset
Below are the original and added noise E0509m
dataset used to produce Fig. 13 and the pretrained Telemanom^{1} model.
Original E0509m
dataset contains 15,000
datapoints.
 Original
E0509m
dataset: e0509m.csv. SHA1:549fe34ea23e143029172b4b15f8fbd5f793d511
The added noise E0509m
dataset also contains 15,000
datapoints. We use
e0509m_rand_50 = e0509m + randn(size(e0509m)) * 50;
to generate the added noise version. You can also get it from
 Pregenerated added noise
E0509m
dataset: e0509m_rand_50.csv. SHA1:438901ad44d2ab9916d45b0d2502f83554723376
The pretrained Telemanom model is trained with default parameters in its paper^{1}:
 Pretrained Telemanom model: e0509m.h5. SHA1:
2d559897f948a0dab9c6eca245b8339b32048378
The training and testing datasets in npy
format for Telemanom:
 Training dataset: training_e0509m.npy. SHA1:
88e2559b1b15253c30666e8f080db8c22a34c624
 Testing datasets
 Original
e0509m
dataset: testing_e0509m.npy. SHA1:9209c644ea7d486307ea5fef2e69a245bd6fd479
 Added noise
e0509m
dataset: testing_e0509m_rand_50.npy. SHA1:55c60fe24b04b1a09b95c33f6f9a1c409c6f8ede
 Original
Slides
Gallery of Problems with Benchmark Datasets
Below presentation contains dozens of additional examples from Yahoo^{2}, Numenta^{3}, NASA^{1} and Pei’s Lab (OMNI)^{4} that yield to one line solutions.
We also show some examples for some other datasets that are occasionally used for the anomaly detection.
In some cases we give some text or comments, in others we just do a screen dump of a MATLAB figure.
 Presentation: Gallery_of_Benchmark_Problems.pptx. SHA1:
a82f17ce679161690dc8cbb6362ae9ef7b95e496
Original Figures in the Paper
Below presentation contains all original figures in our paper.
In some cases we give some text or comments on how they are plotted, in others we just do a screen dump of a MATLAB figure.
 Presentation: anomaly_detection_benchmark_figures.pptx. SHA1:
40ae021206099c842a285b316302b5f534d78a5a
MATLAB Scripts
All MATLAB scripts are written with MATLAB R2018b.
 Bruteforce oneliners on Yahoo Benchmark: bruteforceYahooBatch.m. SHA1:
b79742d4997dbdf5bd8750c9351f1d1aa3ee803a
 Collect the rightmost anomaly’s location in Yahoo A1 Benchmark: scanYahooA1.m. SHA1:
54dd44f6e184cb821ddf9e18b6ef7facdcac4dc3
 Matrix Profile to compute Discord^{5} scores: mpx.m. SHA1:
61e339e9c0338396b5bc30e2ea1a276a59cfb9b0
Reproduce Results in the Paper
All original figures in the paper can be found in the slides in section Slides: Original Figures in the Paper.
Section 2.2: Triviality
Table. 1
Dataset  Solvable with  # Time Series Solved  # Time Series in Dataset  Percent 

A1  (3)  30  67  44.8% 
(4)  14  20.9%  
Subtotal  44  67  65.7%  
A2  (3)  40  100  40.0% 
(4)  57  57.0%  
Subtotal  97  100  97.0%  
A3  (5)  84  100  84.0% 
(6)  14  14.0%  
Subtotal  98  100  98.0%  
A4  (5)  39  100  39.0% 
(6)  38  38.0%  
Subtotal  77  100  77.0%  
Total  316  367  86.1% 
The numbers in Table. 1 results from our bruteforce algorithm:
bruteforceYahooBatch(1, '/path/to/yahooa1benchmark/', 'A1Benchmark.csv');
bruteforceYahooBatch(1, '/path/to/yahooa2benchmark/', 'A2Benchmark.csv');
bruteforceYahooBatch(0, '/path/to/yahooa3benchmark/', 'A3Benchmark.csv');
bruteforceYahooBatch(0, '/path/to/yahooa4benchmark/', 'A4Benchmark.csv');
Detailed explanations on our bruteforce algorithm and its results (individual u
, k
, c
and b
s) can be found in the slides in section Slides: Gallery of Problems with Benchmark Datasets.
The bruteforceYahooBatch.m
can be found in section MATLAB Scripts; Yahoo Benchmark^{2} datasets can be downloaded from here; The output files A1Benchmark.csv
, A2Benchmark.csv
, A3Benchmark.csv
and A4Benchmark.csv
will be stored in the current working directory.
Section 2.5: Runtofailure Bias
Fig. 10
To produce Fig. 10, we run
locations = scanYahooA1('/path/to/yahooa1benchmark/', 1);
to collect the rightmost anomaly’s location in Yahoo A1 Benchmark.
The returned variable locations
will store the start index of all anomalies. If no anomalies are found, the corresponding index will be 1
.
The scanYahooA1.m
can be found in section MATLAB Scripts; Yahoo A1 Benchmark^{2} datasets can be downloaded from here.
Section 4.2: Algorithms should be Explained with Reference to their Invariances
Fig. 13
To produce Fig. 13, we first use
discord_original = mpx(e0509m, 100, 300);
discord_rand_50 = mpx(e0509m_rand_50, 100, 300);
to compute Discord^{5} scores.
The mpx.m
can be found in section MATLAB Scripts; Both original and added noise E0509m
dataset can be found in section Datasets: E0509m
dataset.
For Telemanom^{1}, we use the default parameters in their paper for training. The training datasets are the first 3,000
datapoints (normalized to [0, 1]
) in original E0509m
dataset.
The remaining 12,000
datapoints (normalized to [0, 1]
) in original and added noise E0509m
dataset are used for computing anomaly scores in Fig. 13.
Both the training and testing datasets can be found in section Datasets: E0509m
dataset.
References

K. Hundman et al., “Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding,” Proc. 24th ACM SIGKDD Intl. Conf. Knowledge Discovery & Data Mining (KDD 18), 2018, pp. 387395. ↩ ↩^{2} ↩^{3} ↩^{4}

N. Laptev, S. Amizadeh and Y. Billawala, “S5  A Labeled Anomaly Detection Dataset, version 1.0 (16M),” Mar. 2015. ↩ ↩^{2} ↩^{3}

S. Ahmad et al., “Unsupervised RealTime Anomaly Detection for Streaming Data,” Neurocomputing, vol. 262, 2017, pp. 134147. ↩

Y. Su et al., “Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network,” Proc. 25th ACM SIGKDD Intl. Conf. Knowledge Discovery & Data Mining (KDD 19), 2019, pp. 28282837. ↩

T. Nakamura et al, “MERLIN: ParameterFree Discovery of Arbitrary Length Anomalies in Massive Time Series Archives,” Proc. 20th IEEE Intl. Conf. Data Mining (ICDM 20), to be published, 2020. ↩ ↩^{2}