When is Early Classification of Time Series Meaningful?

This is the supporting page for our paper When is Early Classification of Time Series Meaningful?

Keywords: Early classification, time series analysis, data mining.

Update (Jan. 25, 2022): Extended abstract accepted by 38th IEEE International Conference on Data Engineering (ICDE2022), TKDE Poster Track.

Update (Aug. 24, 2021): Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE), doi:10.1109/TKDE.2021.3108580.

@Article{Wu2021a,
  author    = {Renjie Wu and Audrey Der and Eamonn Keogh},
  journal   = {{IEEE} Transactions on Knowledge and Data Engineering},
  title     = {When is Early Classification of Time Series Meaningful},
  year      = {2023},
  number    = {8},
  pages     = {3779--3785},
  volume    = {34},
  doi       = {10.1109/TKDE.2021.3108580},
  publisher = {Institute of Electrical and Electronics Engineers ({IEEE})},
}

Code

Section 4: Table. 1

Code for evaluated early time series classification algorithms in Table. 1:

ECTS¹: ijcai09Code.rar (link retrieved from Z. Xing’s website²)
EDSC³: TSFS-2.rar (link retrieved from Z. Xing’s website²)
Rel. Class.⁴: Early_Classification_For_Web.zip (link retrieved from the paper⁴)

Section 5: Fig. 9

Code for computing the holdout classification error-rate in Fig. 9 (bottom):

nibble_right_side.py. Requires at least Python 3.6. SHA1: 43baabcd682581407137b0f229786b36f5877be6

Appendix B: 2^nd Q&A

Related code for the test on TEASER⁵ mentioned in Appendix B: 2^nd Q&A:

TEASER classifier: https://www2.informatik.hu-berlin.de/~schaefpa/teaser/
TEASER tester: TEASERTester.java. Requires at least JDK 11. SHA1: ae72c3022807b04fd302ffa6fa5dae09622c5d62
Result plotter: TEASERResultPlotter.m. Written with MATLAB R2018b. SHA1: 55df8113fc5adc521c02cfa46b1831442cbf1e0c

In order to run the above testing code, below changes must be applied to TEASER’s code: [Expand All] [Collapse All]

src/main/java/sfa/classification/Classifier.java

--- Original/src/main/java/sfa/classification/Classifier.java  2020-07-23 05:50:30.000000000 -0700
+++ Modified/src/main/java/sfa/classification/Classifier.java  2021-02-03 18:11:10.656257400 -0800
@@ -317,8 +317,8 @@
     public Double[] labels;
     public AtomicInteger correct;
    
-    double[][] probabilities;
-    int[] realLabels;
+    public double[][] probabilities;
+    public int[] realLabels;


     public Predictions(Double[] labels, int bestCorrect) {

src/main/java/sfa/classification/TEASERClassifier.java

--- Original/src/main/java/sfa/classification/TEASERClassifier.java  2020-07-23 05:50:30.000000000 -0700
+++ Modified/src/main/java/sfa/classification/TEASERClassifier.java  2021-02-04 18:42:54.635397100 -0800
@@ -36,9 +36,9 @@
   public static int MAX_WINDOW_LENGTH = 250;

   // the trained TEASER model
-  EarlyClassificationModel model;
+  public EarlyClassificationModel model;
   
-  WEASELClassifier slaveClassifier;
+  public WEASELClassifier slaveClassifier;

   public TEASERClassifier() {
     slaveClassifier = new WEASELClassifier();

src/main/java/sfa/timeseries/TimeSeriesLoader.java

--- Original/src/main/java/sfa/timeseries/TimeSeriesLoader.java  2020-07-23 05:50:30.000000000 -0700
+++ Modified/src/main/java/sfa/timeseries/TimeSeriesLoader.java  2021-02-04 01:05:41.448100200 -0800
@@ -39,7 +39,7 @@
         }

         // switch between old " " and new separator "," in the UCR archive
-        String separator = (line.contains(",") ? "," : " ");
+        String separator = (line.contains(",") ? "," : line.contains("\t") ? "\t": " ");
         String[] columns = line.split(separator);

         double[] data = new double[columns.length];

We also provide the pre-compiler jar with both modified TEASER classifier and testing code packaged:

TEASERTester.jar. SHA1: 6fdf842e4ffa06eadbfb1d4ac6729b2051d7840c

Although it should be fine to run it with JRE 1.8 (we didn’t test it), we highly recommend you use at least JRE 11.

Data

GunPoint dataset

GunPoint dataset can be found in the UCR Time Series Classification Archive.

EDSC³ requires the training dataset to be sorted by the class label. Thus we provide a pre-sorted one:

GunPoint_TRAIN_sorted_by_class.tsv. SHA1: dca5c92fa6fc9331318eeeb5616d6928d7aa68cd

Section 4: Table. 1

We tested ECTS¹, EDSC³ and Rel. Class.⁴ on a “denormalized” version of GunPoint testing dataset, by adding to each instance a random number in range [-1, 1]:

GunPoint_RandomShift_TEST.tsv. SHA1: 0e3ea1463af156470e5cc8c3d1eb10f73d678270

This dataset follows UCR format, where the 1^st column is the class label, and the 2^nd column to the end are the datapoints (in this case, of length 150).

Section 4: Fig. 7

The ECG snippet recorded from two different chest locations:

chf01m.mat. SHA1: 4ee5f109a1378860f73649ffe401b1b86a384b50

Appendix B: 2^nd Q&A

We tested TEASER⁵ with 10,000 random walks of length 150. We use MATLAB:

function generateRandomWalkDataset(outputTSVPath)
    rng(1919186795);

    testingDataset = zeros(10000, 151);
    testingDataset(:, 1) = -1; % Class: -1

    for i = 1:10000
        testingDataset(i, 2:end) = cumsum(randn(150, 1));
    end

    dlmwrite(outputTSVPath, testingDataset, '\t');
end

to generate the data. We also provide the pre-generated random walks:

RandomWalk_TEST.tsv. SHA1: 343056f26637ee7bb66b0b32e88fa04f42605f1f

This dataset follows UCR format, where the 1^st column is the class label, and the 2^nd column to the end are the datapoints (in this case, of length 150).

Slides

Below presentation contains the results of the test mentioned in Appendix B: 2^nd Q&A. Related code is available in section Code: Appendix B: 2^nd Q&A. Datasets can be found in section Data: Appendix B: 2^nd Q&A.

TEASER_on_GunPoint_RandomWalk.pptx. SHA1: bcbe816ff5b4835806aa4c395922d6225bf7fc3f

In some cases, we give some text or comments. In others, we just do a screen dump of a MATLAB figure.

Reproduce Results in the Paper

Section 4: Table. 1

The accuracy of six early classification algorithms
Algorithm	Normalized	DeNormalized
(min. support = 0) ECTS¹	86.7%	68.7%
(min. support = 0) RelaxedECTS¹	86.7%	68.7%
EDSC-CHE³	94.7%	62.7%
EDSC-KDE³	95.3%	58.7%
(τ = 0.1) Rel. Class.⁴	90.0%	70.0%
(τ = 0.1) LDG Rel. Class.⁴	91.3%	71.3%

Relate code is available in section Code: Section 4: Table. 1. Datasets can be found in section Data.

ECTS & RelaxedECTS

First, we edit DataSetInformation.h to set up metadata:

// GunPoint, change paths below to the actual ones.
const int DIMENSION=150;   // length of time series
const int ROWTRAINING=50;  // size of training data
const int ROWTESTING=150;  // size of testing data
const char* trainingFileName="/path/to/GunPoint_TRAIN.tsv";
// If running on original GunPoint testing dataset, use the following:
// const char* testingFileName="/path/to/GunPoint_TEST.tsv";
const char* testingFileName="/path/to/GunPoint_RandomShift_TEST.tsv";
const char* trainingIndexFileName="/path/to/ECTS_GunPoint_Training_Index";
const char* testingIndexFileName="<not used>";
const char* DisArrayFileName="/path/to/ECTS_GunPoint_DisArray";
// If running on original GunPoint testing dataset, use the following:
// const char* ResultfileName="/path/to/ECTS_GunPoint_Result.txt";
const char* ResultfileName="/path/to/ECTS_GunPoint_RandomShift_Result.txt";
const int NofClasses=2;
const int Classes[]={1,2};

We then run IndexBuilding.cpp to build training index. Later, we edit line 36 in Hierarchical.cpp:

33
34
35
36
// Algorithm parameters: minimal support

double MinimalSupport=0;
int strictversion=1; // 1 strict version, 0 loose version

to choose which version of ECTS to execute: 0 for RelaxedECTS and 1 for ECTS.

Finally, we run Hierarchical.cpp and collect the reported accuracy.

EDSC-KDE & EDSC-CHE

Similarly, we edit DataSetInformation.h to set up metadata first:

// GunPoint, change paths below to the actual ones.
const int DIMENSION=150;   // length of time series
const int ROWTRAINING=50;  // size of training data
const int ROWTESTING=150;  // size of testing data
const char* trainingFileName="/path/to/GunPoint_TRAIN_sorted_by_class.tsv";
// If running on original GunPoint testing dataset, use the following:
// const char* testingFileName="/path/to/GunPoint_TEST.tsv";
const char* testingFileName = "/path/to/GunPoint_RandomShift_TEST.tsv";
const char* resultFileName="/path/to/EDSC_Features_GunPoint.txt";
const char* path="/path/to/GunPoint_Folder/";
const int NofClasses= 2;
const int Classes[]={1,2};
const int ClassIndexes[]={0,24}; // the data is sorted by class
const int ClassNumber[]={24,26};

Then, we edit line 105 in ByInstanceDP.cpp:

105
106
107
108
109
110
111
112
int option=2; // 1: using thresholdAll, 2: using the KDE cut
int DisArrayOption=2; // 1: naive version, 2: fast version
int MaximalK=DIMENSION/2; // maximal length
int MinK=5;
double boundThrehold=3; // parameter of the Chebyshev's inequality
double recallThreshold=0; 
double probablityThreshold=0.95;
int alpha=3;

to choose which EDSC to execute: 1 for EDSC-CHE and 2 for EDSC-KDE.

Finally, we run ByInstanceDP.cpp and collect the reported accuracy.

Rel. Class. & LDG Rel. Class.

We first edit loadDataset.m to add metadata for GunPoint_RandomShift:

% ... omitted ...

switch(lower(name))
    % ... omitted ...

    case 'gunpoint_randomshift'
        % Change paths below to the actual ones
        test = load('/path/to/GunPoint_RandomShift_TEST.tsv');
        ts_l = test(:,1);
        ts_d = test(:,2:end);
        train = load('/path/to/GunPoint_TRAIN.tsv');
        tr_l = train(:,1);
        tr_d = train(:,2:end);
    
    % ... omitted ...
end

Then we edit line 4, 7, 8, and 10 in Run_all_experiments.m to set up parameters:

3
4
5
6
7
8
9
10
11
12
%%%% User Inputs %%%%%%%%
dataset = 'GunPoint_RandomShift'; % 'GunPoint' for the original one
disp(['Loading ' dataset ' data']);

constraint_type = 'Naive';
pred_type = 'Corr';

use_LDG = 1;

%%%%%%%%%%%%%%%%%%%%%%%%%%

Note that line 10 determines which version to execute: 0 for Rel. Class. and 1 for LDG Rel. Class.

Finally, we run Run_all_experiments.m and collect the reported accuracy.

Section 5: Fig. 9

Fig. 9 in the paper — **Fig. 9.** *(top)* A typical example from GunPoint annotated to show where the discriminating region is. *(bottom)* The holdout classification error-rate of every prefix of the GunPoint data from lengths 20 to 150 (the full length of the data).

To produce Fig. 9, we first edit line 131 and 132 in nibble_right_side.py:

129
130
131
132
133
# Parameters
NUM_K = 1
train_name = 'GunPoint/GunPoint_TRAIN.tsv' # path to file
test_name = 'GunPoint/GunPoint_TEST.tsv' # path to file
datasetname = 'GunPoint'

to the actual path of GunPoint dataset. Then we can simply run

python nibble_right_side.py

to generate error-rate curves. The output files are:

gunpoint_r_nibble.svg, plots the holdout classification error-rate curves on both training and testing dataset.
gunpoint_r_nibble.txt, stores all the datapoints of the two curves.

Fig. 9 (bottom) shows the reverse of the testing error-rate curve in the output.

Appendix B: 2^nd Q&A

The results are presented in the presentation in section Slides.

To produce the results in the slides, we first run Java code

java -jar "/path/to/TEASERTester.jar" "GunPoint (TRAIN) + RandomWalk (TEST)" "/path/to/GunPoint_TRAIN.tsv" "/path/to/RandomWalk_TEST.tsv" "/path/to/output.json"

to generate the early predictions of each snapshot on every exemplars in the RandomWalk testing dataset.

The pre-compiled TEASERTester.jar can be found in section Code: Appendix B: 2nd Q&A; GunPoint_TRAIN.tsv and RandomWalk_TEST.tsv can be found in section Data; /path/to/output.json determines where results will be written.

Then we run MATLAB script TEASERResultPlotter.m (which can be found in section Code: Appendix B: 2nd Q&A):

TEASERResultPlotter("/path/to/output.json")

to draw the figures in the slides.

References

Z. Xing et al., “Early Classification on Time Series,” Knowledge and Information Systems, vol. 31, no. 1, pp. 105-127, 2012. ↩ ↩² ↩³ ↩⁴
Z. Xing, “Zhengzheng Xing: PhD Research”. ↩ ↩²
Z. Xing et al., “Extracting Interpretable Features for Early Classification on Time Series,” in Proc. SIAM Intl. Conf. Data Mining, 2011, pp. 247-258. ↩ ↩² ↩³ ↩⁴ ↩⁵
N. Parrish et al., “Classifying with Confidence from Incomplete Information,” J. Machine Learning Research, vol 14, no. 1, pp. 3561-3589, 2013. ↩ ↩² ↩³ ↩⁴ ↩⁵
P. Schäfer and U. Leser, “TEASER: Early and Accurate Time Series Classification,” Data Mining and Knowledge Discovery, vol. 34, no. 5, pp. 1336-1362, 2020. ↩ ↩²

When is Early Classification of Time Series Meaningful?

Renjie Wu

Audrey Der

Eamonn J. Keogh

Code

Section 4: Table. 1

Section 5: Fig. 9

Appendix B: 2nd Q&A

Data

GunPoint dataset

Section 4: Table. 1

Section 4: Fig. 7

Appendix B: 2nd Q&A

Slides

Reproduce Results in the Paper

Section 4: Table. 1

ECTS & RelaxedECTS

EDSC-KDE & EDSC-CHE

Rel. Class. & LDG Rel. Class.

Section 5: Fig. 9

Appendix B: 2nd Q&A

References

Appendix B: 2^nd Q&A

Appendix B: 2^nd Q&A

Appendix B: 2^nd Q&A