This is the supporting page for our paper When is Early Classification of Time Series Meaningful?

Keywords: Early classification, time series analysis, data mining.

Update (Jan. 25, 2022): Extended abstract accepted by 38th IEEE International Conference on Data Engineering (ICDE2022), TKDE Poster Track.

Update (Aug. 24, 2021): Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE), doi:10.1109/TKDE.2021.3108580.

@Article{Wu2021a,
  author    = {Renjie Wu and Audrey Der and Eamonn Keogh},
  journal   = {{IEEE} Transactions on Knowledge and Data Engineering},
  title     = {When is Early Classification of Time Series Meaningful},
  year      = {2021},
  doi       = {10.1109/tkde.2021.3108580},
  publisher = {Institute of Electrical and Electronics Engineers ({IEEE})},
}

Code

Section 4: Table. 1

Code for evaluated early time series classification algorithms in Table. 1:

Section 5: Fig. 9

Code for computing the holdout classification error-rate in Fig. 9 (bottom):

  • nibble_right_side.py. Requires at least Python 3.6. SHA1: 43baabcd682581407137b0f229786b36f5877be6

Appendix B: 2nd Q&A

Related code for the test on TEASER5 mentioned in Appendix B: 2nd Q&A:

In order to run the above testing code, below changes must be applied to TEASER’s code: [Expand All] [Collapse All]

  • src/main/java/sfa/classification/Classifier.java
    --- Original/src/main/java/sfa/classification/Classifier.java  2020-07-23 05:50:30.000000000 -0700
    +++ Modified/src/main/java/sfa/classification/Classifier.java  2021-02-03 18:11:10.656257400 -0800
    @@ -317,8 +317,8 @@
         public Double[] labels;
         public AtomicInteger correct;
        
    -    double[][] probabilities;
    -    int[] realLabels;
    +    public double[][] probabilities;
    +    public int[] realLabels;
    
    
         public Predictions(Double[] labels, int bestCorrect) {
    
  • src/main/java/sfa/classification/TEASERClassifier.java
    --- Original/src/main/java/sfa/classification/TEASERClassifier.java  2020-07-23 05:50:30.000000000 -0700
    +++ Modified/src/main/java/sfa/classification/TEASERClassifier.java  2021-02-04 18:42:54.635397100 -0800
    @@ -36,9 +36,9 @@
       public static int MAX_WINDOW_LENGTH = 250;
    
       // the trained TEASER model
    -  EarlyClassificationModel model;
    +  public EarlyClassificationModel model;
       
    -  WEASELClassifier slaveClassifier;
    +  public WEASELClassifier slaveClassifier;
    
       public TEASERClassifier() {
         slaveClassifier = new WEASELClassifier();
    
  • src/main/java/sfa/timeseries/TimeSeriesLoader.java
    --- Original/src/main/java/sfa/timeseries/TimeSeriesLoader.java  2020-07-23 05:50:30.000000000 -0700
    +++ Modified/src/main/java/sfa/timeseries/TimeSeriesLoader.java  2021-02-04 01:05:41.448100200 -0800
    @@ -39,7 +39,7 @@
             }
    
             // switch between old " " and new separator "," in the UCR archive
    -        String separator = (line.contains(",") ? "," : " ");
    +        String separator = (line.contains(",") ? "," : line.contains("\t") ? "\t": " ");
             String[] columns = line.split(separator);
    
             double[] data = new double[columns.length];
    

We also provide the pre-compiler jar with both modified TEASER classifier and testing code packaged:

Although it should be fine to run it with JRE 1.8 (we didn’t test it), we highly recommend you use at least JRE 11.

Data

GunPoint dataset

GunPoint dataset can be found in the UCR Time Series Classification Archive.

EDSC3 requires the training dataset to be sorted by the class label. Thus we provide a pre-sorted one:

Section 4: Table. 1

We tested ECTS1, EDSC3 and Rel. Class.4 on a “denormalized” version of GunPoint testing dataset, by adding to each instance a random number in range [-1, 1]:

This dataset follows UCR format, where the 1st column is the class label, and the 2nd column to the end are the datapoints (in this case, of length 150).

Section 4: Fig. 7

The ECG snippet recorded from two different chest locations:

  • chf01m.mat. SHA1: 4ee5f109a1378860f73649ffe401b1b86a384b50

Appendix B: 2nd Q&A

We tested TEASER5 with 10,000 random walks of length 150. We use MATLAB:

function generateRandomWalkDataset(outputTSVPath)
    rng(1919186795);

    testingDataset = zeros(10000, 151);
    testingDataset(:, 1) = -1; % Class: -1

    for i = 1:10000
        testingDataset(i, 2:end) = cumsum(randn(150, 1));
    end

    dlmwrite(outputTSVPath, testingDataset, '\t');
end

to generate the data. We also provide the pre-generated random walks:

This dataset follows UCR format, where the 1st column is the class label, and the 2nd column to the end are the datapoints (in this case, of length 150).

Slides

Below presentation contains the results of the test mentioned in Appendix B: 2nd Q&A. Related code is available in section Code: Appendix B: 2nd Q&A. Datasets can be found in section Data: Appendix B: 2nd Q&A.

In some cases, we give some text or comments. In others, we just do a screen dump of a MATLAB figure.

Reproduce Results in the Paper

Section 4: Table. 1

The accuracy of six early classification algorithms
Algorithm Normalized DeNormalized
(min. support = 0) ECTS1 86.7% 68.7%
(min. support = 0) RelaxedECTS1 86.7% 68.7%
EDSC-CHE3 94.7% 62.7%
EDSC-KDE3 95.3% 58.7%
(τ = 0.1) Rel. Class.4 90.0% 70.0%
(τ = 0.1) LDG Rel. Class.4 91.3% 71.3%

Relate code is available in section Code: Section 4: Table. 1. Datasets can be found in section Data.

ECTS & RelaxedECTS

First, we edit DataSetInformation.h to set up metadata:

// GunPoint, change paths below to the actual ones.
const int DIMENSION=150;   // length of time series
const int ROWTRAINING=50;  // size of training data
const int ROWTESTING=150;  // size of testing data
const char* trainingFileName="/path/to/GunPoint_TRAIN.tsv";
// If running on original GunPoint testing dataset, use the following:
// const char* testingFileName="/path/to/GunPoint_TEST.tsv";
const char* testingFileName="/path/to/GunPoint_RandomShift_TEST.tsv";
const char* trainingIndexFileName="/path/to/ECTS_GunPoint_Training_Index";
const char* testingIndexFileName="<not used>";
const char* DisArrayFileName="/path/to/ECTS_GunPoint_DisArray";
// If running on original GunPoint testing dataset, use the following:
// const char* ResultfileName="/path/to/ECTS_GunPoint_Result.txt";
const char* ResultfileName="/path/to/ECTS_GunPoint_RandomShift_Result.txt";
const int NofClasses=2;
const int Classes[]={1,2};

We then run IndexBuilding.cpp to build training index. Later, we edit line 36 in Hierarchical.cpp:

33
34
35
36
// Algorithm parameters: minimal support

double MinimalSupport=0;
int strictversion=1; // 1 strict version, 0 loose version

to choose which version of ECTS to execute: 0 for RelaxedECTS and 1 for ECTS.

Finally, we run Hierarchical.cpp and collect the reported accuracy.

EDSC-KDE & EDSC-CHE

Similarly, we edit DataSetInformation.h to set up metadata first:

// GunPoint, change paths below to the actual ones.
const int DIMENSION=150;   // length of time series
const int ROWTRAINING=50;  // size of training data
const int ROWTESTING=150;  // size of testing data
const char* trainingFileName="/path/to/GunPoint_TRAIN_sorted_by_class.tsv";
// If running on original GunPoint testing dataset, use the following:
// const char* testingFileName="/path/to/GunPoint_TEST.tsv";
const char* testingFileName = "/path/to/GunPoint_RandomShift_TEST.tsv";
const char* resultFileName="/path/to/EDSC_Features_GunPoint.txt";
const char* path="/path/to/GunPoint_Folder/";
const int NofClasses= 2;
const int Classes[]={1,2};
const int ClassIndexes[]={0,24}; // the data is sorted by class
const int ClassNumber[]={24,26};

Then, we edit line 105 in ByInstanceDP.cpp:

105
106
107
108
109
110
111
112
int option=2; // 1: using thresholdAll, 2: using the KDE cut
int DisArrayOption=2; // 1: naive version, 2: fast version
int MaximalK=DIMENSION/2; // maximal length
int MinK=5;
double boundThrehold=3; // parameter of the Chebyshev's inequality
double recallThreshold=0; 
double probablityThreshold=0.95;
int alpha=3;

to choose which EDSC to execute: 1 for EDSC-CHE and 2 for EDSC-KDE.

Finally, we run ByInstanceDP.cpp and collect the reported accuracy.

Rel. Class. & LDG Rel. Class.

We first edit loadDataset.m to add metadata for GunPoint_RandomShift:

% ... omitted ...

switch(lower(name))
    % ... omitted ...

    case 'gunpoint_randomshift'
        % Change paths below to the actual ones
        test = load('/path/to/GunPoint_RandomShift_TEST.tsv');
        ts_l = test(:,1);
        ts_d = test(:,2:end);
        train = load('/path/to/GunPoint_TRAIN.tsv');
        tr_l = train(:,1);
        tr_d = train(:,2:end);
    
    % ... omitted ...
end

Then we edit line 4, 7, 8, and 10 in Run_all_experiments.m to set up parameters:

3
4
5
6
7
8
9
10
11
12
%%%% User Inputs %%%%%%%%
dataset = 'GunPoint_RandomShift'; % 'GunPoint' for the original one
disp(['Loading ' dataset ' data']);

constraint_type = 'Naive';
pred_type = 'Corr';

use_LDG = 1;

%%%%%%%%%%%%%%%%%%%%%%%%%%

Note that line 10 determines which version to execute: 0 for Rel. Class. and 1 for LDG Rel. Class.

Finally, we run Run_all_experiments.m and collect the reported accuracy.

Section 5: Fig. 9

Fig. 9 in the paper
Fig. 9. (top) A typical example from GunPoint annotated to show where the discriminating region is. (bottom) The holdout classification error-rate of every prefix of the GunPoint data from lengths 20 to 150 (the full length of the data).

To produce Fig. 9, we first edit line 131 and 132 in nibble_right_side.py:

129
130
131
132
133
# Parameters
NUM_K = 1
train_name = 'GunPoint/GunPoint_TRAIN.tsv' # path to file
test_name = 'GunPoint/GunPoint_TEST.tsv' # path to file
datasetname = 'GunPoint'

to the actual path of GunPoint dataset. Then we can simply run

python nibble_right_side.py

to generate error-rate curves. The output files are:

  • gunpoint_r_nibble.svg, plots the holdout classification error-rate curves on both training and testing dataset.
  • gunpoint_r_nibble.txt, stores all the datapoints of the two curves.

Fig. 9 (bottom) shows the reverse of the testing error-rate curve in the output.

Appendix B: 2nd Q&A

The results are presented in the presentation in section Slides.

To produce the results in the slides, we first run Java code

java -jar "/path/to/TEASERTester.jar" "GunPoint (TRAIN) + RandomWalk (TEST)" "/path/to/GunPoint_TRAIN.tsv" "/path/to/RandomWalk_TEST.tsv" "/path/to/output.json"

to generate the early predictions of each snapshot on every exemplars in the RandomWalk testing dataset.

The pre-compiled TEASERTester.jar can be found in section Code: Appendix B: 2nd Q&A; GunPoint_TRAIN.tsv and RandomWalk_TEST.tsv can be found in section Data; /path/to/output.json determines where results will be written.

Then we run MATLAB script TEASERResultPlotter.m (which can be found in section Code: Appendix B: 2nd Q&A):

TEASERResultPlotter("/path/to/output.json")

to draw the figures in the slides.

References

  1. Z. Xing et al., “Early Classification on Time Series,” Knowledge and Information Systems, vol. 31, no. 1, pp. 105-127, 2012.  2 3 4

  2. Z. Xing, “Zhengzheng Xing: PhD Research”.  2

  3. Z. Xing et al., “Extracting Interpretable Features for Early Classification on Time Series,” in Proc. SIAM Intl. Conf. Data Mining, 2011, pp. 247-258.  2 3 4 5

  4. N. Parrish et al., “Classifying with Confidence from Incomplete Information,” J. Machine Learning Research, vol 14, no. 1, pp. 3561-3589, 2013.  2 3 4 5

  5. P. Schäfer and U. Leser, “TEASER: Early and Accurate Time Series Classification,” Data Mining and Knowledge Discovery, vol. 34, no. 5, pp. 1336-1362, 2020.  2

Updated: