PRCIS: Pattern Recognition Comparison in Series

We are happy to announce that PRCIS has been accepted to ICKG 2022.

Audrey Der, Chin-Chia Michael Yeh, Renjie Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

emails: {ader003, rwu034}@ucr.edu, {miyeh, junpenwa, yazheng, zzhuang, liawang, wzhan}@visa.com, eamonn@cs.ucr.edu

Resources and Code

  • Paper (arxiv; PDF)

  • LINK TO REPOSITORY: Contains the codebase and subsets of data (when subsets were used).

Note: This is a supplementary website intended to be referenced in tandem with its corresponding paper, and is not meant to be used alone.

Note: "PRECIS" was the original spelling of the method, and any remaining instances of this spelling are a byproduct of this change.

Quickstart

Notation

In the paper we refer to the dictionary parameters S and L as the size of the dictionary and length of the patterns within. Due to the fact the codebase written over an extended period of time, the naming of the variables within may vary.

  • S may be referred to as NUMPAT ("number of patterns") for short.

  • L may be referred to as WINLEN ("window length"), CYCLELEN ("cycle length"), or something along the lines of "pattern length".

A Short Tutorial

The codebase uses Experiment objects as defined below for generating Yeh Dictionaries and calculating distance matrices (regardless of dictionary creation method).

class Experiment:

def __init__(self, distmet, dict_settings, algyield=True, multivariate=False, downsamplefactor=1):

self.distmet = distmet # distance metric, "DTW", "ED", "PRECIS"

self.numpatt = dict_settings[0]

self.cyclelen = dict_settings[1]

self.algyield = algyield # yield to dict method or exclude any generated patterns not of this exact length

self.multivariate = multivariate # multivar PRECIS extension; only used during the development of this work, not presented in paper

self.downsamplefactor = downsamplefactor #typically untouched; only used during development of this work, not presented in paper

Here is a simple sample snippet of what creating Yeh Dictionaries from each time series and computing a PRECIS distance matrix:

exp = Experiment("PRECIS",[4,150])

use_dicts = []

for ts in dataset:

d, idxs = make_exemplar(ts) #idxs will be a list of tuples in the form of (start,end) indices of each pattern from ts

use_dicts.append(d)

distmat = exp.distmat_from_dicts(use_dicts)

The Yeh Dictionary creation method is directly called within class methods, and is automatically used during make_exemplar. To use a different dictionary method, do not use Experiment.make_exemplar.


Clustering

Note: Figure placement indicators may not be accurate when viewed on a mobile device.

Note: Rival methods not pictured on this website are easily viewable by viewing their notebooks through the github repository.

  • OPSD_CLU.ipynb: left, top) OPSD Two-month snippets of the electrical power demand data from four randomly selected countries in Europe. Includes:

    • (Not Pictured) OPSD Random Day Strawman

  • WeAllWalk_CLU.ipynb: left, bottom) We-All-Walk Dendrogram.

    • Includes:

      • Catch22:

        • All features

        • (Not Pictured) FS features (as determined during classification)

      • (Not Pictured) Random Non-Obvious Holiday

  • TaipeiMRT_CLU.ipynb: left, middle) Taipei MRT Clustering

  • NASAMill_CLU.ipynb: right, top) NASA Mill Dataset

    • (Not Pictured) k-shape

    • (Not Pictured) Folder of Results: Cluster by Period

  • bottom right) Due to the sensitive nature of the data, we cannot share the dataset or code used to generate the Business Merchant figures at this time. We thank you for your understanding.

Links to Papers &/ Datasets

Classification

Please see the paper for the table of results.

Datasets:

Anomaly Detection

  • BVD_AD.ipynb: PRECIS and Anomaly Detection

Links to Datasets Used

Note: The MATLAB implementation of Telemanom was used to produce the following runs.