LiPCoT: Linear Predictive Coding based Tokenizer for Self-Supervised Learning of Time Series Data via BERT

LiPCoT (Linear Predictive Coding based Tokenizer for time series) is a novel tokenizer that encodes time series data into a sequence of tokens, enabling self-supervised learning of time series using existing Language model architectures such as BERT.

Main Article: LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models

Unlike traditional time series tokenizers that rely heavily on CNN encoder for time series feature generation, LiPCoT employs stochastic modeling through linear predictive coding to create a latent space for time series providing a compact yet rich representation of the inherent stochastic nature of the data.
LiPCoT is computationally efficient and can effectively handle time series data with varying sampling rates and lengths, overcoming common limitations of existing time series tokenizers.

Citation

If you use this dataset or code in your research, please cite the following paper:

    @misc{anjum2024lipcot,
        title={LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models},
        author={Md Fahim Anjum},
        year={2024},
        eprint={2408.07292},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
    }

Dataset

We use EEG dataset of 28 PD and 28 control participants.

Original dataset can be found at link. The data are in .mat formats and you need Matlab to load them. (No need for this unless you are interested into the original EEG data)
Raw CSV dataset used for this repo can be found at link. Download this for running all steps in this repo.

How to run

If you want to run all steps:
- Download raw CSV dataset and place them in the data/raw folder
- Run Step 1-7
If you want to run only the BERT models, run Step 4-6. No need to download raw data as the processed dataset is included in this repo.

Steps

1. Data Preparation

First, the data must be processed. data_processing notebook loads raw data and prepares training,validation and test dataset.

2. Tokenization via LiPCoT

data_tokenizer notebook tokenizes the data using LiPCoT model

3. Prepare tokenized dataset for BERT

data_prepare notebook prepares datasets for BERT models. If you are downloading from GitHub, up to this step is done for you.

4. Self-supervised learning via BERT

pretrain_bert notebook conducts pretraining of BERT model.

If you are running code with data from GitHub, start with this step.

5. Classification task: with pretrained BERT

finetune_bert notebook conducts fine-tune of BERT model for binary classification

6. Classification task: without pretraining

finetune_bert_without_pretrain notebook uses a randomly initialized BERT model and fine tunes it for classification

7. Classification task: CNN-based architectures

cnn_classifier notebook uses CNN model as described in Oh et. al. (2018)
deepnet_classifier notebook uses Deep Convolutional Network as described in Schirrmeister et. al. (2017)
shallownet_classifier notebook uses Shallow Convolutional Network as described in Schirrmeister et. al. (2017)
eegnet_classifier notebook uses EEGNet as described in here

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
models		models
modules		modules
results		results
.gitignore		.gitignore
LICENSE		LICENSE
cnn_classifier.ipynb		cnn_classifier.ipynb
data_prepare.ipynb		data_prepare.ipynb
data_processing.ipynb		data_processing.ipynb
data_tokenizer.ipynb		data_tokenizer.ipynb
deepnet_classifier.ipynb		deepnet_classifier.ipynb
eegnet_classifier.ipynb		eegnet_classifier.ipynb
finetune_bert.ipynb		finetune_bert.ipynb
finetune_bert_without_pretrain.ipynb		finetune_bert_without_pretrain.ipynb
index.md		index.md
pretrain_bert.ipynb		pretrain_bert.ipynb
readme.md		readme.md
shallownet_classifier.ipynb		shallownet_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiPCoT: Linear Predictive Coding based Tokenizer for Self-Supervised Learning of Time Series Data via BERT

Citation

Dataset

How to run

Steps

1. Data Preparation

2. Tokenization via LiPCoT

3. Prepare tokenized dataset for BERT

4. Self-supervised learning via BERT

5. Classification task: with pretrained BERT

6. Classification task: without pretraining

7. Classification task: CNN-based architectures

About

Releases

Packages

Languages

License

MDFahimAnjum/LiPCoT

Folders and files

Latest commit

History

Repository files navigation

LiPCoT: Linear Predictive Coding based Tokenizer for Self-Supervised Learning of Time Series Data via BERT

Citation

Dataset

How to run

Steps

1. Data Preparation

2. Tokenization via LiPCoT

3. Prepare tokenized dataset for BERT

4. Self-supervised learning via BERT

5. Classification task: with pretrained BERT

6. Classification task: without pretraining

7. Classification task: CNN-based architectures

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages