This repository contains an implementation of the Multi-WaveNet model used for time series forecasting, particularly on intra-day Bitcoin exchange data. Separate WaveNet models are trained to forecast prices for multiple exchanges conditioned on each others' histories. Each WaveNet is then coupled at generation time to produce arbitrary out-of-sample dynamic forecasts. The code is a bit messy, and I don't have any plans to extensively refactor in the future, so use it as-is.
Python 2.7. I believe this code should also work fine with Python 3.6 but I haven't tested it. The packages versions used can be found in requirements.txt
.
Download the Bitcoin Historical Data and unzip it into the ./data folder. The data can be found here: Bitcoin Historical Data - Kaggle
As of writing this (Nov 2018), this data set includes 2 exchanges: Bitstamp and Coinbase. Unzipping the data should result in 2 CSV files.
The model can be simply run for a single date with:
python main.py --date 2018-10-10 --plot
Where --plot
specifies to save a plot. See --help
for additional hyperparameters and arguments. Playing around with the number of filters is interesting, but quickly leads to overfitting. You can also access the TensorBoard log with:
tensorboard --logdir=logs
And visit localhost:6006
via your web browser.
The general architecture used for this implementation was taken from:
The original WaveNet paper:
Also some code was used from the Fast WaveNet model:
The model was run on only indavidual day's data at a time at 1-minute resolution. The model was fit on data up to 6pm, then generated a dynamic forecast forward for the rest of the day. Sample forecast on the most recent day in the data set (additional samples can be found in the /images folder):
The network is very close to what is described in (1). It uses a causal, conditional input layer with parameterized skip connections, followed by a stack of dilated convolutions (filter size 2, dilation of 2^l) each with a residual connection similar to ResNet, and finally the output is generated by a 1x1 convolution. For this example, we use 7 convolution layers (resulting in a receptive field of 128 time steps or about 2 hours).
The key to this implementation that there are actually multiple parallel WaveNet networks being used, one for the exchange rate from each of different exchanges included in the Bitcoin data. Each of these networks is trained on predicting the price for a given exchange, based on the past prices of that exchange and conditioned on the past prices of the other two exchanges. These networks can be trained asynchronously since all historical data is known.
At generation time, the networks are coupled together such that each model will forecast a single time step t+1, then these three predictions will be fed back into each of the models to make dynamic forecasts. By training a separate network for each feature, we can make arbitrary out-of-sample forecasts into the future, and maintain the multivariate time series.