Stock Forecasting with LSTM
Introduction
As indicated in a previous blog post, time-series models are designed to predict future values based on previously observed values. In other words the input is a signal (time-series) that is defined by observations taken sequentially in time. However, time-series forecasting models such as ARIMA has it own limitations when it comes to non-stationary data (i.e. where statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time). An examples of non-stationary time-series stock price (not to be confused with stock returns) over time.
As discussed in a previous blog post here there have been attempts to predict stock outcomes (e.g. price, return. etc.) using time series analysis algorithms, though the performance is sub par and cannot be used to efficiently predict the market. It is noteworthy that this is a technical tutorial and does not intent to guide people into buying stocks.
LSTM
The LSTM stands for Long Short-Term Memory a member of recurrent neural network (RNN) family used for sequence data in deep learning. Unlike standard feedforward fully connected neural network layers, RNNs and here LSTM have feedback loops which enables them to store information over a period of time also reffered to as a memory capacity.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import yfinance as yf
from yahoofinancials import YahooFinancials
%matplotlib inline
The first step is to download the data from Yahoo finance. In the first step we focus on the Apple stock.
appl_df = yf.download('AAPL',
start='2018-01-01',
end='2019-12-31',
progress=False)
appl_df.head()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2018-01-02 | 42.540001 | 43.075001 | 42.314999 | 43.064999 | 41.380238 | 102223600 |
2018-01-03 | 43.132500 | 43.637501 | 42.990002 | 43.057499 | 41.373032 | 118071600 |
2018-01-04 | 43.134998 | 43.367500 | 43.020000 | 43.257500 | 41.565216 | 89738400 |
2018-01-05 | 43.360001 | 43.842499 | 43.262501 | 43.750000 | 42.038452 | 94640000 |
2018-01-08 | 43.587502 | 43.902500 | 43.482498 | 43.587502 | 41.882305 | 82271200 |
and to plot it using pandas plotting function.
appl_df['Open'].plot(title="Apple's stock price")
here we covert the stock price to daily stock returns and to plot it
appl_df['Open']=appl_df['Open'].pct_change()
appl_df['Open'].plot(title="Apple's stock return")
From previous experience with deep learning models, we know that we have to scale our data for optimal performance. In our case, we’ll use Scikit- Learn’s StandardScaler and scale our dataset to numbers between zero and one.
sc = StandardScaler()
here we create a univariate pre-processor function that does three steps of min max scaling, creating lags, and separating the data to train and test sets for a given time-series.
def preproc( data, lag, ratio):
data=data.dropna().iloc[:, 0:1]
Dates=data.index.unique()
data.iloc[:, 0] = sc.fit_transform(data.iloc[:, 0].values.reshape(-1, 1))
for s in range(1, lag):
data['shift_{}'.format(s)] = data.iloc[:, 0].shift(s)
X_data = data.dropna().drop(['Open'], axis=1)
y_data = data.dropna()[['Open']]
index=int(round(len(X_data)*ratio))
X_data_train=X_data.iloc[:index,:]
X_data_test =X_data.iloc[index+1:,:]
y_data_train=y_data.iloc[:index,:]
y_data_test =y_data.iloc[index+1:,:]
return X_data_train,X_data_test,y_data_train,y_data_test,Dates;
Then we apply the univariate pre-processing to the Apple data
a,b,c,d,e=preproc(appl_df, 25, 0.90)
As a second ticker and an additional varaible to improve our model performance we focus of the SP500 index eft SPY.
spy_df = yf.download('SPY',
start='2018-01-01',
end='2019-12-31',
progress=False)
spy_df.head()
here we covert the stock price to daily stock returns and to plot it
spy_df['Open']=spy_df['Open'].pct_change()
spy_df['Open'].plot(title="Apple's stock return")
here we create a multi-variate pre-processor function that does three steps of min max scaling, creating lags, and separating the data to train and test sets for common dates of two time series.
def preproc2( data1, data2, lag, ratio):
common_dates=list(set(data1.index) & set(data2.index))
data1=data1[data1.index.isin(common_dates)]
data2=data2[data2.index.isin(common_dates)]
X1=preproc(data1, lag, ratio)
X2=preproc(data2, lag, ratio)
return X1,X2;
Then we apply the multi-variate pre-processing to both SPY and Apple data
dataLSTM=preproc2( spy_df, appl_df, 25, 0.90)
here we load necessary libraries for the deep learning model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import keras.backend as K
from keras.callbacks import EarlyStopping
in order to run the models data should be transformed to numpy arrays
a = a.values
b= b.values
c = c.values
d = d.values
and properly reshaped for LSTM modeling
X_train_t = a.reshape(a.shape[0], 1, 24)
X_test_t = b.reshape(b.shape[0], 1, 24)
here we define a simple Sequential model with two LSTM and two dense layers
K.clear_session()
early_stop = EarlyStopping(monitor='loss', patience=1, verbose=1)
model = Sequential()
model.add(LSTM(12, input_shape=(1, 24), return_sequences=True))
model.add(LSTM(6))
model.add(Dense(6))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
that we train for 100 epochs
model.fit(X_train_t, c,
epochs=100, batch_size=1, verbose=1,
callbacks=[early_stop])
Epoch 1/100
429/429 [==============================] - 2s 1ms/step - loss: 1.3043
Epoch 2/100
429/429 [==============================] - 0s 1ms/step - loss: 0.9467
...
Epoch 00029: early stopping
Here we create a rolling forecast function that predicts the values based on the previous outcomes of the model.
ypredr=[]
st=X_test_t[0].reshape(1, 1, 24)
tmp=st
ptmp=st
val=model.predict(st)
ypredr.append(val.tolist()[0])
for i in range(1, X_test_t.shape[0]):
tmp=np.append(val, tmp[0,0, 0:-1])
tmp=tmp.reshape(1, 1, 24)
ptmp=np.vstack((ptmp,tmp))
val=model.predict(tmp)
ypredr.append(val.tolist()[0])
the plot here shows the rolling forecast which base each forecast on 24 data points forecasted beforehand, this should be contrasted to the one point forecast function that base each forecast on 24 data points observed beforehand.
plt.plot(ypredr,color="green", label = "Rolling prediction")
plt.legend()
plt.show()
y_pred = model.predict(X_test_t)
plt.plot(d, label = "Real data")
plt.plot(y_pred, label = "One point prediction")
plt.plot(ypredr, label = "Rolling prediction")
plt.legend()
plt.show()
here we move to multivariate models. First, to run the models data should be transformed to numpy arrays.
Aa = dataLSTM[0][0].values
Ab = dataLSTM[0][1].values
Ac = dataLSTM[0][2].values
Ad = dataLSTM[0][3].values
X_train_A = Aa.reshape(Aa.shape[0], 1, 24)
X_test_A = Ab.reshape(Ab.shape[0], 1, 24)
Sa = dataLSTM[1][0].values
Sb = dataLSTM[1][1].values
Sc = dataLSTM[1][2].values
Sd = dataLSTM[1][3].values
X_train_S = Sa.reshape(Sa.shape[0], 1, 24)
X_test_S = Sb.reshape(Sb.shape[0], 1, 24)
here we load necessary libraries for the deep learning model
from keras.layers import concatenate
from keras.layers import Dropout
from keras.layers import Dense
import keras.backend as K
from keras.callbacks import EarlyStopping
from keras.layers import LSTM
from keras.models import Input, Model
from keras.layers import Dense
here we define model with Keras functional API using two LSTM layers concatenated together and two dense layers with drop out.
early_stop = EarlyStopping(monitor='loss', patience=1, verbose=1)
input1 = Input(shape=(1,24)) # for the three columns of dat_train
x1 = LSTM(6)(input1)
input2 = Input(shape=(1,24))
x2 = LSTM(6)(input2)
con = concatenate(inputs = [x1,x2] ) # merge in metadata
x3 = Dense(50)(con)
x3 = Dropout(0.3)(x3)
output = Dense(1, activation='sigmoid')(x3)
n_net = Model(inputs=[input1, input2], outputs=output)
n_net.compile(loss='mean_squared_error', optimizer='adam')
and to train the model for 100 epochs
n_net.fit(x=[X_train_A, X_train_S], y=Ac, epochs=100, batch_size=1, verbose=1,
callbacks=[early_stop])
Epoch 1/100
429/429 [==============================] - 0s 832us/step - loss: 0.7942
Epoch 2/100
429/429 [==============================] - 0s 808us/step - loss: 0.7825
...
Epoch 14/100
429/429 [==============================] - 0s 802us/step - loss: 0.7143
Epoch 00014: early stopping
```
the plot here shows the rolling forecast which base each forecast on 24
data points forecasted beforehand, this should be contrasted to the one
point forecast function that base each forecast on 24 data points
observed beforehand.
``` {.python}
y_pred = n_net.predict([X_test_A,X_test_S])
plt.plot(Ad, label = "Real data")
plt.plot(y_pred, label = "One point prediction")
plt.legend()
plt.show()
ypredr=[]
st=X_test_A[0].reshape(1, 1, 24)
sst=X_test_S[0].reshape(1, 1, 24)
tmp=st
ptmp=st
val=n_net.predict([tmp,sst])
ypredr.append(val.tolist()[0])
for i in range(1, X_test_t.shape[0]):
tmp=np.append(val, tmp[0,0, 0:-1])
tmp=tmp.reshape(1, 1, 24)
sst=X_test_S[i].reshape(1, 1, 24)
ptmp=np.vstack((ptmp,tmp))
val=n_net.predict([tmp,sst])
ypredr.append(val.tolist()[0])
plt.plot(ypredr, color="green", label = "Rolling prediction")
plt.legend()
plt.show()
y_pred = n_net.predict([X_test_A,X_test_S])
plt.plot(Ad, label = "Real data")
plt.plot(y_pred, label = "One point prediction")
plt.plot(ypredr, label = "Rolling prediction")
plt.legend()
plt.show()