Stock Forecasting with LSTM, part 2 Keras Tuner


Introduction

As discussed in a previous blog post here there have been attempts to predict stock outcomes (e.g. price, return, etc.) using recurrent neural networks and more specifically LSTMs. The LSTM stands for Long Short-Term Memory a member of recurrent neural network (RNN) family used for sequence data in deep learning. Unlike standard feedforward fully connected neural network layers, RNNs and here LSTM have feedback loops which enables them to store information over a period of time also referred to as a memory capacity. The example above showed that the performance is sub par and cannot be used to efficiently predict the market. One approach is to tune hyperparameters of the network such as the number of layers, activation functions, and regularization. This tutorial aims to highlight the use of the Keras Tuner package to tune a LSTM network for time series analysis. It is noteworthy that this is a technical tutorial and does not intent to guide people into buying stocks.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
import yfinance as yf
from yahoofinancials import YahooFinancials
%matplotlib inline

The first step is to download the data from Yahoo finance. In the first step we focus on the Apple stock.

appl_df = yf.download('AAPL', 
                      start='2019-01-01', 
                      end='2020-12-31', 
                      progress=False)

appl_df.head()
  Open High Low Close Adj Close Volume
Date            
2018-01-02 42.540001 43.075001 42.314999 43.064999 41.380238 102223600
2018-01-03 43.132500 43.637501 42.990002 43.057499 41.373032 118071600
2018-01-04 43.134998 43.367500 43.020000 43.257500 41.565216 89738400
2018-01-05 43.360001 43.842499 43.262501 43.750000 42.038452 94640000
2018-01-08 43.587502 43.902500 43.482498 43.587502 41.882305 82271200

and to plot it using pandas plotting function.

appl_df['Open'].plot(title="Apple's stock price")

here we covert the stock price to daily stock returns and to plot it

appl_df['Open']=appl_df['Open'].pct_change()
appl_df['Open'].plot(title="Apple's stock return")

From previous experience with deep learning models, we know that we have to scale our data for optimal performance. In our case, we’ll use Scikit- Learn’s StandardScaler and scale our dataset to numbers between zero and one.

sc = StandardScaler()

here we create a univariate pre-processor function that does three steps of min max scaling, creating lags, and separating the data to train and test sets for a given time-series.

def preproc( data, lag, ratio):
    data=data.dropna().iloc[:, 0:1]
    Dates=data.index.unique()
    data.iloc[:, 0] = sc.fit_transform(data.iloc[:, 0].values.reshape(-1, 1))
    for s in range(1, lag):
        data['shift_{}'.format(s)] = data.iloc[:, 0].shift(s)
    X_data = data.dropna().drop(['Open'], axis=1)
    y_data = data.dropna()[['Open']]
    index=int(round(len(X_data)*ratio))
    X_data_train=X_data.iloc[:index,:]
    X_data_test =X_data.iloc[index+1:,:]
    y_data_train=y_data.iloc[:index,:]
    y_data_test =y_data.iloc[index+1:,:]
    return X_data_train,X_data_test,y_data_train,y_data_test,Dates;

Then we apply the univariate pre-processing to the Apple data

a,b,c,d,e=preproc(appl_df, 25, 0.90)

in order to run the models data should be transformed to numpy arrays

a = a.values
b= b.values
c = c.values
d = d.values

and properly reshaped for LSTM modeling

X_train_t = a.reshape(a.shape[0], 1, 24)
X_test_t = b.reshape(b.shape[0], 1, 24)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import keras.backend as K
from keras.callbacks import EarlyStopping
import keras_tuner as kt
from tensorflow.keras.layers import Dropout
from keras_tuner.tuners import RandomSearch
from keras_tuner.engine.hyperparameters import HyperParameters

here we define a simple Sequential model with two LSTM and two dense layers

K.clear_session()
early_stop = EarlyStopping(monitor='loss', patience=1, verbose=1)
model = Sequential()
model.add(LSTM(12, input_shape=(1, 24), return_sequences=True))
model.add(LSTM(6))
model.add(Dense(6))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

and fit the model

model.fit(X_train_t, c,
          epochs=100, batch_size=1, verbose=1,
          callbacks=[early_stop])
Epoch 1/100
431/431 [==============================] - 2s 973us/step - loss: 0.9624
Epoch 2/100
431/431 [==============================] - 0s 947us/step - loss: 1.0980
Epoch 3/100
431/431 [==============================] - 0s 947us/step - loss: 0.9530
Epoch 4/100
431/431 [==============================] - 0s 945us/step - loss: 0.8867
Epoch 5/100
431/431 [==============================] - 0s 943us/step - loss: 0.8433
Epoch 6/100
431/431 [==============================] - 0s 942us/step - loss: 0.5886
Epoch 7/100
431/431 [==============================] - 0s 956us/step - loss: 0.6192
Epoch 8/100
431/431 [==============================] - 0s 973us/step - loss: 0.5257
Epoch 9/100
431/431 [==============================] - 0s 957us/step - loss: 0.4120
Epoch 10/100
431/431 [==============================] - 0s 946us/step - loss: 0.3625
Epoch 11/100
431/431 [==============================] - 0s 944us/step - loss: 0.3114
Epoch 12/100
431/431 [==============================] - 0s 943us/step - loss: 0.3296
Epoch 13/100
431/431 [==============================] - 0s 944us/step - loss: 0.2298
Epoch 14/100
431/431 [==============================] - 0s 945us/step - loss: 0.2337
Epoch 15/100
431/431 [==============================] - 0s 945us/step - loss: 0.2314
Epoch 16/100
431/431 [==============================] - 0s 942us/step - loss: 0.2489
Epoch 17/100
431/431 [==============================] - 0s 940us/step - loss: 0.2131
Epoch 18/100
431/431 [==============================] - 0s 943us/step - loss: 0.1688
Epoch 19/100
431/431 [==============================] - 0s 947us/step - loss: 0.1759
Epoch 20/100
431/431 [==============================] - 0s 974us/step - loss: 0.1767
Epoch 21/100
431/431 [==============================] - 0s 1ms/step - loss: 0.1603
Epoch 22/100
431/431 [==============================] - 1s 1ms/step - loss: 0.1526
Epoch 23/100
431/431 [==============================] - 0s 1ms/step - loss: 0.1666
Epoch 24/100
431/431 [==============================] - 0s 945us/step - loss: 0.1575
Epoch 00024: early stopping

This is in contrast to the tuner approach where options for hyper parameters “hp” are specified and passed to the model

def build_model(hp):
    model = Sequential()
    model.add(LSTM(hp.Int('input_unit',min_value=32,max_value=128,step=32),return_sequences=True, input_shape=(1,24)))
    for i in range(hp.Int('n_layers', 1, 10)):
        model.add(LSTM(hp.Int(f'lstm_{i}_units',min_value=32,max_value=128,step=32),return_sequences=True))
    model.add(LSTM(6))
    model.add(Dropout(hp.Float('Dropout_rate',min_value=0,max_value=0.5,step=0.1)))
    model.add(Dense(6))
    model.add(Dropout(hp.Float('Dropout_rate',min_value=0,max_value=0.5,step=0.1)))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam',metrics = ['mse'])
    return model

Thereafter the tuner object is defined here we use the RandomSearch but other options are available as well.

tuner= kt.RandomSearch(
        build_model,
        objective='mse',
        max_trials=10,
        executions_per_trial=3
        )

and instead of fitting the model you run the search function on the tuner object.

tuner.search(
        x=X_train_t,
        y=c,
        epochs=20,
        batch_size=128,
        validation_data=(X_test_t,d),
)
Trial 10 Complete [00h 00m 17s]
mse: 0.8778028885523478

Best mse So Far: 0.8115118543306986
Total elapsed time: 00h 02m 14s
INFO:tensorflow:Oracle triggered exit

Once the hyper parameter training is done it is possible to get the best model

best_model = tuner.get_best_models(num_models=1)[0]

information on the architecture

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 1, 64)             22784     
_________________________________________________________________
lstm_1 (LSTM)                (None, 1, 64)             33024     
_________________________________________________________________
lstm_2 (LSTM)                (None, 1, 96)             61824     
_________________________________________________________________
lstm_3 (LSTM)                (None, 6)                 2472      
_________________________________________________________________
dropout (Dropout)            (None, 6)                 0         
_________________________________________________________________
dense (Dense)                (None, 6)                 42        
_________________________________________________________________
dropout_1 (Dropout)          (None, 6)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 7         
=================================================================
Total params: 120,153
Trainable params: 120,153
Non-trainable params: 0
_________________________________________________________________

as well as evaluating the performance, here we use the same visualizing approach as discussed in a previous blog post here.

ypredr=[]
st=X_test_t[0].reshape(1, 1, 24)
tmp=st
ptmp=st
val=model.predict(st)
ypredr.append(val.tolist()[0])
for i in range(1, X_test_t.shape[0]):
    tmp=np.append(val, tmp[0,0, 0:-1])
    tmp=tmp.reshape(1, 1, 24)
    ptmp=np.vstack((ptmp,tmp))
    val=model.predict(tmp)
    ypredr.append(val.tolist()[0])
plt.plot(ypredr,color="green", label = "Rolling prediction")
plt.legend()
plt.show()

y_pred = model.predict(X_test_t)
plt.plot(d, label = "Real data")
plt.plot(y_pred, label = "One point prediction")
plt.plot(ypredr, label = "Rolling prediction")
plt.legend()
plt.show()