The Kalman Filter and Pairs Trading

Imagine this scenario: you are a statistical arbitrage trader at a prop desk or HF. As such, you routinely hold an inventory of ETF exposure that you must hedge.

The previous night, you instructed your overnight traders to calculate the hedge ratios for a matrix of ETF’s.

The next morning before the market opens, your junior traders eagerly present their results for your inspection. Liking what you see, you load the hedge ratios into your trading platform and wait for the open.

When the market first opens for trading, you re-balance your hedges according to the new ratios. Afterwards, you watch in horror as your hedges do not perform as expected. What went wrong?

Every good trader knows they have to adapt when conditions in the market change, so why do we demand otherwise from our trading models? The traders in our example relied on static hedge ratios to power their trading logic. As a result, they opened themselves up to what is known as parameter risk.

Updating your parameters as new information becomes available is one way to protect yourself from this under-appreciated trading risk. By far the most ubiquitous model for accomplishing this in a trading scenario is the Kalman Filter. This is useful when you are dealing with a linear model such as pairs trading, which in its simplest form reduces down to trading the residual of a linear regression:

{\bf Y}_{t} = {\boldsymbol \beta }_{t}*{\bf X}_{t} + {\bf e}_{t}

Where {\bf Y}_{t} is the current price of the first stock, {\bf X}_{t} is the current price of the second stock, {\boldsymbol \beta }_{t} is our current hedge ratio and {\bf e}_{t} is the current spread price we are trading. We could also estimate the hedge ratio using the log changes in X and Y, instead of their levels. This would be more likely to be the case in a High Frequency Trading scenario, where all we care about are price changes.

The Kalman Filter allows us to vary the hedge ratio over time. For example, suppose we assume the hedge ratio follows a random walk, i.e.

{\boldsymbol \beta}_{t} = {\boldsymbol \beta}_{t-1} + {\bf w}_{t}

Where {\boldsymbol \beta}_{t} is the current state of the hedge ratio, {\boldsymbol \beta}_{t-1} is the last state and {\bf w}_{t} is random white noise with mean of zero and volatility {\boldsymbol \sigma}_{w}.

The Kalman Filter was designed for estimating the “hidden state” of a linear Gaussian model like Pairs Trading. The filter is based off of a system of equations:

 Transition Equation: {\bf x}_{t+1} = {\bf A}_{t} {\bf x}_{t} + {\bf w}_{t}\\ Observation Equation: {\bf z}_{t} = {\bf H}_{t} {\bf x}_{t} + {\bf e}_{t}


  • {\bf x}_{t} is the current hidden state (e.g. our hedge ratio),
  • {\bf A}_{t} is the transition matrix (e.g. the identity matrix, \bf I )
  • {\bf z}_{t} is the latest observation vector (e.g. the log change of stock Y)
  • {\bf H}_{t} is the latest observation matrix (e.g. the log change of stock X)
  • {\bf w}_{t}, {\bf e}_{t} are Gaussian white noise with mean zero and variances {\sigma}_{w}, {\sigma}_{e}

Let’s look at a concrete example of the Kalman Filter in action to get a better understanding of how to create and use this model for pairs trading.

First lets import slicematrixIO and create our client which will do the heavy lifting. Make sure to replace the api key below with your own key:

from slicematrixIO import SliceMatrix
sm = SliceMatrix(api_key)

Next let’s import some useful Python modules such as Pandas, NumPy, and Pyplot

import pandas as pd
from pandas_datareader import data as web
import datetime as dt
import numpy as np

In this example, we’ll use the Kalman Filter to estimate the hedge ratio for AAPL and SPY, so let’s go ahead and grab the price data for each symbol:

start = dt.datetime(2016, 1, 1)
end = dt.datetime(2017, 3, 27)
aapl = web.DataReader("AAPL", 'yahoo', start, end)[['Close']]
spy  = web.DataReader("SPY", 'yahoo', start, end)[['Close']]
combo = pd.concat([aapl, spy], axis = 1)

We can now feed the price data into our KalmanOLS pipeline to create an machine learning model which will 1) estimate the current hedge ratio and 2) allow us to update our hedge ratio as new price data becomes available.

Let’s create the Kalman Filter model and get the current state of the model.

kf = sm.KalmanOLS(dataset = combo)

Which should output something like:

{u'cov': [[0.005238177410697875, -1.2409605965399522],
  [-1.240960596539952, 294.9887852793271]],
 u'mean': [0.5861620741470888, 0.003405671636829966]}

The part we really are interested in is the “mean” which contains our pairs trading model’s beta (hedge ratio) and alpha parameters, respectively.

Now suppose we observe two new prices for AAPL and SPY:

aapl_px = 139.34
spy_px = 237.97

We can now update our trading model by using the KalmanOLS’s update method:

kf.update(X = spy_px, Y = aapl_px)

Which will output the new state:

{u'cov': [[0.005244396950524588, -1.243807056418034],
  [-1.243807056418034, 295.98877717694285]],
 u'mean': [0.5855216995507875, 0.0034038488248975996]}

For a full working example of this process, please check out this Kalman Filter and Pairs Trading example Jupyter Notebook.

Interested in resources for Pairs Trading?

Check out the Beta release of SliceMatrix: a unique tool for visualizing the stock market, including views of filtered correlation networks and minimum spanning trees



Want to learn how to mine social data sources like Google Trends, StockTwits, Twitter, and Estimize? Make sure to download our book Intro to Social Data for Traders

Enter your email address to follow this blog and receive notifications of new posts by email.


6 replies »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s