Fun with FIX protocol: parsing the CME secdef.dat file

Recently we received a request to provide a tutorial on splicing front month futures contracts into one continuous series. This is a non-trivial task, so we decided to split the process into several parts. The first thing you have to do is figure out a simple way to build a database of expiration dates. Luckily for us, we can use FIX protocol to solve this problem and many more.

The Financial Information eXchange (FIX) protocol is a communications protocol which forms the basis of modern financial markets. FIX is a common protocol that allows many different financial systems to communicate and work together. While most choose to leave FIX to their developers, traders and quants who learn how to use this protocol will find that the upside is well worth the initial learning curve.

Exchanges provide a wealth of information in FIX form, particularly the CME Group (which includes the CME, CBOT, NYMEX, KCBOT, and COMEX). The CME releases a security definition (secdef) file which contains information on just about every product traded on the exchange including dates of expiry.

For the purposes of this tutorial, we are only interested in a subset of the available FIX tags contained within the secdef file, specifically:

  • Tag 55 – Underlying/ root Symbol, for options, spreads, butterflies, and outrights
  • Tag 107 – Security Description, shows detailed information on the symbol
  • Tag 865 – Event Type (specifically we are looking for values equal to 7 = Last Eligible Trade Date)
  • Tag 866 – Event Date (when Tag 865 == 7 then this is the final trading day)

The secdef file can be downloaded here:

ftp://ftp.cmegroup.com/fix/Production/secdef.dat.gz

The first step is to unzip the file. This can be accomplished easily with the Linux command line util gunzip:

gunzip secdef.data.gz

Now we can parse the secdef.dat file:

import pandas as pd

tags = ['55', '107', '865', '866']

rows = []
with open("secdef.dat") as f:
for line in f:
row_dict = {}
# split each line
for element in line.split('x01'):
try:
tag, val = element.split('=')
if tag in tags:
row_dict[tag] = val
except:
pass
rows.append(row_dict)

Now rows contains the definition for each security; we can then make it into a pandas dataframe for easy manipulation and storage down the line.

df = pd.DataFrame(rows)

print df[df['55'] == 'ZB']

Which should return something like:

Index 107 55 865 866
343092 ZBH5-ZBU5 ZB 7 20150320
353274 ZBH5 ZB 7 20150320
367367 ZB:BF H5-M5-U5 ZB 7 20150320
369357 ZBM5-ZBU5 ZB 7 20150619
370995 IN:ZBH5L062515MAY30 ZB 7 20150320
372165 ZBM5 ZB 7 20150619
373638 ZBH5-ZBM5 ZB 7 20150320
428531 ZBU5 ZB 7 20150921

So now you have the expiration dates, the first step in splicing contracts. Next step is to store them in a database for later use (like a pandas HDFStore for example). Of course, to get real value from this data you should record it periodically so that you have historical expiration dates.

In the next installment of this series we will show the next step in the process of splicing together futures contracts into one continuous historical series.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s