How much data should I use to build a trading strategy?

On average, High Frequency Trading is a young profession. At meetups, high frequency traders are likely to refer to the years prior to 2008 as “ancient history”. As a group, their attention spans might seem short and HFT strategies resemble their creators to a startling degree. However, in general, successful traders remember the worst trades and the events leading up to big losses, regardless of age. This dichotomy of thought — the use of short term, high resolution datasets to “inform” trading models while keeping the long term human perspective always watching the loop — is a hallmark of successful HFT shops.

This is why our ears perked up when, about a week ago, fellow quant blogger Michael Harris asked Are Historical Data Prior to 2009 Obsolete for Developing Trading Systems? The article correctly points out that this question will have different answers to different people depending on your context. E.g. Are you a short term or long term trader? Mr. Harris’ post examines some “short term” trade setups around the 2009 pivot point and compares the results. (Check it out, I’m not going to spoil the plot for you)

That being said, “Short term vs long term” has as many interpretations as there are humans on the planet. One man’s short term is another man’s eternity. Thus we thought we would join the conversation and share our perspective from the short term context, where our “short term” is measured in microseconds and “long-term” is 24 hours.

In this world, data prior to two weeks ago might be obsolete, let alone 2009. 

Selecting the appropriate amount of data to use in the formulation of a trading model is a complicated question, even for experienced practitioners. If your target holding period is 5 minutes, does it really make sense to generate today’s hedge ratios using data from 2009? On the other hand, the knowledge of what happened to the market after 2:42pm ET on May 6th 2010 might save your strategy from ruin today!

This highlights two contrasting heuristics which guide our own modeling (warning, some of this might be regarded as heresy, but these rules were forged from the friction of melding theory and reality). First we divide a trading model’s parameters into two groups: A) Risk Management Parameters and B) everything else. Group A includes practical parameters that can save your job such as: position limits, volatility awareness, execution throttles, et al. Thus the appropriate amount of data to use for optimizing any particular parameter varies depending on which group it belongs to:

  1. Group B: The size of your dataset should be proportionate to your holding period. E.g. high frequency traders might use 1-2 weeks of high res data to generate hedge ratios, triggers, and general business logic. This is especially true for any parameters which rely on moving targets like correlation and volatility.
  2. Group A: The size of your dataset should include every bit of information YOU have ever consumed. You cannot offload these to traditional backtesting. Do you want your trading model in the market if a commercial plane hits a skyscraper? How do you include your knowledge of the 1907 credit event into your quant model or the fact that October 19th 1987 happened? These events certainly seem applicable any time the market literally vanishes: figuratively called “swiss-cheese markets” because there are so many holes in the order book.

Risk management controls are your first defense against tail risk. Did the bid/ask spread just widen out to 500% of its normal amount? Did the market move more than 5% in the last 500 milliseconds? If your strategy was designed for these moments, trade on trader! If not, save yourself some green and implement some risk limits while considering the following: if you’re trading all the time you will encounter a tail event with a probability of 1.0. 

Risk management parameters are more than a cheap hedge for exogenous tail risk, they can also save your strategy from itself. As GestaltU recently observed, its a dirty secret that eventually All Strategies “Blow Up”.

H/T to Quantocracy for providing the forum for this kind of quant-minded discussion.

Check out SliceMatrix: a unique tool for visualizing the stock market, including views of filtered correlation networks and minimum spanning trees


Want to learn how to mine social data sources like Google Trends, StockTwits, Twitter, and Estimize? Make sure to download our book Intro to Social Data for Traders

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 277 other followers


1 reply »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s