Timing TSLA with Google Trends

Historical search volume presents a wealth of information. Recently, we presented a way to use Google Trends to predict stock market directiontsla_real_vol_, looked at the correlation between searches for the term “stock market” to the value of the VIX, and used searches for “farmville” to value ZNGA post-IPO.

Today we want to present a back-of-the-envelope relationship we came across while looking at the historical volume for search terms related to Tesla Motors (TSLA). We extracted the historical search time series data with the Python class gtrends.

When we looked at a plot of TSLA’s realized volatility (rolling 20-day, not annualized) against search levels for the term (“tsla”), we saw an interesting relationship. The current level of search volume is significantly correlated with realized volatility (0.44). kde_

It would seem that increases in TSLA’s share price volatility occur simultaneously with bursts of search volume.

We wanted to test the relationship for possible cointegration. Just from an intuitive standpoint, it would seem reasonable for causation to flow both ways, for example:

  • high volatility might create more demand for information from investors / traders leading to higher search volume, and
  • higher search volume might be indicative an intent to act from investors / traders leading to more volatility in the future

Thus there could be an equilibrium relationship between search volume and realized volatility. This could be valuable to traders looking to time their entries and exits into TSLA. tsla_stock_price2

One way to test for cointegration is to perform a linear regression where realized volatility is the dependent variable and search volume is the independent variable. Then we test the residuals to see if they are stationary using an Augmented Dickey-Fuller test (ADF).

Suppose our search data is contained within a CSV file named “tsla.csv”. We can load it into a pandas dataframe with:

import pandas as pd
tsla = pd.read_csv(“tsla.csv”)
tsla.columns =  [‘date’,’x’,’m’,’s’,’tsla’,’RV’]

tsla[‘RV’] contains the realized volatility and tsla[‘tsla’] contains search volume. Now we can perform a linear regression using statsmodels:

import statsmodels.formula.api as smf
model = smf.ols(formula = “RV ~ tsla”, data = tsla)
res = model.fit()
res.summary()

which should produce:

cli

Both the intercept and the coefficient for search volume are significant. Now we can extract the model residuals and test them for stationarity:

import statsmodels.tsa.stattools as sms
adf = sms.adfuller(res.resid)
adf[1]

which will produce something like:

0.00059748187021358898

This is the p-value for the ADF test, it is way under the 0.05 threshold we like to use. By this measure we can cautiously conclude that search volume and realized volatility are cointegrated, but as we said earlier these results are preliminary and require further investigation before going into production.


You can learn to use the gtrends class yourself by reading our forthcoming eBook, Intro to Social Data for Traders, which is available for pre-order now. The release date is only 9 days away on February 26th, 2015 so pre-order now to make sure your get your copy upon its release.

4 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s