**Historical search volume presents a wealth of information**. Recently, we presented a way to use Google Trends to predict stock market direction, looked at the correlation between searches for the term “stock market” to the value of the VIX, and used searches for “farmville” to value ZNGA post-IPO.

Today we want to present a back-of-the-envelope relationship we came across while looking at the historical volume for search terms related to Tesla Motors (TSLA). We extracted the historical search time series data with the Python class *gtrends.*

When we looked at a plot of TSLA’s realized volatility (rolling 20-day, not annualized) against search levels for the term (“tsla”), we saw an interesting relationship. The **current level of search volume is significantly correlated with realized volatility** (0.44).

It would seem that **increases in TSLA’s share price volatility occur simultaneously with bursts of search volume**.

We wanted to test the relationship for possible cointegration. Just from an intuitive standpoint, it would seem reasonable for causation to flow both ways, for example:

**high volatility**might create more demand for information from investors / traders leading to higher search volume, and**higher search volume**might be indicative an intent to act from investors / traders leading to more volatility in the future

Thus there could be an equilibrium relationship between search volume and realized volatility. This could be **valuable to traders looking to time their entries and exits** into TSLA.

One way to test for cointegration is to perform a linear regression where realized volatility is the dependent variable and search volume is the independent variable. Then we test the residuals to see if they are stationary using an Augmented Dickey-Fuller test (ADF).

Suppose our search data is contained within a CSV file named “tsla.csv”. We can load it into a pandas dataframe with:

**import pandas as pd**

**tsla = pd.read_csv(“tsla.csv”)**

**tsla.columns = [‘date’,’x’,’m’,’s’,’tsla’,’RV’]**

tsla[‘RV’] contains the realized volatility and tsla[‘tsla’] contains search volume. Now we can perform a linear regression using statsmodels:

** import statsmodels.formula.api as smf**

**model = smf.ols(formula = “RV ~ tsla”, data = tsla)**

**res = model.fit()**

**res.summary()**

which should produce:

Both the intercept and the coefficient for search volume are significant. Now we can extract the model residuals and test them for stationarity:

**import statsmodels.tsa.stattools as sms
**

**adf = sms.adfuller(res.resid)**

**adf[1]**

which will produce something like:

**0.00059748187021358898**

This is the p-value for the ADF test, it is way under the 0.05 threshold we like to use. By this measure we can cautiously conclude that search volume and realized volatility are cointegrated, but as we said earlier these results are preliminary and require further investigation before going into production.

You can learn to use the *gtrends* class yourself by reading our forthcoming eBook, Intro to Social Data for Traders, which is available for pre-order now. The **release date** is only 9 days away on **February 26th, 2015 **so pre-order now to make sure your get your copy upon its release.

Categories: Python, Quantitative Trading

## 4 replies »