Given yesterday’s post showing the link between trading volume and stock valuation, it would seem appropriate to investigate the ways in which we can **predict today’s volume**. Its all well and good to know that high volume coincides with volatility and falling prices but **how do we know we are going to be in a high volume period?**

It would be nice if we could quickly identify which trading days are likely to be high volume in real-time. As such, we wanted to test how well the daily volume could be predicted using just the first minute’s trading volume after the 9:30am open.

**The first chart plots the opening minute’s trading volume for SPY against the rest of the trading volume for the day** (i.e. total daily volume minus first minute’s volume). As we can see there is a decently **high correlation between the first minute’s volume and the rest of the day’s volume (0.52)**, but the relationship is hard to capture with a simple linear regression. The wedge shape of the scatter plot — complete with increasing volatility around the mean, a classic sign of heteroskedasticity — suggests that we might use another technique to describe the relationship: the Clayton Copula.

We first highlighted copula’s when we showed how to simulate correlated random walks with a Gumbel copula. Again we will make use of the ambhas package for Python. **Given today’s first-minute trading volume, we can use a copula to describe the marginal distribution of the total daily trading volume**. We estimated copula’s for 8 different securities including ETF’s and listed stocks (using intraday data from 2013 onward). To test the realism of the estimated relationships, we **plotted the actual results against a simulated dataset using the copula parameters**:

Once we have estimated the parameters for each copula relationship, we can use the parameters to **predict the total volume for the day** **from**** the first minute’s trading volume. **

Suppose we have the volume data for SPY saved in a pandas dataframe called *df*, where *df[‘open_volume’] *is the first minute’s volume and *df[‘rest_volume’]* is the total trading volume for the remainder of the day. We can create the copula with the following code:

**from ambhas.copula import Copula**

**copula = Copula(df[‘open_volume’], df[‘rest_volume’], “clayton”)**

Now suppose we observe that 1.1 million shares traded in SPY in the first minute (actual data from today) and we want to make a prediction at 9:31am for how much more volume will have traded by 4:00pm. We can do this with the *estimate* function:

**est = copula.estimate(1100000)**

**print est**

which produces the following:

**(array(105502144.80198874), array(30314961.556911916), array(80369536.36541207), array(128374067.47211906))**

The first value in the array is the mean for the prediction, i.e. given 1.1 million shares traded in the first minute we would predict around 106.6 million shares for the full day (1.1 million plus 105.5 million). The second value is the standard deviation of the prediction. The final two numbers represent the lower and upper bounds of the interquartile range (IQR).

The actual volume came in a little bit shy of 80 million, right at the lower bound of the IQR. Looking at the marginal distribution of total daily volume we can see that the mean prediction of 100 million is in the “meat” of the distribution by which we mean its not particularly high given the historical record. Given today’s gains in SPY, this jive’s with our **hypothesis that low volume coincides with benign market environments (read; gently rising) .**

You can try it out for yourself by downloading the data here: export_volume_spy (sorry its in xlsx, wordpress didn’t like our CSV file)

Categories: Python, Quantitative Trading

Did you test the strategy that is short SPY if first minute volume is higher than a threshold and long otherwise ?

LikeLike

we are in the preliminary stages of investigating this phenomenon. so far we have found evidence that 1) volatility and high volume tend to cluster and 2) high volume and recently falling prices tend to coincide. whether this is a stand-alone trading strategy has yet to be determined but it seems promising at least as part of an ensemble

LikeLike

Also why did you use copulas and not WLS ?

LikeLike

Indeed, Weighted Least Squares (WLS) can address the problem of heteroskedasticity, but you have a linear relationship and in a lot of implementations you are limited with Gaussian errors as well.

Archimedean copulas are a different way of thinking about multivariate relationships, they make use of sklar’s theorem (http://en.wikipedia.org/wiki/Copula_%28probability_theory%29#Sklar.27s_theorem) and their representation allows for ease of simulation using random uniforms, the copula, and the marginal edf’s.

LikeLike

Reblogged this on atoast2trading.

LikeLike