Mining the StockTwits home stream

Financial network research has traditionally looked at price correlation. Social networks like StockTwits present a new opportunity to visualize the structure of the market as a whole. Today we took a look at the StockTwits home stream. When you follow another account, their tweets automatically get piped into an aggregated feed known as your home stream. This is the feed of tweets you see when you first log in to the StockTwits homepage.

Each tweet can reference a collection of financial instrument symbols. These references take the form of “cashtags” which are symbols preceded by the $-sign. This convention allows a quick way to identify the subject matter of a tweet. Some tweets contain a single reference while others mention multiple symbols.

We wanted to see if there was a pattern in which symbols get mentioned together. Much like price correlation, we created a mention correlation matrix which estimated the probability of seeing any two symbols in the same tweet. When we saw that the mention correlation was a sparse matrix, we immediately thought of Minimum Spanning Trees.

The figure below represents the network created by the last 800 tweets from our home stream. Stocks that are connected by an edge are more likely to be mentioned within the same tweet. The chart is 8000 x 8000 so please click to expand and explore


We defined the correlation between two symbols as the probability that they would be mentioned together in the same tweet. To calculate this matrix, we counted the number of co-occurrences and then normalized each column by the total number of mentions on the appropriate element of the diagonal.

We also looked at the top-mentioned symbols within our home stream:


Top mentions in MKTSTK’s home tweet stream

In total there were about 290 different symbols mentioned in our stream. The correlation matrix was extremely sparse: most tweets only mention one symbol. The symbols that form the clusters in our MST are mostly market indices or index ETF’s, as well as some influential individual equities. Traditional industry groups are clearly visible such as the cluster around Ford (F) or the Yahoo-Google cluster (YHOO-GOOG).

Another notable cluster is the one centered around the $STUDY cashtag; this tag is used to signify learning lessons for the StockTwits community.

The sparseness of the correlation matrix produces characteristically long tails within the graph. These are stocks that are more likely to be mentioned on their own.

Another way to visualize financial correlation networks is with heatmaps, which represent the connections between assets by color.

Adapted from Intro to Social Data by Thomas Pendergrass

Mining the StockTwits Homestream Series

Part I: Minimum Spanning Trees
Part II: Filtered Correlation Network for the S&P 500
Part III: Information Flow

4 replies »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s