Mining the StockTwits home stream: Part II

homestreamBack in January we wrote a piece showing how to visualize the StockTwits home stream. In the article we built a co-occurrence matrix for the unique list of stock symbols we observed in our homestream. Because the matrix was sparse, we chose to use Minimum Spanning Trees to visualize the co-occurrence relationship.

spxToday we are extending upon our work on correlation filtering to filter a co-occurence matrix based on the tweets found in our current home stream. The graphs on this page show which stock symbols (or cashtags) are likely to occur within the same tweet.

The label sizes are scaled by the relative importance of each symbol. The bigger the label, the more links a symbol shares with the rest of the graph.

For instance, we can see that clusters form around familiar concepts: the tag for the S&P 500 index ($SPX) is connected to two clusters which contain a majority of stocks which are components of the index.



The FX market forms another distinct cluster, with $EURUSD forming a bridge between the $MACRO cashtag and S&P 500 futures ($ES_F). This is a complex network: FX is linked to equities via stock futures, with macroeconomic themed tweets also tending to occur with updates for the US interest rate markets ($ZB_F, $GE_F) and FX futures ($6J_F).

Want to learn how to mine social data from StockTwits, Twitter, Google Trends, and Estimize? Be sure to check out our new book: Intro to Social Data for Traders by our very own Thomas Pendergrass.

book_talkingSome clusters do not necessary follow orthodox logic, and are thus interesting to researchers such as ourselves. Consider the network formed by diverse stocks like QIHU, URBN, CASY, and KANG. There is a definite consumer theme, but the addition of some more exotic names calls for a deeper look into the process going on.

Perhaps it is an example of talking one’s book? Perhaps it is the network identifying important undocumented relationships. One interesting extension of this graph would be to link the users who sent the most tweets related to particular cashtags in this graph. This would allow a quick test of the preceding hypothesis. It would also be a good way to show which users are joining broadly-linked discussions about cashtag clusters.

Mining the StockTwits Homestream Series

Part I: Minimum Spanning Trees
Part II: Filtered Correlation Network for the S&P 500

Part III: Information Flow

We have also included an alternative graph with a white background and a slightly different configuration. It highlights both the relative importance of each node as well the connected clusters in the graph. We used a modified version of this for our lead image on the MKTSTK homepage:


2 replies »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s