Recently MKTSTK had the honor to collaborate with StockTwits in building a prototype sentiment stock screener.
The process seemed straightforward; we were able to utilize the Python programming language to partition a universe of 7000 stocks into new classes using the combination of price and sentiment.
When you really broke it down, however, the process was incredibly complex for those uninitiated in the dark arts of data science. Moreover, for anyone who didn’t know a programming language, it would have been impossible to replicate the screen in a reasonable time frame.
Then we did a back of the envelope calculation of the amount of time it would take a perfect human data entry agent to complete the analysis. That is, assuming our agent cannot make data entry errors, how long would it take a human to do the same analysis?
We measured this by doing a sample of some of the tasks our script did automatically, then extrapolating out across the whole dataset. This includes looking through each message MANUALLY and recording the sentiment data. To give you an idea of the scale of this, $AAPL had a over 13,000 messages for the week, alone…
In total, we processed around 129,000 messages. For our human agents, they would have to iterate repeatedly through chunks of messages that look like this. Its not meant to be human readable, mind you, so isolating the sentiment tag in each message (if it exists at all) is a non-trivial task. Thus we allocated about 15 seconds to each message for extraction and recording of sentiment (we were averaging a little more than this, when you included downloading the message chunk, finding the sentiment, then finding the next message id to jump to). That means processing the messages alone would have taken over 1.9 million seconds or about 22 days.
But that’s just half the battle. Now our agents have to count all the sentiment data and merge it with price data. In the end it would take around 2 months, give or take a few days. It took our script around 4 hours to run in total. In other words it would take a human over 5.1 million seconds to run the scan (and by human we mean one who never slept the whole time, or maybe a team of perfect data entry agents…) versus 14,400 for the computer. That’s around 360 times better.
In other words, for this particular task our script can do in 1 day what our human agent can do in a year. Its safe to say that in this context, programming can give one almost superhuman abilities.
There’s always a consistent stream of chatter about the coming labor apocalypse stemming from robots replacing humans. Far from it; right now we are witnessing the benefits from having a small minority coding literate individuals.
Its hard to imagine what would happen if each human learned to program: we could unlock tremendous gains in productivity for years to come. The flip-side is this: as coding literacy becomes mainstream, the difference between literates and illiterates will increase, especially in data heavy fields. We can see this in trading already: how many young kids out there really expect to land a Junior Trader job without knowing a bit of coding? None?
There was a time when very few people knew how to read. Widespread literacy enabled leaps in productivity that were unthinkable at the time. It its MKTSTK’s belief that we are living through a similar moment right now…
Interested in resources for Pairs Trading?
Want to learn how to mine social data sources like Google Trends, StockTwits, Twitter, and Estimize? Make sure to download our book Intro to Social Data for Traders