How to get free intraday stock data with Python

While high frequency traders need access to update by update market data, medium frequency algorithmic traders can usually get by with the less granular sort. Oftentimes, we spend a lot of money on data feeds only to realize we are downsampling the frequency for the purposes of historical analysis. While libraries like Pandas can give us access to daily historical data, getting intraday stock prices can be a bit trickier. That’s why we wrote this script that lets you download up to 15 trading days worth of intraday stock data from Google Finance (not real-time, 30 minutes delayed, but its great for quick and dirty backtesting).

Make sure you have NumPy and Pandas installed first, you can do this using pip:

pip install numpy pandas

Then copy and run the following script:

import pandas as pd
import numpy as np
import urllib2
import datetime as dt
import matplotlib.pyplot as plt

def get_google_data(symbol, period, window):
 url_root = 'http://www.google.com/finance/getprices?i='
 url_root += str(period) + '&p=' + str(window)
 url_root += 'd&f=d,o,h,l,c,v&df=cpct&q=' + symbol
 response = urllib2.urlopen(url_root)
 data = response.read().split('\n')
 #actual data starts at index = 7
 #first line contains full timestamp,
 #every other line is offset of period from timestamp
 parsed_data = []
 anchor_stamp = ''
 end = len(data)
 for i in range(7, end):
 cdata = data[i].split(',')
 if 'a' in cdata[0]:
 #first one record anchor timestamp
 anchor_stamp = cdata[0].replace('a', '')
 cts = int(anchor_stamp)
 else:
 try:
 coffset = int(cdata[0])
 cts = int(anchor_stamp) + (coffset * period)
 parsed_data.append((dt.datetime.fromtimestamp(float(cts)), float(cdata[1]), float(cdata[2]), float(cdata[3]), float(cdata[4]), float(cdata[5])))
 except:
 pass # for time zone offsets thrown into data
 df = pd.DataFrame(parsed_data)
 df.columns = ['ts', 'o', 'h', 'l', 'c', 'v']
 df.index = df.ts
 del df['ts']
 return df

 

Now we can use get_google_data(symbol, period_in_seconds, number_of_days). For example, suppose we want 5 minute intraday data for SPY for the last 2 weeks (10 trading days):

spy = get_google_data(‘SPY’, 300, 10)

Which returns a dataframe with OHLC data for each 5 minute bar, plus it returns volume (interestingly, today’s volume returns 0, but by the next day and its updated)

An easy extension of this functionality is to look at multiple intraday stock prices. Suppose we want to visualize the correlation structure of the Nasdaq 100 components. We could use the get_google_data function in a loop like so:

from pandas_datareader import data as web
import datetime as dt

data = pd.read_csv("https://s3.amazonaws.com/static.quandl.com/tickers/nasdaq100.csv")

start = dt.datetime(2016, 3, 7)
end = dt.datetime(2017, 3, 7)

volume = []
closes = []
good_tickers = []
for ticker in data['ticker'].values.tolist():
 print ticker,
 try:
 vdata = get_google_data(ticker, 60, 10)
 cdata = vdata[['c']]
 closes.append(cdata)
 vdata = vdata[['v']]
 volume.append(vdata)
 good_tickers.append(ticker)
 except:
 print "x",

closes = pd.concat(closes, axis = 1)
closes.columns = good_tickers

diffs = np.log(closes).diff().dropna(axis = 0, how = "all").dropna(axis = 1, how = "any")
diffs.head()

The diffs dataframe contains the log differences for the Nasdaq 100 components we could download. Now we can visualize the correlation for these symbols using a Jupyter Notebook and slicematrixIO-python.

First let’s create an Isomap which will learn the underlying structure of the market and embed a network graph in 2 dimensions so we can visualize the complex relationships between individual stock symbols:


from slicematrixIO import SliceMatrix
sm = SliceMatrix(api_key, region = "us-west-1")

iso = sm.Isomap(dataset = diffs, D = 2, K = 6)

Now we can visualize the resulting network using the notebook module:

from slicematrixIO.notebook import GraphEngine
viz = GraphEngine(sm)

viz.init_style()
viz.init_data()

viz.drawNetworkGraph(iso, height = 500, min_node_size = 10, charge = -250, color_map = "Heat", graph_style = "dark", label_color = "rgba(255, 255, 255, 0.8)")

Which will produce the following interactive graph in the notebook:

network.png

For a full working example of this visualization check out the notebook:

Manifold Learning and Visualization: Nasdaq 100


Interested in Pairs Trading?

Check out SliceMatrix: a unique tool for visualizing the stock market, including views of filtered correlation networks and minimum spanning trees

workflow_hedges_dark


Want to learn how to mine social data sources like Google Trends, StockTwits, Twitter, and Estimize? Make sure to download our book Intro to Social Data for Traders

9 replies »

  1. Finally, I’ve downloaded and installed the necessary modules and wrote the code as presented, with some adjustment to the urllib2.urlopen changed to urllib.request.urlopen as I am using Python 3.4 and I got this error when I entered spy = get_google_data(‘SPY’, 300, 10):

    Traceback (most recent call last):
    File “”, line 1, in
    spy = get_google_data(‘SPY’, 300, 10)
    File “”, line 6, in get_google_data
    data = response.read().split(‘\n’)
    TypeError: Type str doesn’t support the buffer API

    What is wrong?

    Like

    • not sure yet right now, we use 2.7, what’s changed in urllib2 in 3.4? looks like response is already a string from that error, maybe the .read() has become redundant? have the function return the response var and see what it looks like

      Like

    • First, thanks mktstk for the script.
      Now, to make it work with python 3, it’s necessary to make some modifications:

      import urllib

      python 3 unified urllib2 and 1 into a single library

      response = urllib.request.urlopen(url_root)

      urlopen is now under ‘request’ sub library

      data = response.read().split(b’\n’)

      In python 3 you need to specify a ‘b’ before any input if you’re using ‘bytes’ data, so:

      for i in range(7, len(data)-1):
      cdata = data[i].split(b’,’)
      if b’a’ in cdata[0]:
      #first one record anchor timestamp
      anchor_stamp = cdata[0].replace(b’a’, b”)
      cts = int(anchor_stamp)
      else:
      coffset = int(cdata[0])
      cts = int(anchor_stamp) + (coffset * period)
      parsed_data.append((dt.datetime.fromtimestamp(float(cts)), float(cdata[1]), float(cdata[2]), float(cdata[3]), float(cdata[4]), float(cdata[5])))

      Now it works on Python 3!

      Anyway, thanks again for the script!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s