SIC lookup by stock symbol

We wanted a quick (and free) way to lookup the SIC code for a stock symbol in python. You can do this manually with the SEC’s EDGAR website, but if your list is 100’s of stock symbols this can get old pretty quickly.

The following code makes use of Python’s BeautifulSoup package to extract the SIC code from the SEC’s website. To install Beautiful Soup on your CoCo, open a connection and type:

sudo apt-get install python-bs4

Then you can import Beautiful Soup in python with:

from bs4 import BeautifulSoup as bs

Then we can define a function to grab the SIC code:

def query_sic(symbol):
  url = 'https://www.sec.gov/cgi-bin/browse-edgar?CIK=' + symbol.upper()
  url += '&Find=Search&owner=exclude&action=getcompany'
  res = urllib2.urlopen(url)
  res = res.read()
  soup = bs(res)
  return int(soup.find_all('a')[9].contents[0])

No doubt there are more robust ways to do this; if the SEC add’s or subtracts a link this will need adjustment. Nevertheless we thought this quick and dirty method would provide value, as our own Google searches on the subject did not yield any useful results.

You could also extend this code to scrape the human readable name of the sector. Let’s redefine the function to accept an additional parameter, readable , which defaults to True and controls which format we want returned by the function:

def query_sic(symbol, readable = True):
  url = 'https://www.sec.gov/cgi-bin/browse-edgar?CIK=' + symbol.upper()
  url += '&Find=Search&owner=exclude&action=getcompany'
  res = urllib2.urlopen(url)
  res = res.read()
  soup = bs(res)
  if readable == True:
    return soup.p.text.split(' - ')[1].split('State location')[0]
  else:
    return int(soup.find_all('a')[9].contents[0])

Some examples:

symbol = "AAPL"
code = query_sic(symbol)

Which produces an output of “ELECTRONIC COMPUTERS”

Or you can download a list:

import pandas as pd

codes = []
symbols = ['AAPL', 'MSFT', 'CSCO', 'GILD', 'GE', 'V']
for symbol in symbols:
  codes.append(query_sic(symbol))

codes = pd.DataFrame(codes, index = symbols)

Lead image licensed under CC-BY-SA 2.5 from EvaK

Categories: Python, Quant Tools

Tagged as: , , ,

3 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s