How to download FOMC statements

Earlier in the week we charted the evolution of Fedspeak from Bernanke to Yellen. In the process, we needed a quick and easy way to grab the text from a collection of FOMC statements. To do this we made use of the Python library BeautifulSoup, which you might remember from our post on SIC code lookup. The following function can be used to extract the text from a given FOMC statement URL:

def get_paragraphs(url):
  res = urllib2.urlopen(url)
  res =
  soup = bs(res)
  statement = soup.find('div', attrs={'id': 'leftText'})
  paragraphs = statement.findAll('p')[1:-1]
  return paragraphs

For example, the URL for this week’s FOMC statement is:

url = ""

Thus we can download the paragraphs into a list using:

paragraphs = get_paragraphs(url)
print paragraphs[0].text.strip()

Which should output the first paragraph from Yellen’s latest statement:

u”Information received since the Federal Open Market Committee met in January suggests that economic growth has moderated somewhat. Labor market conditions have improved further, with strong job gains and a lower unemployment rate. A range of labor market indicators suggests that underutilization of labor resources continues to diminish. Household spending is rising moderately; declines in energy prices have boosted household purchasing power. Business fixed investment is advancing, while the recovery in the housing sector remains slow and export growth has weakened. Inflation has declined further below the Committee’s longer-run objective, largely reflecting declines in energy prices. Market-based measures of inflation compensation remain low; survey-based measures of longer-term inflation expectations have remained stable.”

Now let’s suppose we wanted to process each statement as a whole, extracting the top Collocated Phrases, which are words that live near one another more than expected by chance.

We can do this using Python’s nltk library. If you have a CoCo, this library is already pre-installed, otherwise you may install it with pip using the the following command:

pip install nltk

(Note, check out this post for instructions on installing BeautifulSoup)

Next we can import the required modules and get started:

from bs4 import BeautifulSoup as bs
import urllib2
import numpy as np
import nltk
from nltk import word_tokenize
from nltk.collocations import *

url = ""

def get_paragraphs(url):
  res = urllib2.urlopen(url)
  res =
  soup = bs(res)
  statement = soup.find('div', attrs={'id': 'leftText'})
  paragraphs = statement.findAll('p')[1:-1]
  return paragraphs

paragraphs = get_paragraphs(url)

speech = []
for paragraph in paragraphs:
  tokens = word_tokenize(paragraph.text.strip())

speech = np.concatenate(speech, axis = 0)
text = nltk.Text(speech)

This should output something like:

Building collocations list
labor market; federal funds; funds rate; target range; maximum
employment; agency mortgage-backed; economic activity; mortgage-backed
securities; policy accommodation; rate remains; inflation
expectations; market indicators; Committee expects; Committee judges;
market conditions; Committee continues

Now you have the basic building blocks to perform Natural Language Processing (NLP) on FOMC statements. In addition, modifying this code to scrape from the ECB or BOE websites wouldn’t take much additional effort.

Want to learn how to mine social data sources like Estimize, StockTwits, Twitter, and Google Trends? Make sure to download our new book Intro to Social Data for Traders by our very own Thomas Pendergrass

Enter your email address to follow this blog and receive notifications of new posts by email.


1 reply »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s