Data Commons

A repository of combined government transparency data sets. Our initial release will combine state and federal campaign contribution data from NIMSP and CRP.

Looking at the various political data we have available to us, almost all data can be represented as transactional records between entities:

  • A gave $2000 to X's campaign
  • A paid lobbyist B to meet with X
  • X requested earmark for $1 million to A
  • Agency D awarded a contract to A for $6 million

The biggest challenge is not reconciling the transactions, but matching the transaction participants across the data. Each data set has it's own representation for entities; they usually have different IDs and different names. We need a way to look at two data sets and decide that entity A from the CRP data is the same organization as entity Z from NIMSP data. Additionally, we need to keep track of any attributes, such as the original CRP and NIMSP IDs, that each entity contains.

Our first tool, Matchbox, allows us to load and store entities from each data set. We can then merge records that are deemed to represent the same entity. You can interact with Matchbox using the included Python module or by calling the basic API over HTTP.

We will be adding additional features to Matchbox over the next several months including name standardization algorithms, importers/exporters, and a web-based administration interface.

Project Participants

Follow The Labs And See What We're Up To

  • @JackieKazil appreciate the offer, but I think we're all set! tho if meetup members want to start wearing lab goggles, that might be awesome

  • this is the kind of excitement you miss when you don't come to django-district http://t.co/ofwoGWJX (via @jcarbaugh, who survived)

1818 N Street NW, Suite 300
Washington, DC 20036
202.742.1520