Recent Posts

Don't Use Zip Codes Unless You Have To

Many of us in the labs found it thrilling to watch the internet community unite around opposition to the SOPA and PIPA bills yesterday. Even more gratifying was seeing how many participating websites used our APIs to help visitors find their elected representatives. This kind of use is exactly why we built those tools, and why we'll always make them freely available to anyone who wants to make government more accessible to its citizens.

Still, I'd be lying if I said we don't occasionally wince when we see someone using our services in a less-than-ideal way. It's completely understandable, mind you: the problem of figuring out who represents a given citizen is tougher than you might think. But we hate to think that anyone is getting bad information about which office to call -- talking to the people who represent you should be simple and easy! Since this comes up with some frequency, it's probably worth talking about the nature of these problems and how to avoid them.

TL;DR: Looking up congressional districts by zip code is inherently problematic. Our latitude/longitude-based API methods are much more accurate, and should be used whenever possible.

The first complication is probably obvious: zip codes and congressional districts aren't the same thing. A zip code can span more than one district (or even more than one state!), so if you want to support zip lookups for your users, you'll have to support cases where more than one matching district is returned. Our API accounts for this, but it's important that your code do so, too. We err on the side of returning inclusive results when a zip might belong to multiple congressional districts.

Unfortunately, things are actually more complicated than that. Most people don't realize it, but zip codes describe postal delivery routes -- the actual routes that mail carriers travel -- not geographically bounded areas. Zip codes are lines, in other words, while congressional districts are polygons. This means that mapping zips to congressional districts is an inherently imperfect process. The government uses something called a zip code tabulation area (ZCTA) to approximate the geographic footprint of a given zip as a polygon, and this is what we use to map zip codes to congressional districts. But it really is just an approximation -- it's far from perfect.

It's much better to skip the zip code step entirely and simply look up your location against the congressional district shapefiles published by the Census Bureau using a precise geographic coordinate pair instead of a hazy, vague zip code. Thanks to the Chicago Tribune News App Team's excellent Boundary Service project, we offer exactly this capability. If you can, we strongly encourage you to get a precise latitude/longitude pair from your users (either by geolocating them or geocoding their full address), then use it to determine their representatives.

"But what about house.gov's ZIP+4 congressional lookup tool?" I hear you asking. It's true, many House offices use this tool to determine who your representative is (and whether you're allowed to email them). Unfortunately, just because this tool is on an official site doesn't mean it's perfect. Here in the Labs, Kaitlin (who lives in Maryland) can't write her representative because the ZIP+4 tool gives incorrect results. Besides, not that many people know their full nine-digit ZIP+4 code.

So if you can, use latitude/longitude pairs. If you can't, and have to depend on zips, we'll supply results that are very, very good -- but not as good as real coordinates would allow.

Broadcasters' Public Files Should Be Published Online (and it's absurd that we're even having this conversation)

Luigi passed along a couple of links to a great/infuriating On the Media segment about the new rules the FCC is considering related to the online disclosure of political ad purchases.

To run through the issue quickly: every broadcast station is required to keep a "public file" of paper records related to campaign ad purchases. These records show basic information about how an ad was purchased, who bought it and when it aired. As the name implies, the file is available for public inspection, but only if you show up at the station and ask for it.

The FCC has proposed a rule that would require the public file to be posted online. We feel that this is an obvious and overdue step, and have submitted comments to the rulemaking saying as much. After all, it's 2012--it's absurd to claim that information is "public" if it isn't also online. And this information is particularly important: with Citizens United enabling a new flood of money into our political system--with less acountability!--keeping track of the ways in which wealth is deployed to move political opinion is more important than ever. The public file is a vital source of this kind of information.

The first OTM segment, which features Steven Waldman, does a good job of explaining all of this. The second one mostly just makes your blood boil. In it, Jack Goodman, a lobbyist for the National Association of Broadcasters, makes the case that posting the public file online would represent an onerous burden on broadcast stations.

Clearly, this is nonsense. As Waldman notes, Goodman is claiming that his would be "the first industry to use the internet to become less efficient." I've seen what the public file looks like. Yeah, there's a bunch of stuff in there, but obviously not too much to fax to the FCC once a day (or, preferably, enter into a modern electronic records-keeping system--perhaps one supplied by the FCC--instead of continuing to record everything on paper like it's 1970).

But forget for a moment how ridiculous Goodman's argument is. Consider how outrageous it is that he's even making it. This is one of the underappreciated pathologies that lobbying produces. If you're an organization like the NAB and you have a staff lobbyist, whenever an issue comes along--however minor--your lobbyist can be counted on to make a fuss about it. That's what they're paid to do, right? Here we have a disclosure burden that is basically the bureaucratic equivalent of your office manager announcing that expense reports have to be filed using a webform. Yet for some reason we're now having a national conversation about it.

It's absolutely dumbfounding to have an effort to make money in politics more transparent weighed against someone not wanting to use the fax machine. And yet here we are. That's the magic of the lobbying industry.

The FEC's New Mobile Site Could Use Some Work

screenshot of the new FEC mobile siteLast Friday the Federal Election Commission announced the launch of a new mobile interface. You should try it for yourself at http://fec.gov/mobile/. The site declares itself to be a beta, which I suspect you'll agree is something of an understatement.

Let's call a spade a spade: there's no use pretending this is good. To begin with, there are obvious superficial problems: graphs lack units, graphics have been resized in a lossy way, and the damn thing doesn't work on most Android devices.

Worse, there are substantive errors. Look at Herman Cain's cash on hand. Why are debts listed as a share of positive assets? Look at the Bachman campaign's receipts. Why is "total contributions"--which should reflect the entire pie--just a slice? (It's not 50% because other slices seem to have incorrectly counted overlap, too.) Why don't any of the line items below the graphs reflect the fact that some are components of others?

We asked the FEC for comment, but so far they've declined. Once the powers that be over there have a closer look, I'm confident they'll agree that the mobile site is a mess.

It's hard to know what to say about all of this. Part of Sunlight's mission is to encourage government agencies to embrace technology more fully. We don't want to send mixed messages by jumping down their throats when they actually try to do so. Sure, we gave FAPIIS a hard time, but that was because the site's creators were obviously and deliberately undermining the idea of public oversight. By contrast, I don't think anyone who worked on the FEC Mobile site intended to do a bad job.

And of course there's a fundamental question. Obviously the bits that are relaying incorrect information are a problem. But assuming those get fixed, is a half-hearted attempt like this better than nothing? I suppose there might be some poor, twisted soul who will enjoy listening to FEC meeting audio while they're at the gym (though frankly, if such a person existed I suspect they'd already be working here). But as a general matter it's difficult to imagine anyone needing a mobile interface to a set of campaign finance data that's as narrowly conceived as this one.

To their credit, it doesn't seem as if this mobile interface was created at the expense of the organization's much more important responsibility to publish data--a mission that, by and large, the FEC fulfills ably and with steadily increasing sophistication. There's always room for improvement, but the truly pressing needs, like reliable identifiers for contributors and meaningful enforcement of campaign finance law, are beyond the reach of the organization's technical staff.

Still, it's a bit amazing to see obviously wrong numbers attached to a product that Chairperson Bauerly has been quoted as endorsing appreciatively. Among those of us concerned about America's campaign finance system and the effect it has on our democracy, there is a sense that the FEC's leadership does not take its mission particularly seriously. The release of shoddy work like this mobile site does little to dispel that impression.

The data behind Capitol words

Last Monday we launched an update to our Capitol Words project, which indexes and tokenizes the Congressional Record daily. With the launch behind us and the dust starting to settle, I'd like to walk through how we get from raw text to attributed, searchable quotations, and provide some examples of how you can interact with the data directly.

Before delving into how it works, though, it's important to acknowledge the myriad developers whose work on this project has made it possible. I'm only the most recent steward of the site; the bulk of the data legwork for this iteration was handled by Aaron Bycoffe and Jessy Kate Schingler, and the web interface owes its beauty to Caitlin Weber and Ali Felski. Timball provided the hardware, and the list continues from contributions to the scrapers all the way back to the original conception and implementation of the idea by Josh Ruihley and Garrett Schure. It's the combined efforts of everyone involved that brought us the site that's available today.

Now, without further ado...

House Approves Sweeping Open Data Standards

At a Friday hearing, the House of Representatives significantly raised the bar on open data by passing a resolution requiring that a wide variety of crucial House legislative information be published online, in open formats, and at permanent predictable URLs. Daniel Schuman covered this on the Sunlight Foundation blog on Friday.

The new standards create a new central website, run by the Clerk of the House, that will host all House bills, resolutions, amendments, and conference reports. These documents will be online on January 1, 2012, and will be in XML.

Beyond that, the standards require committees to post their amendments, votes, hearing notices, which bills and resolutions they're considering, and lots of other documents. The Clerk is charged with building tools for committees to post this information to the new website; in the meantime, committees must post them to their own website, in PDF. Committees are also encouraged to post this information in XML, and "should expect XML formats to become mandatory in the future".

This is hugely valuable information that, to date, has been extremely difficult to discover in a reliable way. To get House legislation, one either needs to scrape THOMAS.gov (a Sisyphean ordeal), or to rely on the good work of people who've already done it. Committee information is terribly fragmented, and in some cases there is often no way to get it at all (such as committee votes and amendments), short of hiring people to go sit in committee rooms and record what goes on (a practice that forms the basis for a number of business models here in DC). This is the beginning of bringing much needed order to chaos, and sunlight to the legislative process.

These standards demonstrate excellent leadership on the part of the House, and offers a modern vision for how a legislative body should view its responsibilities to the public. The Senate should hear the sound of a gauntlet being thrown. The Committee's action is in keeping with Speaker Boehner's and Majority Leader Cantor's April call for the House Clerk to release legislative data in machine readable formats. It is very gratifying to see this call taken so seriously.

Name Standardization: Problems and a Solution

Name standardization, on its surface, would appear to be a primarily aesthetic problem (no pun intended). People's names can be listed "last, first" or "first last". Simple, right? Not exactly. When you're naming different things— people vs. organizations, for instance— and dealing with different ordering, capitalization styles, honorifics, suffixes, metadata or other additional info embedded in names (e.g. politicial party signifiers, company departments or locations), or just general cruft and typos, name standardization is a thorny problem. Add to that the fact that there are no universal identifiers for people or companies in many datasets, names rarely (if ever) come split into their constituent parts, and we are often expected to link data via little more than a name string, and you can see how relevant the issue is to the world of open government data.

Sunlight in ACM's XRDS

Those of you who were computer science majors in college may have belonged to your school’s student chapter of the ACM (Association for Computing Machinery). If you were a dues paying member, you likely received their quarterly magazine XRDS (called Crossroads when I was a student).

The latest issue of XRDS is themed around “CS in Service of Democracy”, and I’ve contributed an article about Sunlight Labs to the issue. If you’re able to get a copy, you’ll also find articles by friends of Sunlight like Josh Tauberer of GovTrack and POPVOX, and Harlan Yu and Stephen Schultze, who built RECAP.

My article is reprinted after the jump.

FederalRegister.gov Wins Innovation Award

Remember the inspiring story of FederalRegister.gov 2.0, and its humble beginnings as Apps For America finalist GovPulse.us? Well, the team behind the site has won another commendation, this time from ACUS:

According to its website, the Administrative Conference of the United States is an independent federal agency dedicated to improving the administrative process through consensus-driven applied research, providing nonpartisan expert advice and recommendations for improvement of federal agency procedures. In a writeup about FederalRegister.gov, ACUS describes some lessons learned that other agencies should take to heart:

  1. Make your data available in bulk so others can use it.
  2. Work with volunteers in the community and encourage them to develop new applications with your data.
  3. If the volunteers come up with something great, work with them and use those components on the government web site.
  4. Make the source code for the government web site open source so other agencies and other non-governmental organizations can make customized versions.

We at Sunlight Labs could not agree more. Congratulations to the team at FederalRegister.gov!

Labs Update: December 2011

It’s the most wonderful time of year… Montgomery County property tax payment time! It’s also the holidays, which are quite nice as well. Things are wrapping up here in the Labs before we head off for winter break. We have a lot going on right now and even more big plans for next year.

In tangentially related news, Scott Weiland released a holiday album. I can sense your blank stare from here… please don’t let it distract you from reading the rest of this post.

Influence Explorer

The Data Commons team has launched a redesign of Influence Explorer that greatly improves navigation on long, complex profile pages. As you scroll, the navigation bars stay with you so that you know which data set you are currently viewing and can jump between them quickly. The year selector also follows you so that you can easily switch to different year views.

Ryan and Lee have been working closely with Ethan to dig through the data stored in Influence Explorer. Interested in reading up on lobbyist bundling for the Super Committee? How about the political ties behind Zuccotti Park? Want to find out how lobbying can reduce your tax rate?

In addition to all this lovely work, the team has been acquiring more timely campaign contribution data from the FEC, exploring the federal regulatory process and upgrading the server infrastructure.

Open States Project

James and contributors have been knocking out the states, bringing us ever closer to 50 + DC. Kentucky, Oregon, Idaho, Arkansas and Nevada have all graduated from experimental status based on several months of stability. North Dakota and South Carolina were also recently added to the API.

James has been prepping the Boston Sunlight office, new home of the Open States Project. He just hired a new developer and has secured office space. I patiently await an invitation to the opening party.

Congress for Android

Eric released a major update to the Congress App for Android that includes a visual redesign and information on what’s coming up in the next couple of days on the floor of Congress. This is a really great release and Eric did a lot of great work on the new redesign. He’s got many plans for new features that will be included over the next year, so stayed tuned!

The section in which I post Chris' update verbatim

Chris wishes that there was a more eloquent and loquacious manner in which she could describe her continued work in the mobile game app and the 180 Project. Alas, these projects defy description as the day-to-day minutia of design eventually amounts to: move this there, rinse repeat. However, Chris is pleased to report that the completion of the 180 Project is in her sights, barring any timeline disrupting events. She is coding, thus all is well.

Team Sysadmin

Tim has been involved in the long and arduous process of upgrading our office network. As it currently stands, the new fiber connection is a frustrating 15 feet from the office. Tim can see it from the ceiling tiles above our server room, but it is caught up in insurance, contractor and building management turmoil. To ease his mind, he’s been configuring our new Juniper Junos EX-series switches. It’ll be like a cute little ISP here in Sunlight’s office!

Team C-Level Executive

Tom is freshly back from the TAI Bridging Session and News Foo. Aside from that he’s been working on filling our open positions and some end-of-year planning stuff.

Tidbits

  • Expanding on the Sunlight Labs Olympics, we’ll be participating in the Sunlight Foundation Olympics early next week. Results will be posted shortly thereafter!
  • We now have 40 instances running on Amazon EC2. I’m sure we know what’s on each of those boxes, right?
  • Drew has been lending a hand to reporting to keep their projects running while we search for someone to fill the open position.
  • Dan and Capitol Words. Soon. Promise.
  • Eric and Andrew begun a project on gathering the data to connect bills and laws to the regulatory process. This effort should yield lots of bulk data over the next month or two for the legal and legislative communities to use.
  • Kaitlin has updated the video endpoint in the Real Time Congress API to support some upcoming changes to our Roku apps.
  • Upwardly Mobile is coming together nicely. There will be lots of great things to show early in January.
  • Renaissance man Luigi Montanez authored How can software engineers help make government better? in the latest issue of the ACM’s XRDS (Crossroads) magazine.
  • The hottest Labs holiday gift this season is Well Dressed’s El Gordo burrito.

When working with raw meat for your holiday meals, remember: though Sunlight is said to be the best of disinfectants, bleach is better.

Sunlight at the International Open Data Hackathon

This past Saturday was the second annual International Open Data Hackathon, a globally coordinated day for people to gather and hack on open public data from the world's governments. As part of this, POPVOX hosted an Open Data event here in DC at the MLK Memorial Public Library.

Several Sunlighters showed up, and we had a pretty great time. Andrew and I came expecting to work alone on our project, an ambitious attempt to bridge the data gap between legislation and the regulations they generate, that we're tentatively titling Crosslaws. Instead, after we (and everyone else) described our project to the room at the start of the day, we had 6 people come to our table and ask how they could help - 5 of whom weren't developers at all.

Despite Andrew and I not having any obvious tasks to hand out, after we explained the finer points of the work, everyone figured out their own valuable research and development to do for the entire course of the day, from scholarly articles to actual parsing code. You can find some of our group's notes on the Crosslaws wiki, as well as an overview of what's left to be done (there's a lot!).

Drew and Daniel went to the hackathon to work on their statistical analysis of USASpending data, using Benford's Law. They were hoping to find a stats wizard to help rigorously test the findings, and while they weren't able to find one, their search was still fruitful. The project did attract interest from a handful of very thoughtful people, and they had a long discussion that helped refine the goals of the project. Drew was very thankful for that, as he came away from the hackathon better focused on a concrete goal. At the end of the day, they had the parser and downloader written, but weren't able to download enough data to test it thoroughly. You can find Drew's team's code on Github.

In general, it was a fantastic crop of people who showed up on a Saturday morning at the MLK Library, from awesome self-directed policy people, to talented folks from the DC and federal governments. My project got real momentum from it, and we'll be capitalizing on that momentum with more work over the next couple months. Given all that, the hackathon felt like a real success to me, and I'm looking forward to next year's.

Follow The Labs And See What We're Up To

1818 N Street NW, Suite 300
Washington, DC 20036
202.742.1520