Looking At The NYTimes Congress API
- Written by
- James
- Date
- 01/08/2009 6:05 p.m.
Earlier this week the New York Times released their Congress API second politics API (following up on the release of their Campaign Finance API late last year). Here at Sunlight Labs we are always happy to see new APIs that wrap government data and there is definitely a lot to like here, although there are some things that will hopefully change to make the API more useful to the community at large.
The Good
There can never be too many ways to access this kind of government data, the more that are out there, the more they will be used, and ultimately that is one of our primary goals here at Sunlight Labs. There are a few really nice design decisions that are worth noting.
- The API includes methods not only for details on Members of Congress but is more focused on their roll call votes over time.
- The data goes back as far as any readily available sources go, this is a nice touch that makes it easier on people looking to do historical analysis of the behavior of Congress. (Data goes back to at least 1992 but as early as 1947 for Senate biographical data)
- The RESTful API is easy to use and experiment with from a browser. The four methods are easy to understand and don't require jumping through hoops. They also use standard HTTP error codes.
- They have chosen to use Bioguide IDs as their primary keys, which means that they haven't introduced yet another identifier for members of congress. [*]
- The data returned is reasonably minimal throughout most of the API, you get back pretty much what you ask for without a ton of irrelevant information to sort through.
The Bad
As nice as the new API is from a technical perspective, there are a few problems that a would-be user might have. These problems primarily stem from the Terms of Use
- 5000 requests per day limit seems a bit restrictive. (it is prominently noted that this is subject to change, but no indication whether this change would likely be in the upwards or downwards direction)
- Section 1.e.vi of the ToS states that you may not use the API for any service that competes with products or services offered by NYT. This comes across as overly broad as one can imagine that a lot of this vote information would be useful for organizations that can in some way be considered 'in competition with the NY Times.' (such as local news organizations).
- Section v of the attribution restrictions prohibits archiving of data for access by users "at any future date after you have finished using the service" which would prohibit something like building an application that used the NY Times as an initial source for votes but had no need to hit the Congress API regularly.
The Ugly
Ok, not much is really ugly about this API, as I said it is quite simple and elegant. At the time of this post however output is XML only and let's face it, XML is ugly. (I have hopes that this will change as the Campaign Finance API is JSON, XML, or Serialized PHP with JSON by default.)
The New York Times should be applauded for their effort in creating both this and the Campaign Finance API, hopefully the future holds more of this from them and we look forward to seeing what they release next. I hope that some of the more troublesome provisions can be revised or clarified as to make it beneficial to a wider audience.
| [*] | As the maintainer of the Sunlight Labs API (primarily focused on providing ID lookups for legislators) I cannot emphasize how much I appreciate new websites and services using one of the standard ids. |
Discussion
What are Your Thoughts?
Comments have been closed on this post.
Thanks for the nice write-up. Maybe we can offer some friendly collective pressure so that the NYT loosens up on the API TOS? I'll sign such a petition.
Thanks indeed for the kind words and thoughtful criticism. Andrei and Derek have worked long hours to get this in shape for the 111th, and there's a lot more to come.
Just to address a couple points you raised (and David, perhaps this will address your concerns too):
1) The 5,000 limit is relatively low, I agree. I think since we're new to this, we wanted to err on the low side to begin with and ramp up as demand increased. This was largely a technical consideration, since the APIs are on a new platform for us. If usage demands it, we'll re-examine the limits and bump them upwards. We can make exceptions on a case-by-case basis, so if this is a show-stopping limitation for you, drop me a note. We'll see what we can do.
2) I'm not a lawyer, but my understanding of the noncompete clause is that it refers specifically to financial competition. In other words, we wouldn't want someone taking our API, turning it into a product or service and then selling it. (At least not without talking to us first.) It has nothing at all to do with who can and cannot use the API, just how.
3) The "archiving data" clause is fairly boilerplate, and, frankly, pretty reasonable I think. My understanding is this restriction is intended to prevent users from systematically downloading all of the data, and storing it in perpetuity -- even after you stop using the API and/or your account is terminated. That isn't the intended use of the API, and from a data integrity standpoint makes perfect sense.
We also aren't alone in having a limitation like this in place. CRP's terms, for example, have roughly the same restriction.
4) Finally, there's the question of local media organizations using the API. The legalese here is a little dense, and perhaps we should make this clearer -- but we absolutely allow other news organizations to use our API. In fact, we specifically lifted the commercial use restriction with that in mind.
There's two reasons we did this: First, any lawyer will tell you there's no hard and fast rule about what constitutes commercial use. Technically, a blogger with Google ad words on his or her site could be considered a commercial entity. Secondly, The Times considers this API to be part of its journalistic mission, and wanted it to be as open as possible.
It seemed odd to us to release something that users like the DNC and RNC -- which are both noncommercial -- could take advantage of, but for-profits like TPMMuckraker could not. We felt the best solution was to drop that restriction entirely.
Anyway, I hope that addresses some of the issues you raised, and thanks again for the mention.
Yuck. Sorry about the comment blob above. That had line breaks when I posted it... I did forget to include my email, which is aron [at] nytimes.com.
Aron,
A couple questions:
"3) The "archiving data" clause is fairly boilerplate, and, frankly, pretty reasonable I think. My understanding is this restriction is intended to prevent users from systematically downloading all of the data, and storing it in perpetuity"
Is there data that you can really claim in this new API that is really your own? Isn't all of the data here coming from public sources? Why prevent people from caching it locally?
The API terms of services says "use the NYT APIs for any commercial purpose or in any product or service that competes with products or services offered by NYT."
The "or in any" actually nullifies the commercial categorization earlier in the sentence. This means that because Sunlight Labs has an API which could be construed to compete with the New York Times API, we can't use it to begin with!
Certainly, I know you welcome us to use it, but that isn't the point. The point is, the API licensing agreement you've got needs to be revisited and adjusted so that it is truly "open."
We're happy to help in any way that we can.
"We also aren't alone in having a limitation like this in place. CRP's terms, for example, have roughly the same restriction. "
Great, they set a precedent. ::bangs his head::
But, it's okay for the NYTimes API, since it's not like you couldn't find all that on GovTrack anyway... ::cough cough::
"XML is ugly" - What's so bad about XML? Depending on the usage JSON may be more convenient, however in most cases either one will require parsing by the application.