Recent Posts

Meet the New Federal Register

screenshot of the new federal register siteIf you haven't already, be sure to check out the new federalregister.gov, which launched last night. For some of you, the site might bring to mind govpulse, one of the winners of our second Apps for America contest. That's no coincidence: GPO and NARA, the agencies responsible for maintaining the FR, sought out Andrew, Dave and Bob -- the folks behind govpulse -- and asked them to help build the new site.

As you can imagine, those of us at Sunlight are pretty excited about this. It's a great validation of the work of the Labs community, and a wonderful example of what's possible when government stays open to the transformative possibilities offered by technology.

Government Data and the Case for Not Running Me Over

Over the weekend I was clearing out my RSS, and was pleasantly surprised to find Sunlight's work in an unexpected place. TheWashCycle is my favorite DC bike blog, and its author has started a series of posts designed to address arguments that are commonly faced by cycling advocates. One of those is that cyclists don't pay for roads — that the gas tax pays for them — and consequently folks on bikes aren't entitled to the use of roads, or are less entitled to space on the road than motorists, or shouldn't have a say in how roads are built.

As it turns out, the assumption that cyclists don't pay for roads is wrong. The WashCycle post linked to some work that we did for Pew's Subsidyscope project, which shows that gas taxes are paying for a decreasing share of our roads. In 2007 taxes and fees related to auto use covered only half the bill. The shortfall is made up by general revenues and debt — and though the specifics of the story play out differently from state to state it's safe to say that cyclists pay taxes that help build roads.

Share of Highway Funds by Source

I mention all this not simply to highlight some pro-cyclist propaganda — though of course, as a daily bike commuter, I'm glad to do that, too — but rather to point this out as an example of what open government data can accomplish.

A Few Git Tips

This weekend I had the opportunity to attend Scott Chacon's Advanced Git class at Jumpstart Lab. Scott works for Github and maintains the Git project's website. He's also written a book, ProGit, and the handy reference site Git Reference.

Scott spent a good bit of time going over the fundamentals of Git--the different types of objects stored in its database and how they point to one another. I had seen all this before when I first started using Git, but I wasn't ready to really understand it then. If you've ever felt that Git was a bit mysterious or scary I'd highly recommend going over the basics again. Try this article and these two sections of Scott's book.

Here are some other useful tips I picked up:

The Health 2.0 Developer Challenge

the Health 2.0 Developer Challenge 2010 logoThe Health 2.0 Developer Challenge launched last week, and I've been embarrassingly remiss at mentioning it. Hopefully, many of you are already in the loop and excited about the project. Let me take a second and fill the rest of you in.

There are a lot of app contests and hackathons and dev challenges around these days. But I think this is one worth getting excited about, for three reasons.

Labs Olympics: Automate your life with geocron

geocron

Last month for the two-day internal app competition we had at Sunlight Labs, Jeremy, Kaitlin, and I built geocron. Jeremy had a specific problem that just needed to be solved. When reaching his Red line Metro station during the commute home, he'd have to physically take out his iPhone and send a text message to his wife, asking to be picked up. Surely, such actions can and should be automated, and that's where geocron comes in. By combining the Google Latitude API with old-fashioned cron jobs, we've created a utility that can send automated email, SMS, or webhook payloads depending on the time of day and the place you're located.

Grading the new USA.gov

USA.gov, the site that conveys official information and services about the U.S. government, just launched the new design of their website. Since we took a stab at redesigning it ourselves back in January of '09, we thought we'd see if they took any of our advice.

Guest Post: Calling All Phoenix Area Civic Hackers

Marc Chung is one of the organizers who helped make the Great American Hackathon a success, and is a friend of Sunlight. He's asked for a little space on the Labs blog to announce his new Phoenix-area open data group, and we're only too happy to oblige. Read on for the details.

I'm Marc Chung, a computer scientist who is passionate about bringing technologists together to improve our world.

Last year, I organized the Phoenix edition of the Great American Hackathon. That weekend a local gathering of developers decided to contribute time towards building a (parser)[http://sunlightlabs.com/blog/2009/hotness-arizona/] for the Arizona State Legislature. The work was done as part of the Fifty States project which supports organizations like MapLight and OpenCongress.

After the hackathon, I was contacted by several journalists and developers who were very excited by the work we did and just as eager to offer their assistance on future civic hacking initiatives. In the short time since GAH '09, we've been working with to extract useful information from public data in an effort to shed more light on how state governments work.

Combining the interests of these two groups was inevitable and so today, along with Mark Ng and Brian Shaler, I'd like to announce PhxData, a group to unite technologists in the Phoenix area who are engaged in data mining, parsing, visualization, etc. It also serves as a platform for journalists and government officials to connect with civic hackers who want to take public data and make it useful.

Check out our website: http://phxdata.org

If you're a data scientist, journalist, government official, statistician, developer or designer who would like to work on exploring data in the interest of pursuing greater government transparency for the state of Arizona, you should join this group.

Cole§law: Visualizing the US Legal Code

To take a break from the routine and our official projects, the Sunlight Labs organized an internal "labs olympics", in which teams would compete for outrageous prizes by building an extracurricular project. This installment brings you the contribution from "Team Intern".

The Team

As team intern, we felt we had something to prove. Could four unseasoned new recruits withstand the blazing glory of the veteran sunlighters? On the team were Charlie DeTar (from MIT, working at Sunlight Labs on Transparency Data), Dan Schneiderman (from RIT, working on the Fifty State Project), Michael Stephens (from RPI, also with the Fifty State Project) and Ryan Wold (consultant, working on the National Data Catalog).

The Process

We started off on Monday morning with a couple of vague ideas of what we might work on (Some sort of direct message/twitter bot for RSS feeds? Something to do with mapping?). We kicked it off with a brain storming session for a couple of hours, putting ideas on post-it notes, sorting them into categories, pruning, and we eventually settled on a "Legalese Translator" service: a wiki which lets people annotate legalese documents – such as Terms of Service and Privacy Policies – with more human-readable summaries, and eye-catching icons indicating major problem areas (such as the company asserting they can change the TOS at any time). We started poking around the MediaWiki codebase to see what it would take to do a few extensions to suit our needs. After spending a couple of hours on this, we started to second guess ourselves: would we be able to pull something off with this worthy of a demo? Challenges included coming up with a taxonomy of legal problems (none of us are lawyers), coming up with enough seed data to make the wiki work, and a realization that the vast majority of the work in a project like this would involve community management, expectation setting, and organization, none of which were particularly strong points in any of our expertise.

So, at 1pm on Monday with 1/4 of the alloted time already consumed, we shifted gears. Gathered around a whiteboard, we almost instantly converged on another topic: mapping the complex references in bodies of law. Legal code tends to refer to itself, often in noodley, snakey paths that are hard to traverse, and most of the laws were written before such a thing as "hypertext" existed. This stayed in our general topic area of "legalese", but gave us a much more finite and concrete objective: visualizing and navigating references in laws. We started exploring a few different bodies of law to choose one for the project, and settled on the US Code – a gargantuan body comprising more than 50 titles broken into more than 60,000 sections with a decidedly complex subsection hierarchy. To get started, we made use of Cornell University's XML translation of the code. For the rest of the day, we worked on importing the code into a relational database from which we could generate the reference hierarchies necessary for our navigation and visualization tools. And a name.... we needed a name. Since we were dealing with the law in a shredded and stringy form, we decided to call it "Coleslaw", or if you prefer, "Cole§law".

The US code is awfully complex. Among the 50 titles of the US Code, there are 168,000 references – including those within and between sections. Now on to the eye candy.

Elena's Inbox: How Not to Release Data

screenshot of elenasinbox.com

On Friday @BobBrigham tweeted a suggestion: put the just-released Elena Kagan email dump into a GMail-style interface. I thought this was a pretty cool idea, so I started hacking away at it over the weekend. You can see the finished results at elenasinbox.com.

I'm really pleased that people have found the site useful and interesting, but the truth is that a lot of the emails in the system are garbage: they're badly-formatted, duplicative or missing information. For instance, one of the most-visited pages on the site is the thread with the subject "Two G-rated Jewish jokes" -- understandably, given that it's the most potentially-scandalous-sounding subject line on the first page of results. Unfortunately, if you click through you'll see that there's no content in the messages.

The site was admittedly a bit rushed, but in this case it isn't the code that's to blame. If you go through the source PDF, you'll see that the content is missing there, too. It looks like it might have been redacted, but the format of the document is confusing enough that it's difficult to be sure.

But the source documents' problems go beyond ambiguous formatting. A lot of the junky content on the site comes from the junk it was built from -- there's not much we can do about it. To give you some idea of the problem, consider these strings:

Labs Olympics: Sunlight 2D

Recently, the Labs broke into teams and spent two days doing projects entirely of our own devising, given free rein. Our team consisted of two developers, a designer, and Sunlight's prized sysadmin. So for our project, we wanted to do something for the office, that blended software and design with the physical world. Inspired by some recent internal work in inventorying items using QR codes, we thought it'd be fun to make a system that lets Sunlighters print out QR codes for anything they wanted.

What people do with those codes is up to them - document internal events for posterity, lead coworkers on a scavenger hunt, plant jokes, write QR slam poetry, whatever. The design goal here was to make it dirt easy, through their computer's browser or their mobile phone, for a Sunlighter to print out a QR code with some text and/or a picture attached.

Follow The Labs And See What We're Up To

1818 N Street NW, Suite 300
Washington, DC 20036
202.742.1520