Everything We Know About Data.gov
- Written by
- Clay Johnson
- Date
- 05/22/2009 10:28 a.m.
Now that Data.gov's out, I thought I'd take a look under the hood and see what's in there, what's missing, and try and figure out what's coming.
First off searching through twitter for the phrase "Data.gov congratulations" I'm able to get enough evidence that hmiller23 and Jerad Speigel of the Phase One Consulting Group built the site. I asked them on Twitter, and they said "It Uses LAMP"
Right now the site is short on data. Federal CIOs: There are hundreds of us waiting to do interesting things with your data. Invest in putting it up on Data.gov now. You will be rewarded.
Right now the breakdown of the files looks like this:
In terms of number of datasets per agency, here's what we're looking at:
So the US Geological Survey represents roughly half the data (which also may be why the available datasets are in KML or ESRI).
That's the thing that really must change now-- and that's going to be what will determine the success of Data.gov. There's a lot of datasets that the federal government has that have not been included, big datasets like the FACA Database, the FARA Database, and what about OMB's own Federal Budget?
But that's not stopping us. Already-- in less than 24 hours, we have one entry to the contest. Go ahead and play FBI Fugitive Concentration!
Discussion
What are Your Thoughts?
Have thoughts that might fuel this discussion further, post them below. (Markdown syntax is supported in comments.)
These are very interesting - thanks for compiling.
It's disconcerting the large number of Shapefiles - considering it's not an open format, just a well reverse-engineered one.
However, the alternatives such as KML will have very large file sizes, and CSV's lack rich geometry for visualization.
We need to push formats such as SQLite as an alternative for compact, rich, open data formats and tools to support their use.
I don't understand your comment. Shapefiles can be easily loaded into Oracle and ESRi products. KML and CSV can easily store the same data as shapefiles. They are just simple formats. SQLite is a database product. Sort of comparing apples to toaster ovens.
So the next graph that is missing should be the raw data vs tools. Most of the data is in tool form. These tools don't allow developers to do anything. It's far from transparency. The only raw data is USGS, NWS, and NOAA. I sure the heck don't want to mashup earthquake patterns and how they effect the migration of birds!
Why isn't any of the USA Spending data in here!
RAW data is transparency. I can make a tool that makes bad data look good any day of the week.
i like alan howlett
OpenStreetMap is already using quite a bit of the US government's spatial data. I would say that it is a pretty darn good visualization of government data :).
Not to be nitpicky, but I'd like to hear more about how it uses LAMP. The HTTP responses identify the server as Microsoft-IIS/6.0 . I checked because I the URLs look RESTful and I wondered if it was built w/ Django. I was hoping to see some open technologies in the .gov domain.
The shapefile format has been published by ESRI since 1998. Get the spec at: http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
I think having the raw data is outstanding, but agree 100% that I'm more interested in program spending then the soil samples.
Lets hope they put lots more up in the coming months.
They also need to get the meta data more standardized (but then again, don't we all).
http://www.datamartist.com/datagov-looking-at-the-us-governments-data
This post inspired another look at Data.gov collection statistics by file size, in addition to agency and format. Using with Google Docs so it updates when new sets are added and so the source spreadsheet is available.
Hi Clay, great breakdown of information, hopefully they will keep adding information over the next 12 months.