Recovery.gov's Systemic Failure
- Written by
- Clay Johnson
- Date
- 11/10/2009 1:55 p.m.
The new Recovery.gov-- which we've written about and even nearly bid on-- has certainly taken the government huge steps forward in terms of disclosing information, but it is not without controversy. The press is questioning the program, pointing to wasteful spending or bad data. The White House fired back with a "reality check"(their words) saying that few of the reports have gone through the "extensive three-week review" and that the data might be particularly misleading at this point.
In short what’s happened is that the Recovery Accountability and Transparency Board has launched a website, asked citizens to report waste fraud and abuse on that website, and filled it full of data that they knew was either questionable or blatantly inaccurate. This doesn’t sound productive for either the recipients of the funds, the government or the citizens seeking to monitor this spending. What’s the point of reporting waste, fraud or abuse if none of the data is correct? Some of the more glaring errors are now being corrected but at significant political cost to the administration. Unfortunately, we're still not confident that the data will be good enough for the public to meaningfully contribute to the search for waste or fraud.
It's great that the government is taking this review seriously. When Sunlight met with the Recovery Accountability and Transparency Board (as part of our work with the Coalition for an Accountable Recovery) at the beginning of the fall, they said outright that they were going to work on "data integrity" but not "data quality" -- that would be left up to the reporting recipients and agencies. We were left worried that nobody would ultimately take responsibility for what was reported. It sounds like the RAT Board has realized that it isn't enough to publish bad data and call their job done.
No amount of technology can improve the quality of data issue if there’s thousands of people inserting data. You’re going to get mis-reported data, erroneous data, and yes, waste, fraud and abuse. What’s worse, it’ll all look the same.
In the short term, there are systemic things that the Recovery board can do to highlight the problem and work toward solving it:
- Instead of couching the data posted as “not being entirely accurate,” make that the point— get in front of it and put the data in a “public review” category. Deliberately ask the public for feedback and to point out the data in there could be inaccurate. The only way to solve the problem today is to have a different set of thousands of other people reviewing the data.
- Make a “report bad data” button on the website that makes it easier for the people auditing the data at various agencies to see where data errors should happen. Instead of publishing the data directly, ask citizens and the media to spot bad data and report it back in.
- After that data has been reported back in, require the reporters of the data (or at least the agencies who are responsible for it) to review the data -- and add consequences for those who fail to do so, or who approve data that's later shown to be faulty (this is likely beyond what the RAT Board can do -- at some point OMB needs to get serious about this problem).
- Keep an agency-by-agency tally of bad-data reports and make it public and in real time. Report this as a percentage of data reported so there's no incentive to report no data to game the system.
The point is to seal the feedback loop and make it so the consequences of reporting bad data— whether it be intentional or otherwise, is the creation of more cost on the recipient of the funds. While human disclosure may be a burden, erroneous disclosure should trigger a heavier burden to improve accuracy.
I commend the government from publishing data and giving people early access to it. Great work— the sooner data can get out to the public, the better. Even if that means the data is inaccurate to start out with. But the lesson learned is, you have to be over explicit about the accuracy of the data, and invite people to not only report waste fraud and abuse but also data that is less than perfect. Solid data doesn't happen when only part of the chain is responsible for the accuracy of the data, but the whole chain is responsible for it. Including the people consuming it.
Discussion
What are Your Thoughts?
Have thoughts that might fuel this discussion further, post them below. (Markdown syntax is supported in comments.)
Closing the feedback loop is a very important principle. With the judiciary, we've repeatedly asked that they establish a Chief Privacy Officer position, a Best Current Practice in most modern corporations and government agencies. That way, if the PACER system happens to inadvertently publish personal information, there is a mechanism for the parties affected to notify the courts.
A point of contact, "report bad data" buttons, and other mechanisms to make publishing a two-way street are a core principle that all agencies should pay attention to, particularly those like those that run Recovery.Gov with the stupendous sums of money they are responsible for.
I agree that the Recovery.gov folks should provide an easy way for people to report "bad data". I wish more emphasis had been placed on highlighting what is known and what is not known. For example, recovery.gov should have something like ProPublica's Stimulus Progress Bar (http://projects.propublica.org/tables/stimulus-spending-progress), with all the details made available so that we can verify all the totals. Instead, ProPublica's staff has to work a fair amount of their investigative journalism mojo to compute those totals and keep them up to date.
I think the real acid test of all government data is whether reporters are satisfied with its accuracy, and clearly reporters are finding lots of problems with Recovery.gov data. But outside of relatively simple screw ups that we ink-stained wretches are used to finding in government data (some data entered in the wrong fields, transposed zip codes, etc.) the biggest problems so far have been with the job numbers and the way State Fiscal Stabilization Fund numbers have been reported.
I think the Associated Press called the "Jobs Saved" metric murky math months ago, and even some recipients of stimulus funds get that -- like Hastings, Neb. (see details here: http://bit.ly/1NZKfE)
"...according to the formulas provided by DOE the replacement of street lamps will create 2.29 jobs. The replacement of traffic signal bulbs will create .31 jobs. Actually no new employees will be hired, we simply applied the DOE formulas to arrive at these numbers."
Another example of this was found in California, where the state was counting as "saved jobs" those of tenured professors--who by definition can't be fired (the Financial Times unearthed that tidbit: http://bit.ly/43OAdP).
I don't think the Recovery Accountability and Transparency Board is to blame for the bad job numbers -- as I recall, it's the White House that wanted those numbers, and came up with the formula for calculating them in a way that can produce 2.6 jobs even though no one has been hired, or 900-some jobs saved or created at places that employ only 500-some.
As for the State Fiscal Stabilization Fund numbers -- this was the money shipped out to states to close the holes in their budgets caused by declining tax revenues in the wake of 2008's economic meltdown. In California, the money is paying for, among other things, teacher salaries and prison guard salaries all over the state (I may be mistaken, but these are the two biggest "job creator/saver" projects in the stimulus data), but it's reported as all being spent in Sacramento -- as if Sacramento is home to 50,000 some teachers and 18,000 some prison guards. (See how we get back to jobs and job numbers?)
I'm a fan of points #3 and 4. It's essential to attribute the data (good and bad) to whoever reported it. Reporting agencies won't take their task seriously without the threat of embarrassment (too cynical?).
Bill's point about my home state fudging numbers with tenured professors makes me think that somebody—perhaps the RAT—will need to start creating something like "data style guides" for reporting agencies to set clearer expectations and add another layer of accountability.
FYI, Dave McClure touches on data quality in the latest Dotgov Buzz, acknowledging that it's a "recurring problem in government:" http://www.usa.gov/dotgovbuzz/1009.html#dave
Agreed to all of this. Also, error discovery is also something that may happen away from Recovery.gov. Other sites may present the same data with different interfaces and visualizations that may make some errors more apparent than on Recovery.gov itself.
It would be important to have a service interface for Recovery.gov that would enable error flags to come in from a variety of sources. We discuss this in our recommendations for Recovery.gov web services (http://escholarship.org/uc/item/0fv601z8)