Statistically Improbable Words

We should create a service that analyzes and finds popular words per lawmaker that are statistically improbable on a given day, month, year, or term. Take for instance, "biologics" http://capitolwords.org/lawmaker/E000215/ At time of writing it turns out to be a frequently used word of Ana Eshoo's. What makes this interesting is, well, often times representatives use words like this because they've been supplied in talking points memos from lobbyists or outside influences. See, for instance, this: http://www.nytimes.com/2009/11/15/us/politics/15health.html A statistically improbable word (or better, phrases) list on some certain date ranges, compared with what's normal, seems pretty useful for finding out stuff like this fairly automatically.

Are you sure you want to lead this project?

If you want to lead a project, this means you're responsible for it. Please only choose to lead a project if you have the time to commit, the passion to lead it, and the skills to get it done. Projects that are inactive after 30 days may be removed.

Discussion

  1. Pete Skomoroch 11/16/2009 5:47 p.m. (permalink)

    Some good references on how to calculate this can be found at this link:

    http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html

  2. Pete Skomoroch 11/25/2009 9:23 p.m. (permalink)

    I'm planning on working this at the Cloudera SF Hackathon, Sunday Dec 13th.

    Start Time:
    Sunday, December 13, 2009 at 12:00pm End Time:
    Monday, December 14, 2009 at 12:00am

    Location:
    1409 Chapin Ave, Third Floor, Burlingame, CA

  3. Will Holcomb 12/02/2009 11:47 a.m. (permalink)

    Something like tf-idf (http://en.wikipedia.org/wiki/Tf%E2%80%93idf) could be useful as well, depending on the size of the dataset.

  4. occuttwef 09/01/2010 7:15 p.m. (permalink)

    new guys! stop the latest freeing casino games like roulette and slots !after demonstrate the all uncharted untied online casino games at the all current www.casinolasvegass.com, the most trusted online casinos on the entanglement! exhume our free casino software download and gain a victory in money. you can also validate other online casinos bonus . you should also check this Casino en ligne, Casino Online and casino en linea games. join the the largest online poker room. check this new paypal casino. Online Casino Spiele , buy acai berry . bondage casino . online casino games , Buy k2 and new online casino

What are Your Thoughts?

Have thoughts that might fuel this discussion further, post them below. (Markdown syntax is supported in comments.)

How Our Process Works

  1. Mess

    Disorganized government data needs your help.

  2. Ideas

    Ideas are born on how to make government data more useful and accessible.

  3. Community

    The community contributes to these ideas to make them reality.

  4. Transparency

    We're one step closer to government transparency.

  5. Learn More

Follow The Labs And See What We're Up To

  • Introducing the Open State Project API: http://bit.ly/9VseiO 10 states so far (5 are experimental), 37000+ bills, 1600+ legislators

1818 N Street NW, Suite 300
Washington, DC 20036
202.742.1520