Open Data Day 2013
Here is the final contents of the Open Data Day DC 2013 event hackpad, originally at https://hackpad.com/OpenDataDay-2013-DC-XOyEe1yIUK2.
OpenDataDay 2013 DC
Paste your links here that we’ll show as you summarize what your groupdata did during the day, e.g.: visualizations, github repos, screenshots of the work you did, etc.
My Group Name - http://www.example.org/your_link_to_something
OpenData DevOps - http://github.com/OpenDataDevOps/minus
- Created baseline CentOS 6.3 virtual machine and began building Puppet manifests
- Planning for 3 flavors eventually:
- Researchers (external to organization)
- Researchers/Analysts (within organization)
- VM to contain all tools necessary to get started immediately with open data analysis or development -- should be a minimal setup process (3 steps?)
- Also future plans to generate OpenData DevOps community to continue effort in this direction
- Use the instructions in the README file on Github (above) to try out our baseline image!
#OpenDCZoning: Density + Transit - map: http://bit.ly/13dCJhb (@andrewsalzberg + @b__k)
- google doc (simplified zoning sheet): http://bit.ly/132EoGG
DC Public School Data
- piles of data to clean and use: https://docs.google.com/spreadsheet/ccc?key=0AnaXKp9bt6OXdFY1Y0hiZ3hJNmdla3FQbmlSaGZxZkE&usp=sharing#gid=0
- simple enrollment graph: http://i.imgur.com/5qxNdhg.jpg
- mapped Accountability Categories: http://imgur.com/5G0blO7
- relationship between Accountability Categories and school neighborhood walkability: http://imgur.com/SnnApCa
- We learned a tremendous amount about the current state of school data, and the possibility of integrating data sources of eclectic sorts in a future tool for parents.
- participants: Harlan Harris, Sandra Moscoso, Tom Shen, Aaron Schumacher, Alex Cruz, Jami Ansell
Extraction of PACER legal citations
- Git repo: https://github.com/dvogel/pacer-recap-citations
- Downloads PDFs
- Converts PDFs to PGMs
- OCRs the PGMs
- Next steps:
- OCR tuning
- Extract legal citations using https://github.com/unitedstates/citation
Open 211 / 990 : District Commons data pool - Open211DataCommons
- Cleaned and merged data from three different social service directories (2-1-1, Bread for the City, the BRIDGE Project).
- Merged this clean file with a database including Federal 990 data of every non-profit in DC (via Urban Institute)
- Geocoded the data!
- Next steps:
- stand file up in an Open Data Catalog (which can also host other local data sets) - 80% ready
- prepare tagsonomy-coding interface + instructions for service providers and other community members, so that we can organize the data
- then, import data into apps such as local wiki: www.districtcommons.org and Open211
- Establish cooperative data management plan
- To work on this project in the future, join : https://groups.google.com/group/districtcommons/subscribe
Street Tree Species Mapping // DC
Millienium Challenge Corporation
IPython notebooks - Curriculum building using open data sets
Final Project link here: https://notebook-quiyq.picloud.com/b20db244-6e40-4962-bb9d-8a75daa60935 (pw: ju88al0)
Picloud notebook to visualize global earthquake data as a map. There were a few speedbumps with mpl toolkits not loading in the picloud basemap. But setting up an environment in linux helped with this. Ideally, we would be able to create a short suite of excercises in python for an introductory curriculum that could allow for interactive programmin on the same template. The example is a map, but ideally we could also create time series of perhaps USDA crop data over time, a network graph node/edge diagram relating NGOs to relevant data that they collect and provide, and then a scraper in python to scrape data (maybe nutrition data from the USDA) create a full suite of short excercises under a food security or relevant climate conditions theme.
Screencaps + random data (sorry) here: https://www.dropbox.com/sh/e9obo2lur4otgad/yWP2ZIBRmR
Proposal for new Open Data community on StackExchange
Here is the proposal I was talking about - http://area51.stackexchange.com/proposals/51674/open-data
We need folks interested in open data to sign up and follow proposal as well as suggest questions that community may address as well as vote up/down questions that are already proposed.
This new community may serve as a neutral ground where open data geeeks and data owners/experts can meet to talk about data and tools to better use the data that is already available on one of the open data catalogs such as data.gov and data.gov.uk. As well as starting point to discover new datasets inside government and help it to liberate them for public good.
this page: http://bit.ly/opendatadc
eventbrite page: http://opendataday2013dc.eventbrite.com/
hashtag: #opendataday #dc
organizers: Josh Tauberer (GovTrack.us @JoshData), Eric Mill (Sunlight Foundation @konklone), Katherine Townsend (USAID @DiploKat), Dmitry Kachaev (Presidential Innovation Fellow/Millennium Challenge Corporation @kachok), Sam Lee (@OpenNotion &The World Bank ), and Julia Bezgacheva (The World Bank @ulkins). You can reach Josh at email@example.com.
event theme song: http://www.youtube.com/watch?v=4Fyomh-Y3fg
international info: http://opendataday.org/ and http://opendataday.org/map/
wifi: GUEST / password: nehokjiok7
Attendees, feel free to edit this section and add other logistical details that you want to share with others.
Here’s what we’re working on this weekend. Add your project once you get started. At the end of the day, add your name next to the project you worked on and add any links to project "outputs" so we all can find it again later.
Introductory Open Data Workshop -
- Intro to Open Data Workshop
- get your hands dirty with open data, without having to code. [Also see http://eaves.ca/2013/02/20/three-ways-anyone-can-contributLogistics:e-to-open-data-day/]
- Measuring poverty using data - let’s find data sources to help us track poverty.
- Watching for corruption - money that’s getting lost, what data can we use to detect this and to root out corruption?
- If you like researching data, exploring it, finding out if it’s viable.
http://dc.opendataday.org/p/datakind - our hackpad
- DataKind - hackpad for existing dataset
Sam Lee , World Bank -
- Contract/Procurement Visualization Sprint (Visualizing.org) - http://www.visualizing.org/sprint/global-development-sprint
- Daniel Schuman - Identify Related Legislation
- Goal: Similar to the "Discover Bill Pedigree" identified above, it would be helpful to identify similar legislation introduced by different authors in the same congress, or in different congress. Any bills above a particular threshhold of similarity (95%?) would be identified as virtually identical or related pieces of legislation.
- Pad Page: Not sure if it makes sense to create a new pad page or add to the Bill Pedigree. Leaving it blank for now -- feel free to create one.
- Why does this matter?
- Often times, a hearing will be held on a bill, but another bill with the same text will be the one that gets voted on. It’s important to be able to connect the hearing to the bill, as the mark-up hearing will explain the bill’s contents.
- Often times, multiple versions of the same bill will be introduced. This is often evidence of lobbyists at work or of people introducing "message" bills. Being able to identify when this happens would allow reports and advocates to trace the influence of money in politics and when Members are just trying to send a message.
- Often times, a bill that is considered in one congress (the 113th) went through the entire process in an earlier congress (the 112th, for example.) Unless you happen to know that, there’s no way you’d know there was a debate, markup, hearing, and discussion on the previous iteration. It’s helpful to connect the two so that context is easily discoverabe.
- Happy to chat about this. @danielschuman. For more explanation, see: http://sunlightfoundation.com/blog/2010/07/29/apps-for-thomas-3-wishes/
- Cato, Jim Harper - Discover Bill Pedigree
- Contact info: firstname.lastname@example.org & @jim_harper
- We’re marking up bills in Congress with semantically rich XML. (Take a look: http://deepbills.dancingmammoth.com/download) But many bills repeat the same text - sometimes bills are identical, sometimes one bill incorporates another. We need to find identity between texts in bills and copy our markup from one to the other.
- I (Chad Robinson) was helping out with this project. I am interested in continuing to participate on this project as time allows. @MrChadRobinson on twitter, mr dot chad dot robinson @ gmail.com.
- Devise an algorithm to easily discover when "pieces" of bills are lifted and reused in other bills, and which bills they came from.
- (And some kind of structured way of expressing "this paragraph in this bill is from this other paragra ph in this other bill".)
- Finally find out what bill a paragraph in an Omnibus bill came from!
- I (Alexander Furnas) was helping on this part. I have some workable code, but need to throw it on a more powerful box. I am interested in continuing to work on the projet. @AlexanderFurnas on twitter and email@example.com
- Hopefully, use this information to reuse semantic information we know about one of those pieces and apply it to the others.
- This "semantic information" is inline "CatoXML" markup which tags information like references to agencies, acts, U.S. Code and other legislation, and appropriation years and dollar amounts.
- Since right now this is laboriously human-tagged information and there is a surprising amount of duplication among bills (e.g. House bills reintroduced in the Senate, Omnibus Bills, etc), being able to "reuse" semantic information reliably would increase the amount of inline semantic data available in legislation and decrease the amount of time needed to produce it.
- s(See also "identify related legislation" listed below
- Map Kathmandu on OpenStreetMap #mapktm
- We map Kathmandu, Nepal on OpenStreetMap as part of a broader Open Data for Resilience Initiative
- Announcement post http://mapbox.com/blog/open-data-day-mapping-kathmandu/
- Instructions http://wiki.openstreetmap.org/wiki/Open_Data_Day_2013_Kathmandu_Mapping
- Look for @crysb, @lxbarth, @nas_smith or @ian_villeda for an introduction
- Participants, put your info here: Andrew Wiseman (USAID), Amy Noreuil (USAID), Chad Blevins (USAID), Holly Krambeck (World Bank), Gaurav Tiwari (@gnstiwari, Syracuse University).
- Colored areas are today’s edits in Kathmandu on OpenStreetMap by team Nepal and team DC
Greg Elin, FCC -
- Provisioning computers to do the work is a huge problem. Let’s learn how to take devops practices and package data together with virtual machines with vagrant, and do it.
DemocracyMap - http://api.demofcracymap.org/#get-involved
Phil Ashlock - @philipashlock
- Creating a map of representation at every level of government. The biggest challenge is at the local level, where the variety of representation is intense. We’ll be scraping, identifying, and collecting data.
Open 211 / Local Data Commons - Greg Bloom
- Background in this brief memo. What health+social services are available to people in need? DC doesn’t have a master source of information about this. We’re going to take old 2-1-1 data, new federal government data, and local agencies’ own directories, clean and merge them, and stand them up for apps like Open211.
- Consolidate and clean data
- Complete a CSV -> API importer (Matt Bailey (felbot) says we’re "80%" on this)
- Create an API for the data
- Make a simple interface to enable laypeople categorize data
Symetri Health - Jon Glassman & Sumit Sood
- Working with immense health datasets to predict someone’s chances of being diagnosed with cancer. Interfaces, data, visualizations.
- It was great to meet everyone! Please reach out to us: firstname.lastname@example.org & email@example.com
Girl Development -
- Teaches girls how to code. Would like to make some ipython notebooks using PyCloud, to help people program with Python without having to open a terminal. Wolfram Alpha-style visualizations of data.
Thomas Levine (thomaslevine.com)
- Pick your brains about federal buildings in DC - I have a study I want to run about them, and need to find a great candidate.
MCC Data Effort
Chantale Wong - Millennium Challenge Corporation -
- Visualizing performance data on our projects, our $9B investments over 20 countries. How do we visualize this?
- Here is data:
APIs based on above data:
Application based on KPI data:
Andrew from DC! -
- Open DC Zoning Data (@andrewsalzberg)
- If you’re interested in: planning, land use, the future of your city - help open up and visualize DC’s (and the DC metro area) zoning code:
- see description of the project here: http://andrewsalzberg.wordpress.com/2013/02/22/lets-build-an-open-zoning-data-standard/
- First step: create simple zoning key based on http://secondcityzoning.org/zones . Google doc for this is starting up here: https://docs.google.com/spreadsheet/ccc?key=0AkOnFG7c1TBDdFlielp6TElNd0FuRWpjYVBGYmRLZmc#gid=0
- Next step: make visualizations of land use/density in DC along these lines: http://tiles.mapbox.com/asalzberg/map/Chicago_Zoning_Transit#13.00/41.8872/-87.6354
- For bonus points: Actually adopt the entire Chicago Second City Zoning project: http://secondcityzoning.org/about#code on github. [This is way beyond my (@andrewsalzberg) abilities - help welcome!]
- Civic Commons Wiki entry here: http://wiki.civiccommons.org/Land_Use_%26_Zoning
Justin Grimes - Visualizing DC budget data!
Dmitry Kachaev - Make a better Data.gov today.
Let’s create better platform for data.gov (e.g. API) that we can use. Check out alpha.data.gov for inspiration.
- Finding vacant and blighted properties in DC and putting them to use. Especially affects the poor. Integrating a number of datasets and anlyzing them to discover evidence of fraud that will help the city prioritize resources. Scrape, sort, expose data - integrate it with mapping software (GIS).
Harlan Harris - Code for DC -
- DC School Decisions
- An ongoing project of Code for DC: http://www.codefordc.org/. To do something interesting and valuable for DC parents and students to help them better navigate school lotteries and decisions. We are collecting data about schools, and how decisions are made about schools and lotteries, with the goal of aggregating open data, collecting new data, and eventually/hopefully building decision-support tools.
- Data ideas spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AnaXKp9bt6OXdFY1Y0hiZ3hJNmdla3FQbmlSaGZxZkE&usp=sharing
- Contacts: Sanda Moscoso-Mills (@sandramoscoso), Harlan Harris (@harlanh)
- Team: Harlan Harris (@harlanh), Sanda Moscoso-Mills (@sandramoscoso), Aaron Schumacher (
@planarrowspace), Jami Ansell, Tom Shen, Alex Cruz (Computer Science student from Puerto Rico by way of internship in Baltimore at DoD)
- Need: data scraping & cleaning, database setup, data visualization. Thanks!!!
- End result for project: rich dataset about individual schools
Drew Vogel -
- Parse dates and legal citations from RECAP documents, creating a database useful for investigating the claim that federal prosecutors have been relying on increasing complexity to secure convictions/pleas.
- Background: http://www.forbes.com/sites/harveysilverglate/2013/01/03/black-whitey-how-the-feds-disable-criminal-defense/
- Project needs:
- Write a scraper for RECAP PDFs (or find out how to download all of them from archive.org) http://archive.org/details/usfederalcourts
- Process PDFs through OCR tool like Tesseract -- https://code.google.com/p/tesseract-ocr/
- Use citation parser -- https://github.com/unitedstates/citation
- Write/find a suitable date parser
- Possible follow-on: bubble motion diagram with # of citations on the x-axis and the duration of the case on the y-axis. (e.g. http://bost.ocks.org/mike/nations/)
Measuring Voting Behavior
- I’m interested in a better way to rank lawmakers by voting behavior, moving beyond the simplistic linearinear designs of National Journal or CQ. I’m thinking social network analysis. I’m in a red Phillies hat near the front, or firstname.lastname@example.org
Rob Baker, Ushahidi (email@example.com) -
OpenTreeMap: Let’s create an open map of DC’s trees, editable by anyone (a la OpenStreetMap), that also calculates the positive environmental and finanial impact they provide, through the FOSS OpenTreeMap platform.
- Installing a new instance of OpenTreeMap for DC tree data on an EC2 micro
- OTM on github: https://github.com/azavea/OpenTreeMap/
- More info about OTM: http://www.azavea.com/products/opentreemap/
- Importing available data
- Possible data sources:
- data.dc.gov, http://data.dc.gov/Metadata.aspx?id=540 (link to download KML and SHP files at the bottom) (edit: apparently we need written permission to use #notquiteopendataday)
- CaseyTrees.org: Have done inspiring, extensive work on this over the past decade
- DCGrove.org: Current initiative for planting new trees
- Possibly help in installing all of the required dependencies and patches (see install instructions at github)
- Contacting those other organizations about this project and sharing data and ideas.
- Research into importing SHP files to OpenTreeMap
Recently the World Bank conducted a global social media outreach initiative call "What Will It Take". The objective was a get opinions from people around the world on what they think it will take to end poverty. I have a dataset of 20,000 tweets (in multiple languages) but tagged to sector and country. Is anyone interested in exploring this data with me?
StackExchange Q&A Site Proposals:
These are really easy, and everyone can do it. There are two relevant proposals to stand up StackExchange Q&A sites, one for all things Washington DC, and another for Open Data. We’re in the process of defining the proposed sites, and we need your help getting the sites to Beta. So, if you want to take a 5 minute break today:
- Go to:
- http://area51.stackexchange.com/proposals/51247/washington-d-c/ (for Washington DC)
- http://area51.stackexchange.com/proposals/51674/open-data (for Open Data)
- Log in or sign up for an account
- "Follow" the proposals
- Add interesting questions that you’ve always wanted to know about DC and Open Data (separately)
- Most importantly (and easiest) -- vote on your favorite questions!
That’s it! Thanks!
Here are some ideas we know participants are interested in:
Raise awareness for DC’s Advisory Neighborhood Commissions system.
Open 211: Community resource data commons. Consolidating data about all District of Columbia’s non-profit services (from federal and local gov, and non-profit agencies) -- background in this brief memo.
consolidate and clean data
stand up in ’open data catalog’ (here?)
consider different taxonomies (one, two) and possibly prototype a ’crosswalk’
Complete a CSV -> API importer (?)
Visualizing DC’s blighted property and/or zoning data.
Visualzing government spending data.
Transportation data: Bikeshare trips <http://greatergreaterwashington.org/post/14048/bicycling-is-the-fastest-way-to-travel-in-downtown-dc/> (Matt Caywood), your personal WMATA ride history, etc.
Compare hospital costs using Medicare data.
Build a Pre-Fab Data Scraping/Cleaning Workstation (@auremoser).
Designing a Better ____. What would your ideal government data portal do? Use Design Thinking principles to explore how the portal serves its users. The Bootcamp Bootleg at http://dschool.stanford.edu/use-our-methods/ is a great resource to start with. Try redesigning Data.gov or Data.dc.gov (compare to http://data.gov.uk/ or http://hub.healthdata.gov/), http://www.whitehouse.gov/open, https://petitions.whitehouse.gov/ (compare to http://epetitions.direct.gov.uk/).
Here are some more ideas for inspiration:
A GTFS Archiver (so we can get historical data for analysis).
DC Chapter of Code for America Brigade.
Redefining #OpenData for 2013.
Recommendations for the DC Government’s #OpenData.
Write a Knight News Challenge proposal.
Deploy The State Decoded for the laws in your home state.
Open Data Census. http://okfn.org/events/open-data-day-2013/census/
TransparencyCamp! May 4-5 (A open gov unconverence brought to you by the Sunlight Foundation and the George Washington University School of Media and Public Policy.) Learn more about TCamp at http://transparencycamp.org/ Register before March and save $10. Registration: https://tcamp13.eventbrite.com/ Thanks!
If you post photos on twitter or flickr, put links here!