Weekend Wrap-up 2014

Fill in what you accomplished at Open Data Day 2014.

Project Name

Team Members: X, Y, Z, AA, BB, CC ....

Link: URL to a project website, github repository, etc.

What did you accomplish? We did a data analysis of XYZ and found out ABC. The results will influence DEF.

Paste at least one image --- it can be a screenshot --- showing what you worked on. It might be a chart, a map, your code, etc.

Add the image(s) to the Tumblr: http://opendatadaydc.tumblr.com/


Mapping Scanned Documents (Literally)

Team Members: 


What did you accomplish? 


Team Members

Steven C. Burgess

Hayley Brown

Claude Concolato

Joshua K. Farrar

Ben Klemens

Christine Zhang




What did we accomplish?

The goal was to understand the scope and potential of the data for addressing the following issue:

As a starting point, we compiled data from the Demographic and Health Surveys (DHS) as well as the AIDS Indicator Surveys (AIS) for Tanzania.  The AIS surveys exist for 2007-2008 and for 2011-2012.  The DHS surveys exist for 04-05 and for 2010.  

We used the AIS surveys to compare HIV prevalence among households for the two periods, as the drug raids in Tanzania began in late 2008 and continued until 2010.  

We were able to establish a 07-08 as the "before raid" period and 11-12 as the "after raid" period.  Because the raids were concentrated in the capital city of Dodoma, we disaggregated the data by region, using cities of comparable size as counterfactuals.  Specifically, we chose the cities of Arusha and Mwanza.

HIV prevalence was slightly lower in Dodoma between the two periods (-0.74%). Mwanza was -2.81%, and Arusha was +3.75%.  However, the small sample size for certain regions presents an issue:

Overall, we gained a better understanding of the different datasets and were able to disaggregate the data on the household, individual, and regional levels.  Although we may not be able to make causal statements about the impacts of the raids per se, it is extremely useful to have this disaggregated data os that we may understand how better to target preventative enforcement efforts related to the fake pharmaceutical industry in countries such as Tanzania.

See the slide show for further details and next steps: http://klemens.org/asst/bust.pdf

Philippines Road Safety and Transit Data Hack

Team Members:

Holly Krambeck (hkrambeck at worldbank dot org) 

Thore Fechner (t.fechner at uni-muenster dot de)

Dave Johnson (davj56 at gmail)

Sara Thurman (srthurman at gmail)

Aaron Dibner-Dunlap (aaron dot dibnerdunlap at gmail)

Travis Korte (tkorte at datainnovation dot org)

Li Qu (lqu at worldbank dot org)

Carlos Morales (carlos at dbsinc dot us)




(1) Using the new Philippine national road accident database, Thore and David developed a demonstration road accident visualization platform that can be used by traffic management agencies to optimize traffic enforcer assignments and to more accurately target investments for improving road safety.  This demonstration will be shared with the Cebu and Manila traffic management agencies, and feedback will be used to help design the next phase of the program. 

(2) Using the Philippine Transit Information Service (GTFS database), Sara and Aaron  began to tackle the legacy left by a transit planning agency that continuously added routes to a system without the benefit of a complete system route map. They created a visualization of the corridors with the most route redundancies and learned that some corridors have as many as 62 overlapping route assignments. Aaron identified which sets of routes have the most significant overlap, to further support reduction of route redundancies. These analyses will be shared with the Department of Transport and Communications for use in their on-going jeepney route rationalization plan. 

(3) Outside of certain main corridors, there are no formal stations or stops for buses or jeepnies in Metro Manila -- these vehicles are flagged at the side of the road by passengers, resulting in accident-prone traffic conditions. Drawing upon both road accident and transit databases, Sara and Aaron worked together to create a visualization of the relationship between transit route/stop density and accidents in Metro Manila. Where there is a strong relationship, the World Bank will propose to the Department of Highways and Public Works, the Department of Transport and Communications, and the Metro Manila Development Authority to investigate these hot sports further as candidates for construction formal bus/jeepney stations. 

(4) While knowing the total number of accidents in an intersection is very valuable, normalizing the number of accidents against the traffic volumes also provides valuable information for targeting scarce resources towards making road segments and intersections safer. Travis has used road safety and taxi travel speed data from Cebu City and developed a visualization using R and QGIS to show areas with the highest likelihood of accidents (i.e., most dangerous), normalized by  traffic volume


Manila Road Accident Visualization (I)

Manila Road Accident Visualization (II)

Road segments that have relatively high accident rates for the amount of traffic that passes through them.

Relatioship between route density (by stop) and high concentrations of road accidents in Metro Manila

Preliminary visualization of route overlap - correlations in stops in bus routes. More refinement needed, but this is a first pass (red is high overlap):

Corruption: Is There a Clue in the Data?

Can we identify patterns in procurement/contract award data from multiple sources?  Will these patterns be be red flags for corruption?

Team: Betsy Wiramidjaja, Francis Gagnon, Alex Habershon, Sutirtha Roy, Chris Pease, Joan-Josep (Pep) Vallbé, SooJin Choi, Emily Rose McRae, Austin Ngo, Regina Lam,  Sayan Chaki (sayan.chaki@gmail.com), Jeremy Shankle (jpshankle@gmail.com), Katherine Mereand-Sinha 

Link: https://docs.google.com/spreadsheet/ccc?key=0Arwn_EjAjOu5dGw2WDdZVElZcUJGbDFwcnVzWXp6Unc#gid=0



The team has created a spreadsheet of countries and begun to identify national procurement websites and the detailed contract award data contained in them.  

The sites and their data have been ranked according to a scale devised by the team. It includes World Bank lending in the last five years, the Corruption Perception Index (Transparency International), and a homemade scale to measure the ease of use and usefulness of the data.

Data from the procurement sites has been injected by xxx and yyy. It is being combined with data from the World Bank and other development banks.

World Bank API: https://finances.worldbank.org/resource/kdui-wcs3.json

Interface Sample

Russia Clear Spending API: https://www.mashape.com/infoculture/clearspending-ru-as-russian-government-spending

API Sample Result:





DC Municipal Regulations Conversion

The Municipal Regulations of D.C. should be browsable in the same formats and cross-references should be easily identifiable and navigable. The D.C. Regulations are just as important as the law itself when determining what rules citizens must live by.  Right now it is very difficult to search these regulations, identify and follow code references, and there is no API or even bulk downloads for developers to use.  Current site:  http://dcregs.dc.gov/Search/DCMRSearchByTitle.aspx

Team Members:  Chris Birk, Bill Hunt, Leili Slutz, Matt Steinberg, Keith Porcaro

Link: http://regulations.dev.dcdecoded.org/



What did you accomplish?  We were able to get most (~98%) of the regulations imported to some degree.  Problems with filenaming and the usual government data issues made importing difficult.  

DC Education Data (One of many)

Hackpad: DC Education- School Chooser and More


Open Data Directory

Team Members:  Clare Zimmerman, Louis Fettet, Jessica Carsten, Marcus Louie, Ally Palanzi, Beth Shankle, Peter Dudka

Link: Open Data Directory, Front- end Repo -- GH Page, Cron Job Repo

Accomplishments:  We compiled a listing of over 90 data.json files with information about the data in those catalogs. We successfully deployed a solr instance, and are working to get data in. We also designed and implemented the front-end design. We will be continuing this work at the next Code for DC meeting!

DC Tree Map

Team Members: Rob Baker (@rrbaker, rrbaker at gmail dot com)

Link: http://cdb.io/1hed90I

What did you accomplish?

Following up from the 2013 OpenDataDay and some great visualizations created by Gena Wirth to view DC tree data, I’ve taken the open street tree data shapefile from the DC Data Catalog, cleaned it up a bit, and brought that into CartoDB using MapBox tiles made specifically for the district. Next steps: to incorporate the great work of CaseyTrees into the Azavea OpenTreeMap2 platform to not only visualize Urban Forestry Administration (UFA) data but more recent tree planting and calculate the economic and environment benefits.

Watch this space for future developments: http://dctreemap.org.

Practical Open Data for Nepal: Data-Driven Stories & Visualizations

We found shape files of 75 distrcits in Nepal that weren’t easily available. We’ve normalized and visualized them. They are now publicly available on Code for Nepal.

We analyzed School Performance data in Nepal and built a prototype of an app to rank schools based on the performance. It will help parents and students find good schools in different areas: http://demo.skylinenet.net/nepal/ 

We scraped some datasets from Nepal Census 2011 and DHS. 

We wrote a blog post hightlighitng number of people who are illetrate. 

We visualized the percent of female candidates that got elected per district in year 2064.

We clustered more than 5000 schools by subject areas. 

And we are launching Code for Nepal, practical data for Nepal: www.codefornepal.org

Team members: 

Jacob Menajovsky,   jjmenajosky at gmail dot com

Hatim Waholani, waholanihatim at gmail dot com

Michael Branan, mbranan at gmail dot com

Seth Miller, sethmiller at gmail dot com

Sam Lee, sammyslee at gmail dot com

Amanda Makulee, amandakulec at jsi dot com

Ravi Kumar, kumarav4 at gmail dot com 

Phil Brondyke  pbrondyke at ndi dot org

Add the image(s) to the Tumblr: http://opendatadaydc.tumblr.com/


Team: Ashish Sinha, Alex Lyte

1) Pitch

On Tuesday, March 4th, the President will unveil his Administration’s budget request - reflecting an estimated $3.777 trillion in spending priorities for 2015. With publicly available tax data, we can build a database and online interface that allows anyone to deconstruct federal budget programs into geo-specific data (zip code, county, state, and congressional districts). By making it easier to manipulate budget data to be more proximate and relevant to audiences, messaging to the public can be more effective in articulating the spending priorities and trade-offs that are being made by their elected representatives. 

We’d like a web application that provides context to large federal expenditures. This app might provide a form for users to enter in an expenditure amount and a zip code (or county), which returns the percentage and actual amount) of that zip code’s tax revenue that would be consumed by that expenditure. This would provide users with an idea of what a federal expenditure means to them, or their county. 

2) Problems

Tools to manipulate budget data into different forms already exist; it’s not novel (See Washington Post, National Priorities Project, and Brookings for examples). 

The needs that remain unmet are:

1. Existing tools do not allow users to input their own budget numbers. Typically, the tools only allow you to manipulate a curated list of programs or entire departments. This limits the ability of reporters, civil society, and activists to tackle specific federal programs/activities that are not already part of an existing tool. 

2. Existing tools (for the most part) are not open for others to use and customize.

3. Although the data is available publicly, the hurdle for an organization or activist to make sense of the .csv files, to create and maintain a database, and to create an interface is high enough to deter most from attempting to do this on their own. 

Hypothetical Use Case - Two local firehouses and a police station are being shut down in Cheshire County due to lower local tax revenues. Representative Strangelove is not supporting increased stimulus funding because of concerns over the growing deficit but he is pushing for a new $10 billion expenditure, over the next 10 years, which would upgrade nuclear weapons. Using this tool, a reporter/lobbyist/activist will be able to tell people in Cheshire County that Rep. Strangelove is pushing for them to spend $2.3 million dollars each year to make our nukes even more deadly at the expense of fire safety and preventing crime. 

3) Resources: What resources (people, skills, data, tech, etc) do you have?

Ashish Sinha - Stakeholder/Project Coordinator

Data Available (.csv files):

Statistics of Income - Tax Stats - State Data

Statistics of Income - Tax Stats - County Data

Statistics of Income - Tax Stats - Zip Code Data

Census Boundary Geodata - .shp files for state, county, congressional district (still looking for zip code geodata - maybe OpenStreetMap?)

4) Needs

- Tax data broken out by Congressional District

- Other stakeholders, subject matter experts, and technical leads welcome!

5) Questions

- Do we need this tool? Are there existing tools that do a good enough job?

- Where do we get the data and can we understand the data?

- Where do we store the data?  

- How do we create an easy to use interface for users? Can we make an interface that is customizable by someone without a programming background?

- While there is no specific timeline for this project yet, how much work would be involved to make this happen?

The Workshops

Team: Eric Mill, Leah Bannon, Shannon Turner, Max Richman

We taught the skillz! See these other hackpads: Workshop: Open Data | Workshop: Open Collaboration | Workshop: Intro to Python

Workshop Attendee Notes

A Map of Human Conflict and Violence, showing Development Interventions


Jiro Tominaga (jtominaga at worldbank dot org); Ken Chomitz (kchomitz at worldbank dot org); Alex McKenzie (amckenzie at worldbank dot org), Claude Concolato (claude dot mercury at gmail dot com), Neal Sidhwaney (nealsid at gmail dot com), Maria Kail (mkail at worldbank dot org), Jes Skillman (jes dot skillman at gmail dot com), Maria Kail (mkail at worldbank dot org)


We tried to address the question on how best use maps to present conflict information and relate it to development interventions. The aspiration is to use geo-referenced data to inform how conflict may affect/disrupt development actions, or how interventions are targeting conflict events. In addition to space dimensions, we need to look at temporal dimension.

To constrain this effort, we decided to focus on one country: Nigeria, and use data from the Uppsala Conflict Data Program, and the list of geo-referenced points from related World Bank interventions/projects. Initial explorations were done with Tableau and GoogleMaps. 

We did some data exploration using Tableau, which was helpful to identify relevant variables, and issues/problems with data.

Neal Sidhwaney prepared an interesting GoogleMaps mashup prototype, showing available Worldbank assistance projects in purple circles, along with markers that show conflicts from 1990 - 2010.

Also prepared a density plot to represent conflict information about Nigeria, shading represents the number of fatalies

Ideas to follow up

- heat map of violence [best_est] best estimate of fatalities

- slider that shows years of violence [year]

- color can show the type of violence (1state, 2non-state, 3one-sided) type_of_violence

pop-up label: 

   - conflict_name

   - best_est

   - start and end date 

- slider that shows progression over time (only shows 2005 - 2007), 3 year running window 

- Different maps for each sector (easy to read) - drop down selection filter

- 20 km buffer around violence zones

Issues to address:

- better documentation of WB project data!

- would be useful to have data on when the location-specific actions took place, as opposed to project-level timelines

- some projects have multiple points, some projects have no points (e.g. polio immunization) 

- maybe get the old data from completed projects? 

Mapping Nigeria’s extractive industry

School of Data tested the Task Manager as a tool for oil infrastructure with 7 volunteers at Open Data Day in Washington DC on February 22nd.


School of Data organised a sprint to map pipelines and parts of the oil infrastructure in Nigeria. As oil spill monitoring in Nigeria slowly improves, it is becoming clear that having a full picture of the Nigerian oil infrastructure is important for ensuring accountability of the oil industry and improve the opportunity of local citizens to engage. 

The DC  Open Day Data event focused on the learning journey serving approximately 350 people with a wide-range. At School of Data, we are keen to use projects to frame technical learning and lower the barrier to entry for new data makers. Events like this provide opportunities for participants to choose between making (hacking on projects) or learning (taking workshops). The success of every event pins on every participant being dispatched to be both teachers and learners. Some of the workshops included Introduction to Open Data, Open Mapping, Introduction to Open Collaboration with Github and Introduce to Python.  This is a list of all the projects created by the DC community. School of Data’s methodology is to provide opportunities to for people to learn these types of skills while building a core project. 

Mapping Nigeria Oil Infrastructure

With assistance from Mikel Maron at Groundtruth and guidance from OpenOil we were able to use the Humanitarian OpenStreetMap Task Manager to map the infrastructure. Using the Nigerian oil spill monitor, we looked at spill sites to identify pipelines and oil infrastructure. We cross-checked the spills with Wikimapia and other satellite data and then used the Open Street Map task manager to break up an oil concession and get to work on the edits!

Here you can see the spill data:

 Here’s our task manager and current progress as we follow the pipelines:

Here’s an example of some infrastructure that we found and tagged for more investigation:

Here’s an example of the final result in OSM: