Explorations in Data Science and Health


Newly elected Congresswoman Alexandria Ocasio-Cortez is in the new a lot these last couple of months. She was elected in the November 6, 2018 general election to represent New York's 14th Congressional District. Now let me ask you a question? do you know how many votes Congresswoman Ocasio-Cortez received in Precinct 037/78 in her district? Go ahead, I will wait . . . You probably went looking on CNN or NYTimes and probably came up short. They don't share this kind of detailed data.

I recently wanted to do a visualization on the US House and Senate voting in the most recent election and I found their exists no publicly accessible dataset that contains these results. You read that right. In one of the most established democracies in the world, its actually very difficult to access election results. News organizations and Universities subscribe to one of a number of companies that synthesize this information such the AP Elections API or CQ Elections Results Collection but again these are closed subscription sources.

That is why with my open-source contribution this week I wanted to highlight the work of two organizations who are working to change this, OpenElections and the MIT Election Lab. But there is simply way too much work todo. As shown in their status chart:

MIT is actually even further behind with data covering just 22 states.

Almost everyone in the United States is assigned a voting precinct. These precincts are very small. Our example Precinct 037/78 had just 46 votes out of more than 690,000 people who live in the 14th District. The vast majority of this precinct is made up of the Bronx Zoo. Many precincts are combined into a polling place. In this case almost 20 precincts all vote at Giordano MS. The totals by precinct are then bubbled up through the elections hierarchy until finally a vote total is published. THe official totals aren't published in a standard format, aren't universally available online meaning its extremely difficult to assemble these for the 435 members of the US House let alone for the over 500,000 elected officials nationwide.1

These two organizations try to wrangle all of this together. Through web-scraping, volunteer data entry, and freedom of information act requests, they somehow get a hold of the spreadsheets and pdfs that contain this information and make it freely available.

The data challenge

Let's return to our sample district 037/78, what is needed to process this data? New York is one of the organizations that at leasts publishes spreadsheets of the results. Many other agencies only publish PDFs or just have images of paper records which cannot be machine processed. For Congresswoman Ocasio-Cortez, the results data file is over 4400 lines long with no headings. For each of the 450 precincts in her district the following ten lines are repeated:

Public Counter 41
Manually Counted Emergency 0
Absentee / Military 7
Federal 0
Affidavit 0
Alexandria Ocasio-Cortez (Democratic) 39
Anthony Pappas (Republican) 4
Elizabeth Perri (Conservative) 0
Joseph Crowley (Working Families) 1
Joseph Crowley (Women's Equality) 2

This is a nightmare to parse but at least it is parseable. My open source contribution this week was to parse the results of 2013 special election for the Tennessee House of Representatives 91st district. Not glamours. The PDF of the results was available on the website for Tennessee Secretary of State but only as a PDF. I ended up copying out by hand and submitting it just 65 datapoints out of the thousands needed.