Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling down district level voter registration #22

Open
chrisdick14 opened this issue Jan 31, 2017 · 17 comments
Open

Pulling down district level voter registration #22

chrisdick14 opened this issue Jan 31, 2017 · 17 comments

Comments

@chrisdick14
Copy link
Contributor

Would like a time-series for redistricting work.

@Smahoney37
Copy link

Attached:
a file containing the availability of granular data by state - not necessarily district.
Arizona data - not cleaned

Arizona.zip

Availability of Data.txt

@chrisdick14
Copy link
Contributor Author

Awesome work! We will have to start assigning people to each of the states we can pull.

@gvpeek
Copy link

gvpeek commented Feb 24, 2017

I'm interested to jump in and help. I was thinking I could start with Texas, since that's where I am. But I'm happy to take on some other states too. That being said, I had some questions...

  1. Is there a list of states people are assigned to?
  2. What is the ideal state of the data, cleaned to a certain spec or just raw for now?
  3. Is there a place for these files to be committed or are they just living in comments for now?

@KirkHadley
Copy link

Hi,

So I'm not really sure where would be the best place to put these but I have for varying recent years (farthest back ~2008) voter files for CO, CT, DC, DE, FL, GA, MI, NC, OK, RI, UT, and WA. Would that be helpful?

@kflanagan
Copy link

You can find the current NC registered voter info here
https://data.world/kflanagan/nc-statewide-voter-info
Along with it is the SQL statement to create columns

@chrisdick14
Copy link
Contributor Author

@KirkHadley and @kflanagan we can definitely use this information. However, this is slightly different data than we have been using in the past so let me think about where we want to store it, and how it will fit into our current structure.

@KirkHadley
Copy link

@chrisdick14 I actually have that file for every NC election since 2005. Should I upload it to data.world?
@kflanagan Has any thought been put into standardizing election results at the state level? If so, I have all the states state level election results at the district level and am more than happy to share.

@kflanagan
Copy link

@KirkHadley and @chrisdick14 The source for the data I posted is the state, here's their link. I don't know if there are efforts to standardize but given the sate of things at the federal level I doubt it.
https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvoter_Statewide.zip

@chrisdick14
Copy link
Contributor Author

@KirkHadley and @kflanagan there are two things we can do for these data. (1) You can post them yourself on data.world and tag them with 'd4d' and 'election transparency' (as well as any other tags you want to use), or (2) we can have you send us the data and we can upload directly to the d4d election transparency data.world page. I am totally fine either way. I agree about the standardization. The Open Elections Project has been doing some of this work: https://github.com/openelections/openelections-results-nc

I think one thing we could do is if we can get results from several states we can all agree on a format moving forward and put something out there, if that is something you all are interested in.

@kflanagan
Copy link

Given that I had already put the NC data up on data.world I just went and tagged them with d4d and election transparency. That'll get us started. I don't know what's best, the states keep their own formats, is it a good use of time to re-format every time they update the data? I think that NC updates weekly. Would use of data.world to present the data via SQL like queries be something that we could do to present it in a way that would allow folks to query across states?

@chrisdick14
Copy link
Contributor Author

@kflanagan I think that is a great idea. Especially with data that are coming out that regularly. I think if there were some 'clean' datasets we needed for projects we could pull the requisite data from your larger file and post it in the cleaned format that we end up using for analysis.

This is really fantastic. We are having a hackathon this weekend and who knows, someone may end up using these data in their analyses!

@kflanagan
Copy link

I found a flaw in my logic. Big data sets don't work so well it seems on data.world, file too large to extract from the archive. Maybe I'll try to upload the raw data, but of course the uncompressed file may be too big to upload raw. Perhaps we need to point at the county by county info for NC. I'll take a look at it this evening.

@chrisdick14
Copy link
Contributor Author

Let me know how big the data set would be. We can chat with the data.world folks and see if there is a work around. If not we may have some other options that I am exploring now to upload the data and make it public.

@KirkHadley
Copy link

So I have voter files on a good number of states (I'm a squirrel with these things). Details on sizes and such:

State-Total Size, Number of Voter Files, Range of Years

  • CO-15gb, 14, 2013-17
  • CT- 8.4gb, 7, 2013,14,16,17
  • DC- 131mb, 6, 2014
  • DE- 1.1G, 7, 2013-15
  • FL-103G, 47, 2012-17
  • mbI- 31G, 8, 2014-16
  • NC- 134G, 51, 2012-17
  • OK- 3.6G, 6, 2014-16
  • RI- 250mb, 7, 2012-15
  • UT- 569mb, 6, 2014
  • WA- 1.8gb, 12, 2006-17

@kflanagan
Copy link

@KirkHadley is that the voter file that's found https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvoter_Statewide.zip but with multiple years?

@chrisdick14
Copy link
Contributor Author

Ok, those are going to be too big for data.world I think. We are going to have to come up with another solution to host these. Let me do some asking around and see what we can find.

@alistaire47
Copy link

Hi, I'm Edward. I'm new and happy to help. To get rolling I scraped the relevant PDFs off of the DC BoE site in the link above to see how hard the PDFs are to parse. The answer is (predictably) not terribly easy, but possible.

Given that, what data do we want?

  • Since DC is all one district, just the whole city, or wards or precincts?
  • What time frames? They publish monthly, so everything, yearly (start? end?), before elections (which?)?

I also saw on their website that you can get the whole voter file on CD-ROM (yeah) for $2 (yeah). It's not clear if how it handles formerly registered voters, but it's as granular as you can get—but since it's individuals, it's at least dubious to republish it unaggregated, even though it's all public data. I'm not sure we want it, but it's entirely possible to assemble a national voterfile; e.g. you can grab the Ohio CD CSVs at will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants