Standardize precinct identifier format #144

nvkelso · 2018-07-31T06:43:58Z

Should generally be state fips (AA) & county fips (AAA) & precinct id (AAAAAAA*).

Sometimes there is both a precinct name and ID, perhaps we should include both variants? (Though extra columns inflates the DBF).

migurski · 2018-08-01T18:41:15Z

Those precinct IDs come from the Census, but only in cases where a state participated in the 2010 VTD program right?

nvkelso · 2018-08-01T18:54:05Z

We invent them for state and local sources. We should be more consistent there... And there should be crosswalk with other precinct data provider / sources.

…

On Aug 1, 2018, at 11:41, Michal Migurski ***@***.***> wrote: Those precinct IDs come from the Census, but only in cases where a state participated in the 2010 VTD program right? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

migurski · 2018-08-01T19:51:05Z

For PlanScore, I’ve been assigning them artisinal integers. Works really well internally but not something I’ve exposed generally.

nvkelso · 2018-08-01T22:24:57Z

Please make them public!

…

On Aug 1, 2018, at 12:51, Michal Migurski ***@***.***> wrote: For PlanScore, I’ve been assigning them artisinal integers. Works really well internally but not something I’ve exposed generally. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

nvkelso · 2018-09-03T18:51:00Z

In #146: All state IDs are now FIPS codes in #146, and there's a common field format (2 char for state, 32 char for county (which should be ssCCC but some data comes as longer name strings and that's not normalized yet), and 255 char precinct (should be normalized, but same as county).

sigpwned · 2018-09-16T18:02:17Z

First, let me say how thrilled I was when I came across this project. Because it contains preinct-level geodata for the whole country, I think it can be the hub for any GIS or map election data project. I know it's a great starting off point for some work I plan to do!

Regarding precincts, I reviewed precinct labels from a number of states and I was disappointed to find that there is little shared rhyme or reason among them. Some use numeric codes; some use physical location names, like "city hall"; some use a combination of the two; others seem not to include labels at all. If the goal is to standardize precinct labels in a way more general than "uppercase, split on non-alphanumeric and join with single whitespace," this project will have to come up with its own novel naming scheme. I'm not sure there is a "right" answer, on face.

However, I think we can optimize the labeling for some common use cases. The work I plan to do involves joining this data set to other precinct-level data sets, e.g. data sets from here. I think a good way of standardizing the labels would be:

Look for other other precinct-level data sets that are available
Study how they label their precincts
Choose a method that makes joining to as many different data sets as easy as possible

For example, let's say we find 10 such data sets. It's likely they'll all be at least a little different. But if we find that they all use place names to identify precincts, then we'd want to make sure to preserve place names when they're available in this data set. Because all the data sets will be different we won't find any scheme that's perfect, but we can at least find some objective measure for "better."

I also think that what @nvkelso about data crosswalks is really important. If this data set is going to become a hub, then it needs to be as easy for other people to pick up and use for their own purposes as possible. To that point, I think that encouraging people to publish any crosswalks they create would be A Good Thing. (For example, when I do the join to the data sets linked above, I'll be happy to share a "join table" that maps this data set to those data sets.) Those joins make this data more useful; the joined data more useful; and any data that joins to either can now be mapped to both.

Here are some data sets that I think it could be useful to review when trying to decide on a standard. I'm sure there are others, but hopefully these are a good start:

Just a couple of thoughts I had while elsewhere in the data set, for whatever they're worth. Hopefully they make sense.

nvkelso · 2018-09-17T06:23:01Z

Hi @sigpwned, thanks for your kind words and thoughtful comments. I really like the idea of x-walk concordance "join" tables with other precinct datasets.

I've been wondering if this project should allow both precinct "identifier" and precinct "name" columns when both those are available in the upstream sources to make this a little easier.

sigpwned · 2018-09-17T15:45:35Z

I've been wondering if this project should allow both precinct "identifier" and precinct "name" columns when both those are available in the upstream sources to make this a little easier.

That's an interesting idea! And it's knocked some ideas loose for me. Let me try to dump my brain while the thoughts are fresh.

Based on my understanding, the goal of this project is:

Every US precinct is represented by one record in the dataset with a unique (state_fips, county_fips, precinct_id) key.

Here are a few thoughts on getting there:

All records should now have state FIPS codes.
All records with counties should now have county FIPS codes, or will soon, per Standardize use of name and FIPS code in state and county fields #135.
All records without counties should receive county labels soon, per Standardize use of name and FIPS code in state and county fields #135.
There are duplicate (state_fips, county_fips, precinct_id) keys in the data set.
Some records have no precinct_id.

4 and 5 above are potentially significant issues.

Regarding 4, it's difficult to know if these "duplicate" rows represent one precinct with the region split into multiple geometries, or if the rows are actually mislabeled. The only way I can think of to make that determination is to compare this data to other precinct-level data. Once we know that:

If the rows represent one precinct, then I recommend we merge the duplicates into one row having the ST_Union of their respective geometries.
If the rows are mislabeled, then I recommend we change the precinct_id labels to make them unique, e.g. by appending A, B, C, and so on.

Regarding 5, it's much like 4, except that all precincts should be treated as having the same label. Teasing these apart into "real" labels is going to be fairly manual work, unfortunately. We probably can't cheat by comparing to a "known good" precinct data set because if that data set existed, presumably we'd be using that instead of the data we have. At the very least, we should be able to use this map or one like it to do the assignments.

We're free to assign any IDs to updated rows we like. I think it would be wise to make those IDs look as much like other precinct ID labels as possible, but the reality is that new IDs are completely at our discretion. Any crosswalks we publish are essentially a relabel anyway, so users can substitute new labels if they wish.

Regarding keeping two precinct_id columns, I think it's a fine idea, but ultimately users will have to pick one column for any work they do. Fundamentally, it would be our first crosswalk, so we can publish that separately if we want to, or leave it integrated into the data set as a separate column. They're basically the same thing.

In any case, I think the plan of attack here should be to finish out #135 since we're close, and then generate a report sizing up 4 and 5 above, per state. We won't really know how much work this step will be until we have that report.

Just my two cents. How does that seem to everyone else?

sigpwned · 2018-10-02T16:03:07Z

Once we have #135 closed and the $state_fips$county_fips vs $county_fips format standardized, I see this issue as the next "big thing." Any thoughts on the above? With the benefit of more thought, I'm more confident that trying to standardize the precinct values is probably not useful, because they're so different.

Here's where I think we are:

All records should now have state FIPS codes.
All records with counties should now have county FIPS codes, or will soon, per Standardize use of name and FIPS code in state and county fields #135.
All records without counties should receive county labels soon, per Standardize use of name and FIPS code in state and county fields #135.
There are duplicate (state_fips, county_fips, precinct_id) keys in the data set.
Some records have no precinct_id.

I think fixing the last two above are top priority. I'm not sure what the best way to approach that is, per the above, but I haven't though too hard on it yet either.

Once this is handled however is deemed best, I think the next priority would be building crosswalks everywhere. How easy that is will probably depend on how we do this work.

Again, just my two cents. What does everyone else think?

nvkelso mentioned this issue Sep 2, 2018

cleanups Montana, New Hampshire #143

Merged

nvkelso mentioned this issue Sep 3, 2018

Fixups for Arizaona, Kansas, common field template, identifiers #146

Merged

sigpwned mentioned this issue Sep 15, 2018

Standardize use of name and FIPS code in state and county fields #135

Open

ummel mentioned this issue Oct 20, 2018

How did NYT create their map? #195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize precinct identifier format #144

Standardize precinct identifier format #144

nvkelso commented Jul 31, 2018

migurski commented Aug 1, 2018

nvkelso commented Aug 1, 2018 via email

migurski commented Aug 1, 2018

nvkelso commented Aug 1, 2018 via email

nvkelso commented Sep 3, 2018

sigpwned commented Sep 16, 2018 •

edited

Loading

nvkelso commented Sep 17, 2018

sigpwned commented Sep 17, 2018 •

edited

Loading

sigpwned commented Oct 2, 2018 •

edited

Loading

Standardize precinct identifier format #144

Standardize precinct identifier format #144

Comments

nvkelso commented Jul 31, 2018

migurski commented Aug 1, 2018

nvkelso commented Aug 1, 2018 via email

migurski commented Aug 1, 2018

nvkelso commented Aug 1, 2018 via email

nvkelso commented Sep 3, 2018

sigpwned commented Sep 16, 2018 • edited Loading

nvkelso commented Sep 17, 2018

sigpwned commented Sep 17, 2018 • edited Loading

sigpwned commented Oct 2, 2018 • edited Loading

sigpwned commented Sep 16, 2018 •

edited

Loading

sigpwned commented Sep 17, 2018 •

edited

Loading

sigpwned commented Oct 2, 2018 •

edited

Loading