-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map charge descriptions to crosswalk categories #6
Comments
It turns out that charges descriptions have more reliable mappings to IUCR categories than ILCS statutes, but they're still not perfect. 15.6% of unique charge descriptions map to multiple IUCR categories. Under Miscellaneous convictions data I've uploaded two JSON files:
More to come... |
@bepetersn. I'll take a look at this today. We might just have to make our own call for mapping the ambiguous ones and document it clearly. |
@bepetersn I'm a little confused by these. For example, this mapping from
I don't see why this description would match to either of these categories. Are the JSON files based on ILCS -> IUCR, or on charge description or a combination? Do you have code that implements your methodology for generating these mappings. |
First attempt towards #6; create a JSON file mapping chrgdesc to category
See my code at: #10 Several notes:
|
@bepetersn I'll take a look at #10. Thanks for clarifying the "ATT. FORGERY" issue. It sounds like there might be some disconnect for some records between the statute and the chargedesc. I'll take a quick peek and let you know what I find. It shouldn't matter that you looked at dispositions since that's the source for the convictions anyway. It just means that you're looking through more records. The mappings from charge description to IUCR category won't capture records that didn't have an IUCR code calculated from statute. I think the first step moving forward would be to start making our own mapping from |
@bepetersn, FYI I've uploaded a recent snapshot of the database to drive. |
@bepetersn, I took a look at just the "ATT.FORGERY" case. It seems like there might have been some difficulty parsing the statute field to get the IUCR code/category which your management command was using to grab the categories. This makes sense because, at least for these dispositions, it looks like they tried to cram two different statutes into one field. 😿
The output is:
720-5/17-3 maps to forgery, so I'm not sure where the Battery and Burglary mappings are coming from. This makes me think that some of the more weird mappings might be due to parse issues. I wonder if we're better off making our own map of chargedesc to category. Have you done any more digging into this? |
Hey @ghing, I did some more digging. The majority of the multiples that we saw before were of the type that you said: coming from parsing errors, whether in the ILCS or IUCR modules. After removing a bug in my I believe I also got cases where there were multiple IUCRs associated with a charge description but all with the same IUCR category to feed into the mapping. Finally, after adding a check to make sure the category is found in the IUCR crosswalk along the lines of what I talked about in #14, the number of multiples went down to 3 (I might be able to get it to none). I need to run a check to see how many of the convictions I'm actually able to reliably account for using this new mapping of charge descriptions to IUCR categories, but I'm somewhat hopeful. For now, the new I'm also going to upload my code tonight. |
So the number of convictions for which I was able to successfully make a one-to-one mapping from its charge description to its IUCR category was 80.39% this time, or 27,743 missed records. A little bit worse, but I think we could make it better. |
@bepetersn, let's hold of on working on this further until I finish a pass on my drug queries so we can figure out the best approach for this. I think we'll want to focus on our areas of interest rather than trying to get a clean category for every charge. |
Ok. You should see the two new files, though. I've mostly got the mapping created. The multiples are 246 items long. We can really easily roll most of them up into the categorizations you are defining (most of them are going to map to a property crime, a few to sexual assault, etc.) The other 1300-some charge descriptions map to a single category, and we should be able to decide how to roll up these single categories into property/sexual/drug/violent really easily too. The only thing I really want to do still is turn these JSON files into a CSV table. |
@bepetersn, ok. I'll take a look at the new multiples file. |
I've been doing some fixes to ILCS statute parsing and also looked through the duplicates and made mappings in this spreadsheet. In many cases the mapping is genuinely ambiguous but we should be able to map them to our broader categories: violent, property, drug, index/nonindex, etc. |
Add statute-based queries to group together violent, nonviolent, affecting women and drug crimes (and index crimes within each) to handle cases where an ILCS statute doesn't map to an IUCR code or the mapping is ambiguous. Addresses sc3/cook-convictions#83, sc3#7, sc3#6
For statutes with multiple IUCR possibilties, make csv mapping between charge descriptions and crosswalk categories. Also to IUCR codes, if an obvious mapping can be made, and eventually to our own categories of interest.
The text was updated successfully, but these errors were encountered: