-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile English/British/foreign citations #42
Comments
Grajzl and Murrell's helpful guide to the English Reports, and how they constructed their database of pre-1765 case reports: http://www.econweb.umd.edu/~murrell/articles/AppendicesMachineCaselawJOIE.pdf "The source of our data and the starting point for our corpus construction and processing was |
Here's a start on a database of UK law reports, adapted from the second English edition (1892) of Joseph Story's Commentaries on Equity. It's probably incomplete, but hopefully not very. |
So with perfect OCR, we can at least use this dataset to match a citation to a UK reporter. To this point I haven't attempted to correct common OCR errors on English reports. To do that, it would be helpful to have the output of our general regex run on the Story volume I mention above.
|
Image of a typical page in the English Reports. The plain text is not expensive to acquire. This page makes clear there are two complications posed by the English reports that we won't usually encounter with American reports: 1) multiple cases can be reported on a single page, meaning citation "addresses" are not unique. 2) Many private reporters had such limited runs they only produced one volume and so there is no volume signifier in the standard citation form. Neither of these derail the main project. We will either miss citations to the obscure private reporters or we can write special particular regex's to find them. So far as I can tell, there is no CAP equivalent for UK case reports. There are things we could do to create more meaningful connections in the data, but these should all be considered back burner to the main project.
|
@kfunk074 Two questions about the status of this one.
|
I don’t know what I don’t know. I think it’s a pretty extensive list, and I don’t know where to look to find more, though there may well be more out there. Many are single-volume, but that’s the only hang up to finding them with a general regex search. |
For future reference, this database might be helpful as a UK CAP alternative. Have yet to suss out how comprehensive it is: https://swarb.co.uk/its-what-we-do/ |
We have essentially detected the British citations, unless there is some reporters that fall out of the |
Not sure how I missed this before. A complete database of the English Reports appears to be here: http://www.commonlii.org/uk/cases/EngR/ It appears there are hand-keyed parallel citations that could link to our detected cases and allow us to extract at least the dates of the decisions. I'll see if they can share their datafiles. |
Behold, the English Reports. Turns out each case has one and only one parallel cite, so no extra table needed for that. The second table here matches up volume number to court jurisdiction. We have the full text too, just not in table form yet. Low priority to get full text I would think. Edit: File too big. Download the csv here. |
A few pointers, as I review Phil's data:
|
The complete, clean, final, and godly English Reports are here: https://drive.google.com/drive/folders/1QpwUQHIxzAJdeUG15CdNPioT5HBilyKY?usp=sharing The csv file contains everything described above as well as the clean years, titles, and wordcounts from Peter Murrell's data. This is ready to integrate when you're ready to tackle the integration. |
CAP has only U.S. cases and does not detect citations of English (or any foreign) reporters, nor would it help much if it did, as the case text and metadata will not be in CAP.
The text was updated successfully, but these errors were encountered: