Reconcile English/British/foreign citations #42

kfunk074 · 2021-11-12T19:53:32Z

CAP has only U.S. cases and does not detect citations of English (or any foreign) reporters, nor would it help much if it did, as the case text and metadata will not be in CAP.

Prepare an excel file list of foreign reporters and their common abbreviations
Construct particular reg ex detectors of foreign reporters
Explore open source corpora of English reports spanning the relevant time period (1800-1920)

kfunk074 · 2021-11-12T21:37:20Z

Grajzl and Murrell's helpful guide to the English Reports, and how they constructed their database of pre-1765 case reports: http://www.econweb.umd.edu/~murrell/articles/AppendicesMachineCaselawJOIE.pdf

"The source of our data and the starting point for our corpus construction and processing was
a digitized database of English Reports, obtained from Juta and Company (Pty) Ltd (English
Reports (1260-1865), n.d.). The resultant database consists of 129,042 nominate reports of
decisions rendered in the English courts of law between the early 13th century and the mid-19th
century."

kfunk074 · 2021-11-13T00:57:44Z

English Case Reports.xlsx

Here's a start on a database of UK law reports, adapted from the second English edition (1892) of Joseph Story's Commentaries on Equity. It's probably incomplete, but hopefully not very.

kfunk074 · 2021-11-13T01:06:55Z

So with perfect OCR, we can at least use this dataset to match a citation to a UK reporter. To this point I haven't attempted to correct common OCR errors on English reports. To do that, it would be helpful to have the output of our general regex run on the Story volume I mention above.

Produce a general regex citations output for Story's English Equity, Gale ID: F0105632267

kfunk074 · 2021-11-13T14:32:41Z

Image of a typical page in the English Reports. The plain text is not expensive to acquire. This page makes clear there are two complications posed by the English reports that we won't usually encounter with American reports: 1) multiple cases can be reported on a single page, meaning citation "addresses" are not unique. 2) Many private reporters had such limited runs they only produced one volume and so there is no volume signifier in the standard citation form. Neither of these derail the main project. We will either miss citations to the obscure private reporters or we can write special particular regex's to find them.

So far as I can tell, there is no CAP equivalent for UK case reports. There are things we could do to create more meaningful connections in the data, but these should all be considered back burner to the main project.

We could "section" the cases into separate texts as we did with the Field Codes. Each text could retain its "address" in the English Reports and we could try to extract the OCR of the private reporter citations with which each report begins.
The English Report volumes are divided up by jurisdiction (King's Bench, Chancery, Exchequer, etc.) and then run chronologically. An RA could prepare a database of court personnel and corresponding dates. We could then track decisional law by court and jurist as we can with CAP.
Grajzl and Murrell are trying to topic model this corpus to death. I'll get in touch to see what if anything they've done to think about citations.

lmullen · 2022-04-22T19:49:53Z

@kfunk074 Two questions about the status of this one.

Any more (much more?) to be done to create as complete a list of English reporters as reasonable?
Any reason to think these won't be picked up by our general Go cite detector? In other words, the problem isn't detection by analysis?

kfunk074 · 2022-04-22T20:03:30Z

I don’t know what I don’t know. I think it’s a pretty extensive list, and I don’t know where to look to find more, though there may well be more out there. Many are single-volume, but that’s the only hang up to finding them with a general regex search.

kfunk074 · 2022-08-16T15:41:15Z

For future reference, this database might be helpful as a UK CAP alternative. Have yet to suss out how comprehensive it is: https://swarb.co.uk/its-what-we-do/

lmullen · 2022-08-26T22:14:52Z

We have essentially detected the British citations, unless there is some reporters that fall out of the 1 Reporter 123 pattern. What we need is a process to reconcile them to useful information parallel to CAP.

kfunk074 · 2023-02-27T18:42:47Z

Not sure how I missed this before. A complete database of the English Reports appears to be here: http://www.commonlii.org/uk/cases/EngR/

It appears there are hand-keyed parallel citations that could link to our detected cases and allow us to extract at least the dates of the decisions. I'll see if they can share their datafiles.

kfunk074 · 2023-07-11T19:46:43Z

Behold, the English Reports. Turns out each case has one and only one parallel cite, so no extra table needed for that. The second table here matches up volume number to court jurisdiction. We have the full text too, just not in table form yet. Low priority to get full text I would think.

Edit: File too big. Download the csv here.

english_reports_courts_by_volume.csv

kfunk074 · 2023-07-23T19:49:25Z

A few pointers, as I review Phil's data:

The reporter_standard entries in the whitelist now match exactly the reporter abbreviations used in the English Reports. A "raw" MOML citation should match exactly the official or nominate citation from the English Reports.
The English reports give one and only one nominate citation for each official citation. I don't know if that's historically accurate but I have no evidence to doubt it either, so for now we can just embrace the simplicity.
The English Reports are comprehensive through 1866, sporadic until 1877. An entirely different set of reports, the Law Times Weekly, became the official reporter in the 1870s. I'm working with law librarians to see if a structured database of the Law Times is available, but just to be clear: the English Reports cover UK cases from 1200 to about 1870. They will only account for a fraction (half? a third?) of all cites our whitelist labels "UK." But they're comprehensive, influential, and the metadata is useful, so well-worth plugging in now while we wait to see if anything comes of the Law Times.
The good and bad news is that the metadata is far less extensive than CAP's, and the corpus far smaller. Hopefully that helps with linking. There are three tables in the drive folder linked above: The data on each case in the reports, a table of jurisdictions by volume (the printed English Reports are organized chronologically by jurisdiction), and a table of full text reports keyed to each case id (being ironed out by Phil as of 7/23 but nearly complete). We don't need to import the full text if we don't want to burden the server with a bunch of data we're not going to use for the foreseeable future.

kfunk074 · 2023-10-25T03:48:40Z

The complete, clean, final, and godly English Reports are here: https://drive.google.com/drive/folders/1QpwUQHIxzAJdeUG15CdNPioT5HBilyKY?usp=sharing

The csv file contains everything described above as well as the clean years, titles, and wordcounts from Peter Murrell's data. This is ready to integrate when you're ready to tackle the integration.

kfunk074 self-assigned this Nov 12, 2021

This was referenced Nov 13, 2021

Correct obvious OCR errors in the pre-processing stage (Go) #36

Closed

Kellen's wish list #41

Open

lmullen changed the title ~~Track English/British/Foreign Citations~~ Reconcile English/British/foreign citations Aug 26, 2022

kfunk074 mentioned this issue Jul 11, 2023

Paper for OWCAL #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconcile English/British/foreign citations #42

Reconcile English/British/foreign citations #42

kfunk074 commented Nov 12, 2021 •

edited

Loading

kfunk074 commented Nov 12, 2021

kfunk074 commented Nov 13, 2021

kfunk074 commented Nov 13, 2021

kfunk074 commented Nov 13, 2021

lmullen commented Apr 22, 2022

kfunk074 commented Apr 22, 2022

kfunk074 commented Aug 16, 2022

lmullen commented Aug 26, 2022

kfunk074 commented Feb 27, 2023

kfunk074 commented Jul 11, 2023

kfunk074 commented Jul 23, 2023

kfunk074 commented Oct 25, 2023

Reconcile English/British/foreign citations #42

Reconcile English/British/foreign citations #42

Comments

kfunk074 commented Nov 12, 2021 • edited Loading

kfunk074 commented Nov 12, 2021

kfunk074 commented Nov 13, 2021

kfunk074 commented Nov 13, 2021

kfunk074 commented Nov 13, 2021

lmullen commented Apr 22, 2022

kfunk074 commented Apr 22, 2022

kfunk074 commented Aug 16, 2022

lmullen commented Aug 26, 2022

kfunk074 commented Feb 27, 2023

kfunk074 commented Jul 11, 2023

kfunk074 commented Jul 23, 2023

kfunk074 commented Oct 25, 2023

kfunk074 commented Nov 12, 2021 •

edited

Loading