Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional data set (potential) available Bexley, OH PD #20

Open
spencer-brooks opened this issue Mar 18, 2022 · 2 comments
Open

Additional data set (potential) available Bexley, OH PD #20

spencer-brooks opened this issue Mar 18, 2022 · 2 comments

Comments

@spencer-brooks
Copy link
Contributor

Bexley, OH is Spencer Brooks's hometown.

Data available at . The below two links are included on this page.

  • Offense Reports:
  • Accident Reports:

Potential challenges:

  • Dataset available in electronic PDF form (underlying HTML is available but might not be structured in a useful way) - would likely require significant new architecture for parsing
@mendible
Copy link
Collaborator

mendible commented Nov 5, 2022

@spencer-brooks can you share how you got to the underlying HTML? Curious if this is possible for the weekend

@spencer-brooks
Copy link
Contributor Author

I opened the pdf in browser and clicked "Inspect". I also used python package PyPDF2, which seems like a good utility (bypasses HTML and goes straight to text, though doesn't maintain all non-text elements). I wonder if there's another in-between option that allows extraction of the HTML itself programmatically.

Example code:

import PyPDF2
pdffileobj=open('data.pdf','rb')
pdfreader=PyPDF2.PdfFileReader(pdffileobj)

text = ""
for i in range(pdfreader.numPages):
    pageobj=pdfreader.getPage(i)
    text+=pageobj.extractText() + "\n\n"
print(text)

Example PyPDF2 output:

 Bexley Police Department Page: 1    
 Case Report 05/03/2022
            Incident #: 22BEX-4158-OF
                Call #:   22-4158
 Date/Time Reported:  05/01/2022 1821
   Report Date/Time:  05/01/2022 2013
             Status:  Incident Open
  Reporting Officer:  Patrol Officer Mellison Davis
          Detective:  Detective Darren Briley
  Approving Officer:  Patrol Sergeant Robert Holdren
          Signature:  ______________________________
          Signature:  ______________________________
 #  OFFENSE(S)                                       ATTEMPTED    TYPE       DEGREE                  
 LOCATION TYPE:  SINGLE FAMILY HOME          Zone: North
 236 N MERKLE RD
 BEXLEY OH 43209
 1 Assault                                            N Misdemeanor 1
 2903.13                  L         636.03A                     1
                    FINE: 300.00
       CRIMINAL ACTIVITY: None/Unknown
      WEAPON/FORCED USED: Other Weapon
  AGGR. ASSAULT/HOMICIDE: Other Circumstances
 #  VICTIM(S)                                        SEX RACE       AGE  SSN        PHONE             
 1 ********************************************* F * 15 NOT AVAIL ************
 ******************************
 ******************************
 DOB: ******************************
 EMPLOYER: ******************************  ·  ************
 ETHNICITY: ********************
 VICTIM CONNECTED TO OFFENSE NUMBER(S): 1    
 #  PERSON(S)                        PERSON TYPE     SEX RACE       AGE  SSN        PHONE             
 1 ******************************** PARENT M * 53   *********** ************
 ******************************
 ******************************
 DOB: **********
 2 ******************************** PARENT F * 43   *********** ************
 ******************************
 ******************************
 DOB: **********
 3 ******************************** PARENT F * 50   *********** ************
 ******************************
 ******************************
 DOB: **********
   


 Bexley Police Department Page: 1    
  NARRATIVE FOR PATROL OFFICER MELLISON M DAVIS   
              Ref:   22BEX-4158-OF
   
On 5/1/2022 a Bexley resident advised that their child was assaulted by known parties.   
  ** Portions of this report have been redacted **

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants