You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a first step in the extraction process we need to scope the different formats that the Ex. 21 documents come in so we can have a better understanding of what layouts to try to extract from first. Create a document with the following:
The content you are editing has changed. Please copy your edits and refresh the page.
Once we've conducted this overview, we'll have a better idea of how to create a representative sample of documents to begin extracting from, or which layout to start with.
The text was updated successfully, but these errors were encountered:
Began creating a google doc to track this, so far main issue is that sometimes Ex. 21 tables are within an HTML table while other times they're part of a HTML body. Even when using pandas read_html to read in from an HTML table there needs to be some serious reformatting to get the schema right. This might require categorizing into the different formats/schemas.
As a first step in the extraction process we need to scope the different formats that the Ex. 21 documents come in so we can have a better understanding of what layouts to try to extract from first. Create a document with the following:
Tasks
Once we've conducted this overview, we'll have a better idea of how to create a representative sample of documents to begin extracting from, or which layout to start with.
The text was updated successfully, but these errors were encountered: