You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We were building a RAG based solution and it involves lots of PDF file ingestion. So we tried integrating docling and it is doing a great job in PDF parsing and esp. table extraction.
Expectation
It was cleaning up all the header and footer in a file, which is what we needed as a part of cleanup.
Feature requirement
With respect to our knowledge base, we have files with lots of table of content (ToC) kind of pages. Ingesting them creates lots of noise during retrieval. When connected with @cau-git, he mentioned ToC related pages can be added as a new category instead of falling under tables.
The text was updated successfully, but these errors were encountered:
Context
We were building a RAG based solution and it involves lots of PDF file ingestion. So we tried integrating docling and it is doing a great job in PDF parsing and esp. table extraction.
Expectation
It was cleaning up all the header and footer in a file, which is what we needed as a part of cleanup.
Feature requirement
With respect to our knowledge base, we have files with lots of table of content (ToC) kind of pages. Ingesting them creates lots of noise during retrieval. When connected with @cau-git, he mentioned ToC related pages can be added as a new category instead of falling under tables.
The text was updated successfully, but these errors were encountered: