-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for various document upload formats #571
Comments
There are multiple dataprep components here(Not supported by helm chart deploy yet), does any of this satisfy your requirement? |
Discussed this recently with Padma. @yongfengdu DocSum application supports currently PDF, docx, audio and mp4 video formats. However, while data-prep service may support also images, DocSum app does not currently use data-prep service. PS. this ticket would be more relevant for the Comps (or Examples) repo where such support is implemented, than for this (k8s integration) one. |
This does not a feature request for upload data format. The embedding, retrieval, ranking and LLM to support it. It's a huge feature. |
I think Padma was referring to summarization of its OCR'ed text content. @Padmaapparao ? |
need to be able to upload "forms" in text or word documents, scanned images, pdf docs, json files, jpeg/png images, mp4 and other video clips, and audio clips for "Q/A, summarization " etc with OPEA RAG.
Pipeline should be able to consume any type of upload and extract the content (chunk...)
The text was updated successfully, but these errors were encountered: