Skip to content
This repository has been archived by the owner on Oct 29, 2024. It is now read-only.

feat(document): improve convert to markdown #310

Merged
merged 5 commits into from
Sep 5, 2024
Merged

Conversation

chuang8511
Copy link
Member

Because

  • we do not extract images from PDF
  • we do not recognise tables & LaTex in current converter

This commit

  • improve pdfplumber and extract image base64

Copy link

linear bot commented Aug 28, 2024

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

Attention: Patch coverage is 7.14286% with 65 lines in your changes missing coverage. Please review.

Project coverage is 35.55%. Comparing base (f63657a) to head (492199a).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
operator/document/v0/pdf_to_markdown_converter.go 0.00% 41 Missing ⚠️
operator/document/v0/markdown_transformer.go 0.00% 17 Missing ⚠️
...erator/document/v0/convert_document_to_markdown.go 30.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #310      +/-   ##
==========================================
- Coverage   37.19%   35.55%   -1.64%     
==========================================
  Files         154      187      +33     
  Lines       19452    24110    +4658     
==========================================
+ Hits         7235     8573    +1338     
- Misses      11161    14216    +3055     
- Partials     1056     1321     +265     
Flag Coverage Δ
unittests 35.55% <7.14%> (-1.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chuang8511 chuang8511 merged commit b531c13 into main Sep 5, 2024
8 checks passed
@chuang8511 chuang8511 deleted the chunhao/ins-5953 branch September 5, 2024 13:38
jvallesm pushed a commit that referenced this pull request Sep 10, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.27.0-beta](v0.26.0-beta...v0.27.0-beta)
(2024-09-10)


### Features

* add asana component
([#311](#311))
([d0e1335](d0e1335))
* add chroma component
([#300](#300))
([9fa57f6](9fa57f6))
* add Freshdesk component
([#274](#274))
([b22203f](b22203f))
* **compogen:** add reference to integrations on Setup block
([#323](#323))
([497f785](497f785))
* **document:** improve convert to markdown
([#310](#310))
([b531c13](b531c13))
* **document:** improve pdf to markdown
([#320](#320))
([adae85a](adae85a))
* **instill:** adopt latest Instill Model protobufs
([#315](#315))
([52b7d78](52b7d78))
* **text:** add table and list concept into markdown chunking
([#317](#317))
([60d5fdb](60d5fdb))


### Bug Fixes

* **chroma:** change interface
([#319](#319))
([1da8988](1da8988))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: 👋 Done
Development

Successfully merging this pull request may close these issues.

2 participants