Skip to content

Commit

Permalink
fix: add export to xml and html (#17)
Browse files Browse the repository at this point in the history
* added the XML export

Signed-off-by: Peter Staar <[email protected]>

* reformatted all

Signed-off-by: Peter Staar <[email protected]>

* fixed tests

Signed-off-by: Peter Staar <[email protected]>

* added the DocumentTokens class

Signed-off-by: Peter Staar <[email protected]>

* updating the to-xml method

Signed-off-by: Peter Staar <[email protected]>

* updating the to-xml method

Signed-off-by: Peter Staar <[email protected]>

* fixed the to-md method

Signed-off-by: Peter Staar <[email protected]>

* added the strict-text in the to-md method

Signed-off-by: Peter Staar <[email protected]>

* added page-tokens

Signed-off-by: Peter Staar <[email protected]>

* updated the location/page tokens

Signed-off-by: Peter Staar <[email protected]>

* small fix to have correct special document-tokens

Signed-off-by: Peter Staar <[email protected]>

* reformatted the code

Signed-off-by: Peter Staar <[email protected]>

---------

Signed-off-by: Peter Staar <[email protected]>
  • Loading branch information
PeterStaar-IBM authored Sep 18, 2024
1 parent 2f55d92 commit 9bc256e
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docling_core/types/doc/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,21 +410,21 @@ def get_special_tokens(
special_tokens = [token.value for token in cls]

# Adding dynamically generated row and col tokens
for i in range(0, max_rows):
for i in range(0, max_rows + 1):
special_tokens += [f"<row_{i}>", f"</row_{i}>"]

for i in range(0, max_cols):
for i in range(0, max_cols + 1):
special_tokens += [f"<col_{i}>", f"</col_{i}>"]

for i in range(6):
special_tokens += [f"<section-header-{i}>", f"</section-header-{i}>"]

# Adding dynamically generated page-tokens
for i in range(0, max_pages):
for i in range(0, max_pages + 1):
special_tokens.append(f"<page_{i}>")

# Adding dynamically generated location-tokens
for i in range(0, max(page_dimension[0], page_dimension[1])):
for i in range(0, max(page_dimension[0] + 1, page_dimension[1] + 1)):
special_tokens.append(f"<loc_{i}>")

return special_tokens
Expand Down

0 comments on commit 9bc256e

Please sign in to comment.