Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing largeish sheets/xlsx-files(30-50MB) seems very slow/unable to complete writing files from std::strings #241

Open
og-yona opened this issue Mar 15, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@og-yona
Copy link

og-yona commented Mar 15, 2024

Hello!

I'm working with a project where I need to handle semi-large csv- and xlsx-files, and I tried to add OpenXLSX as part of my project for handling reading and writing the xlsx-file part.

Reading the example xlsx-file with 5 sheets to std::string-storages finishes in 7 seconds, which is very nice and fast.

But when trying to write the same data from std::strings as a new xlsx -file, the process keeps getting exponentially slower the more data/sheets it has already written. Basically OpenXLSX was unable to complete writing the data back to xlsx. I waited for 1,5 hours and had to kill the process becouse it was seemingly stuck at writing one sheet.

Writing a cell/row at a time makes basically no difference.

My problem might be related to this issue: #154

Is there any way to skip the shared strings -checks, and just write everything as plain strings? Or does someone have any other tips which might make writing files actually usable when dealing with larger random string-data?

image

@og-yona og-yona changed the title Writing largeish sheets/xlsx-files(30-50MB) seems very slow/unable to complete writing files Writing largeish sheets/xlsx-files(30-50MB) seems very slow/unable to complete writing files from std::strings Mar 15, 2024
@og-yona
Copy link
Author

og-yona commented Mar 15, 2024

Answering for myself, and for future reference in case someone is having the same issue.

Looking around the openxlsx files I managed to find a sort of fix, at least for my case.:

in XLCellValue.cpp i commented out lines 402, 405 and 409:

// ===== Set the type attribute.
m_cellNode->attribute("t").set_value("s");
// ===== Get or create the index in the XLSharedStrings object.
auto index = (m_cell->m_sharedStrings.stringExists(stringValue) ? m_cell->m_sharedStrings.getStringIndex(stringValue)
: m_cell->m_sharedStrings.appendString(stringValue));
// ===== Set the text of the value node.
m_cellNode->child("v").text().set(index);

and uncommented lines 412 and 413 instead:

// m_cellNode->attribute("t").set_value("str");
// m_cellNode->child("v").text().set(stringValue);

without touching the following lines, uncommenting these at 415-419 caused problems....

// auto s = std::string_view(stringValue);
// if (s.front() == ' ' || s.back() == ' ') {
// if (!m_cellNode->attribute("xml:space")) m_cellNode->append_attribute("xml:space");
// m_cellNode->attribute("xml:space").set_value("preserve");
// }

Saving my earlier example file was now done in less than 30 seconds, which is around what I was hoping for:
image

edit: accidently closed the issue, not sure if my comment/uncomment tweak counts actually as solving this whole issue.
image

@aral-matrix
Copy link
Collaborator

Subscribed myself to this issue so I can have a look into it eventually.
The shared strings logic in the case that you describe might indeed benefit from handling an ordered set in memory, and only writing the XML file once (from that set) when the document is saved.

@aral-matrix aral-matrix self-assigned this Jan 10, 2025
@aral-matrix aral-matrix added the enhancement New feature or request label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants