-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very large form conversion fails due to timeout #171
Comments
6 minutes is bigger than the default tcp timeout interval of 5 minutes, is probably why. possibly the solution is to change the |
Closing after discussion with @lognaturel. We can always reopen if it's still an issue. |
I've tried with v2024.2.1 on three large forms: 15.6MB, 11.9MB, and 9.7MB. All three upload, convert, and preview just fine. |
Just ran into a 496KB form with 6728 rows that uploads but fails conversion with a 504.
I can confirm that increasing proxy_read_timeout in files/nginx/odk.conf.template from 2m to 5m works. Any danger in making that the default? |
That's interesting to me, as the proxy_read_timeout is set on Backend (paths that start with /v1). To me, that means that the timeout isn't happening in the communication between Backend and pyxform-http, but between the client and Backend. I wonder how much time pyxform is taking to convert the form vs. how much time Backend is taking to do other things like creating the form schema. (I'm reminded a little of getodk/central-backend#662.)
I don't know the answer to that offhand, but it's certainly something we could look into. We've seen a 504 in another issue recently as well, #691.
Just wanted to note that I'm a little surprised by this size! If you've been able to upload a 15MB form, I'm surprised that 496KB would be a problem. |
I think that's a great question. @yanokwa could you please send the form to me and @lindsay-stevens so we can see what the pyxfrom perf is like? |
File size can be unrelated to the amount of useful data. For example having lots of direct cell formatting can result in lots of "styles" saved for the document, or there could be lots of comments, or used-but-empty cells beyond the main table range, or other XLSX features. File size can also be unrelated to XLSForm complexity - it could be a very large but simple form, or a smaller and complex one. For pyxform, complexity generally means lots of things that have to be parsed, checked, cross-referenced / looked up: e.g. pyxform reference replacements (like As for the 496KB form with 6728 rows, it has: ~1000 answerable questions, ~2000 notes, ~500 calculate items, ~1000 groups (up to 4 levels of nesting), ~2000 constraints, ~2000 pyxform references, ~50 choice lists with a total ~500 options. No repeats, media, or translations. Uncompressed file size is ~4MB, of which ~2MB is the survey sheet, and ~200KB for ~400 document comments. On the survey sheet, when saved as CSV (excluding When I run XLS2XForm conversion with pyxform master (b65e7277 ~v2.2.0), the 496KB file takes ~5 minutes. It takes ~6 seconds to read the data and prepare the initial internal form structure, and the rest is spent in the So, definitely room for improvement in pyxform in regards to processing large forms. These kinds of issues don't become obvious until there's a certain quantity of e.g. cross references, groups, nesting levels, etc. Although I think the docs advise against designing large forms? Or at least limiting complexity. |
Super useful information. Thank you, @lindsay-stevens! One thing I'm wondering is whether we should wait to see whether changes to pyxform would improve the situation, or whether we should we go ahead and up the Central timeout. I'm also wondering, could different endpoints have different timeouts? Maybe it's not a bad idea to keep the timeout lower for most endpoints, but I also don't see much harm in increasing the timeout for the form upload endpoint. I also want to make sure I understood what @yanokwa wrote here:
Does this mean that Central was able to create the form eventually? That is, sometime after the 504 was received, you could see the form in the form list? I think it's probably surprising to users that an error message doesn't mean that the request failed. If others think so too, I could file a separate issue about that. We could change how that error is messaged on Frontend ("Your request took too long, and Central stopped waiting for it. It may eventually complete."). Or we could set up Backend to actually stop working and roll back the database transaction after 2 minutes (or whatever we decide for the request timeout). That way, an error response would mean that the request was unsuccessful / had no effect. |
Is that maybe a typo? Given the calls you list, I'd expect that to be higher.
That's a good idea.
That's a little bit of a terrifying message? But I can't immediately think of something better... |
Nope, I never saw the form after the 504.
I'd prefer to wait to see how quickly @lindsay-stevens can get a performance increase shipped. We have proxy_read_timeout as workaround if someone needs it. |
Validate really took 6 seconds. It's pyxform doing some weird stuff. Currently working on a branch here - that form is now producing the same output in about 20 seconds, including Validate time. The first commit tidied up some dict management and cached each element's XPath which brought it to about 45s total. The second commit is starting to tackle the main issue which seems to be @yanokwa / @lognaturel let me know if you want to ship this pronto and I'll prioritise adding test cases (so far just been using that form and cross checking with existing test suite). Otherwise I could keep going to optimise for large forms with repeats and many references - might help XLSForm/pyxform#453 as well. |
🕺 💃 🪩
Let's keep going a bit and see what you find! I think we'll aim to release |
Let's close this issue for now. If we get more similar form conversion timeouts, let's look at them from a |
It sounds like the request actually did fail for @yanokwa, so maybe we don't need to make any big change here. I still think a better error message would be nice though, something other than "Something went wrong: error code 504." I have a PR up for that at getodk/central-frontend#1052.
If 504s do sometimes fail, then maybe we don't need to worry our users unnecessarily. I've removed the second sentence. |
When I upload a 1.6MB XLSForm on the test server, it fails with "Something went wrong: the server returned an invalid error."
The pyxform and service containers don't show any errors, but the nginx container does.
The XML version uploads just fine.
Obviously something is timing out, but I don't know yet where the timeout is happening. Findings so far...
The text was updated successfully, but these errors were encountered: