Vague suggestion: Utilities for parsing strings #166

Aran-Fey · 2025-01-14T02:20:06Z

Parsing strings has turned out to be unexpectedly challenging. I've spent the last hour trying to figure out what text encoding is used in multipart form data, and I still haven't got a clue. There's no Content-Type/charset header anywhere to be found, and some sources say it's utf8 while others say HTTP requests are ISO-8859-1.

So, it would be very nice if this module had builtin support for parsing strings. Ideally in a way that works together with the ListTarget so that we can also parse lists of strings.

The text was updated successfully, but these errors were encountered:

siddhantgoel · 2025-01-14T18:05:52Z

I think the encoding might depend on a bunch of different things, at least going by the RFC. Could you post the raw request body that you're working with?

Aran-Fey · 2025-01-14T20:35:19Z

Here's an example request where the file_names parameter should be "Eine größere Textdatei.txt":

b'------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6\r\nContent-Disposition: form-data; name="file_names"\r\n\r\nEine gr\xc3\xb6\xc3\x9fere Textdatei.txt\r\n------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6\r\nContent-Disposition: form-data; name="file_types"\r\n\r\ntext/plain\r\n------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6\r\nContent-Disposition: form-data; name="file_sizes"\r\n\r\n17\r\n------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6\r\nContent-Disposition: form-data; name="file_streams"; filename="Eine gr\xc3\xb6\xc3\x9fere Textdatei.txt"\r\nContent-Type: text/plain\r\n\r\nM\xc3\xa4use \xc3\xbcberleben\r\n------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6\r\nContent-Disposition: form-data; name="dummy"\r\n\r\ndummy\r\n------geckoformboundaryc1734bfb1ebb04d62438bb4100c2be6--\r\n'

It seems to use utf-8, which matches my website's document.characterSet. Not sure if that's a coincidence or not.

siddhantgoel · 2025-01-15T16:34:44Z

I guess if there's no hint anywhere as to how the browser/client encoded the data, it's hard to say how the string should be obtained on the server side. The RFC has the following piece of text that I found relevant.

In practice, many widely deployed implementations do not supply a
charset parameter in each part, but rather, they rely on the notion
of a "default charset" for a multipart/form-data instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vague suggestion: Utilities for parsing strings #166

Vague suggestion: Utilities for parsing strings #166

Aran-Fey commented Jan 14, 2025

siddhantgoel commented Jan 14, 2025

Aran-Fey commented Jan 14, 2025 •

edited

Loading

siddhantgoel commented Jan 15, 2025

Vague suggestion: Utilities for parsing strings #166

Vague suggestion: Utilities for parsing strings #166

Comments

Aran-Fey commented Jan 14, 2025

siddhantgoel commented Jan 14, 2025

Aran-Fey commented Jan 14, 2025 • edited Loading

siddhantgoel commented Jan 15, 2025

Aran-Fey commented Jan 14, 2025 •

edited

Loading