You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.
@RatanShreshtha pointed to this PR: urllib3/urllib3#1715, asking whether it applies to us. I looked into it and realized it involves a larger issue that I don't understand, so I'm moving it from chat into a proper issue :-)
So: Body encoding is tricky. In HTTP/1.0, the only option was to use Content-Length. Of course this caused problems for streaming uploads. So HTTP/1.1 added the option to use Transfer-Encoding: chunked. For response bodies, this is fine – we can handle whichever one the server gives us. But for request bodies, we have a problem: we have to pick which one we're using before we've talked to the server, and we don't know yet whether it's a HTTP/1.0 server or a HTTP/1.1 server!
So in some cases you MUST use Content-Length, because the server doesn't support anything else, and in some cases you MUST use Transfer-Encoding: chunked, because you don't know the length of the data ahead of time and can't afford to generate it all up front in-memory (either because it's too big, or because your semantics actually demand streaming, e.g. consider a Travis worker streaming logs up to the central Travis server). There's no way to automatically detect which case we're in. So, it kind of has to be exposed to the user as an option they can set.
In upstream urllib3, this is done through the chunked=True/False kwarg. In our branch currently, I guess the only way to do it is by explicitly setting the Content-Length or Transfer-Encoding headers? And if you don't set either, then we unconditionally use Transfer-Encoding: chunked, see _add_transport_headers in connectionpool.py.
I don't think our current thing is going to work... there are a lot of cases where using Content-Length is easy and free (e.g. when the body is specified as an explicit string, or a file whose length we can query). And since Content-Length is more broadly compatible, we need to be using it in these cases; anything else will be a regression compared to other HTTP libraries. But there are a bunch of other cases that are trickier, and I'm not sure what we want to do with them.
One thing we definitely know is what the user passed as the body. If there's no body, that's always Content-Length: 0 (or possibly no header at all, for some methods like GET where Content-Length: 0 is the default). If the body is a fixed-length thing we know up front, then that's always Content-Length: <the length>. Or it could be something more complicated, like an arbitrary iterable or a file object that doesn't support seek/tell/fstat.
So one option would be to say that we default to Content-Length in the former cases, and default to Transfer-Encoding: chunked in the latter case, and if you want to override that then you can set the header explicitly.
Another option would be to do like urllib3 does now, and have an explicit chunked= kwarg. Then we have to decide:
If we get chunked=False + and iterable body, what do we do? I guess the options are:
Immediately read the whole iterable into memory, so we can figure out its length
Give an error, saying that the user should either use chunked=True or set a Content-Length
If the user doesn't specify chunked=, what's the default? I guess the options are:
chunked=False
chunked="guess"
This is a 2x2 space of possibilities.
The text was updated successfully, but these errors were encountered:
This makes sense for a higher-level API but we're not at this level yet, so for now I think we can look at the headers and if they're not there just perform a simple auto-detection to see if we have an iterator.
See urllib3/urllib3@3e586ef, we already look at existing headers, and already check if the body exists, so we only need to check if the body is an iterator to avoid chunking when it's not. requests uses this code:
is_stream = (
hasattr(data, '__iter__')
and not isinstance(data, (basestring, list, tuple, Mapping)))
And I guess that's enough for now? I don't think we want to provide the chunked parameter, because advanced users can already specify this differently, and well, urllib3 has too many parameters!
We need to make sure that this plays well with retries: if the first request uses chunked encoding, then we should continue to do so with other requests. This is the bug in urllib3 that prompted urllib3#1715 and urllib#1734.
I'd say the way Requests handles this is the proper technique and has worked well for them. I'll note here that I'd like to change how multipart forms are encoded so a typical use-case for them can rely on Content-Length instead of chunking.
@RatanShreshtha pointed to this PR: urllib3/urllib3#1715, asking whether it applies to us. I looked into it and realized it involves a larger issue that I don't understand, so I'm moving it from chat into a proper issue :-)
So: Body encoding is tricky. In HTTP/1.0, the only option was to use
Content-Length
. Of course this caused problems for streaming uploads. So HTTP/1.1 added the option to useTransfer-Encoding: chunked
. For response bodies, this is fine – we can handle whichever one the server gives us. But for request bodies, we have a problem: we have to pick which one we're using before we've talked to the server, and we don't know yet whether it's a HTTP/1.0 server or a HTTP/1.1 server!So in some cases you MUST use
Content-Length
, because the server doesn't support anything else, and in some cases you MUST useTransfer-Encoding: chunked
, because you don't know the length of the data ahead of time and can't afford to generate it all up front in-memory (either because it's too big, or because your semantics actually demand streaming, e.g. consider a Travis worker streaming logs up to the central Travis server). There's no way to automatically detect which case we're in. So, it kind of has to be exposed to the user as an option they can set.In upstream urllib3, this is done through the
chunked=True/False
kwarg. In our branch currently, I guess the only way to do it is by explicitly setting theContent-Length
orTransfer-Encoding
headers? And if you don't set either, then we unconditionally useTransfer-Encoding: chunked
, see_add_transport_headers
inconnectionpool.py
.I don't think our current thing is going to work... there are a lot of cases where using
Content-Length
is easy and free (e.g. when the body is specified as an explicit string, or a file whose length we can query). And sinceContent-Length
is more broadly compatible, we need to be using it in these cases; anything else will be a regression compared to other HTTP libraries. But there are a bunch of other cases that are trickier, and I'm not sure what we want to do with them.One thing we definitely know is what the user passed as the body. If there's no body, that's always
Content-Length: 0
(or possibly no header at all, for some methods likeGET
whereContent-Length: 0
is the default). If the body is a fixed-length thing we know up front, then that's alwaysContent-Length: <the length>
. Or it could be something more complicated, like an arbitrary iterable or a file object that doesn't supportseek
/tell
/fstat
.So one option would be to say that we default to
Content-Length
in the former cases, and default toTransfer-Encoding: chunked
in the latter case, and if you want to override that then you can set the header explicitly.Another option would be to do like urllib3 does now, and have an explicit
chunked=
kwarg. Then we have to decide:chunked=False
+ and iterable body, what do we do? I guess the options are:chunked=True
or set aContent-Length
chunked=
, what's the default? I guess the options are:chunked=False
chunked="guess"
This is a 2x2 space of possibilities.
The text was updated successfully, but these errors were encountered: