This module provides a fast incremental non-blocking parser for RFC7578
multipart/form-data
, as well as blocking alternatives for easier use in
WSGI or CGI applications:
PushMultipartParser
: Incremental and non-blocking (SansIO) parser suitable for ASGI, asyncio and other time or memory constrained environments.MultipartParser
: Streaming parser that yields memory- or disk-bufferedMultipartPart
instances.parse_form_data(environ)
andis_form_request(environ)
: Convenience functions for WSGI applications with support for bothmultipart/form-data
andapplication/x-www-form-urlencoded
form submissions.
pip install multipart
- Pure python single file module with no dependencies.
- Well tested with inputs from actual browsers and HTTP clients. 100% test coverage.
- Parses multiple GB/s on modern hardware (see benchmarks).
- Quickly rejects malicious or broken inputs and emits useful error messages.
- Enforces configurable memory and disk resource limits to prevent DoS attacks.
Scope: This parser implements multipart/form-data
as defined by HTML5
and RFC7578 and aims to support all browsers or HTTP clients in use today.
Legacy browsers are supported to some degree, but only if those workarounds do
not impact performance or security. In detail this means:
- Just
multipart/form-data
, not suitable for email parsing. - No
multipart/mixed
support (deprecated in RFC7578). - No
base64
orquoted-printable
transfer encoding (deprecated in RFC7578). - No
encoded-word
orname=_charset_
encoding markers (deprecated in HTML5). - No support for clearly broken clients (e.g. invalid line breaks or headers).
Here are some basic examples for the most common use cases. There are more parameters and features available than shown here, so check out the docstrings (or your IDEs built-in help) to get a full picture.
For WSGI application developers we strongly suggest using the parse_form_data
helper function. It accepts a WSGI environ
dictionary and parses both types
of form submission (multipart/form-data
and application/x-www-form-urlencoded
)
based on the actual content type of the request. You'll get two MultiDict
instances in return, one for text fields and the other for file uploads:
from multipart import parse_form_data, is_form_request
def wsgi(environ, start_response):
if is_form_request(environ):
forms, files = parse_form_data(environ)
title = forms["title"] # type: string
upload = files["upload"] # type: MultipartPart
upload.save_as(...)
Note that form fields that are too large to fit into memory will end up as
MultipartPart
instances in the files
dict instead. This is to protect
your app from running out of memory or crashing. MultipartPart
instances are
buffered to temporary files on disk if they exceed a certain size. The default
limits should be fine for most use cases, but can be configured if you need to.
See MultipartParser
for details.
Most WSGI web frameworks already have multipart functionality built in, but you may still get better throughput for large files (or better limits control) by switching parsers:
forms, files = multipart.parse_form_data(flask.request.environ)
If you are in the unfortunate position to have to rely on CGI, but can't use
cgi.FieldStorage
anymore, it's possible to build a minimal WSGI environment
from a CGI environment and use that with parse_form_data
. This is not a real
WSGI environment, but it contains enough information for parse_form_data
to do its job. Do not forget to add proper error handling.
import sys, os, multipart
environ = dict(os.environ.items())
environ['wsgi.input'] = sys.stdin.buffer
forms, files = multipart.parse_form_data(environ)
The parse_form_data
helper may be convenient, but it expects a WSGI
environment and parses the entire request in one go before it returns any
results. Using MultipartParser
directly gives you more control and also
allows you to process MultipartPart
instances as soon as they arrive:
from multipart import parse_options_header, MultipartParser
def wsgi(environ, start_response):
content_type, params = parse_options_header(environ["CONTENT_TYPE"])
if content_type == "multipart/form-data":
stream = environ["wsgi.input"]
boundary = params["boundary"]
charset = params.get("charset", "utf8")
parser = MultipartParser(stream, boundary, charset)
for part in parser:
if part.filename:
print(f"{part.name}: File upload ({part.size} bytes)")
part.save_as(...)
elif part.size < 1024:
print(f"{part.name}: Text field ({part.value!r})")
else:
print(f"{part.name}: Test field, but too big to print :/")
The MultipartParser
handles IO and file buffering for you, but relies on
blocking APIs. If you need absolute control over the parsing process and want to
avoid blocking IO at all cost, then have a look at PushMultipartParser
, the
low-level non-blocking incremental multipart/form-data
parser that powers
all the other parsers in this library:
from multipart import PushMultipartParser, MultipartSegment
async def process_multipart(reader: asyncio.StreamReader, boundary: str):
with PushMultipartParser(boundary) as parser:
while not parser.closed:
chunk = await reader.read(1024*64)
for result in parser.parse(chunk):
if isinstance(result, MultipartSegment):
print(f"== Start of segment: {result.name}")
if result.filename:
print(f"== Client-side filename: {result.filename}")
for header, value in result.headerlist:
print(f"{header}: {value}")
elif result: # Result is a non-empty bytearray
print(f"[received {len(result)} bytes of data]")
else: # Result is None
print(f"== End of segment")
Code and documentation are available under MIT License (see LICENSE).