Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(esp_http_client): Fix host header for IPv6 address literal (IDFGH-14640) #15388

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thelazt
Copy link

@thelazt thelazt commented Feb 13, 2025

An IPv6 IP that occurs in the 'Host:' header of an HTTP request must be enclosed in square brackets

Description

When performing an HTTP request to an IPv6 address via URL, for example

esp_http_client_config_t config = {
    .url = "http://[2001:db8::1]/path",
};

the HTTP request header of the ESP IDF will contain

Host: 2001:db8::1

This IPv6 format is invalid and webservers will refuse to process the request.
The correct host in the header would be

Host: [2001:db8::1]

Background

HTTP RFC 7230 Section 5.4 defines the Host header field as:

Host = uri-host [ ":" port ] ; Section 2.7.1

and explicitly references Section 2.7.1, which specifies that host needs to conform to RFC 3986 Section 3.2.2.

The RFC 3986 Section 3.2.2 mandates that, a host identified by an Internet Protocol literal address, version 6 or later, needs to be enclosed in square brackets within an http URI:

host        = IP-literal / IPv4address / reg-name
IP-literal = "[" ( IPv6address / IPvFuture  ) "]"

Therefore, an IPv6 occuring in the Host header of an HTTP request must be enclosed by square brackets.

Fix

The attached solution will test for a : character in the _get_host_header helper function to determine if the host is an IPv6 address, and in such a case create the header hostname with the address in square brackets.
This is a very simple approach and effectively the same check as GNU Wget uses.

I have also considered alternatives:
One would be testing for an valid IPv6 address (e.g., using inet_pton), however, this would probably add overhead and additional dependencies (lwip).

Another solutions would employ the parser to track if the URL contains an IPv6 literal.
While this would be a rather clean solution, it would require to changes components/http_parser/http_parser.c, in particular tracking the occurance of s_http_host_v6_start.
Such a solution would be similar to Curl, which sets an IPv6 IP flag.
However, for ESP IDF such a solution would mean to adjust the HTTP parser (which, to the best of my knowledge, is an outdated version originating from the now unmaintained Node.js HTTP parser).
Furthermore, it would not handle the case of providing config->host (instead of config->url) for the esp_http_client_init call.

Related

There have been similar issues in other projects, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=45891

Testing

Without the fix, a request to a lighttpd) server fails.
With the fix, it succeeds.
Using network package capturing, the fixed host header can be seen:
esp_http_client_ipv6_pcap


Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

Copy link

github-actions bot commented Feb 13, 2025

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello thelazt, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against f31a0f7

@github-actions github-actions bot changed the title fix(esp_http_client): Fix host header for IPv6 address literal fix(esp_http_client): Fix host header for IPv6 address literal (IDFGH-14640) Feb 13, 2025
@espressif-bot espressif-bot added the Status: Opened Issue is new label Feb 13, 2025
@safocl
Copy link
Contributor

safocl commented Feb 13, 2025

and additional dependencies (lwip).

isn't lwip part of the network stack for esp32?
https://github.com/espressif/esp-idf/tree/master/components/lwip

@thelazt
Copy link
Author

thelazt commented Feb 14, 2025

isn't lwip part of the network stack for esp32?

Yes, the function in question is implemented in components/lwip/lwip/src/api/sockets.c - my comment referred to the includes of the esp_http_client.c file, but this point may indeed be rather negligible.

My main concern with this alternative approach is whether the higher overhead of inet_pton over strchr is worth it, since this call is made on every HTTP request:

if (host != NULL && host[0] != '[' && strchr(host, ':') != NULL)

vs

char tmp[INET6_ADDRSTRLEN];
if (inet_pton(AF_INET6, host, tmp))

I cannot see any valid real-world cases where this would make a difference, and since GNU Wget uses the same strchr check, I think this simpler and leaner approach is sufficient.

@safocl
Copy link
Contributor

safocl commented Feb 14, 2025

since this is a static function -- maybe it's worth checking somewhere higher in the API?

@safocl
Copy link
Contributor

safocl commented Feb 14, 2025

esp_http_client_config_t config = {
.url = "http://[2001:db8::1]/path",
};

if config with a "http://\[2001:db8::1\]/path" url string?

perhaps it is simply necessary to change the documentation, indicating that the user should monitor the valid value of this entity

@thelazt
Copy link
Author

thelazt commented Feb 14, 2025

if config with a "http://\[2001:db8::1\]/path" url string?

perhaps it is simply necessary to change the documentation, indicating that the user should monitor the valid value of this entity

This does not solve the problem:
Since the URL http://\[2001:db8::1\]/path is invalid, it will not pass the project's HTTP parser: http_parser_parse_url (inside the esp_http_client_set_url call) will return an error and the HTTP client initialization will fail.

For the (valid) URL http://[2001:db8::1]/path, the HTTP parser will return the following values:

  • UF_SCHEMA = http
  • UF_HOST = 2001:db8::1
  • UF_PATH = /path

The value of UF_HOST will be assigned to client->connection_info.host -- it is crucial that the value does not contain square brackets, since it is also used for establishing the transport connection.
We have also verified this part on the ESP: We get errors in the log and no packets recorded by tcpdump.

But there are additional problems along the way:
Even if we extend the HTTP parser by introducing a new HTTP parser URL field that contains the IPv6 address including the square brackets (we have implemented this as well), we have to store this host header value in an additional field in connection_info_t (requiring additional memory to be allocated just to cover the case of an IPv6 address as host).

Furthermore, this will not solve the case of calling the HTTP client with host and path instead of URL, as no HTTP parser will be called.

With

esp_http_client_config_t config = {
    .host = "2001:db8::1",
    .path = "/path"
};

the _get_host_header will not contain the square brackets.

If we use

esp_http_client_config_t config = {
    .host = "[2001:db8::1]",
    .path = "/path"
};

instead, the HTTP header would be correct, but this config will not succeed in establishing a connection to the host (because esp_transport does not handle hosts with square brackets).

since this is a static function -- maybe it's worth checking somewhere higher in the API?

With the findings from above, I am convinced that there is no appropriate place to integrate this somewhere higher in the API.
Also, I don't think there is any way to send a proper HTTP request using an IPv6 address with the current ESP IDF, so tweaking the documentation will not prevent this problem.

Instead, this is a bug in the code that is probably unlikely to occur often (as some web servers may be quite forgiving of bad headers, and using IPv6 addresses for HTTP is not that common).
However, since we discovered the bug in real-world scenarios, and the proposed fix is quite simple and cheap (in terms of overhead), I think it is worth considering merging it.

By the way: To test the results of the HTTP parser, one can use the following snippet - with URLs to test as CLI parameters.

#include <stdio.h>
#include <string.h>
#include "http_parser.h"

#define xstr(s) str(s)
#define str(s) #s

int main(int argc, const char * argv[]) {
	printf("HTTP Parser version: %d.%d.%d\n", HTTP_PARSER_VERSION_MAJOR, HTTP_PARSER_VERSION_MINOR, HTTP_PARSER_VERSION_PATCH);
	for (int i = 1; i < argc; i++) {
		struct http_parser_url purl;
		const char * url = argv[i];
		printf("\nURL: '%s'", url);
		if (http_parser_parse_url(url, strlen(url), 0, &purl) != 0) {
			puts(" could not be parsed!");
			continue;
		} else {
			puts(":");
		}
#define printfield(NAME) do { \
		if (purl.field_set & (1 << NAME)) { \
			printf(" - " xstr(NAME) " = \"%.*s\"\n", purl.field_data[NAME].len, url + purl.field_data[NAME].off); \
		} \
	} while (0)
		printfield(UF_SCHEMA);
		printfield(UF_HOST);
		printfield(UF_PORT);
		printfield(UF_PATH);
		printfield(UF_QUERY);
		printfield(UF_FRAGMENT);
		printfield(UF_USERINFO);
	}
	return 0;
}

@nileshkale123
Copy link
Collaborator

Hello @thelazt ,

Thank you for your contribution.
The changes look good to me, but I have a small code update request. Please make the update so we can proceed with merging this PR.

@thelazt
Copy link
Author

thelazt commented Feb 17, 2025

Thanks for your comments and suggestions, @nileshkale123

I agree that there is redundancy in the code that can be removed.
I would suggest a slightly different variant, which I think would slightly improve readability and prevent any future static analyzer from complaining about an unused data argument:

    // Check if host is an IPv6 address literal that needs square brackets according to RFC3986
    bool is_ipv6 = (host != NULL && host[0] != '[' && strchr(host, ':') != NULL);
    if (port != DEFAULT_HTTP_PORT && port != DEFAULT_HTTPS_PORT) {
        err = asprintf(&host_name, is_ipv6 ? "[%s]:%d" : "%s:%d", host, port);
    } else {
        err = asprintf(&host_name, is_ipv6 ? "[%s]" : "%s", host);
    }

I have -Wformat-extra-args in mind, but the current static analyzers I have tried are happy with your version as well.

Please let me know if you are happy with my version or if I should rather stick to your suggestion, I will make the changes accordingly.

@thelazt thelazt force-pushed the esp_http_client_ipv6reference branch 3 times, most recently from f831080 to 4e4c3b6 Compare February 17, 2025 13:58
@mahavirj
Copy link
Member

sha=4e4c3b6f8b07424cd4e42001eb088a797c9a91af

@mahavirj mahavirj added the PR-Sync-Merge Pull request sync as merge commit label Feb 17, 2025
@safocl
Copy link
Contributor

safocl commented Feb 17, 2025

in the code bool is_ipv6 = (host != NULL && host[0] != '[' && strchr(host, ':') != NULL); the code host != NULL && should be taken out as a separate check (if it is necessary to do this here at all). otherwise format will depend on whether host is nullptr (and nullptr will be passed to the function asprintf)

@safocl
Copy link
Contributor

safocl commented Feb 17, 2025

review on code is not visible?

@thelazt
Copy link
Author

thelazt commented Feb 17, 2025

in the code bool is_ipv6 = (host != NULL && host[0] != '[' && strchr(host, ':') != NULL); the code host != NULL && should be taken out as a separate check (if it is necessary to do this here at all). otherwise format will depend on whether host is nullptr (and nullptr will be passed to the function asprintf)

According to POSIX, printf("%s", NULL) is indeed undefined behavior.
However, this case is usually handled in actual printf implementations (by printing (null)).

Regarding whether the check is needed:
Right now, there are only two calls to the function in question, both in esp_http_client_init, and both paths have null pointer checks.
Accordingly, it is currently ensured that the host is not a null pointer when calling _get_host_header.
Still, I think it is good practice to make sure in the same scope that we are not dereferencing a null pointer (with host[0]).

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: Opened Issue is new labels Feb 18, 2025
@safocl
Copy link
Contributor

safocl commented Feb 18, 2025

in the code bool is_ipv6 = (host != NULL && host[0] != '[' && strchr(host, ':') != NULL); the code host != NULL && should be taken out as a separate check (if it is necessary to do this here at all). otherwise format will depend on whether host is nullptr (and nullptr will be passed to the function asprintf)

According to POSIX, printf("%s", NULL) is indeed undefined behavior. However, this case is usually handled in actual printf implementations (by printing (null)).

Regarding whether the check is needed: Right now, there are only two calls to the function in question, both in esp_http_client_init, and both paths have null pointer checks. Accordingly, it is currently ensured that the host is not a null pointer when calling _get_host_header. Still, I think it is good practice to make sure in the same scope that we are not dereferencing a null pointer (with host[0]).

this is logic bug -- is_ipv6 depend on host != NULL -- when it turn false the either "%s:%d" or "%s" will be as fmt -- no other checks will be performed.

@safocl
Copy link
Contributor

safocl commented Feb 18, 2025

Still, I think it is good practice to make sure in the same scope that we are not dereferencing a null pointer (with host[0]).

so such a check needs to be taken outside the given expression. This is not the invariant of is_ipv6 expression.

@safocl
Copy link
Contributor

safocl commented Feb 18, 2025

Regarding whether the check is needed:
Right now, there are only two calls to the function in question, both in esp_http_client_init, and both paths have null pointer checks.
Accordingly, it is currently ensured that the host is not a null pointer when calling _get_host_header.

That's why the check was not carried out until your correction.

@thelazt
Copy link
Author

thelazt commented Feb 18, 2025

@safocl: I am sorry, but I am not sure what you are recommending:

Should I just remove the host != NULL check in is_ipv6 or, in additon to this change, also introduce an if (host == NULL) { return NULL; } check right before the is_ipv6 declaration?

@safocl
Copy link
Contributor

safocl commented Feb 18, 2025

@safocl: I am sorry, but I am not sure what you are recommending:

Should I just remove the host != NULL check in is_ipv6 or, in additon to this change, also introduce an if (host == NULL) { return NULL; } check right before the is_ipv6 declaration?

either this or that. I think we need to remove this check.

An IPv6 IP that occurs in the 'Host:' header of an HTTP request must be enclosed
in square brackets (RFC3986 section 3.2.2).

Searches for ':' in the host string to efficiently determine if the host is an
IPv6 IP address.
@thelazt thelazt force-pushed the esp_http_client_ipv6reference branch from 4e4c3b6 to f31a0f7 Compare February 19, 2025 09:43
@thelazt
Copy link
Author

thelazt commented Feb 19, 2025

either this or that. I think we need to remove this check.

@safocl I made a small change regarding your request:
The host != NULL is now asserted before the first access to the variable (and thus no longer part of the is_ipv6 assignment).

My intention is to prevent an invalid memory access in case the _get_host_header function is used in another part of the code in the future without proper host checking, but also to allow for easy optimization.

I hope this has allayed your concerns.

@safocl
Copy link
Contributor

safocl commented Feb 19, 2025

either this or that. I think we need to remove this check.

@safocl I made a small change regarding your request: The host != NULL is now asserted before the first access to the variable (and thus no longer part of the is_ipv6 assignment).

My intention is to prevent an invalid memory access in case the _get_host_header function is used in another part of the code in the future without proper host checking, but also to allow for easy optimization.

I hope this has allayed your concerns.

Yes, that's okay. Thank you.

@nileshkale123
Copy link
Collaborator

sha=f31a0f7f61156aaf4fe81cb57cef4d9229b83603

@nileshkale123 nileshkale123 added the PR-Sync-Update Pull request sync fetch new changes label Feb 21, 2025
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-Sync-Merge Pull request sync as merge commit PR-Sync-Update Pull request sync fetch new changes Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants