Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

www/caddy: v2.9.0 breaks proxying OPNSense Web GUI #4471

Closed
3 tasks done
dcrdev opened this issue Jan 15, 2025 · 19 comments
Closed
3 tasks done

www/caddy: v2.9.0 breaks proxying OPNSense Web GUI #4471

dcrdev opened this issue Jan 15, 2025 · 19 comments
Assignees
Labels
upstream Third party issue

Comments

@dcrdev
Copy link

dcrdev commented Jan 15, 2025

Important notices
Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug
The update to caddy-v2.9.0 breaks proxying the OPNSense WebGUI - there are intermittent cases where Caddy returns a http 400. This has been reported by several others here https://caddy.community/t/access-to-opnsense-throws-a-400-bad-request-after-upgrading-caddy-from-2-8-4-to-2-9/29525

To Reproduce
Steps to reproduce the behaviour:

Add a handler pointing to OPNsense:

  • https
  • skip TLS verify.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Image

Relevant log files

2025-01-15T23:18:14	Error	caddy	"warn","ts":"2025-01-15T23:18:14Z","logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"hell01-router01.dcrdev.com:8443","duration":0.02584911,"request":{"remote_ip":"10.210.1.231","remote_port":"60519","client_ip":"10.210.1.231","proto":"HTTP/2.0","method":"GET","host":"hell01-router01.dcrdev.com","uri":"/apple-touch-icon-precomposed.png","headers":{"Cookie":["REDACTED"],"Accept":["*/*"],"X-Forwarded-For":["10.210.1.231"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["hell01-router01.dcrdev.com"],"User-Agent":["com.apple.WebKit.Networking/20620.1.16.11.8 CFNetwork/1568.300.101 Darwin/24.2.0"],"Accept-Language":["en-GB,en;q=0.9"],"Accept-Encoding":["gzip, deflate, br"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"hell01-router01.dcrdev.com"}},"error":"writing: http2: stream closed"}	
2025-01-15T23:09:28	Error	caddy	"warn","ts":"2025-01-15T23:09:28Z","logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"hell01-router01.dcrdev.com:8443","duration":0.130546344,"request":{"remote_ip":"10.210.1.231","remote_port":"60333","client_ip":"10.210.1.231","proto":"HTTP/2.0","method":"GET","host":"hell01-router01.dcrdev.com","uri":"/api/diagnostics/cpu_usage/stream","headers":{"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["hell01-router01.dcrdev.com"],"Accept":["text/event-stream"],"Sec-Fetch-Site":["same-origin"],"Pragma":["no-cache"],"Sec-Fetch-Mode":["cors"],"Cache-Control":["no-cache"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.2 Safari/605.1.15"],"Sec-Fetch-Dest":["empty"],"Accept-Language":["en-GB,en;q=0.9"],"X-Forwarded-For":["10.210.1.231"],"Cookie":["REDACTED"],"Accept-Encoding":["gzip, deflate, br"],"Referer":["https://hell01-router01.dcrdev.com/ui/core/dashboard"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"hell01-router01.dcrdev.com"}},"error":"reading: context canceled"}

Additional context

  1. Clearing browser cache restores service temporarily (<1 min)
  2. This issue seems to specific to proxying the OPNSense web gui, other services being proxied seem unaffected.
  3. Downgrading the plugin fixes the issue

Environment
OPNsense 24.7.12-amd64
FreeBSD 14.1-RELEASE-p6
Browsers: Safari / Chrome

@Monviech
Copy link
Member

In that thread there is a solution that you can try.

Open the advanced options in the handler and choose HTTP Version - HTTP 1.1

@Monviech
Copy link
Member

The weird part is that I cannot reproduce it. I have tested this with multiple OPNsense WebGUIs of different versions, and it always works for me.

That means it cannot be a general issue for everybody, it must be something specific.

@Monviech Monviech added the upstream Third party issue label Jan 16, 2025
@Monviech
Copy link
Member

For now I guess its an upstream issue or a browser issue, so it might be best to go to:
https://github.com/caddyserver/caddy/issue

I cannot own an upstream ticket since I cannot reproduce it in my environment.

@dcrdev
Copy link
Author

dcrdev commented Jan 16, 2025

Yeah that works - straight away.

I'm curious, when you tested this - was it simply does it load, or did you wait a bit? The issue doesn't manifest immediately - it takes a minute or two to kick in.

@Monviech
Copy link
Member

Monviech commented Jan 16, 2025

Please tell me on which page you wait. When I look at the logs it seems like you wait on the Dashboard?

/api/diagnostics/cpu_usage/stream

I just tested if it loads and I click around a bit, I didn't wait anywhere for too long.

@dcrdev
Copy link
Author

dcrdev commented Jan 16, 2025

It's not specific to any page - but this is a fairly consistent approach for triggering it:

  1. Clear cache
  2. Login
  3. Do something else for 5 mins
  4. Come back try going somewhere in the UI
  5. You should get a 400
  6. 400 will remain and UI will be inaccessible until next cache clear

@Monviech
Copy link
Member

I still cannot replicate the issue:

  1. Closed Chrome on Macbook. Opened a new incognito window so there's no cache
  2. logged in and waited on dashboard
  3. minimized the tab
  4. Did something else for 10 minutes or so
  5. Come back and click something in the GUI
  6. No error

There must be a difference in the OPNsense or Caddy setup of people that experience the issue (like you, someone on the forum this morning https://forum.opnsense.org/index.php?topic=45233.msg226083, and the others in the Caddy forum that did not use this plugin), and others that do not experience it (like me and others we do not know about.)

For now there is not much I can do. Since you have this issue you could see if something can be found out with your logs if you open an issue upstream.

@bucky2076
Copy link

@Monviech
Copy link
Member

I got some more information here from a different user:

https://www.reddit.com/r/opnsense/comments/1i36sq7/comment/m7m7rl0/?context=3

They used TLS TRUST Pool instead of tls insecure skip verify, like I suggest in the OPNsense documentation:

https://docs.opnsense.org/manual/how-tos/caddy.html#reverse-proxy-the-opnsense-webgui

@NG1973
Copy link

NG1973 commented Jan 17, 2025

I switched my setup from 'TLS Insecure Skip Verify' to use 'TLS Trust Pool' instead; however, I still get the 400 Bad Request error when I set the 'HTTP Version' back to default. It only connects succesfully after setting it to HTTP/1.1.

@Monviech
Copy link
Member

Monviech commented Jan 17, 2025

@NG1973 Can you test if it happens if you specifically select HTTP/2.0 ? What is the exact error? Can you enable the Caddy debug logs and post the requests and the error here?

@Monviech
Copy link
Member

I finally got the issue reproduced locally so I will try to hunt this down.

@Monviech Monviech self-assigned this Jan 17, 2025
@Monviech
Copy link
Member

Monviech commented Jan 17, 2025

Can anybody verify that it also stops from happening when HTTP/3 is deselected in General Settings (Only HTTP/1.1 and HTTP/2 selected)

The handler should be default for this test with the default "HTTP/1.1, HTTP/2" selected.

@Monviech
Copy link
Member

I enabled some debug logging in lighttpd.

As soon as h3 is used in the request from client to caddy , the GET request from caddy to the upstream lighttpd contains a body.

In RFC7231 it is stated:
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.1

   A payload within a GET request message has no defined semantics;
   sending a payload body on a GET request might cause some existing
   implementations to reject the request.

This means that lighttpd in its standard settings is not obligated to respond to this GET request with Body. It will send 400 as response.

2025-01-17 20:11:24: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/request.c.317) GET/HEAD with content-length -> 400
2025-01-17 20:11:24: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/h2.c.2380) fd:8 id:3 resp: :status: 400
2025-01-17 20:11:24: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/h2.c.2369) fd:8 id:3 resp: content-type: text/html
2025-01-17 20:11:24: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/h2.c.2369) fd:8 id:3 resp: content-length: 162

This is controlled by the following setting in lighttpd (which should not be changed without reason)
https://redmine.lighttpd.net/projects/lighttpd/wiki/Server_http-parseoptsDetails

server.http-parseopts = ( "method-get-body" => "enable" )

This behavior will only happen when a browser (e.g. Chrome) uses h3 (HTTPS over QUIC) to communicate with Caddy.
Why it happens now after an update of the caddy binary from v2.8.4 to v2.9.1 is an update of the dependency of the http3 go package.

More explained here:
caddyserver/caddy#6678 (comment)

@dcrdev
Copy link
Author

dcrdev commented Jan 18, 2025

Can anybody verify that it also stops from happening when HTTP/3 is deselected in General Settings (Only HTTP/1.1 and HTTP/2 selected)

The handler should be default for this test with the default "HTTP/1.1, HTTP/2" selected.

Can confirm that this works

@Monviech
Copy link
Member

Monviech commented Jan 20, 2025

@dcrdev Thank you for confirming. I have merged a PR that will disable HTTP/3 per default for people that update later or (re)install the plugin. Its not live yet but it will probably be in a new plugin revision soon.

For now that seemed like the best option.

@gstrauss
Copy link

gstrauss commented Jan 21, 2025

This is controlled by the following setting in lighttpd (which should not be changed without reason) https://redmine.lighttpd.net/projects/lighttpd/wiki/Server_http-parseoptsDetails

server.http-parseopts = ( "method-get-body" => "enable" )

FYI this setting is safe to change in lighttpd for lighttpd. lighttpd handles it properly.

The reason the lighttpd default is to reject GET/HEAD with request body is RFC9110 warnings about potential avenues for abuse, as mholt notes in caddyserver/caddy#6678 (comment)

If a HTTP/1.1 proxy vulnerable to request smuggling proxies requests to another proxy which translates HTTP/1.1 to HTTP/2 or to HTTP/3 when connecting to an origin server, this scenario might occur, and not only for HTTP/1.x requests to the origin server. If lighttpd is that origin server, the default in lighttpd is to reject GET/HEAD with request body, as a paranoid default, since GET/HEAD without request body is by far the most common historical behavior for GET/HEAD.

lighttpd provides the configuration option to tell lighttpd to serve the GET/HEAD request, which may include processing the request body, or ignoring the request body, depending on the lighttpd handler.

@Monviech
Copy link
Member

Monviech commented Jan 21, 2025

@gstrauss Thank you for the explanation. I was hesitent because Im not an expert regarding the lighttpd webserver. When I see an option thats not enabled by default, I imply that it has a reason (e.g. its a secure default).

EDIT: I'll offer a PR in opnsense/core to enable this setting and see how the discussion goes. Maybe.

@Monviech
Copy link
Member

Since this behavior has been hotfixed with a configuration change in the caddy plugin, I'll close this as solved.

#4482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

5 participants