Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy JMAP requests in a Murder #3564

Merged
merged 17 commits into from
Sep 10, 2021
Merged

Proxy JMAP requests in a Murder #3564

merged 17 commits into from
Sep 10, 2021

Conversation

ksmurchison
Copy link
Contributor

@ksmurchison ksmurchison commented Jul 20, 2021

[edit from elliefm: There's tests for this in cyrusimap/cassandane/pull/151]

@ksmurchison
Copy link
Contributor Author

ksmurchison commented Jul 22, 2021

I think this is finally done (other than Cass test). It properly proxys JMAP requests and JMAP/WS connections over both HTTP/1.1 and HTTP/2.
It might be easier to look at the changes as a whole, rather than the individual commits but YMMV.

@ksmurchison ksmurchison requested review from rsto and elliefm July 22, 2021 14:05
Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the bigger picture well enough to spot potential abstraction problems, but taken line by line it seems okay except for a couple of possible edge cases

Copy link
Member

@rsto rsto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane to me, but bear in mind that I haven't grokked all of the murder code, yet. I'd like to approve when we see a Cassandane test working.
I can confirm that my local build does not yield any compiler errors. However, the whole current Cassandane Murder test suite fails with valgrind errors.

@ksmurchison
Copy link
Contributor Author

ksmurchison commented Jul 28, 2021

I've looked into adding a JMAP Murder test to Cass and it looks non-trivial. I might need some help in getting the infrastructure built, mainly getting a second httpd instance running.

@ksmurchison ksmurchison force-pushed the jmap-murder-proxy branch 2 times, most recently from 6712eca to fda7e7e Compare July 29, 2021 14:35
@elliefm
Copy link
Contributor

elliefm commented Jul 29, 2021

I've looked into adding a JMAP Murder test to Cass and it looks non-trivial. I might need some help in getting the infrastructure built, mainly getting a second httpd instance running.

Do you have a working local setup? If so, can you send me your imapd.conf files from the frontend, backend, and mupdate server? (If the frontend and mupdate server are the same server, that's actually great, because that's the setup Cassandane uses.)

I figured out the long way how to get it working for IMAP back when I wrote the Murder suite, but I'm not sure what the config needs to look like to have it proxy HTTP as well -- but I know the infrastructure fairly well, so once I know what the config needs to look like I think I can make Cassandane do it.

@ksmurchison
Copy link
Contributor Author

I have frontend and mupdate as the same server. Other than that, its the same setup as IMAP in a Murder, with httpd on frontend and backend (with JMAP module enabled)

@elliefm
Copy link
Contributor

elliefm commented Jul 29, 2021

The trouble is, I don't understand the IMAP murder config quite well enough to know how to generalise it, I just know what I needed to do to wrestle it into working 😅 So it'd really help to see examples of working HTTP, cause e.g. I'm not sure where I'll need to configure the port numbers?

Though I suppose you might be using port 80/443 by default and not actually have port numbers explicitly configured, whereas Cassandane needs to sticks everything on :910x. Hmm.

@ksmurchison
Copy link
Contributor Author

Fair enough. I will cobble together a minimal config for you

@elliefm
Copy link
Contributor

elliefm commented Jul 30, 2021

Great, thanks!

@ksmurchison
Copy link
Contributor Author

Actually, do any of the tests in Murder.pm actually proxy a client request from the frontend to the backend? They all appear to talk to a frontend or backend directly.

@elliefm
Copy link
Contributor

elliefm commented Aug 1, 2021

No idea really. The ones that talk to the frontend expect the frontend to do whatever it needs to do to service the request. If none of them exercise anything that needs to be proxied, then we need to add some tests for things that do get proxied.

I might be misunderstanding how murder proxying works. I wrote these tests mostly in the dark, so I don't expect them to be either correct or thorough as they are. They're better than the nothing we previously had, but they still need a lot more work.

@ksmurchison
Copy link
Contributor Author

@elliefm So, I took a look at the Cass Murder config and I think we can use the "include port in servername" trick for JMAP, but this trick can only work for one protocol at a time. So, we wouldn't be able to have an IMAP service run on the backend, just HTTP. This is because any service running on the proxy (IMAP, JMAP, POP) will try to connect to the port encoded in servername.

@elliefm
Copy link
Contributor

elliefm commented Aug 4, 2021

Ahh interesting. So I think, then, that instead of having just $want->{murder} as the only flag, we'd want to have separate $want->{imapmurder} and $want->{jmapmurder} (and corresponding :IMAPMurder and :JMAPMurder magics to set those flags), and then have _create_instances set up whichever one is wanted, and die hard if both are wanted. And then the existing murder tests would need to be switched to be explicitly IMAP, and then we'd be able to add new tests that are explicitly JMAP. I don't know if we'd want to split the IMAP and JMAP murder tests into two separate suites, or keep them in the Murder suite and just use the :IMAPMurder / :JMAPMurder magics to configure them correctly.

@ksmurchison
Copy link
Contributor Author

This sounds like a reasonable plan to me.

@elliefm
Copy link
Contributor

elliefm commented Aug 25, 2021

@ksmurchison A first stab at what we've discussed about testing this is in cyrusimap/cassandane#151

@elliefm
Copy link
Contributor

elliefm commented Aug 27, 2021

@ksmurchison What behaviour would you expect to see when making a request directly against a backend that does not have that user's account on it?

I have three tests so far:

  • connect to frontend, expect JMAP request to succeed, with a "via:" header indicating that it was proxied (works)
  • connect directly to backend1 (where the account is), expect JMAP request to succeed, without a "via:" header because it was not proxied (works)
  • connect directly to backend2 (account does not exist), expect it to not work, but in what way?

At the moment I get an "internal server error" from backend2, and I'm not sure if this is the correct thing to expect, or if something is awry. I think I would expect some kind of "mailbox does not exist" or "access denied", or something along those lines, not "internal server error".

But it's worth keeping in mind that the sasl replacement Cassandane uses returns "OK" for any user unless the username is the string "bad", so we're in a maybe slightly unusual situation where a user can successfully authenticate to a backend that their account is not on -- and maybe that's why it's winding up in an internal server error situation instead.

What do you think?

Here's the full transaction:

---------- cassandane Fri Aug 27 09:57:54 2021

<1630022274<POST /jmap/upload/cassandane/ HTTP/1.1
Host: 127.0.0.1:9103
Content-Type: text/plain
User-Agent: Mail-JMAPTalk/0.15
Authorization: Basic ...                 
Content-Length: 9

some test>1630022274>HTTP/1.1 500 Internal Server Error
Date: Thu, 26 Aug 2021 23:57:54 GMT
Connection: Upgrade
Upgrade: h2c
Alt-Svc: h2c=":9103"
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
Content-Length: 553

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>500 Internal Server Error</title>
  </head>
  <body>
    <h1>Internal Server Error</h1>
    <p>The server encountered an internal error.</p>
    <hr>
    <address>Cyrus-HTTP/3.5.0-alpha0-1135-gc9eec8fafe Cyrus-SASL/2.1.27 Lib/XML2.9.4 Jansson/2.12 OpenSSL/1.1.1d Nghttp2/1.36.0 Wslay/1.1.1 Zlib/1.2.11 Brotli/1.0.7 Xapian/1.5.0 LibiCal/3.0 ICU4C/68.2 SQLite/3.27.2 Server at 127.0.0.1 Port 9103</address>
  </body>
</html>

@elliefm
Copy link
Contributor

elliefm commented Aug 27, 2021

Ahh, syslog reveals where that error comes from:

Aug 27 11:05:47 debian 0105450103/http[28707]: could not autoprovision calendars for userid cassandane: Invalid user
Aug 27 11:05:47 debian 0105450103/http[28707]: auth_success returned error: 500 Internal Server Error

@elliefm
Copy link
Contributor

elliefm commented Aug 27, 2021

If I take "caldav" and "carddav" out of httpmodules (leaving just "jmap"), then jmap direct to backend2 responds 404 not found, which makes sense.

So I guess maybe we've newly discovered a murder bug in the calendar autoprovisioning, but that's not related to this PR.

@elliefm
Copy link
Contributor

elliefm commented Aug 27, 2021

I think I'm happy with this, based on my WIP tests. It successfully proxies a simple JMAP request when it's supposed to, and doesn't proxy it when it's not supposed to.

I think/assume the JMAP implementation is fairly consistent, such that if one JMAP command proxies correctly they all should -- but if there's any that are known to be weird and we should test specifically, please let me know!

@rsto You also mentioned wanting to see a Cassandane test for this -- are the tests in cyrusimap/cassandane#151 enough to satisfy you that this PR works? (Half of that PR is infrastructure changes; the interesting tests are Murder.jmap_frontend_commands, Murder.jmap_backend1_commands, and Murder.jmap_backend2_commands)

I'm leaving the Cassandane tests PR as a draft for now because I have more work/refactoring I want to do on the infrastructure part.

@elliefm
Copy link
Contributor

elliefm commented Sep 10, 2021

Hmm.

Running MurderJMAP.frontend_commands with this branch but rebased onto current master, under valgrind, finds a memory leak in the frontend httpd.

Since the source for the build I'm testing doesn't exist anywhere as its own entity, the line numbers quoted below might not correspond with line numbers in your local files. So you'll probably wanna run this yourself to get real line numbers...

logs[1470] Valgrind errors from file /dev/shm/cass/0301420304/vglogs/httpd.9891
logs[1470] ==9891== 32 bytes in 1 blocks are definitely lost in loss record 79 o
logs[1470] ==9891==    at 0x483577F: malloc (vg_replace_malloc.c:299)
logs[1470] ==9891==    by 0x4CC7477: xmalloc (xmalloc.c:55)
logs[1470] ==9891==    by 0x4C0829F: prot_addwaitevent (prot.c:548)
logs[1470] ==9891==    by 0x1AD8CA: proxy_findserver (proxy.c:185)
logs[1470] ==9891==    by 0x1CE7A0: meth_post (http_jmap.c:481)
logs[1470] ==9891==    by 0x1A15E6: process_request (httpd.c:1953)
logs[1470] ==9891==    by 0x1A1BEF: http1_input (httpd.c:2091)
logs[1470] ==9891==    by 0x1A1F5D: cmdloop (httpd.c:2197)
logs[1470] ==9891==    by 0x19EE34: service_main (httpd.c:1054)
logs[1470] ==9891==    by 0x1CD8EB: main (service.c:647)
logs[1470] ==9891== 

Looks like the thing that's being leaked is a struct prot_waitevent in (struct backend *)->timeout. It's usually cleaned up by proxy_downserver() specifically, but is notably left alone by backend_disconnect(). So I wonder if we're winding up on a codepath where backend_disconnect is called rather than proxy_downserver.

Though, I've just noticed that cmdloop (httpd.c:2197) is:

        if (ret == HTTP_SHUTDOWN) {
            syslog(LOG_WARNING,
                   "Shutdown file: \"%s\", closing connection", conn->close_str);
===>        shut_down(0);
        }

So we're actually in the process of shutting down when we allocate this new prot_waitevent that then gets leaked? That's weird.

It's also weird that I don't see that "Shutdown file: ..., closing connection" log line anywhere. There is a "Received bad client magic byte string, closing connection" from the backend1 httpd, and an "end of file reached, closing connection" from the frontend httpd.

For what it's worth it's the frontend httpd that the leak is being reported in.

I'm really confused by the valgrind backtrace telling me we came from the "Shutdown file:" block, but that log line not appearing in syslog. So I don't trust the report, but I don't know how to proceed further.

@elliefm
Copy link
Contributor

elliefm commented Sep 10, 2021

It's also weird that I don't see that "Shutdown file: ..., closing connection" log line anywhere. There is a "Received bad client magic byte string, closing connection" from the backend1 httpd, and an "end of file reached, closing connection" from the frontend httpd.

Oh, I changed branch while briefly distracted with something else and forgot to change back, so disregard everything I said about being in a shut down, I had the wrong line numbers myself! 😅

cmdloop (httpd.c:2197) is actually this, which is just ordinary input, nothing to do with shut_down. That makes more sense!

            else if (!ret) {
                /* HTTP/1.x request */
===>            http1_input(&txn);
            }

@ksmurchison ksmurchison merged commit 44b572d into master Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants