Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close on WriteResponse errors #15

Merged
merged 3 commits into from
Feb 24, 2025

Conversation

lattwood
Copy link
Contributor

@lattwood lattwood commented Feb 7, 2025

It turns out when net/rpc calls WriteResponse, it doesn't do anything with the error! (it does correctly propagate errors when WriteRequest is called though)

This means this codec needs to handle connection disposal, because there's no way for someone using net/rpc to find out about the error and handle it- we're in the twilight zone.

Additionally, because net/rpc is frozen upstream, there is a lack of desire to fix even the logging aspect, let alone the underlying defect: golang/go#71559

The WriteResponse and WriteRequest methods both use the same underlying write method, so this fix took three commits-

  • adding the first test coverage to this package, to prove de-DRYing doesn't break anything
  • de-DRYing by inlining write into WriteResponse and WriteRequest
  • the actual fix, along with a test for it

This puts an end to two years of "spooky behaviour at a distance" on our large and geographically distributed Nomad cluster.

I will also be opening a PR against hashicorp/consul-net-rpc for the same code change.

requested tags: @schmichael, @tgross

Fixes: hashicorp/nomad#23305

Introduce tests covering both WriteRequest and WriteResponse paths to
ensure that when inlining the implementation of MsgpackCodec.write in
the next commit, that it behaves correctly.

The tests validate the following functionality:
- proper round-trip encoding/decoding of RPC requests and responses when
  using buffered and unbuffered I/O
- the codec decodes even when a nil target is provided
- subsequent read/write operations immediately return io.EOF after the
  codec is closed
Remove the shared write() helper function by inlining its logic into
both WriteResponse and WriteRequest.
Modify WriteResponse to call cc.Close() immediately if an error occurs
while encoding the response header, encoding the body, or flushing the
buffered writer. This prevents further use of a codec that encountered
an error, ensuring that subsequent operations return io.EOF. This is
*not* needed for WriteRequest as net/rpc propagates that error to the
caller, and a comment has been added to that method explaining the
difference.
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for sticking with this, I know it's been a long and winding road!

schmichael added a commit to hashicorp/go-msgpack that referenced this pull request Feb 7, 2025
schmichael added a commit to hashicorp/go-msgpack that referenced this pull request Feb 7, 2025
tgross pushed a commit to hashicorp/go-msgpack that referenced this pull request Feb 20, 2025
Port the fix from hashicorp/net-rpc-msgpackrpc#15

**For consistency only.** To the best of my knowledge no HashiCorp tools use this code. They only use the serialization code from this package and all use hashicorp/net-rpc-msgpackrpc for RPC.

Also add @hashicorp/nomad-eng to CODEOWNERS as in hashicorp/go-sockaddr#75

Team will also need repo write access.
schmichael added a commit to schmichael/net-rpc-msgpackrpc that referenced this pull request Feb 20, 2025
Add @hashicorp/nomad-eng as owners. Nomad needs to merge  hashicorp#15 to fix hashicorp/nomad#23305

@hashicorp/nomad-eng needs write access as well.
@schmichael schmichael mentioned this pull request Feb 20, 2025
@tgross tgross merged commit 9cb600a into hashicorp:master Feb 24, 2025
2 checks passed
tgross added a commit that referenced this pull request Feb 24, 2025
Update go-msgpack to pick up the bug fix described in
hashicorp/go-msgpack#26, which mirrors the work done
here in #15
tgross added a commit that referenced this pull request Feb 24, 2025
Update go-msgpack to pick up the bug fix described in
hashicorp/go-msgpack#26, which mirrors the work done
here in #15

Also bump go version for CI to a supported version.
tgross added a commit that referenced this pull request Feb 24, 2025
Update go-msgpack to pick up the bug fix described in
hashicorp/go-msgpack#26, which mirrors the work done
here in #15

Also bump go version for CI to a supported version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pending allocations stuck in pending state after adoption by a new deployment
3 participants