Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Refactor protocol sockets (TCP, UDP, UNIX) #6929

Conversation

straight-shoota
Copy link
Member

Closes #5778

The protocol socket implementations previously all inherited Socket (and hence IO) which leads to problems in the type hierarchy. This is more detailed in the issue.

This PR splits the Sockets library into a lower level API Socket::Raw (previously Socket) which gives full access to the underlying OS socket implementation. Protocol sockets only wrap an instance of this raw socket and provide a protocol-specific API wrapper.Old type hierarchy:

Old type hierarchy:

  • IO
    • Socket < IO::Buffered
      • TCPSocket
        • TCPServer < Server::Socket
      • UNIXSocket
        • UNIXServer < Server::Socket
      • UDPSocket

New type hierarchy:

  • IO
    • Socket::Raw < IO::Buffered
    • TCPSocket
    • UNIXSocket
  • TCPServer < Server::Socket
  • UNIXServer < Server::Socket
  • UDPSocket

Implementation

The protocol sockets are simply wrapping a Socket::Raw and delegating all socket interaction to that. They are immutable and could just be structs to be super-lightweight.
However, since IO is a class, and TCPSocket and UNIXSocket need to inherit IO, they have to be classes (for this use case we could think about revisting #4901 or something else).
All other implementations (server sockets and UDPSocket) are structs.

They sometimes share similar behaviour in delegating IO operations, TCP settings etc. but there is not really a point in creating a subordinate hierarchy for this. Instead, src/socket/delegates.cr provides a few macros that define common methods in the socket implementations. This is perhaps not really elegant, but the best solution I could come up with. They could later be expanded directly into the respective types to avoid macro calls in favour of a little bit code duplication, once we're settled on the general design.

Based on #6717

@ysbaddaden
Copy link
Contributor

I don't like Socket::Raw. It's confusing because it's not a RAW socket. I really prefer Socket < IO which makes much more sense in the general case. The refactor is far too complex for my taste: many macros that inject methods that delegate to composed objects...

Obviously, TCPServer/UNIXServer shouldn't inherit TCPSocket/UNIXServer but compose them, and just implement a few methods (open and accept), maybe UDPSocket should do same, but that's about it. No need to make it all more complex than necessary.

@straight-shoota
Copy link
Member Author

Yes, Socket::Raw could just be Socket. I'd prefer to not call it Socket and keep it more in the background because people shouldn't use it typically. But it could have a different name than Raw.

I explicitly mentioned that the macros can be removed later in the process, it just keeps it easier while refactoring this PR. But we just need a lot of delegators for this to work.

I don't get your last paragraph. Could you elaborate what you mean exactly?

@straight-shoota
Copy link
Member Author

Okay, let's please not bikeshed about the completely indifferent naming of a class. It should be clear to everyone what its purpose is. Naming can be discussed once we settle on the general API design.

Copy link
Contributor

@j8r j8r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick review to share some thoughts

src/socket/raw.cr Show resolved Hide resolved
src/socket/raw.cr Show resolved Hide resolved
src/socket/raw.cr Outdated Show resolved Hide resolved
src/socket/unix_server.cr Show resolved Hide resolved
@bcardiff
Copy link
Member

I would prefer to refactor what the comment suggested in #4277 (comment) , and what I think @ysbaddaden is saying: change TCPServer and UnixServer to not inherit Socket and narrow the interface of them.

I see value in exposing as much functionality of the OS, but I would rather don't do it at the same time.

@straight-shoota
Copy link
Member Author

@bcardiff

change TCPServer and UnixServer to not inherit Socket and narrow the interface of them.

That's the main part of what this PR does.
It also removes TCPSocket and UNIXSocket inheriting from Socket to provide a clean API focused on what's relevant to the respective protocol.

I see value in exposing as much functionality of the OS, but I would rather don't do it at the same time.

I'm not sure what you mean. This PR doesn't expose anything more or less than before. It just applies composition instead of inheritance to the socket implementations.

@j8r Thanks for your review, but I currently don't care about implementation details. We still need to figure out the API design. I marked your comments as resolved but will revisit them when it's time.

@@ -483,6 +554,61 @@ class Socket::Raw < IO
arg
end

# Returns `true` if the Nable algorithm is disabled.
def tcp_nodelay?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tcp_ methods should go on TCP Socket/Server and use {get,set}sockopt directly. I imagine Socket::Raw to have "no sugar", and the methods on it would be exactly the BSD Socket API. That even includes no read/write calls, using send and recv methods only. Then TCP::Socket would be an IO but delegate read/write to recv/send. The less complexity is in Socket::Raw, the easier it is to port.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great, I had intended to detail this option in the description but forgot to mention it. I'm fine with that 👍

# Creates a new raw socket for TCP protocol.
#
# Consider using `TCPSocket` or `TCPServer` instead.
def self.tcp(family : Family, *,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also rather these constructors went, people are definitely capable of using :TCP/:UDP enums.

@@ -12,7 +20,15 @@ class IPSocket < Socket
IPAddress.from(sockaddr, addrlen)
end

# Returns the `IPAddress` for the remote end of the IP socket or `nil` if it
# is not connected.
def remote_address?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that I needed it at some point during refactoring the specs. It's not completely thought out, but it came in handy.
But it's certainly up for debate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's even better to remove it from this PR and discuss it separately.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 13, 2018

Socket.new is the generic way to access sockets, without any arbitrary limitation or requiring to create a new type, composing a confusing new type, and require macros to inject method delegation. Like it or not, this is meant to stay —BSD sockets can be considered a confusing mess, I agree, yet limiting what we can achieve with them would be nonsensical.

The consensus among the core team is for the TCPServer abstraction to stop inheriting from TCPSocket, because a server isn't an IO steam. Nothing more.

Unless this is leading to something more complex than beneficiary, I'll gladly accept a pull request doing just that; same for UNIXServer of course.

I'm more wary of breaking the UDPSocket < Socket inheritance, that can quickly become a messy batch of delegated methods, and I'd like to see this tackled later.

@RX14
Copy link
Contributor

RX14 commented Oct 13, 2018

The consensus among the core team is for the TCPServer abstraction to stop inheriting from TCPSocket, because a server isn't an IO steam. Nothing more.

That's not true.

I don't think anyone is proposing arbitrary limitations to sockets here, in fact just to seperate concerns cleanly between a BSD-api Socket class and idiomatic Crystal TCP::* classes.

However, if separating the server classes is something we do all agree on, we can do that first. But I would like to see that as a first step not the last.

`Socket::Address` and `Socket::Addrinfo` conceptually don't dependent on
`Socket`. The shared constants and lib includes are extracted to common.cr to separate
these concerns more cleary.
These methods are variants of `#local_address` and `#remote_address`
that don't raise when the socket is not connected but return `nil`.
Explicitly communicates dependencies between Socket library components.
With upcoming changes, `Socket` will no longer be the supertype of the
socket implementations returned by `Socket::Server#accept` (i.e. `TCPSocket`,
`UNIXSocket`). These sockets have in common that they provide a stream
interface - and that's already the actual use case of `Socket::Server`.

There are some use cases for socket servers that accept clients that
don't operate as streams. These are however relatively rare and
most importantly, semantically different from the actual purpose of
`Socket::Server`. An interface where `#accept` could return stream and
non-stream clients doesn't provide any useful value. Pretty much the
only thing they have in common is that you can close them.
The raw socket implementation will continue to provide a full interface
to the Socket API provided by the operating system.
Users should typically use the socket implementations for specific
protocols and the raw socket should stay in the background because it is
only useful for very specific needs.
The socket implementations are transformed from a huge inheritance
hierarchy to component based composition. The specific protocol sockets
all wrap a `Socket::Raw` and only provide the features relevant to the
protocol. Thus, for example `UDPSocket` and the server sockets no
longer inherit from `IO` and their respective client socket implementations.

Old type hierarchy:

* IO
  * Socket
    * TCPSocket
      * TCPServer < Server::Socket
    * UNIXSocket
      * UNIXServer < Server::Socket
    * UDPSocket

New type hierarchy:

* IO
  * Socket::Raw
  * TCPSocket
  * UNIXSocket
* TCPServer < Server::Socket
* UNIXServer < Server::Socket
* UDPSocket
@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 15, 2018

I've added a commit that moves protocol-specific methods from Socket::Raw to the protocol socket types as proposed by @RX14 in #6929 (comment)

This still uses macros for some of the duplicated code, but that's just for easier development and can be resolved later on in this PR so that the (duplicated) code lives directly inside the respective types.

The downside of this is that the protocol sockets now reference LibC constants as arguments to getsockopt/setsockopt which might actually make it more complicated to port. Otherwise it was just Socket::Raw that has platform-dependent code.

I am not sure however about refactoring IO out of Socket::Raw because it is essentially designed as an IO. For now, I have left all IO-related methods in Socket::Raw.

@jkthorne
Copy link
Contributor

I still like Socket over Socket::Raw I know this was mentioned as bikeshedding earlier but I think it adds an empty namespace and I think it makes things less clear.

@asterite
Copy link
Member

Sorry if this was discussed before, but what's the problem with tcp socket and the other types being IOs?

@asterite
Copy link
Member

Also, class vs struct isn't very important here, one doesn't create thousands of sockets or servers.

@straight-shoota
Copy link
Member Author

@asterite TCPSocket should be an IO, but TCPServer or UDPSocket not.

@asterite
Copy link
Member

asterite commented Oct 15, 2018

Then I don't understand the difference between Socket and Socket::Raw

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSD sockets are versatile. They can be IO, they can send/receive messages, they can be neither but servers that will create IO sockets, or something entirely different. I.e. BSD sockets are confusing by design and they can be whatever.

Are there any real world issues with the current inheritance solution that would require a redesign? As far as I recall, there has been one complaint about an invalid example in TCPServer. Nothing else.

The "issue" at hand seems purely theoretical, like "a developer could use TCPServer or UDPSocket wrongly and we must absolutely prevent it". The proposed solution introduces a complex refactor, that trades inherited methods for macros that inject lots of methods that merely delegate; it introduces a confusing Socket::Raw type without any reason (just use Socket).

I believe good documentation of socket types, how they're expected to be used, would fix the theoretical issue in a more efficient manner than this complex redesign.

@straight-shoota
Copy link
Member Author

@ysbaddaden Crystal is a language with a focus on type safety. The stdlib currently makes this wobbly in places like this, where inheritance completely misguides the actual behaviour of a class. UDPSocket can't be used as an IO so it shouldn't inherit it. TCPServer is not a TCPSocket so it shouldn't inherit. etc. see #5778 for details. It's really confusing to have an API where protocol specific implementations inherit all the multi-purpose features of the parent Socket type making a lot of methods available where they don't make any sense. This obscures the purpose of these classes and makes it really hard for users to understand what they're actually supposed to do with them.

The proposed solution introduces a complex refactor,

Well, the API is really wonky right now. This PR cleans it up. So the PR itself has some complexity (although it should be comprehensible when looking at the individual commits) but it reduces overall complexity by separating concerns and making the API simpler and thus easier to understand and use.

that trades inherited methods for macros that inject lots of methods that merely delegate;

I've already mentioned it several times now that the delegator macros are not meant to stay. Please stop repeating this nonsense argument again and again.

it introduces a confusing Socket::Raw type without any reason (just use Socket).

How is Socket::Raw in any way more confusing than current Socket? It's just been renamed because 90% of socket users won't use it directly. As already mentioned, the naming is completely irrelevant to the structural architecture and can be up for discussion. If you insist that Socket::Raw should stay being called Socket, I'm fine with that. I don't think it's the wisest idea, but it doesn't matter much.

@asterite
Copy link
Member

Is Socket::Raw exposed to the user? If not then the name doesn't mater. Otherwise, maybe it can be named Socket::Base or something like that.

@straight-shoota
Copy link
Member Author

@asterite Yes, it's a public API. It can be used to directly access the OS socket, for example for less common socket types (like Bluetooth connections, Unix datagram) where we don't provide protocol-specific socket implementations.

Socket::Base is also fine.

@asterite
Copy link
Member

So, I'm seeing now that Socket is a module and no longer a class. Can't Socket::Raw be renamed to Socket then? I think this will remove the confusion.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 16, 2018

Yes, it could be renamed back to Socket. See #6929 (comment) #6929 (comment)

But discussing the naming that class is really not the main concern right now.

@RX14
Copy link
Contributor

RX14 commented Oct 16, 2018

If you look at the TCPSocket docs right now as a new user who doesn't know the BSD socket API already, you're lost. And that's a sign of a terrible API. The BSD sockets API is confusing. Added to that, the inheritance makes it such that all the important methods are lost from the page.

That's why I want to remove all the inheritance and replace it with composition. Not because of type safety, but because the current API is impossible to effectively document.

The socket API isn't meant to be object-oriented, so it shouldn't be. There should be one Socket class. Then, to make that Socket class easy to use, it makes sense to wrap that Socket class in TCPClient and TCPServer classes.

Making Socket not be an IO I don't really care about. If we don't want that, Socket can keep being an IO.

So to set clear goals (for me):

  • I want Socket to be a pure representation of the whole of the BSD API, and usable for weird edgecases, including platform-specific stuff like netlink sockets.
  • I want {TCP,UDP,UNIX}::{Client,Server} to be simple to use and well documented classes which hide the complexities of the underlying sockets API.

The only way I can think to achieve that is to remove the inheritance.

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still feel more comfortable if the initial work would be to only split the server vs socket issue.

But it others want to overhaul the socket hierarchy let's try to be sure that all previous use cases are supported.

# ```
# UNIXServer.new("/tmp/dgram.sock", Socket::Type::DGRAM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is a UNIXServer created with DRAM type? Some docs left use dgram.sock but I am a bit lost how is that functionality preserved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using UNIXServer/UNIXSocket with DGRAM type is no longer supported. It would be too weird and contradict the purpose of this PR if a stream socket (implementing IO) could be used as a datagram socket.

We could add an additional type for this but I don't think this use case is anywhere common enough for that. If you want to send datagrams over UNIX sockets, you should simply use Socket::Raw.

@@ -0,0 +1,180 @@
module Socket
# :nodoc:
macro delegate_close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these macros could be modules that are included. If so, I'm ok with reusing code using that artifact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe. But I'm not sure it makes much sense. It doesn't hurt to just duplicate those methods (it's typically only two or three uses for each). I would only use a module if it communicates some sort of common behaviour that could be useful as an individual trait.

Maybe the IO delegator methods could be put in a IO::Delegator module. This might actually be reusable in other places (HTTP::Server::Response, HTTP::UnknownLengthContent, or IO::Hexdump come to mind).

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 16, 2018

100% agree to @RX14.

Compare https://crystal-lang.org/api/0.26.1/TCPSocket.html with https://straight-shoota.github.io/crystal/sockets-refactor/TCPSocket.html

@bcardiff

I would still feel more comfortable if the initial work would be to only split the server vs socket issue.

This could probably be split into separate PRs. I'm happy to reorganize the commits if it helps to push this forward.

@RX14
Copy link
Contributor

RX14 commented Oct 16, 2018

This is still a big PR, is there any way we could split this up more once we're all agreed on the general approach?

For example, separating out some of the minor commits to a separate file, and I think at this point we're mostly agreed that Socket doesn't have to change names.

@RX14
Copy link
Contributor

RX14 commented Oct 16, 2018

Also, we can leave moving the getsockopt calls to the wrapper classes till later if ever. For now it's best to leave them on Socket and wrap them.

@straight-shoota
Copy link
Member Author

The first six commits are only preparatory and can be split off. In fact, I had already submitted #6707 and #6717 as solitary PRs but they didn't receive much appreciation. The other PRs created in preparation for this #6700 #6703 #6704 #6705 #6706 #6710 #6711 and #6909 have already been merged, though 👍

Also, we can leave moving the getsockopt calls to the wrapper classes till later if ever. For now it's best to leave them on Socket and wrap them.

I'm not sure if that's better. But we can leave it out and focus on porting the Sockets API to windows first. This needs fibers, though.

@straight-shoota
Copy link
Member Author

Closing stale PR.

@straight-shoota straight-shoota deleted the jm/feature/sockets-refactor branch November 19, 2020 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Sockets API
8 participants