Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read() write() functions return types. #333

Closed
Jakio815 opened this issue Jan 21, 2024 · 4 comments
Closed

read() write() functions return types. #333

Jakio815 opened this issue Jan 21, 2024 · 4 comments

Comments

@Jakio815
Copy link
Collaborator

The return types of the read() and write() functions seem quite odd.

read()

int read_from_socket(int socket, size_t num_bytes, unsigned char* buffer);
//  * @return 0 for success, 1 for EOF, and -1 for an error.

int read_from_socket_close_on_error(int* socket, size_t num_bytes, unsigned char* buffer);
//  * @return 0 for success, -1 for failure.

void read_from_socket_fail_on_error( int* socket, size_t num_bytes, unsigned char* buffer, lf_mutex_t* mutex, char* format, ...);
//  @return The number of bytes read, or 0 if an EOF is received, or a negative number for an error.

write()

int write_to_socket(int socket, size_t num_bytes, unsigned char* buffer);
//  * @return 0 for success, -1 for failure.

int write_to_socket_close_on_error(int* socket, size_t num_bytes, unsigned char* buffer);
//  * @return 0 for success, -1 for failure.

void write_to_socket_fail_on_error( int* socket, size_t num_bytes, unsigned char* buffer, lf_mutex_t* mutex, char* format, ...);

Prob1. Return values.

So, it currently returns 0 for success, -1 for failure.

This seems not to be a general practice for read() and write() functions.

According to the man page of read(),

       On success, the number of bytes read is returned (zero indicates
       end of file), and the file position is advanced by this number.
       It is not an error if this number is smaller than the number of
       bytes requested; this may happen for example because fewer bytes
       are actually available right now (maybe because we were close to
       end-of-file, or because we are reading from a pipe, or from a
       terminal), or because read() was interrupted by a signal.  See
       also NOTES.

       On error, -1 is returned, and [errno](https://man7.org/linux/man-pages/man3/errno.3.html) is set to indicate the error.
       In this case, it is left unspecified whether the file position
       (if any) changes.

To summarize,

return:
     >0: number_of_bytes_successfully_read,
      0: EOF (no bytes read)
     -1: error

Prob2. Parameter orders

ssize_t read(int fd, void buf[.count], size_t count);
int read_from_socket(int socket, size_t num_bytes, unsigned char* buffer);

The order of the parameters are quite different. It can be quite confusing.

Seems to need some refactoring.

Why it should be changed?

On PR #330, I am working on a network driver to enable other network protocols to work. Especially targeting to make TCP based encryption protocols such as TLS/SSL work.

int read_from_netdrv(netdrv_t netdrv, unsigned char *buffer, size_t num_bytes)

Below is the read function of OpenSSL's TLS library. Details are here.

 #include <openssl/ssl.h>

 int SSL_read_ex(SSL *ssl, void *buf, size_t num, size_t *readbytes);
 int SSL_read(SSL *ssl, void *buf, int num);

 int SSL_peek_ex(SSL *ssl, void *buf, size_t num, size_t *readbytes);
 int SSL_peek(SSL *ssl, void *buf, int num);

DESCRIPTION
SSL_read_ex() and SSL_read() try to read num bytes from the specified ssl
 into the buffer buf. On success SSL_read_ex() will store the number of bytes
  actually read in *readbytes.

SSL_peek_ex() and SSL_peek() are identical to SSL_read_ex() and SSL_read()
 respectively except no bytes are actually removed from the underlying BIO
  during the read, so that a subsequent call to SSL_read_ex() or SSL_read()
   will yield at least the same bytes.

Return values

SSL_read_ex() and SSL_peek_ex() will return 1 for success or 0 for failure. 
Success means that 1 or more application data bytes have been read from
 the SSL connection. Failure means that no bytes could be read from the
 SSL connection. Failures can be retryable (e.g. we are waiting for more 
bytes to be delivered by the network) or non-retryable (e.g. a fatal network
 error). In the event of a failure call SSL_get_error(3) to find out
 the reason which indicates whether the call is retryable or not.

For SSL_read() and SSL_peek() the following return values can occur:

> 0
The read operation was successful. The return value is the number of bytes
 actually read from the TLS/SSL connection.

<= 0
The read operation was not successful, because either the connection was 
closed, an error occurred or action must be taken by the calling process. Call 
SSL_get_error(3)with the return value ret to find out the reason.

Old documentation indicated a difference between 0 and -1, and that -1 was 
retryable. You should instead call SSL_get_error() to find out if it's retryable.

It follows the basic read() function style.

For the scalability of the code the socket read and writes seem to need some refactoring.

@edwardalee
Copy link
Contributor

This is good feedback, but I should point out that read_from_socket() and the other related functions were not intended to replace read(). They wrap read() and should be used in situations where you need to read a certain number of bytes before you can proceed. The reason that read() returns sometimes having not read all the bytes is so that applications don't stall because of a slow socket connection. The only time you should use read_from_socket() instead of read() is when there is nothing that the calling thread can meaningfully do until it has read the specified number of bytes. This is a common situation in federates and in the RTI, so rather than repeating the while loop everywhere, we provide the read_from_socket() function to avoid code duplications.

The argument order can certainly be changed. This would be a good time to do that. Can you suggest a specific alternative?

@Jakio815
Copy link
Collaborator Author

Jakio815 commented Jan 23, 2024

I see the point. But the problem is that the current read_from_socket() cannot handle situations when the total bytes to read are unknown. This can happen when there is encryption.

The part I am stuck is in rti_remote.c's federate_info_thread_TCP(). So, this part only reads one byte, which is the message header. However when the message was encrypted, it is impossible to only decrypt the first byte (header of LF), so the total message must be read.

There can be a solution such as not encrypting the header byte, and only encrypting the payload. However, this will make the implementation much more complicated.
For example, if we are trying to use TLS, a simple API int SSL_write(SSL *ssl, const void *buf, int num) will write num bytes from the buffer buf into the specified ssl connection. If we try to attach the header to the buffer, we would manually need to take care of the encryption.
This will not happen only on TLS, but other network stacks that have already made APIs.
So, I don't agree with this idea.

Another point to mention is that the logic itself should be also fixed in rti_remote.c's federate_info_thread_TCP(). This is kind of a different issue, but the logic reading one byte, passing to each cases, and reading additional bytes does not have to be like this.

This is also why I propose to read the entire buffer at once, and then process the buffer, not the socket itself.
And to do this, we would need to make the read_from_socket() function read unknown number of bytes.

Suggestion

int read_from_socket(int socket, size_t num_bytes, unsigned char* buffer) {
    if (socket < 0) {
        // Socket is not open.
        errno = EBADF;
        return -1;
    }
    ssize_t bytes_read = 0;
    int retry_count = 0;
    while (bytes_read < (ssize_t)num_bytes) {
        ssize_t more = read(socket, buffer + bytes_read, num_bytes - (size_t)bytes_read);
        if(more < 0 && retry_count++ < NUM_SOCKET_RETRIES 
                && (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)) {
            // Those error codes set by the socket indicates
            // that we should try again (@see man errno).
            lf_print_warning("Reading from socket failed. Will try again.");
            lf_sleep(DELAY_BETWEEN_SOCKET_RETRIES);
            continue;
        } else if (more < 0) {
            // A more serious error occurred.
            return -1;
        } else if (more == 0) {
            // EOF received.
            bytes_read += more;
            return bytes_read;
        }
        bytes_read += more;
    }
    return bytes_read;
}

This will return the actually read bytes from the socket.

For example,

int MAX_READ_SIZE = 100;
unsigned char buffer[MAX_READ_SIZE];
int bytes_read = read_from_socket(socket, MAX_READ_SIZE, buffer);

The bytes_read will indicate the actual read bytes.

It isn't that hard to change the logic to return the actually read bytes. The only part changed is the last 7 lines, that returns the actual bytes read.

Of course there will be some additional work.

  • Change other read and writes.
  • Change the return logics where use the read and write functions.

Conclusion

As the comment has become too long, the main point of this comment is that the current read_from_socket() cannot read unknown sized buffers.

@edwardalee
Copy link
Contributor

I see your point about encryption, but I don't think we have the right solution yet.

I think there are a couple of problems with your approach. First, as written above, the only time read_from_socket will return anything other than num_bytes is when either an error occurs (it returns -1) or an EOF occurs (the socket will be closed). This is why the current read_from_socket has only three return values. It is because there really are only three return values.

Second, always reading the full buffer will likely introduce deadlock in programs with feedback loops. Consider that a federate may not be able to produce an output until it has received an input. In the above code, it will not get a chance to process the input until it has filled the buffer. But it can't fill the buffer because subsequent messages won't arrive until an output is sent.

I think that right approach if you don't know how many bytes to read is to read as many as you can (no while loop and no repeating with errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR). The code then has to try to process the buffer, and only if the message is incomplete should it continue reading. If the message is complete and there is another subsequent message after it in the buffer, then it will need to begin processing that message in a similar fashion.

Also, even without feedback, this strategy could introduce large delays. Consider a simple program where fed1 sends messages to fed2 (no feedback). The first message is a short one sent at startup. The second message is sent a week later. The destination, fed2, will likely not process the first message until a week later because it will wait to fill its input buffer.

If I'm right about this, then the read_from_socket wrapper function is not useful when you don't know how many bytes to read. The code needs to be built using the original read function, and it's tricky code to write because the buffer may contain any number of messages where the final one is likely incomplete.

@Jakio815
Copy link
Collaborator Author

Thanks for your feedback.

I kind of decided to make a different approach with the network driver. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants