Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- #PXC-347: Non-PRIM during membership change #101

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from
Open

- #PXC-347: Non-PRIM during membership change #101

wants to merge 1 commit into from

Conversation

sysprg
Copy link

@sysprg sysprg commented Dec 17, 2016

Timers in gcomm (which implemented using ASIO library)
can be triggered even during normal shutdown sequence -
even after interruption of the main event loop in gcomm
subsystem and completion of the corresponding thread.

For example, during normal shutdown sequence and after
termination of the main event loop in gcomm subsystem and
even after completion of the corresponding thread by the
close() function (from the gcs_gcomm.cpp file), the
following call path still possible:

handle_timers -> handle_inactivity_timer -> check_inactive()

Function handle_install_timer() also may called from the
"gcomm::evs::Proto::handle_timers()" (from evs_proto.cpp
module).

Also, gcomm::GMCast::reconnect() function (from gmcast.cpp
file) may be called from the gcomm::GMCast::handle_timers().

Thus, we have a 100% accurate evidence that after the
termination of the main gcomm message loop (using the
close() method from the gcs_gcomm.cpp file), we still
continue to handle events associated with the timers.

Perhaps this is not a direct bug of the server, but an
unexpected feature of the implementation of the timers
in ASIO, which allows posing the events associated with
timers even after an interruption of the main event loop
(using "terminate" function).

For example, we see the following lines in the "epoll_reactor::run(...)"
method from the epoll_reactor.ipp file, which is used in most of our
test systems:

    if (ptr == &interrupter_)
    {
      // No need to reset the interrupter since we're leaving the descriptor
      // in a ready-to-read state and relying on edge-triggered notifications
      // to make it so that we only get woken up when the descriptor's epoll
      // registration is updated.

#if defined(ASIO_HAS_TIMERFD)
      if (timer_fd_ == -1)
	check_timers = true;
#else // defined(ASIO_HAS_TIMERFD)
      check_timers = true;
#endif // defined(ASIO_HAS_TIMERFD)
    }
#if defined(ASIO_HAS_TIMERFD)
    else if (ptr == &timer_fd_)
    {
 ...
     check_timers = true;
    }
#endif // defined(ASIO_HAS_TIMERFD){code}

As we can see, even after interruption of the main loop through
the "interrupter_" handler, the ASIO code sets the "check_timers"
variable to "true", and then it can add all the events associated
with the timers to the queue:

  if (check_timers)
  {
    mutex::scoped_lock common_lock(mutex_);
    timer_queues_.get_ready_timers(ops);
...

Later, the timer handlers, which called after the termination
of main loop during normal shutdown sequence, can form false
"view", in which all the nodes are declared as "partitioned".

I think that it is sending this "view" over the network, and
this leads to the transfer of entire cluster into NON-PRIMARY
state.

Therefore, to fix this, I propose to make two changes:

  1. We need to stop responding to the timers after termination
    of the main gcomm-related event loop (by the close() function
    from the gcs_gcomm.cpp file).

  2. I offer completely disable support of the timerfd feature
    in ASIO, which has already proved to be a source of error.
    I have already told about one of these problems with ASIO
    developers (by publishing PR in their GitHub) and I even
    got a positive reaction to the pull request in their GitHub,
    but yet there is no official release of the ASIO, which
    fixes all the problems related to timerfd feature.

Shutting down the timerfd feature does not affect the
performance of real applications - I was not able to fix
the difference in the tests.

Unfortunately, I cannot attach TC to the patch, because
the only way to make TC for it (that I came up with) requires
DEBUG_SYNC, but during normal shutdown sequence, it stops
working, as management of the DEBUG_SYNC in MTR is actually
carried out through conventional SQL operators, and the
system is already in the process of shutdown.

Timers in gcomm (which implemented using ASIO library)
can be triggered even during normal shutdown sequence -
even after interruption of the main event loop in gcomm
subsystem and completion of the corresponding thread.

For example, during normal shutdown sequence and after
termination of the main event loop in gcomm subsystem and
even after completion of the corresponding thread by the
close() function (from the gcs_gcomm.cpp file), the
following call path still possible:

handle_timers -> handle_inactivity_timer -> check_inactive()

Function handle_install_timer() also may called from the
"gcomm::evs::Proto::handle_timers()" (from evs_proto.cpp
module).

Also, gcomm::GMCast::reconnect() function (from gmcast.cpp
file) may be called from the gcomm::GMCast::handle_timers().

Thus, we have a 100% accurate evidence that after the
termination of the main gcomm message loop (using the
close() method from the gcs_gcomm.cpp file), we still
continue to handle events associated with the timers.

Perhaps this is not a direct bug of the server, but an
unexpected feature of the implementation of the timers
in ASIO, which allows posing the events associated with
timers even after an interruption of the main event loop
(using "terminate" function).

For example, we see the following lines in the "epoll_reactor::run(...)"
method from the epoll_reactor.ipp file, which is used in most of our
test systems:

```
    if (ptr == &interrupter_)
    {
      // No need to reset the interrupter since we're leaving the descriptor
      // in a ready-to-read state and relying on edge-triggered notifications
      // to make it so that we only get woken up when the descriptor's epoll
      // registration is updated.

#if defined(ASIO_HAS_TIMERFD)
      if (timer_fd_ == -1)
	check_timers = true;
#else // defined(ASIO_HAS_TIMERFD)
      check_timers = true;
#endif // defined(ASIO_HAS_TIMERFD)
    }
#if defined(ASIO_HAS_TIMERFD)
    else if (ptr == &timer_fd_)
    {
 ...
     check_timers = true;
    }
#endif // defined(ASIO_HAS_TIMERFD){code}
```

As we can see, even after interruption of the main loop through
the "interrupter_" handler, the ASIO code sets the "check_timers"
variable to "true", and then it can add all the events associated
with the timers to the queue:

```
  if (check_timers)
  {
    mutex::scoped_lock common_lock(mutex_);
    timer_queues_.get_ready_timers(ops);
...
```

Later, the timer handlers, which called after the termination
of main loop during normal shutdown sequence, can form false
"view", in which all the nodes are declared as "partitioned".

I think that it is sending this "view" over the network, and
this leads to the transfer of entire cluster into NON-PRIMARY
state.

Therefore, to fix this, I propose to make two changes:

1) We need to stop responding to the timers after termination
of the main gcomm-related event loop (by the close() function
from the gcs_gcomm.cpp file).

2) I offer completely disable support of the timerfd feature
in ASIO, which has already proved to be a source of error.
I have already told about one of these problems with ASIO
developers (by publishing PR in their GitHub) and I even
got a positive reaction to the pull request in their GitHub,
but yet there is no official release of the ASIO, which
fixes all the problems related to timerfd feature.

Shutting down the timerfd feature does not affect the
performance of real applications - I was not able to fix
the difference in the tests.

Unfortunately, I cannot attach TC to the patch, because
the only way to make TC for it (that I came up with) requires
DEBUG_SYNC, but during normal shutdown sequence, it stops
working, as management of the DEBUG_SYNC in MTR is actually
carried out through conventional SQL operators, and the
system is already in the process of shutdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant