Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict fencing #7141

Merged
merged 3 commits into from
Aug 11, 2022
Merged

Strict fencing #7141

merged 3 commits into from
Aug 11, 2022

Conversation

grafin
Copy link
Member

@grafin grafin commented May 17, 2022

Current leader fencing implementation didn't really guarantee that old
leader would resing it's leadership before new leader could be elected.
That made it possible for several "leaders" coexist in cluster for some
time.

I changed replication_disconnect_timeout so that it is twice
as short for current raft leader (2*replication_timeout) if strict
fencing is enabled. Assuming that replication_timeout is the same for every
replica in replicaset this guarantees that leader will resign it's
leadership before anyone could start elections.

Old fencing behaviour can be enabled by setting fencing to soft mode.
This is useful when connection death timeouts shouldn't be affected
(e.g. different replication_timeouts are set to prioritize some replicas
as leader over the others).

Closes #7110

@grafin grafin force-pushed the grafin/fencing-fix branch 3 times, most recently from ccaadbb to dbac4aa Compare May 17, 2022 08:14
@grafin grafin requested a review from sergepetrenko May 17, 2022 09:00
@grafin grafin assigned grafin and sergepetrenko and unassigned grafin May 17, 2022
@grafin grafin marked this pull request as ready for review May 17, 2022 10:03
@grafin grafin added the integration-ci Enables integration tests for a pull request label May 27, 2022
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for the patch! Looks fine with some minor corrections.
Please, find my comments below.

changelogs/unreleased/strict_fencing.md Outdated Show resolved Hide resolved
src/box/lua/load_cfg.lua Show resolved Hide resolved
test/replication-luatest/election_strict_fencing_test.lua Outdated Show resolved Hide resolved
@grafin grafin force-pushed the grafin/fencing-fix branch from dbac4aa to 0cc9858 Compare June 2, 2022 09:32
@grafin
Copy link
Member Author

grafin commented Jun 2, 2022

@sergepetrenko Thanks for review! I've fixed/commented all found problems, please take a look, when possible.

@grafin grafin requested a review from sergepetrenko June 2, 2022 10:20
@grafin grafin assigned sergepetrenko and unassigned grafin Jun 2, 2022
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!
Only a couple of minor comments left.

changelogs/unreleased/strict_fencing.md Show resolved Hide resolved
src/box/replication.h Outdated Show resolved Hide resolved
@sergepetrenko sergepetrenko assigned grafin and unassigned sergepetrenko Jun 3, 2022
@grafin grafin force-pushed the grafin/fencing-fix branch 2 times, most recently from a7b3fe8 to 8daceee Compare June 3, 2022 11:50
@grafin grafin requested a review from sergepetrenko June 3, 2022 11:52
@grafin grafin assigned sergepetrenko and unassigned grafin Jun 3, 2022
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix!
Please, proceed with Vlad's review.

@sergepetrenko sergepetrenko requested a review from Gerold103 June 3, 2022 12:13
@sergepetrenko sergepetrenko removed their assignment Jun 3, 2022
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

src/box/xrow_io.cc Outdated Show resolved Hide resolved
changelogs/unreleased/strict_fencing.md Show resolved Hide resolved
changelogs/unreleased/strict_fencing.md Show resolved Hide resolved
src/box/box.cc Outdated Show resolved Hide resolved
src/box/raft.h Outdated Show resolved Hide resolved
src/box/raft.h Outdated Show resolved Hide resolved
src/box/replication.cc Outdated Show resolved Hide resolved
src/box/replication.h Outdated Show resolved Hide resolved
test/replication-luatest/election_strict_fencing_test.lua Outdated Show resolved Hide resolved
test/replication-luatest/election_strict_fencing_test.lua Outdated Show resolved Hide resolved
@grafin grafin force-pushed the grafin/fencing-fix branch 2 times, most recently from ca079e8 to d5ea8e3 Compare July 25, 2022 10:37
@grafin grafin added full-ci Enables all tests for a pull request and removed integration-ci Enables integration tests for a pull request labels Jul 25, 2022
@sergepetrenko sergepetrenko self-requested a review July 25, 2022 11:22
@sergepetrenko sergepetrenko self-assigned this Jul 25, 2022
@coveralls
Copy link

coveralls commented Jul 25, 2022

Coverage Status

Coverage increased (+0.01%) to 84.278% when pulling e34de9f on grafin:grafin/fencing-fix into f2d23b4 on tarantool:master.

@grafin
Copy link
Member Author

grafin commented Jul 25, 2022

@sergepetrenko @Gerold103 I've rewrote the test, using special socket proxy test diff. All other issues were fixed before. Rereview when you have time.

@sergos @ligurio You may be interested in reviewing proxy commit.

@grafin grafin requested a review from Gerold103 July 25, 2022 14:15
@grafin grafin force-pushed the grafin/fencing-fix branch from d5ea8e3 to b94aa05 Compare July 26, 2022 06:41
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with the proxy module!

test/replication-luatest/election_fencing_test.lua Outdated Show resolved Hide resolved
@sergepetrenko sergepetrenko removed their assignment Aug 1, 2022
@grafin grafin force-pushed the grafin/fencing-fix branch 2 times, most recently from 5631bce to 3b08340 Compare August 2, 2022 17:40
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes, good stuff with the proxy! The patch looks good enough to be merged, but nonetheless I leave some comments if you feel like fixing them.

test/luatest_helpers/proxy/connection.lua Show resolved Hide resolved
test/luatest_helpers/proxy/connection.lua Outdated Show resolved Hide resolved
test/luatest_helpers/proxy/proxy.lua Show resolved Hide resolved
changelogs/unreleased/strict_fencing.md Outdated Show resolved Hide resolved
changelogs/unreleased/strict_fencing.md Outdated Show resolved Hide resolved
src/box/raft.h Outdated Show resolved Hide resolved
test/replication-luatest/election_fencing_test.lua Outdated Show resolved Hide resolved
@grafin grafin added do not merge Not ready to be merged and removed full-ci Enables all tests for a pull request labels Aug 5, 2022
@grafin grafin force-pushed the grafin/fencing-fix branch from 3b08340 to a1b59a5 Compare August 5, 2022 14:37
grafin added 2 commits August 5, 2022 17:38
Before, we used to modify box.cfg.replication to reproduce network
problems in our test. This worked fine in most situations, but doesn't
work in others: when instance gets disconnected by modifying
box.cfg.replication, it closes its connection immediately (in terms of
realtime), and this is noticed almost immediately by its neighbours in
replica set (because they receive EOF). This made it impossible to test
some things, that rely on specific timeouts in our code (e.g. strict
fencing).

This commits adds helper, which acts as UNIX socket proxy, and can block
connection transparently for tarantool instances. It makes it possible
to write some tests, that were not possible before. It is also possible
to inject arbitrary packets between instance, which are interconnected
via proxy.

Usage:

                  +-------------------+
                  |tarantool server 1 |
                  +-------------------+
                            |
                            |
                            |
                   .-----------------.
                  (   /tmp/test-out   )
                   `-----------------'
                            |
                            |
                            |
                  +-------------------+
                  |       proxy       |
                  +-------------------+
                            |
                            |
                            |
                   .-----------------.
          +-------(   /tmp/test-in    )--------+
          |        `-----------------'         |
          |                                    |
          |                                    |
          |                                    |
+-------------------+                +-------------------+
|tarantool server 2 |                |tarantool server 3 |
+-------------------+                +-------------------+

tarantool server 1 init.lua:
box.cfg{listen = '/tmp/test-out'}
box.once("schema", function()
    box.schema.user.grant('guest', 'super')
end)

tarantool server 2 and tarantool server 3 init.lua:
box.cfg{replication = '/tmp/test-in'}

proxy init.lua:
-- Import proxy helper
Proxy = require('test.luatest_helpers.proxy.proxy')

-- Create proxy, which will (when started) listen on client_socket_path
-- and accept connection when client tries to connect. The accepted
-- socket connection is then passed to new Connection instance.
proxy = Proxy:new({
    -- Path to UNIX socket, where proxy will await new connections.
    client_socket_path = '/tmp/test-in',

    -- Path to UNIX socket where tarantool server is listening.
    server_socket_path = '/tmp/test-out',

    -- Table, describing how to process client socket. Optional.
    -- Defaults used and described:
    process_client = {
        -- function(connection) which, if not nil, will be called once
        -- before client socket processing loop.
        pre = nil,

        -- function(connection, data) which, if not nil, will be called
        -- in loop, when new data is received from client socket.
        -- Connection.forward_to_server(connection, data) will:
        -- 1) Connect server socket to server_socket_path, if server
        --    socket is not connected.
        -- 2) Write data to server socket, if connected and writable.
        func = Connection.forward_to_server,

        -- function(connection) which, if not nil, will be called once
        -- after client socket processing loop.
        -- Connection.close_client_socket(connection) will shutdown and
        -- close client socket, if it is connected.
        post = Connection.close_client_socket,
    },

    -- Table, describing how to process server socket. Optional.
    -- Defaults used and described:
    process_server = {
        -- function(connection) which, if not nil, will be called once
        -- before server socket processing loop.
        pre = nil,

        -- function(connection, data) which, if not nil, will be called
        -- in loop, when new data is received from server socket.
        -- Connection.forward_to_client(connection, data) will write data
        -- to client socket, if it is connected and writable
        func = Connection.forward_to_client,

        -- function(connection) which, if not nil, will be called once
        -- after server socket processing loop.
        -- Connection.close_server_socket(connection) will shutdown and
        -- close server socket, if it is connected.
        post = Connection.close_server_socket,
    }

})

-- Bind client socket (defined by proxy.client_socket_path) and start
-- accepting connections on it in a new fiber. If opts.force is set to
-- true, it will remove proxy.client_socket_path file before binding to
-- it. After proxy is started it will accept client connections and
-- create Connection instance for each connection.
proxy:start({force = false})

-- Stop accepting new connetions on client socket and join the fiber,
-- created by proxy:start(), and close client socket. Also stop all
-- active connections (see Connection:stop()).
proxy:stop()

-- Pause accepting new connections and pause all active connections (see
-- Connection:pause()).
proxy:pause()

-- Resume accepting new connections and resume all paused connections
-- (see Connection:resume())
proxy:resume()

-- Connection class:
Connection:new({
    {
        -- Socket which is already created (by Proxy class for example).
        -- Optional, may be nil.
        client_socket = '?table',

        -- Path to connect server socket to. Will try to connect on
        -- initialization, and in Connection.forward_to_server.
        -- Can connect manually by calling
        -- Connection:connect_server_socket().
        server_socket_path = 'string',

        -- See Proxy:new()
        process_client = '?table',

          -- See Proxy:new()
        process_server = '?table',
    },
})

-- Start processing client socket, using functions from
-- Connection:process_client.
Connection:start()

-- Connect server socket to Connection.server_socket_path (if not
-- connected already). Start processing server socket, if successfully
-- connected (using functions from Connection.process_server).
Connection:connect_server_socket()

-- Pause processing packets (both incoming from client socket and server
-- socket).
Connection:pause()

-- Resume processing packets (both incoming from client socket and
-- server socket).
Connection:resume()

-- Close server socket, if open.
Connection:close_server_socket()

-- Close client socket, if open.
Connection:close_client_socket()

-- Close client and server sockets, if open, and wait for processing
-- fibers to die.
Connection:stop()

NO_DOC=test helpers
NO_CHANGELOG=test helpers
Currently box_raft asserts that raft is initialized when it is called.
For strict fencing box_raft will be called in
replication_disconnect_timeout to set different timeouts for leader and
follower. Sometimes replication_disconnect_timeout is called before raft
is initialized.

This commit changes box_raft behaviour, removing the assertion and
returning NULL instead of pointer to global raft state, if raft isn't
initialized. This makes it possible to call box_raft even before raft
has been initialized, checking that return value isn't NULL.

Assuming that this assertion didn't trigger anywhere
else, there is no need to check for box_raft returning NULL anywhere
except new calls. Even if in future this will change it will trigger
segmentation fault and the problem could be easily localized.

Part of tarantool#7110

NO_DOC=internal changes
NO_TEST=internal changes
NO_CHANGELOG=internal changes
@grafin grafin force-pushed the grafin/fencing-fix branch from a1b59a5 to 017889b Compare August 5, 2022 14:40
@grafin grafin added full-ci Enables all tests for a pull request and removed do not merge Not ready to be merged labels Aug 5, 2022
@grafin grafin requested a review from Gerold103 August 5, 2022 14:52
@grafin
Copy link
Member Author

grafin commented Aug 5, 2022

@Gerold103 Thanks for the review! I've fixed issues you pointed out. Take a look at the proxy commit (lots of renaming there), and test in the last commit.

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

src/box/raft.h Outdated Show resolved Hide resolved
With current leader fencing implementation old leader doesn't resign
it's leadership before new leader may be elected. Because of this
several "leaders" might coexist in replicaset for some time.

This commit changes replication_disconnect_timeout that it is twice
as short for current raft leader (2*replication_timeout) if strict
fencing is enabled. Assuming that replication_timeout is the same for
every replica in replicaset this makes it less probable that new
leader can be elected before old one resigns it's leadership.

Old fencing behaviour can be enabled by setting fencing to soft mode.
This is useful when connection death timeouts shouldn't be affected
(e.g. different replication_timeouts are set to prioritize some
replicas as leader over the others).

Closes tarantool#7110

@TarantoolBot document
Title: Strict fencing

In `box.cfg` option `election_fencing_enabled` is deprecated in favor
of `election_fencing_mode`. `election_fencing_mode` can be set to one
of the following values:
'off' - fencing turned off (same as `election_fencing_enabled` set to
false before).
Connection death timeout is 4*replication_timeout for all nodes.

'soft' (default) - fencing turned on, but connection death timeout is
the same for leader and followers in replicaset. This is enough to
solve cluster being readonly and not being to elect a new leader in
some situations because of pre-vote.
Connection death timeout is 4*replication_timeout for all nodes.

'strict' - fencing turned on. In this mode leader tries its best to
resign leadership before new leader can be elected. This is achived
by halving death timeout on leader.
Connection death timeout is 4*replication_timeout for followers and
2*replication_timout for current leader.
@grafin grafin force-pushed the grafin/fencing-fix branch from 017889b to e34de9f Compare August 11, 2022 04:02
@kyukhin kyukhin merged commit 64ae9a0 into tarantool:master Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Strict" fencing mode
5 participants