-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strict fencing #7141
Strict fencing #7141
Conversation
ccaadbb
to
dbac4aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for the patch! Looks fine with some minor corrections.
Please, find my comments below.
dbac4aa
to
0cc9858
Compare
@sergepetrenko Thanks for review! I've fixed/commented all found problems, please take a look, when possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes!
Only a couple of minor comments left.
a7b3fe8
to
8daceee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix!
Please, proceed with Vlad's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch!
ca079e8
to
d5ea8e3
Compare
@sergepetrenko @Gerold103 I've rewrote the test, using special socket proxy test diff. All other issues were fixed before. Rereview when you have time. @sergos @ligurio You may be interested in reviewing proxy commit. |
d5ea8e3
to
b94aa05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with the proxy module!
5631bce
to
3b08340
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes, good stuff with the proxy! The patch looks good enough to be merged, but nonetheless I leave some comments if you feel like fixing them.
3b08340
to
a1b59a5
Compare
Before, we used to modify box.cfg.replication to reproduce network problems in our test. This worked fine in most situations, but doesn't work in others: when instance gets disconnected by modifying box.cfg.replication, it closes its connection immediately (in terms of realtime), and this is noticed almost immediately by its neighbours in replica set (because they receive EOF). This made it impossible to test some things, that rely on specific timeouts in our code (e.g. strict fencing). This commits adds helper, which acts as UNIX socket proxy, and can block connection transparently for tarantool instances. It makes it possible to write some tests, that were not possible before. It is also possible to inject arbitrary packets between instance, which are interconnected via proxy. Usage: +-------------------+ |tarantool server 1 | +-------------------+ | | | .-----------------. ( /tmp/test-out ) `-----------------' | | | +-------------------+ | proxy | +-------------------+ | | | .-----------------. +-------( /tmp/test-in )--------+ | `-----------------' | | | | | | | +-------------------+ +-------------------+ |tarantool server 2 | |tarantool server 3 | +-------------------+ +-------------------+ tarantool server 1 init.lua: box.cfg{listen = '/tmp/test-out'} box.once("schema", function() box.schema.user.grant('guest', 'super') end) tarantool server 2 and tarantool server 3 init.lua: box.cfg{replication = '/tmp/test-in'} proxy init.lua: -- Import proxy helper Proxy = require('test.luatest_helpers.proxy.proxy') -- Create proxy, which will (when started) listen on client_socket_path -- and accept connection when client tries to connect. The accepted -- socket connection is then passed to new Connection instance. proxy = Proxy:new({ -- Path to UNIX socket, where proxy will await new connections. client_socket_path = '/tmp/test-in', -- Path to UNIX socket where tarantool server is listening. server_socket_path = '/tmp/test-out', -- Table, describing how to process client socket. Optional. -- Defaults used and described: process_client = { -- function(connection) which, if not nil, will be called once -- before client socket processing loop. pre = nil, -- function(connection, data) which, if not nil, will be called -- in loop, when new data is received from client socket. -- Connection.forward_to_server(connection, data) will: -- 1) Connect server socket to server_socket_path, if server -- socket is not connected. -- 2) Write data to server socket, if connected and writable. func = Connection.forward_to_server, -- function(connection) which, if not nil, will be called once -- after client socket processing loop. -- Connection.close_client_socket(connection) will shutdown and -- close client socket, if it is connected. post = Connection.close_client_socket, }, -- Table, describing how to process server socket. Optional. -- Defaults used and described: process_server = { -- function(connection) which, if not nil, will be called once -- before server socket processing loop. pre = nil, -- function(connection, data) which, if not nil, will be called -- in loop, when new data is received from server socket. -- Connection.forward_to_client(connection, data) will write data -- to client socket, if it is connected and writable func = Connection.forward_to_client, -- function(connection) which, if not nil, will be called once -- after server socket processing loop. -- Connection.close_server_socket(connection) will shutdown and -- close server socket, if it is connected. post = Connection.close_server_socket, } }) -- Bind client socket (defined by proxy.client_socket_path) and start -- accepting connections on it in a new fiber. If opts.force is set to -- true, it will remove proxy.client_socket_path file before binding to -- it. After proxy is started it will accept client connections and -- create Connection instance for each connection. proxy:start({force = false}) -- Stop accepting new connetions on client socket and join the fiber, -- created by proxy:start(), and close client socket. Also stop all -- active connections (see Connection:stop()). proxy:stop() -- Pause accepting new connections and pause all active connections (see -- Connection:pause()). proxy:pause() -- Resume accepting new connections and resume all paused connections -- (see Connection:resume()) proxy:resume() -- Connection class: Connection:new({ { -- Socket which is already created (by Proxy class for example). -- Optional, may be nil. client_socket = '?table', -- Path to connect server socket to. Will try to connect on -- initialization, and in Connection.forward_to_server. -- Can connect manually by calling -- Connection:connect_server_socket(). server_socket_path = 'string', -- See Proxy:new() process_client = '?table', -- See Proxy:new() process_server = '?table', }, }) -- Start processing client socket, using functions from -- Connection:process_client. Connection:start() -- Connect server socket to Connection.server_socket_path (if not -- connected already). Start processing server socket, if successfully -- connected (using functions from Connection.process_server). Connection:connect_server_socket() -- Pause processing packets (both incoming from client socket and server -- socket). Connection:pause() -- Resume processing packets (both incoming from client socket and -- server socket). Connection:resume() -- Close server socket, if open. Connection:close_server_socket() -- Close client socket, if open. Connection:close_client_socket() -- Close client and server sockets, if open, and wait for processing -- fibers to die. Connection:stop() NO_DOC=test helpers NO_CHANGELOG=test helpers
Currently box_raft asserts that raft is initialized when it is called. For strict fencing box_raft will be called in replication_disconnect_timeout to set different timeouts for leader and follower. Sometimes replication_disconnect_timeout is called before raft is initialized. This commit changes box_raft behaviour, removing the assertion and returning NULL instead of pointer to global raft state, if raft isn't initialized. This makes it possible to call box_raft even before raft has been initialized, checking that return value isn't NULL. Assuming that this assertion didn't trigger anywhere else, there is no need to check for box_raft returning NULL anywhere except new calls. Even if in future this will change it will trigger segmentation fault and the problem could be easily localized. Part of tarantool#7110 NO_DOC=internal changes NO_TEST=internal changes NO_CHANGELOG=internal changes
a1b59a5
to
017889b
Compare
@Gerold103 Thanks for the review! I've fixed issues you pointed out. Take a look at the proxy commit (lots of renaming there), and test in the last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
With current leader fencing implementation old leader doesn't resign it's leadership before new leader may be elected. Because of this several "leaders" might coexist in replicaset for some time. This commit changes replication_disconnect_timeout that it is twice as short for current raft leader (2*replication_timeout) if strict fencing is enabled. Assuming that replication_timeout is the same for every replica in replicaset this makes it less probable that new leader can be elected before old one resigns it's leadership. Old fencing behaviour can be enabled by setting fencing to soft mode. This is useful when connection death timeouts shouldn't be affected (e.g. different replication_timeouts are set to prioritize some replicas as leader over the others). Closes tarantool#7110 @TarantoolBot document Title: Strict fencing In `box.cfg` option `election_fencing_enabled` is deprecated in favor of `election_fencing_mode`. `election_fencing_mode` can be set to one of the following values: 'off' - fencing turned off (same as `election_fencing_enabled` set to false before). Connection death timeout is 4*replication_timeout for all nodes. 'soft' (default) - fencing turned on, but connection death timeout is the same for leader and followers in replicaset. This is enough to solve cluster being readonly and not being to elect a new leader in some situations because of pre-vote. Connection death timeout is 4*replication_timeout for all nodes. 'strict' - fencing turned on. In this mode leader tries its best to resign leadership before new leader can be elected. This is achived by halving death timeout on leader. Connection death timeout is 4*replication_timeout for followers and 2*replication_timout for current leader.
017889b
to
e34de9f
Compare
Current leader fencing implementation didn't really guarantee that old
leader would resing it's leadership before new leader could be elected.
That made it possible for several "leaders" coexist in cluster for some
time.
I changed replication_disconnect_timeout so that it is twice
as short for current raft leader (2*replication_timeout) if strict
fencing is enabled. Assuming that replication_timeout is the same for every
replica in replicaset this guarantees that leader will resign it's
leadership before anyone could start elections.
Old fencing behaviour can be enabled by setting fencing to soft mode.
This is useful when connection death timeouts shouldn't be affected
(e.g. different replication_timeouts are set to prioritize some replicas
as leader over the others).
Closes #7110