Redundancy für bib database? #113

ruben-herold · 2014-11-03T14:06:01Z

hi,

is there any kind of planung for redundancy like ipvs+keepalived?
For my point of view something must replicate the bib database accross servers and
the ip take overs can do a keeplived (Vrrp) ...

ydahhrk · 2014-11-03T20:41:07Z

Warning: This is my first time reading about these topics. You might want to correct me if I'm not making sense.

I'm assuming Keepalived is IPVS plus VRRP (I don't know what the "checkers" are, though). IPVS provides server redundancy while VRRP provides router redundancy.

With that in mind...

Er, this looks like two separate questions.

IPVS: Well, IPVS happens to be implemented as a Netfilter module that hooks itself after filtering. Unless we discover a problem with my reasoning later in testing, it looks like Jool and IPVS should coexist in harmony once we can move Jool away from the beginning of Netfilter's input chain (which is how we're planning to fix issue Need to enable firewall-like features on the NAT64 #41 anyway). So yeah, I guess there are plans to support server-side redundancy. We hadn't considered the test case, however.
And I ~~guess~~ you could always chain Jool and IPVS in separate machines.
VRRP: This one is harder. Because only Jool knows its tables, Database synchronization would have to be coded into Jool. Because sessions also contain important information for normal NAT64 operation, the session tables might also have to be sync'd. Keepalived's VRRP implementation looks reusable though.

Summary: IPVS should work when we fix #41. VRRP sounds like a reasonable objective afterwards.

Connection synchronization hasn't proved to be too heavy?

ruben-herold · 2014-11-04T13:42:52Z

Just to clarify what I mean:

First we need an mechanism for sync the database and sessions between the server like:
http://www.linuxvirtualserver.org/docs/sync.html

After that is done we have to servers with the same bib and session data so if someone installes keepalived ob both servers a failover is possible. Cause with vrrp you can transfer the "server ip Adresses ( ipv4 and ipv6)" between the server.

I never want let ipvs run on the same systems! It was only an example how it could work.

ydahhrk · 2014-11-05T22:35:15Z

OK.
Tentatively adding to milestone 3.3.0.

toreanderson · 2014-11-21T20:08:25Z

@ruben-herold, correct me if I'm wrong, but I do not think implementation of this issue really has anything to do with IPVS or VRRP per se. VRRP and IPVS is just one of many methods an administrator can use to fail over traffic from one Jool instance to another. (Another way would be e.g. advertise the pool4/pool6 networks from the Jool servers using BGP, and then change the advertisements such that the previously inactive instance suddenly got all the traffic.) If all the stateful stuff is kept in sync between two (or more) Jool nodes, existing sessions won't drop if suddenly traffic shifts from one instance to another.

I think it would be better to compare this feature to the Netfilter conntrackd, which can do exactly this for Netfilter's connection tracking tables.

Tore

ydahhrk · 2014-11-21T22:32:00Z

I think I'm the one who phrased it poorly in this comment.

I did not meant we wanted to make Keepalived the only possible way to admin redundant Jool instances; I meant we're using it to get acquainted with this kind of setup.

What's stopping us is a couple of loose ends in the database synchronization algorithm (here's a little rant). I understand there's no standard for this, so I guess we'll steal some ideas from conntrackd.

(BTW: It looks like anyone with Github accounts can edit the wiki, in case you want to use it for something)

toreanderson · 2014-11-22T11:23:47Z

You might want to consider active/active/.../active scenarios as well. For example if the operator is having a router or switch load balance between multiple Jool instances using ECMP. Perhaps you could facilitate for that by announcing any changes to the state stable to a multicast group which all the cluster members can subscribe to. In order to limit the amount of state replication traffic, another idea could be to only synchronize long-lived sessions (as it's usually not a problem if short-lived HTTP requests and such get interrupted half-way through).

(Just thinking out loud here.)

Tore

ruben-herold · 2014-11-24T11:42:29Z

The multicast distribution looks for me like a way to go. This will not break any type of setups active/active, active/passive and so on.

As I rember right tomcat uses muticast for session replication between cluster nodes.

Fixed at least this batch of bugs: 1. A couple of concurrence issues: can_send was global, shared by all namespaces, where each queue should probably have its own. joold was storing a pointer to the session database, which could disappear behind its nose. 2. foreach_cb() was using a status code as if it were an error code, which led some sessions to be synchronized redundantly. 3. foreach users were not handling negative error codes, which could lead to weird stuff happening. 4. the pending data bit was being set even on packets lacking a response header, which led to some sessions being corrupted. 5. Some continuous spin locking and unlocking. This defeats the point of a semaphor to some extent.

- Fixed a compilation issue included in the previous commit. - Sessions could theoretically list themselves faster than they could be synchronized. This would strangle the available memory. When this happens the code just drops the sessions now. (From the joold queue, not the session database.) From joold's point this is also pretty bad, but at least it doesn't bring the whole kernel down. - Tweaked the configuration options a bit. The userspace app isn't aware of these changes yet though. - --advertise was skipping sessions. - Added lots of documentation and designed the joold_node structure a little more intuitive and less error-prone.

- Added the configuration options in the userspace app. - Removed a kernel's redundant ACK to an ACK. - Fixed lots of flushing logic; was preventing lots of syncs when needed and commiting other syncs when not needed. - Improved synchronization of lifetimes. It's quite tested. I'm still not happy with the synchronized timeouts during an --advertise tho.

There were quite a few small errors synchronizing the session timeouts. Also removed the sessions' creation time, since it's no longer being used. I think all that's left is to figure out where did Rob came up with 2048 as maximum packet size limit and try a better number which does not probably induce fragmentation. It should be ready after that.

Still finding flaws, unfortunately. - As stated earlier, the Netlink message maximum was inducing fragmentation when the packet became IP/UDP. Added new configuration flag, --synch-max-payload. - joold was breaking the assumption of the session expiration queue (that sessions always arrive sorted by expiration date). So when joold adds a session to the database, its correct slot now has to be sequentially searched. - Packed the joold_session structure. The structure had a different size depending on its machine's bittage, which would never do since it's part of a communication protocol.

ydahhrk · 2016-09-26T22:26:36Z

https://jool.mx/en/session-synchronization.html

ydahhrk added the New feature label Nov 3, 2014

ydahhrk self-assigned this Nov 3, 2014

ydahhrk added this to the 3.3.0 milestone Nov 5, 2014

ydahhrk assigned dhfelix and unassigned ydahhrk Nov 14, 2014

ydahhrk mentioned this issue Nov 21, 2014

Feature request: Support for RFC 6145 (SIIT - Stateless IP/ICMP Translation) #116

Closed

ydahhrk modified the milestones: 3.4.0, 3.3.0 Nov 24, 2014

ydahhrk unassigned dhfelix Mar 9, 2015

ydahhrk assigned rolivasnic Nov 12, 2015

ydahhrk mentioned this issue Nov 12, 2015

Add Device Driver mode #140

Open

ydahhrk modified the milestones: 4.1.0, 3.5.0 Nov 23, 2015

ydahhrk added the Merged (needs review) label Feb 17, 2016

ydahhrk added Status: Tested Needs release and removed Merged (needs testing) labels Sep 5, 2016

ydahhrk unassigned rolivasnic Sep 26, 2016

ydahhrk closed this as completed Sep 26, 2016

ydahhrk removed the Status: Tested Needs release label Sep 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundancy für bib database? #113

Redundancy für bib database? #113

ruben-herold commented Nov 3, 2014

ydahhrk commented Nov 3, 2014

ruben-herold commented Nov 4, 2014

ydahhrk commented Nov 5, 2014

toreanderson commented Nov 21, 2014

ydahhrk commented Nov 21, 2014

toreanderson commented Nov 22, 2014

ruben-herold commented Nov 24, 2014

ydahhrk commented Sep 26, 2016

Redundancy für bib database? #113

Redundancy für bib database? #113

Comments

ruben-herold commented Nov 3, 2014

ydahhrk commented Nov 3, 2014

ruben-herold commented Nov 4, 2014

ydahhrk commented Nov 5, 2014

toreanderson commented Nov 21, 2014

ydahhrk commented Nov 21, 2014

toreanderson commented Nov 22, 2014

ruben-herold commented Nov 24, 2014

ydahhrk commented Sep 26, 2016