Skip to content

PBM communication Internals

Akira Kurogane edited this page Oct 21, 2019 · 1 revision

PBM Connection topology

pbm-agent nodes first make a connection to a single mongod process on localhost, and then auto-create connections to some other nodes in the same cluster (if a cluster). As such pbm-agents only participate in the backups and restores of a single cluster or non-sharded replicaset.

The pbm CLI also only works with a single cluster or non-sharded replicaset at a time. It is a stateless command though, so it can connect to different clusters from one command to the next without conflicting the operations between those clusters or non-sharded replicasets.

Definition: The replica set with the PBM control collections = the configsvr replica set in a cluster, or just the replicaset itself if a non-sharded replicaset.

The pbm CLI connects to the replica set with the PBM control collections using a replica set connection, i.e. with a "replicaSet=XXXXX" option in it. (This means it will automatically find the current primary, and automatically switch if there is election.)

The pbm-agent processes connects using a standalone connection, and this should only be to localhost. If it is connected to a shard node then after that connection is established it will, using info in {"_id": "shardIdentity"} document of admin.system.version collection and/or the isMaster command, automatically make a new replicaset connection to the replicaset with the PBM control collections.

Relevant code: pbm.go's New()

Authentication and Authorization

The ability to connect to the mongod nodes and make reads and writes in the PBM control collections is the only form of authentication and authorization used by PBM.

The pbm CLI and pbm-agent nodes both use a mongodb user as their authentication. No specific user name is required, but it should be a user in the "admin" collection.

N.b. The user needs to be created on every shard as well as the configsvr replicaset in a cluster. So connect to the primary in the configsvr replicaset and run the createUser command there; then repeat the same thing for every shard too. (This is requirement for DBA-use accounts in general with MongoDB; it is not special for PBM only.)

Programmatic reuse of same user name and password

When pbm-agent automatically makes new connections to other parts of the topology it reuses the same username and password. E.g. from URI mongodb://myuser:mypass@localhost:27018/ it makes new URI mongodb://myuser:mypass@configsvrA:27019,.../?replicaSet=configrs. In theory the pbm CLI and pbm-agent could connect with a different user, but this is not tested. There may be slight differences in the privileges that the pbm CLI uses vs. the pbm-agent processes too, but as of v1.0 there has been no plan to separate and reduce. Keeping this part simple for now.

Relevant code: pbm.go's New()

Required role grants

The roles that PBM use are: "readWrite" on every collection in the "admin" db, plus the built-in named roles "backup", "restore" and "clusterMonitor". The restore stages need the most permissions. In theory the permissions could be reduced as long as the PBM user can still self-upgrade its own privileges (by having userAdmin privilege in "admin" or the ability to "readWrite" on admin.system.users) during restores, but as of v1.0 the person installing PBM is obliged to grant complete roles from the start.

(A cluster or non-shared replicaset that has no authorization enabled should presumably allow the pbm-agent and pbm CLI to connect and do everything they need, but we've not tested.)

PBM control collections

Communication between the pbm CLI and the pbm-agent processes is done via collections in the cluster or non-sharded replicaset itself. The CLI starts an operation by inserting a new pbmCmd document. The agents are always watching this collection, and then respond. They in turn update other collections as they proceed.

Collection Purpose
admin.pbmConfig Stores one document with the remote storage config as one nested document
admin.pbmCmd Holds objects inserted by the pbm CLI to start an operation (backup or restore)
admin.pbmOp Lock structure. The 'winning' agent for each replicaset that does the backup will be the one that writes itself in first.
admin.pbmBackup The status/log of an operation. Contains the op type, parameters (e.g. the remote storage being saved to if a backup op). Each replicaset has its own processing state in a nested object.

See Backup stage Internals and Restore stage Internals.

Handy external references: