- Install MariaDB
- Slurm database & QOS
- Setup Slurm PAM plugin
- Accounting in Slurm
You can install MariaDB to store the accounting that Slurm provides. If you want to store accounting, here’s the time to do so. I only install this on the server node, buhpc3. I use the server node as our SlurmDB node.
yum install mariadb-server mariadb-devel -y
We’ll setup MariaDB later. We just need to install it before building the Slurm RPMs.
Make sure the MariaDB packages were installed before you built the Slurm RPMs:
rpm -q mariadb-server mariadb-devel rpm -ql slurm-sql | grep accounting_storage_mysql.so
Start the MariaDB service:
systemctl start mariadb systemctl enable mariadb systemctl status mariadb
Make sure to configure the MariaDB database's root password as instructed at first invocation of the mariadb service, or run this command:
/usr/bin/mysql_secure_installation
Select a suitable slurm user's database password. Now follow the accounting page instructions (using -p to enter the database password):
# mysql -p mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'password_of_database' with grant option; mysql> SHOW VARIABLES LIKE 'have_innodb'; mysql> create database slurm_acct_db; mysql> quit;
The following is recommended for /etc/my.cnf, but on CentOS 7 you should create a new file /etc/my.cnf.d/innodb.cnf containing:
[mysqld] innodb_buffer_pool_size=1024M innodb_log_file_size=64M innodb_lock_wait_timeout=900
The innodb_buffer_pool_size might be even larger, like 50%-80% of the server's RAM size.
To implement this change you have to shut down the database and remove logfiles:
systemctl stop mariadb rm /var/lib/mysql/ib_logfile? systemctl start mariadb
You can check the current setting in MySQL like so:
mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
While the slurmdbd will work with a flat text file for recording job completions and such this configuration will not allow "associations" between a user and account. A database allows such a configuration.
MySQL or MariaDB is the preferred database. To enable this database support one only needs to have the development package for the database they wish to use on the system. Slurm uses the InnoDB storage engine in MySQL to make rollback possible. This must be available on your MySQL installation or rollback will not work.
slurmdbd requires its own configuration file called slurmdbd.conf. Start by copying the example file:
cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
The file slurmdbd.conf should be only on the computer where slurmdbd executes and should only be readable by the user which executes slurmdbd (e.g. "slurm"). It must be protected from unauthorized access since it contains a database login name and password. See the slurmdbd.conf man-page for a more complete description of the configuration parameters.
Set up files and permissions:
chown slurm: /etc/slurm/slurmdbd.conf chmod 600 /etc/slurm/slurmdbd.conf touch /var/log/slurm/slurmdbd.log chown slurm: /var/log/slurm/slurmdbd.log
Configure some of the /etc/slurm/slurmdbd.conf variables:
DbdHost=XXXX # Replace by the slurmdbd server hostname SlurmUser=slurm StorageHost=localhost StoragePass=password # The above defined database password StorageLoc=slurm_acct_db
If you use a database slurmdbd daemon on the same server as the slurmctld service, the database must be started first. In addition, all Slurm daemons requires the MUNGE service .
Locally customized systemd files must be placed in /etc/systemd/system/, and slurmdbd must depend on the database service, so a more correct solution is:
Copy the delivered service files:
cp /usr/lib/systemd/system/slurmctld.service /usr/lib/systemd/system/slurmd.service /usr/lib/systemd/system/slurmdbd.service /etc/systemd/system/
Add the prerequisite After= services to the file /etc/systemd/system/slurmdbd.service:
[Unit] Description=Slurm controller daemon After=network.target mariadb.service ConditionPathExists=/etc/slurm/slurmdbd.conf ...
On compute nodes /etc/systemd/system/slurmd.service should be modified:
[Unit] Description=Slurm node daemon After=network.target munge.service ConditionPathExists=/etc/slurm/slurm.conf ...
Only if you use a local database, add the prerequisite After= services to the file /etc/systemd/system/slurmctld.service so that slurmctld depends on the slurmdbd and MUNGE services:
[Unit] Description=Slurm controller daemon After=network.target slurmdbd.service munge.service ConditionPathExists=/etc/slurm/slurm.conf ...
Start the slurmdbd service:
systemctl enable slurmdbd systemctl start slurmdbd systemctl status slurmdbd
In slurm.conf (see slurm.conf) you must configure accounting so that the database will be used through the slurmdbd database daemon:
AccountingStorageType=accounting_storage/slurmdbd
Setting this up with help block users from casually accessing the compute nodes.
# ssh test@buhpc1 test@buhpc1's password: Access denied: user alicia (uid=1450) has no active jobs on this node. Connection closed by 192.168.126.32
First make sure Slurm’s PAM module has been installed, it’s supplied by slurm-pam_slurm package:
# ls -l /usr/lib64/security/pam_slurm_adopt.so -rwxr-xr-x. 1 root root 26368 May 25 14:27 /usr/lib64/security/pam_slurm_adopt.so
Enable PAM module in Slurm:
## add a line in /etc/slurm/slurm.conf UsePAM=1
Enable PAM module in Slurm:
## add a line in /etc/pam.d/sshd before any other account setting other than login nodes account required pam_slurm.so
Configure pam_access module to always allow admin group (hpcadmins); you may have other rules in access.conf, be careful of their ordering, only first matched rule applies:
# cat /etc/security/access.conf + : root (hpcadmins) : ALL - : ALL : ALL
Add a pam.d file for slurm: /etc/pam.d/slurm
auth required pam_localuser.so account required pam_unix.so session required pam_limits.so
Enable Accounting in Slurm:
# add these lines in /etc/slurm/slurm.conf # ACCOUNTING AccountingStorageEnforce=1 AccountingStorageLoc=/opt/slurm/acct AccountingStorageType=accounting_storage/slurmdbdJobCompLoc=/opt/slurm/jobcomp JobCompType=jobcomp/slurmdbd
JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux
restart service:
systemctl restart slurmctld ##in control node systemctl restart slurmd systemctl restart slurmdbd
-