Skip to content

Latest commit

 

History

History
189 lines (156 loc) · 7.61 KB

Setup Slurm Pam Plugin and QOS and Accounting in Slurm.md

File metadata and controls

189 lines (156 loc) · 7.61 KB

Setup Slurm Pam Plugin & QOS &Accounting in Slurm

  1. Install MariaDB
  2. Slurm database & QOS
  3. Setup Slurm PAM plugin
  4. Accounting in Slurm

 

Install MariaDB

You can install MariaDB to store the accounting that Slurm provides. If you want to store accounting, here’s the time to do so. I only install this on the server node, buhpc3. I use the server node as our SlurmDB node.

yum install mariadb-server mariadb-devel -y

We’ll setup MariaDB later. We just need to install it before building the Slurm RPMs.

 

Slurm database & QOS

Make sure the MariaDB packages were installed before you built the Slurm RPMs:

rpm -q mariadb-server mariadb-devel
rpm -ql slurm-sql | grep accounting_storage_mysql.so

Start the MariaDB service:

systemctl start mariadb
systemctl enable mariadb
systemctl status mariadb

Make sure to configure the MariaDB database's root password as instructed at first invocation of the mariadb service, or run this command:

/usr/bin/mysql_secure_installation 

Select a suitable slurm user's database password. Now follow the accounting page instructions (using -p to enter the database password):

 # mysql -p
mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'password_of_database' with grant option;
mysql> SHOW VARIABLES LIKE 'have_innodb';
mysql> create database slurm_acct_db;
mysql> quit;
MySQL configuration

The following is recommended for /etc/my.cnf, but on CentOS 7 you should create a new file /etc/my.cnf.d/innodb.cnf containing:

[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900 

The innodb_buffer_pool_size might be even larger, like 50%-80% of the server's RAM size.

To implement this change you have to shut down the database and remove logfiles:

systemctl stop mariadb
rm /var/lib/mysql/ib_logfile?
systemctl start mariadb 

You can check the current setting in MySQL like so:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
SlurmDBD Configuration

While the slurmdbd will work with a flat text file for recording job completions and such this configuration will not allow "associations" between a user and account. A database allows such a configuration.

MySQL or MariaDB is the preferred database. To enable this database support one only needs to have the development package for the database they wish to use on the system. Slurm uses the InnoDB storage engine in MySQL to make rollback possible. This must be available on your MySQL installation or rollback will not work.

slurmdbd requires its own configuration file called slurmdbd.conf. Start by copying the example file:

cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf 

The file slurmdbd.conf should be only on the computer where slurmdbd executes and should only be readable by the user which executes slurmdbd (e.g. "slurm"). It must be protected from unauthorized access since it contains a database login name and password. See the slurmdbd.conf man-page for a more complete description of the configuration parameters.

Set up files and permissions:

chown slurm: /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
touch /var/log/slurm/slurmdbd.log
chown slurm: /var/log/slurm/slurmdbd.log

 

Configure some of the /etc/slurm/slurmdbd.conf variables:

DbdHost=XXXX    # Replace by the slurmdbd server hostname 
SlurmUser=slurm
StorageHost=localhost
StoragePass=password    # The above defined database password
StorageLoc=slurm_acct_db
Customize the Slurm service files

If you use a database slurmdbd daemon on the same server as the slurmctld service, the database must be started first. In addition, all Slurm daemons requires the MUNGE service .

Locally customized systemd files must be placed in /etc/systemd/system/, and slurmdbd must depend on the database service, so a more correct solution is:

Copy the delivered service files:

cp /usr/lib/systemd/system/slurmctld.service /usr/lib/systemd/system/slurmd.service /usr/lib/systemd/system/slurmdbd.service /etc/systemd/system/ 

Add the prerequisite After= services to the file /etc/systemd/system/slurmdbd.service:

[Unit]
Description=Slurm controller daemon
After=network.target mariadb.service
ConditionPathExists=/etc/slurm/slurmdbd.conf
... 

On compute nodes /etc/systemd/system/slurmd.service should be modified:

[Unit]
Description=Slurm node daemon
After=network.target munge.service
ConditionPathExists=/etc/slurm/slurm.conf
... 

Only if you use a local database, add the prerequisite After= services to the file /etc/systemd/system/slurmctld.service so that slurmctld depends on the slurmdbd and MUNGE services:

[Unit]
Description=Slurm controller daemon
After=network.target slurmdbd.service munge.service
ConditionPathExists=/etc/slurm/slurm.conf
... 
Start the slurmdbd service

Start the slurmdbd service:

systemctl enable slurmdbd
systemctl start slurmdbd
systemctl status slurmdbd 
Configure database accounting in slurm.conf

In slurm.conf (see slurm.conf) you must configure accounting so that the database will be used through the slurmdbd database daemon:

AccountingStorageType=accounting_storage/slurmdbd 

 

Setup Slurm PAM plugin

Setting this up with help block users from casually accessing the compute nodes.

# ssh test@buhpc1
test@buhpc1's password: 
Access denied: user alicia (uid=1450) has no active jobs on this node.
Connection closed by 192.168.126.32

First make sure Slurm’s PAM module has been installed, it’s supplied by slurm-pam_slurm package:

# ls -l /usr/lib64/security/pam_slurm_adopt.so 
-rwxr-xr-x. 1 root root 26368 May 25 14:27 /usr/lib64/security/pam_slurm_adopt.so

Enable PAM module in Slurm:

## add a line in /etc/slurm/slurm.conf
UsePAM=1

Enable PAM module in Slurm:

## add a line in /etc/pam.d/sshd before any other account setting other than login nodes
account    required     pam_slurm.so

Configure pam_access module to always allow admin group (hpcadmins); you may have other rules in access.conf, be careful of their ordering, only first matched rule applies:

# cat /etc/security/access.conf
+ : root (hpcadmins) : ALL
- : ALL : ALL

Add a pam.d file for slurm: /etc/pam.d/slurm

auth    required        pam_localuser.so
account required        pam_unix.so
session required        pam_limits.so

 

Accounting in Slurm

Enable Accounting in Slurm:

# add these lines in /etc/slurm/slurm.conf
# ACCOUNTING
AccountingStorageEnforce=1
AccountingStorageLoc=/opt/slurm/acct
AccountingStorageType=accounting_storage/slurmdbd

JobCompLoc=/opt/slurm/jobcomp JobCompType=jobcomp/slurmdbd

JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux

restart service:

systemctl restart slurmctld  ##in control node
systemctl restart slurmd    
systemctl restart slurmdbd

 

Attention!!!

-
make sure the /etc/slurm/slurm.conf remains consistent in all nodes.