Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shine cannot unload modules when using the lnet.service #211

Open
btravouillon opened this issue Aug 30, 2019 · 3 comments
Open

shine cannot unload modules when using the lnet.service #211

btravouillon opened this issue Aug 30, 2019 · 3 comments

Comments

@btravouillon
Copy link
Contributor

I'm using the lnet.service and /etc/lnet.conf to configure the LNet on my servers and clients:

[root@mds1 ~]# grep -v "^#" /etc/lnet.conf 
net:
    - net type: tcp1
      local NI(s):
        - interfaces:
              0: eth0

This service loads the lnet module, configure the lnet, then import the /etc/lnet.conf.

[root@mds1 ~]# systemctl cat lnet.service|grep Exec
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

shine stop reports an error while trying to remove the Lustre modules from the kernel:

[root@mds1 ~]# shine stop
[17:53] In progress for 4 component(s) on oss[1-2] ...
oss1: Unload modules failed
oss1: >> rmmod: ERROR: Module ksocklnd is in use
oss2: Unload modules failed
oss2: >> rmmod: ERROR: Module ksocklnd is in use
mds1: Unload modules failed
mds1: >> rmmod: ERROR: Module ksocklnd is in use
Stop successful.
= FILESYSTEM STATUS (scratch) =
TYPE # STATUS  NODES
---- - ------  -----
MGT  1 offline mds1
MDT  1 offline mds1
OST  4 offline oss[1-2]

It would need to unconfigure the lnet before trying to remove the lnet module from the kernel.

The simpler solution would be to stop unloading the modules when running shine stop. :-)
I can rebase and enhance https://review.gerrithub.io/c/cea-hpc/shine/+/367989

Then we could plan to add support for the lnet.service if you believe this is worthwhile.

@martinetd
Copy link

I think running lnetctl lnet configure + lnetctl import <configured file> after module load and runing lnetctl lnet unconfigure before module unload might make more sense.

The lnet.service really is too far from how shine expects the system to be configured, but having an /etc/lnet.conf would be much more flexible than kernel module parameters.

@degremont
Copy link
Collaborator

I think both are doable.

Supporting lnetctl import /etc/lnet.conf is definitely something useful that Shine should support.

Delegating the modules/router supports to external scripts is fine to me, as an optional step. Relying on module_unload=false feature should able to achieve that? We need to update the current patch to disable StartRouter/StopRouter or add additional flags

@martinetd
Copy link

martinetd commented Sep 19, 2019

pushed https://review.gerrithub.io/c/cea-hpc/shine/+/468899 as a draft, 100% untested code - will work on that tomorrow morning if life allows, but comments on overall architecture are welcome earlier
(EDIT: didn't go for external script but that'd work for me too, happy to change what I started with in that direction)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants