-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attach/detach command line interface changes broke nightly tests #2668
Comments
Just confirmed the log then not sure why the device /dev/nvme1 failed to detach as not
Configuration
git version : "1.11.1-41-ga9f667a"
nvme-cli 2.11
Paths
prefixdir : /usr/local
##### Creating 109
##### Attaching 109
##### Running IOs in 109
Testcase main ... /dev/nvme1 is not a block or character device
Usage: nvme detach-ns <device> [OPTIONS]
Detach the given namespace from the given controller; de-activates the given
namespace's ID. A namespace must be attached to a controller before IO
commands may be directed to that namespace.
Options:
[ --verbose, -v ] --- Increase output verbosity
[ --output-format=<FMT>, -o <FMT> ] --- Output format: normal|json|binary
[ --namespace-id=<NUM>, -n <NUM> ] --- namespace to attach
[ --controllers=<LIST>, -c <LIST> ] --- optional comma-sep controller id
list
[ --timeout=<NUM>, -t <NUM> ] --- timeout value, in milliseconds
FAIL
...
##### Detaching 1
#### Deleting 1
##### Detaching 2
teardown: ctrl: /dev/nvme1, ns1: /dev/nvme1n1, default_nsid: 1, flbas: 0
static int open_dev_direct(struct nvme_dev **devp, char *devstr, int flags, __u32 nsid)
{
...
if (!is_chardev(dev) && !is_blkdev(dev)) {
nvme_show_error("%s is not a block or character device", devstr);
errno = ENODEV;
err = -1;
goto err_close;
} |
Sorry just realized the error caused on master as tested by run-nightly-tests #88 https://github.com/linux-nvme/nvme-cli/actions/runs/12878797474. (Added) |
As far I understand the situation, before your change the test would open the char device Adding @keithbusch and @MaisenbacherD for awareness. |
@MaisenbacherD Can we add to check and |
I think we need something like this if err == 0:
# enumerate new namespace block device
self.nvme_reset_ctrl()
# check if new namespace block device exists
err = 0 if stat.S_ISBLK(os.stat(self.ns1).st_mode) else 1 for the |
Thank you! Noted it. By the way can we add aslo the -v option for the NS management tokunori@tokunori-desktop:~/nvme-cli$ nvme-build detach-ns /dev/nvme0 -n 2 -v
[sudo] tokunori のパスワード:
blkdev: /dev/nvme0n2, fd: -1
dev: /dev/nvme0, fd: 3
warning: empty controller-id list will result in no actual change in namespace attachment
NVMe status: Invalid Command Opcode: A reserved coded value or an unsupported value in the command opcode field(0x1)
NS management and attachment not supported
tokunori@tokunori-desktop:~/nvme-cli$ |
About the blkdev checking as you mentioned it is done in |
By the way do we need to execute the nvme_tests as root as same with nvme-cli? But failed to install nose2 after changed to su by tokunori@tokunori-desktop:~/nvme-cli$ nose2 --verbose --start-dir tests nvme_create_max_ns_test
test_attach_detach_ns (nvme_create_max_ns_test.TestNVMeCreateMaxNS.test_attach_detach_ns)
Testcase main ...
Using nvme binary 'nvme'
setup: ctrl: /dev/nvme0, ns1: /dev/nvme0n1, default_nsid: 1, flbas: 0
Usage: nvme id-ctrl <device> [OPTIONS]
Send an Identify Controller command to the given device and report
information about the specified controller in human-readable or binary
format. May also return vendor-specific controller attributes in hex-dump if
requested.
Options:
[ --verbose, -v ] --- Increase output verbosity
[ --output-format=<FMT>, -o <FMT> ] --- Output format: normal|json|binary
[ --vendor-specific, -V ] --- dump binary vendor field
[ --raw-binary, -b ] --- show identify in binary format
[ --human-readable, -H ] --- show identify in readable format
[ --timeout=<NUM>, -t <NUM> ] --- timeout value, in milliseconds
FAIL
======================================================================
FAIL: test_attach_detach_ns (nvme_create_max_ns_test.TestNVMeCreateMaxNS.test_attach_detach_ns)
Testcase main
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tokunori/nvme-cli/tests/nvme_create_max_ns_test.py", line 53, in setUp
self.nsze = int(self.get_ncap() /
^^^^^^^^^^^^^^^
File "/home/tokunori/nvme-cli/tests/nvme_test.py", line 285, in get_ncap
return int(self.get_id_ctrl_field_value("tnvmcap"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tokunori/nvme-cli/tests/nvme_test.py", line 301, in get_id_ctrl_field_value
self.assertEqual(err, 0, "ERROR : reading id-ctrl failed")
AssertionError: 1 != 0 : ERROR : reading id-ctrl failed
----------------------------------------------------------------------
Ran 1 test in 0.079s
FAILED (failures=1) |
Bummer, this would have been to easy. Yes, this testsuite needs to be run as root and as noted before it will write to the physical disk /dev/nvme0n1. So by very careful here. This was also the reason why these tests are disabled by default. |
Okay I will do retry to execute by the root later since failed to install nose2 after executed 'sudo su' earlier. |
I'd rather revert the change instead stacking stuff on top of each other. I'll try to find some time to look into it. If I don't figure out I'll revert the offending change. |
Noted. A TP error fixed by the PR #2673 changes and added to enable verbose debugging by it also. But still the cause not found. Thank you. |
By the way TP create_and_validate_ns() function checks the nsid parameter passed from the caller by the id-ns command but the NSID is selected by the controller by the create-ns command so is this really correct NS validation checking? Note: By the PR above added the blkdev checking also for the create NS. |
I will also try to allocate some time to look into this issue :) |
ERROR: test_attach_detach_ns (nvme_create_max_ns_test.TestNVMeCreateMaxNS.test_attach_detach_ns)
Testcase main
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/nvme-cli/nvme-cli/tests/nvme_create_max_ns_test.py", line 86, in test_attach_detach_ns
err = self.create_and_validate_ns(nsid,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/__w/nvme-cli/nvme-cli/tests/nvme_test.py", line 399, in create_and_validate_ns
err = 0 if stat.S_ISBLK(os.stat(self.ctrl + "n" + str(nsid)).st_mode) else 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/dev/nvme1n2'
|
No worries, this happens when working on code. The important thing is to fix it :) |
|
I think we want something like the below, completely untested: import subprocess
import os
import inotify.adapters
import time
def attach_ns(self, ctrl_id, nsid):
""" Wrapper for attaching the namespace.
- Args:
- ctrl_id : controller id to which namespace to be attached.
- nsid : new namespace id.
- Returns:
- 0 on success, error code on failure.
"""
attach_ns_cmd = f"{self.nvme_bin} attach-ns {self.ctrl} " + \
f"--namespace-id={str(nsid)} --controllers={ctrl_id} --verbose"
err = subprocess.call(attach_ns_cmd, shell=True, stdout=subprocess.DEVNULL)
if err != 0:
return err
# Device path
device_path = f"{self.ctrl}n{str(nsid)}"
watch_dir = os.path.dirname(device_path) or "/dev"
i = inotify.adapters.Inotify()
i.add_watch(watch_dir) # Setup inotify first
# Check immediately if the device already exists (avoids race condition)
if os.path.exists(device_path) and os.path.isfile(device_path):
i.remove_watch(watch_dir) # Cleanup inotify before returning
return 0
# Wait for inotify event
success = False
start_time = time.time()
timeout = 5 # seconds
for event in i.event_gen(yield_nones=False, timeout_s=timeout):
(_, type_names, path, filename) = event
if "IN_CREATE" in type_names and filename == os.path.basename(device_path):
success = True
break
if time.time() - start_time > timeout:
break
i.remove_watch(watch_dir) # Cleanup inotify
return 0 if success else 1 |
Looks good as change the TP to wait the blkdev by watching for 5 seconds. Thank you. (Added) |
this version here is slightly better and even tested. the pyinotify package is more common: import subprocess
import os
import pyinotify
import stat
import time
class EventHandler(pyinotify.ProcessEvent):
def __init__(self, target_file):
self.target_file = target_file
self.file_created = False
def process_IN_CREATE(self, event):
if event.pathname == self.target_file:
self.file_created = True
def attach_ns(self, ctrl_id, nsid):
""" Wrapper for attaching the namespace.
- Args:
- ctrl_id : controller id to which namespace to be attached.
- nsid : new namespace id.
- Returns:
- 0 on success, error code on failure.
"""
attach_ns_cmd = f"{self.nvme_bin} attach-ns {self.ctrl} " + \
f"--namespace-id={str(nsid)} --controllers={ctrl_id} --verbose"
err = subprocess.call(attach_ns_cmd, shell=True, stdout=subprocess.DEVNULL)
if err != 0:
return err
# Device path
device_path = f"{self.ctrl}n{str(nsid)}"
watch_dir = os.path.dirname(device_path) or "/dev"
# Start watching directory before checking file existence
wm = pyinotify.WatchManager()
handler = EventHandler(device_path)
notifier = pyinotify.Notifier(wm, handler)
wm.add_watch(watch_dir, pyinotify.IN_CREATE)
# Correct check: See if the block device already exists
if os.path.exists(device_path) and stat.S_ISBLK(os.stat(device_path).st_mode):
return 0
# Wait for the event or timeout
start_time = time.time()
timeout = 5 # seconds
while time.time() - start_time < timeout:
notifier.process_events()
if handler.file_created and stat.S_ISBLK(os.stat(device_path).st_mode):
return 0
if notifier.check_events(timeout=500): # 500ms
notifier.read_events()
return 1 # Timeout, device not created |
@igaw did you test this via nose2 or did you run the test code section by hand? |
Sorry - Just saw your PR and that you updated the container with installing python3-pyinotify. This helps :) |
No I didn't run the whole tests suite locally, just a factored out the function to see if it actually does anything useful. I am very carefully trying to avoid overwriting my local disk :) |
Unfortunatelly the test still failed with the latest TP as below. Run nose2 --verbose --start-dir tests \
nose[2](https://github.com/linux-nvme/nvme-cli/actions/runs/13072672459/job/36477555620#step:8:2) --verbose --start-dir tests \
nvme_attach_detach_ns_test \
nvme_compare_test \
...
nvme_writeuncor_test \
nvme_writezeros_test
shell: sh -e {0}
test_attach_detach_ns (nvme_attach_detach_ns_test.TestNVMeAttachDetachNSCmd.test_attach_detach_ns)
Testcase main ... ok
test_nvme_compare (nvme_compare_test.TestNVMeCompareCmd.test_nvme_compare)
Testcase main ... skipped because: Optional NVM Command 'Compare' (NVMCMPS) not supported
test_copy (nvme_copy_test.TestNVMeCopy.test_copy)
Testcase main ... ok
test_attach_detach_ns (nvme_create_max_ns_test.TestNVMeCreateMaxNS.test_attach_detach_ns)
Testcase main ... FAIL
NVMe status: Namespace Already Attached: The controller is already attached to the namespace specified(0x[4](https://github.com/linux-nvme/nvme-cli/actions/runs/13072672459/job/36477555620#step:8:4)118) The test failed as the NS attached already but I am not sure which NS number is failed actually and seems still failed on NS2 as same from the log below. ##### Testing max_ns: 128
##### Creating 1
##### Attaching 1
##### Running IOs in 1
Running io: dd if=/dev/nvme1n1 of=/dev/null bs=512 count=1 > /dev/null 2>&1
Running io: dd if=/dev/zero of=/dev/nvme1n1 bs=512 count=1 > /dev/null 2>&1
##### Creating 2
##### Attaching 2
teardown: ctrl: /dev/nvme1, ns1: /dev/nvme1n1, default_nsid: 1, flbas: 0
Using nvme binary '/__w/nvme-cli/nvme-cli/.build-ci/nvme'
setup: ctrl: /dev/nvme1, ns1: /dev/nvme1n1, default_nsid: 1, flbas: 0 By the way delete_all_ns() executed before the NSs creation by the setUp() but looks not checked if actually all NSs def delete_all_ns(self):
""" Wrapper for deleting all the namespaces.
- Args:
- None
- Returns:
- None
"""
delete_ns_cmd = f"{self.nvme_bin} delete-ns {self.ctrl} " + \
"--namespace-id=0xFFFFFFFF"
self.assertEqual(self.exec_cmd(delete_ns_cmd), 0)
list_ns_cmd = f"{self.nvme_bin} list-ns {self.ctrl} --all " + \
"--output-format=json"
proc = subprocess.Popen(list_ns_cmd,
shell=True,
stdout=subprocess.PIPE,
encoding='utf-8')
self.assertEqual(proc.wait(), 0, "ERROR : nvme list-ns failed")
json_output = json.loads(proc.stdout.read())
self.assertEqual(len(json_output['nsid_list']), 0,
"ERROR : deleting all namespace failed") (Added) |
Sorry let me correct below.
The delete_all_ns() function seems to check correctly the NSs if deleted by the following part. So not sure why the NS already attached error happend. list_ns_cmd = f"{self.nvme_bin} list-ns {self.ctrl} --all " + \
"--output-format=json"
proc = subprocess.Popen(list_ns_cmd,
shell=True,
stdout=subprocess.PIPE,
encoding='utf-8')
self.assertEqual(proc.wait(), 0, "ERROR : nvme list-ns failed")
json_output = json.loads(proc.stdout.read())
self.assertEqual(len(json_output['nsid_list']), 0,
"ERROR : deleting all namespace failed") |
There is a difference below for the NS attachment command between the blkdev changes by the commit c276f3d.
Then just I reallized the issue cause as below. (Added) static int open_blkdev_direct(char *dev, int flags, __u32 nsid)
{
_cleanup_free_ char *blkdev = NULL;
int fd = -1;
if (is_nvme_dev(dev) && nsid) {
if (asprintf(&blkdev, "%sn%d", dev, nsid) < 0)
blkdev = NULL;
}
if (blkdev) {
fd = open(blkdev, flags);
print_info("blkdev: %s, fd: %d\n", blkdev, fd);
}
if (fd < 0) {
fd = open(dev, flags);
print_info("dev: %s, fd: %d\n", dev, fd);
}
return fd;
} |
It appears #2665 breaks our nightly builds:
https://github.com/linux-nvme/nvme-cli/actions/runs/12878797474/job/35905289626#step:8:761
@ikegami-t could you have a look?
BTW, be careful running these tests locally. These tests do write to the local hardware, e.g. /dev/nvme0n1
The text was updated successfully, but these errors were encountered: