Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate buildroot and add support for QEMU Linux target #667

Merged
merged 5 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions devlib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
ChromeOsTarget,
)

from devlib.target_runner import QEMUTargetRunner

from devlib.host import (
PACKAGE_BIN_DIRECTORY,
LocalConnection,
Expand Down
35 changes: 34 additions & 1 deletion devlib/target.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#

import asyncio
from contextlib import contextmanager
import io
import base64
import functools
Expand Down Expand Up @@ -493,7 +494,7 @@ async def setup(self, executables=None):
# Check for platform dependent setup procedures
self.platform.setup(self)

# Initialize modules which requires Buxybox (e.g. shutil dependent tasks)
# Initialize modules which requires Busybox (e.g. shutil dependent tasks)
self._update_modules('setup')

await self.execute.asyn('mkdir -p {}'.format(quote(self._file_transfer_cache)))
Expand Down Expand Up @@ -1071,6 +1072,38 @@ async def write_value(self, path, value, verify=True, as_root=True):
else:
raise

@contextmanager
def make_temp(self, is_directory=True, directory='', prefix='devlib-test'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would path be a more descriptive parameter name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's working directory?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind the function is designed to create a new temp directory/file under the path provided from the user. The default location is just set to the working directory.

Having said this, after checking the documentation for tempfile.mkstemp I see they use dir as parameter name so not a strong opinion on this.

"""
Creates temporary file/folder on target and deletes it once it's done.

:param is_directory: Specifies if temporary object is a directory, defaults to True.
:type is_directory: bool, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This , optional part is not recognized by Sphinx and results in a broken class reference:

foobar.py:1: WARNING: py:class reference target not found: optional

The typical way an optional parameter is implemented in Python is by using None as default value. It follows that the type to give with Sphinx would become bool or None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see , optional on https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html. Is that wrong doc for Sphinx?
How do you verify docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be the right doc, but I think that either the doc is wrong or it's actually the doc on how to contribute to Sphinx itself, which may use something else than plain autodoc. That syntax is part of the numpy style, which requires the napoleon Sphinx extension. Apparently that napoleon extension pre-processes the docstring before autodoc sees it.

How do you verify docstring?

You build the doc and check for warnings (lisa-doc-build for LISA). I tried by modifying a docstring in LISA. Since devlib's doc is not configured with autodoc, building the doc won't do anything as nothing will touch those docstrings.

Either way, if the function accepts a type T or None (which is the standard way of making a parameter "optional" in the common understanding of the term[1]), then it's actually more accurate to state T or None.

[1] You can make a parameter actually optional without any default value with the variable keyword argument syntax, but that's definitely not something you'd do on a daily basis.

Copy link
Collaborator

@douglas-raillard-arm douglas-raillard-arm Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You build the doc and check for warnings

Another way would be to setup a test sphinx project, which is very easy too. Or port devlib to use autodoc. It's not particularly difficult, just an invasive PR (all existing function documentations need to be moved back as docstrings). If you were to do that, it would probably take ~1week (or maybe less) and I think it would be reasonable to as @marcbonnici to freeze merging any other stuff so you don't end up with conflicts while working on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not go for updating entire devlib repo now :)
Would you be OK if I just remove , optional part from docstring? Otherwise, it does not seem trivial to verify docstrings' Sphinx compatibility in devlib.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes just go for replacing , optional by or None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If implementing a temporary code freeze to make such changes would be beneficial I'm happy to coordinate to do so.


:param directory: Temp object will be created under this directory,
metin-arm marked this conversation as resolved.
Show resolved Hide resolved
defaults to :attr:`Target.working_directory`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that

:attr:`Target.working_directory`

actually works when building a Sphinx project ? This would requires working_directory to be an item with its own docstring so that it's a valid reference target, which is very unlikely considered it's an instance attribute. Short of having a proper reference, you could just use:

 ``Target.working_directory``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type directory: str, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


:param prefix: Prefix of temp object's name, defaults to 'devlib-test'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the defaults to 'devlib-test' as this will be already shown to the user if the documentation is built with autodoc. It currently is not, but since it's not this docstring will currently not appear anywhere anyway, so really it's done in preparation for the day where the switch actually happens. That day we will also get intersphinx working for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type prefix: str, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


:yield: Full path of temp object.
:rtype: str
"""

directory = directory or self.working_directory
temp_obj = None
try:
cmd = f'mktemp -p {directory} {prefix}-XXXXXX'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directory needs to be escaped with shlex.quote(). Same goes for prefix or any user-input, just like when building SQL queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if is_directory:
cmd += ' -d'

temp_obj = self.execute(cmd).strip()
yield temp_obj
finally:
if temp_obj is not None:
self.remove(temp_obj)

def reset(self):
try:
self.execute('reboot', as_root=self.needs_su, timeout=2)
Expand Down
267 changes: 267 additions & 0 deletions devlib/target_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# Copyright 2024 ARM Limited
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to expose that module publicly. This is certainly not needed for the "normal devlib operation", and for now is really only needed for testing devlib/WA/LISA. These 3 projects could just use a private module on a best-effort basis without having to fully commit in front of the entire world to backward compatibility in any of these features.

This is especially important given that android-related things tend to change on a regular basis, and that overall this emulation business is pretty complex, so we may want at some point to use it in a different way, or switch to a different backend (e.g. use VirtualBox) or whatever. We don't want to be constrained by backward compat when doing such changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""
Target runner and related classes are implemented here.
"""

import logging
import os
import signal
import subprocess
import time
from platform import machine

from devlib.exception import (TargetStableError, HostError)
from devlib.target import LinuxTarget
from devlib.utils.misc import get_subprocess, which
from devlib.utils.ssh import SshConnection


class TargetRunner:
"""
A generic class for interacting with targets runners.

It mainly aims to provide framework support for QEMU like target runners
(e.g., :class:`QEMUTargetRunner`).

:param runner_cmd: The command to start runner process (e.g.,
``qemu-system-aarch64 -kernel Image -append "console=ttyAMA0" ...``).
:type runner_cmd: str

:param target: Specifies type of target per :class:`Target` based classes.
:type target: Target

:param connect: Specifies if :class:`TargetRunner` should try to connect
target after launching it, defaults to True.
:type connect: bool, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


:param boot_timeout: Timeout for target's being ready for SSH access in
seconds, defaults to 60.
:type boot_timeout: int, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


:raises HostError: if it cannot execute runner command successfully.

:raises TargetStableError: if Target is inaccessible.
"""

def __init__(self,
runner_cmd,
target,
connect=True,
boot_timeout=60):
self.boot_timeout = boot_timeout
self.target = target

self.logger = logging.getLogger(self.__class__.__name__)

self.logger.info('runner_cmd: %s', runner_cmd)

try:
self.runner_process = get_subprocess(list(runner_cmd.split()))
except Exception as ex:
raise HostError(f'Error while running "{runner_cmd}": {ex}') from ex

if connect:
self.wait_boot_complete()

def __enter__(self):
"""
Complementary method for contextmanager.

:return: Self object.
:rtype: TargetRunner
"""

return self

def __exit__(self, *_):
"""
Exit routine for contextmanager.

Ensure :attr:`TargetRunner.runner_process` is terminated on exit.
"""

self.terminate()

def wait_boot_complete(self):
"""
Wait for target OS to finish boot up and become accessible over SSH in at most
:attr:`TargetRunner.boot_timeout` seconds.

:raises TargetStableError: In case of timeout.
"""

start_time = time.time()
elapsed = 0
while self.boot_timeout >= elapsed:
try:
self.target.connect(timeout=self.boot_timeout - elapsed)
self.logger.info('Target is ready.')
return
# pylint: disable=broad-except
except BaseException as ex:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rational for catching BaseException here instead of Exception ? Any exception that is not a subclass of Exception is typically meant to be fatal and never caught (apart from very specific scenarios). For example, hitting ctrl-C will raise a KeyboardInterrupt, which is a BaseException (like any other exception), but not an Exception. Treating it as such would be a mistake most of time time, e.g. if you had some retry logic and caught BaseException, hitting ctrl-C would just spin it for another round rather than exiting as the user would expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced it with Exception.

self.logger.info('Cannot connect target: %s', ex)

time.sleep(1)
elapsed = time.time() - start_time

self.terminate()
raise TargetStableError(f'Target is inaccessible for {self.boot_timeout} seconds!')

def terminate(self):
"""
Terminate :attr:`TargetRunner.runner_process`.
"""

if self.runner_process is None:
return

try:
self.runner_process.stdin.close()
self.runner_process.stdout.close()
self.runner_process.stderr.close()

if self.runner_process.poll() is None:
self.logger.debug('Terminating target runner...')
os.killpg(self.runner_process.pid, signal.SIGTERM)
# Wait 3 seconds before killing the runner.
self.runner_process.wait(timeout=3)
except subprocess.TimeoutExpired:
self.logger.info('Killing target runner...')
os.killpg(self.runner_process.pid, signal.SIGKILL)
Comment on lines +133 to +144
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a good reason to do so, this code should use Popen.kill() rather than fetching the PID and manually passing it to os.killpg(). Popen.kill() has extra checks that will vastly reduce the race condition window between the process terminating and the PID being recycled. As it stands, runner_process.pid could have finished a long time ago, its PID recycled, and we then just randomly kill that other process that is totally unrelated to devlib.

Also, even better: use Popen as a context manager. It will close stdin/stdout/stderr and wait for the process to complete, which is basically what is happening here (there won't be any timeout feature, but is it really needed ?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We first try to terminate the process gracefully, and then forcefully kill it if the process does not exit in 3 seconds.

Replaced the logic in terminate() with Popen.kill() for now. Did you mean removing wait_boot_complete() as well and use Popen as context manager?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call Popen.__exit__ in terminate. That won't forcefully kill it though, so if it needs a signal to exit, you can simplify all that into:

p = self.runner_process
p.kill(...)
p.__exit__(None, None, None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on #677 .



class QEMUTargetRunner(TargetRunner):
"""
Class for interacting with QEMU runners.

:class:`QEMUTargetRunner` is a subclass of :class:`TargetRunner` which performs necessary
groundwork for launching a guest OS on QEMU.

:param qemu_settings: A dictionary which has QEMU related parameters. The full list
of QEMU parameters is below:
* ``kernel_image``: This is the location of kernel image (e.g., ``Image``) which
will be used as target's kernel.

* ``arch``: Architecture type. Defaults to ``aarch64``.

* ``cpu_types``: List of CPU ids for QEMU. The list only contains ``cortex-a72`` by
default. This parameter is valid for Arm architectures only.

* ``initrd_image``: This points to the location of initrd image (e.g.,
``rootfs.cpio.xz``) which will be used as target's root filesystem if kernel
does not include one already.

* ``mem_size``: Size of guest memory in MiB.

* ``num_cores``: Number of CPU cores. Guest will have ``2`` cores by default.

* ``num_threads``: Number of CPU threads. Set to ``2`` by defaults.

* ``cmdline``: Kernel command line parameter. It only specifies console device in
default (i.e., ``console=ttyAMA0``) which is valid for Arm architectures.
May be changed to ``ttyS0`` for x86 platforms.

* ``enable_kvm``: Specifies if KVM will be used as accelerator in QEMU or not.
Enabled by default if host architecture matches with target's for improving
QEMU performance.
:type qemu_settings: Dict

:param connection_settings: the dictionary to store connection settings
of :attr:`Target.connection_settings`, defaults to None.
:type connection_settings: Dict, optional

:param make_target: Lambda function for creating :class:`Target` based
object, defaults to :func:`lambda **kwargs: LinuxTarget(**kwargs)`.
:type make_target: func, optional

:Variable positional arguments: Forwarded to :class:`TargetRunner`.

:raises FileNotFoundError: if QEMU executable, kernel or initrd image cannot be found.
"""

def __init__(self,
qemu_settings,
connection_settings=None,
# pylint: disable=unnecessary-lambda
make_target=lambda **kwargs: LinuxTarget(**kwargs),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be eta-reduced (i.e. the lambda layer is useless, LinuxTarget is a class and as such is callable). That will also allow dropping the explicit mention of the default value in the docstring, as Target should be formatted properly by autodoc when building the Sphinx doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

**args):
self.connection_settings = {
'host': '127.0.0.1',
'port': 8022,
'username': 'root',
'password': 'root',
'strict_host_check': False,
}

if connection_settings is not None:
self.connection_settings = self.connection_settings | connection_settings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union of dict is brought by PEP 584, which was implemented in Python 3.8. For now, devlib supports down to 3.7. The backward-compatible equivalent would be:

        if connection_settings is not None:
            self.connection_settings = {**self.connection_settings, **connection_settings}

and you can further avoid a separate path with:

self.connection_settings = {**self.connection_settings, **(connection_settings or {})}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


qemu_args = {
'kernel_image': '',
'arch': 'aarch64',
'cpu_type': 'cortex-a72',
'initrd_image': '',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the consuming code checks to see if the value is fals-y. This could be avoided by simply not having the key by default. That typically results in cleaner code, as "having a value" means "the value must be used", and "not having a value" will result in a KeyError that can be explicitly handled. Otherwise, you end up with in-band signaling (some values are "good to use" and others are "not good to use"), on top of always having the risk of a KeyError because it's just a dict after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removed empty values.

'mem_size': 512,
'num_cores': 2,
'num_threads': 2,
'cmdline': 'console=ttyAMA0',
'enable_kvm': True,
}

qemu_args = qemu_args | qemu_settings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on PEP 584

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


qemu_executable = f'qemu-system-{qemu_args["arch"]}'
qemu_path = which(qemu_executable)
if qemu_path is None:
raise FileNotFoundError(f'Cannot find {qemu_executable} executable!')

if not os.path.exists(qemu_args["kernel_image"]):
raise FileNotFoundError(f'{qemu_args["kernel_image"]} does not exist!')

# pylint: disable=consider-using-f-string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with pylint here, that would make the code more readable. If that was not possible for whatever reason, I'd still use the keyword arguments of format() to provide named placeholders, rather than positional ones. Otherwise it's very easy to shift everything by one location and break the code, making it needlessly harder to modify in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to f-string.

qemu_cmd = '''\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the \ does not seem to really serve any purpose does it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prevents prepending leading spaces in following line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK having the space or not won't change anything. If it does, it's probably a bug in devlib. You can also just use .strip() on the final string if you really needed to trim it. And if you want to preserve normal code indentation level in a multiline string without wanting the indentation inside the string itself, you can use textwrap.dedent

Copy link
Contributor Author

@metin-arm metin-arm Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having leading spaces is not a functional issue. Just a cosmetic one.
Will use .strip().

{} -kernel {} -append "{}" -m {} -smp cores={},threads={} -netdev user,id=net0,hostfwd=tcp::{}-:22 \
-device virtio-net-pci,netdev=net0 --nographic\
'''.format(
qemu_path,
qemu_args["kernel_image"],
qemu_args["cmdline"],
qemu_args["mem_size"],
qemu_args["num_cores"],
qemu_args["num_threads"],
self.connection_settings["port"],
)

if qemu_args["initrd_image"]:
if not os.path.exists(qemu_args["initrd_image"]):
raise FileNotFoundError(f'{qemu_args["initrd_image"]} does not exist!')

qemu_cmd += f' -initrd {qemu_args["initrd_image"]}'

if qemu_args["arch"] == machine():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that whatever Python returns in platform.machine() actually agrees with qemu arch names ? This is typically a can of worm, where half the world uses arm64, the other half aarch64 and I'm not even talking about 32bits arm. x86 is usually ok, but even there you can end up with x86-64 vs x86_64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaxed the check by machine().startswith('x86').

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct though ? I know that for now arm64 hosts are not really supported in LISA, but devlib might have a different standing there, and we also don't want to prevent it from working the day the rest (e.g. adb) is actually ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new version of the code in #677:

        if qemu_args['arch'].startswith('x86') and machine().startswith('x86'):
            if qemu_args["enable_kvm"]:
                qemu_cmd += ' --enable-kvm'
        else:
            qemu_cmd += f' -machine virt -cpu {qemu_args["cpu_type"]}'

I want to enable KVM only for x86 and specify -machine virt -cpu ... only if the target machine/guest is Arm.
Will this hurt devlib's current arm64 support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to enable KVM only for x86

Is KVM not working on aarch64 ?

specify -machine virt -cpu ... only if the target machine/guest is Arm

What does -machine virt do that we don't want done on x86 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is KVM not working on aarch64 ?

Works for sure, but there are some restrictions: https://stackoverflow.com/a/71220201.

What does -machine virt do that we don't want done on x86 ?

-machine virt is unsupported for x86_64. qemu-system-x86_64 -machine help | grep -i virt returns no match.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at those restrictions and they don't seem to be too problematic. No-one is running an arm 32bits host these days for any CI system AFAIK, and -cpu host -machine gic-version=host does not sound too bad considered the speed benefits (unless this has a consequence I can't imagine).

-machine virt is unsupported for x86_64

Fair enough, so that's really attached to x86 target itself and not assuming the host is x86.

Make sure that in both cases, you add a comment stating the reason why this code takes those decisions so it can be revisited easily in the future (e.g. one day x86_64 might support virt).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-cpu host and gic-version options are not supported for qemu-system-aarch64 on an x86_64 host.
Also https://wiki.ubuntu.com/ARM64/QEMU says that it's not possible to use KVM accelerator for a different architecture (e.g., x86_64 host, aarch64 guest) which makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a small improvement regarding the host/guest checks. If we don't have any open comments left here, let's continue reviewing on #677 please.

if qemu_args["enable_kvm"]:
qemu_cmd += ' --enable-kvm'
else:
qemu_cmd += f' -machine virt -cpu {qemu_args["cpu_type"]}'

self.target = make_target(connect=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigning to self.target is the responsibility of the base class's __init__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

conn_cls=SshConnection,
connection_settings=self.connection_settings)

super().__init__(runner_cmd=qemu_cmd,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All user input that makes it to qemu_cmd must be escaped using shlex.quote(). I know this is cumbersome, but the only other way to deal with that problem is to ask for the command as a list like subprocess does by default. devlib does not for various reasons (although we could easily add such an API, maybe we should ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is addressed by self.runner_process = get_subprocess(list(runner_cmd.split())) in TargetRunner class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_subprocess(list(runner_cmd.split()))

This is broken. Either the runner_cmd is already a list and nothing needs to be split, or it's not and it needs to be split with shlex.split(). Just using str.split() will wreck any escaping that was applied. So considered the situation, we should be manipulating only lists here, and qemu_cmd should be a list as well. We could use strings and then use shlex.split() but that's pointless and will just require extra work to create the string (shlex.quote on every single user input)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted qemu_cmd to a list of string/parameters.

target=self.target,
**args)
43 changes: 25 additions & 18 deletions devlib/utils/ssh.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,25 +367,32 @@ def __init__(self,
else:
logger.debug('Using SFTP for file transfer')

self.client = self._make_client()
atexit.register(self.close)

# Use a marker in the output so that we will be able to differentiate
# target connection issues with "password needed".
# Also, sudo might not be installed at all on the target (but
# everything will work as long as we login as root). If sudo is still
# needed, it will explode when someone tries to use it. After all, the
# user might not be interested in being root at all.
self._sudo_needs_password = (
'NEED_PASSWORD' in
self.execute(
# sudo -n is broken on some versions on MacOSX, revisit that if
# someone ever cares
'sudo -n true || echo NEED_PASSWORD',
as_root=False,
check_exit_code=False,
self.client = None
try:
self.client = self._make_client()
atexit.register(self.close)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR but we should probably fix this: if we keep creating connection objects (and we do because of the multithreading support), we end up:

  • accumulating atexit handlers without ever removing them
  • leaking the connection object, as a reference to it will stay alive in the method's __self__

let me know if you want to get your hands dirty in low level Python, otherwise I'll do it

Copy link
Contributor Author

@metin-arm metin-arm Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave it up to you :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are #679


# Use a marker in the output so that we will be able to differentiate
# target connection issues with "password needed".
# Also, sudo might not be installed at all on the target (but
# everything will work as long as we login as root). If sudo is still
# needed, it will explode when someone tries to use it. After all, the
# user might not be interested in being root at all.
self._sudo_needs_password = (
'NEED_PASSWORD' in
self.execute(
# sudo -n is broken on some versions on MacOSX, revisit that if
# someone ever cares
'sudo -n true || echo NEED_PASSWORD',
as_root=False,
check_exit_code=False,
)
)
)

except BaseException:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases where BaseException is appropriate: we want to cleanup the situation as far as we can in case we get interrupted, even if the user asked for the interruption. Since we re-raise the exception, we are not going to accidentally mask it. The only bad thing I could see happen is this code being called from a loop that retries in case of failures. If __init__ failed because the user used ctrl-c, leading to a broken state in the client, leading to client.close() to raise an exception. Then this exception would propagate, and the retry logic would swallow it and retry. The fix would be ensuring the initial exception is always preserved:

        except BaseException as e:
            try:
                if self.client is not None:
                    self.client.close()
            except BaseException:
                raise e

Copy link
Contributor Author

@metin-arm metin-arm Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see following error log if I include the snippet above:

Exception ignored in: <function ConnectionBase.__del__ at 0x78aa6ff9fa30>
Traceback (most recent call last):
  File "/data_nvme1n1/devlib/devlib/utils/misc.py", line 960, in wrapper
    return f_(*args, **kwargs)
  File "/data_nvme1n1/devlib/devlib/connection.py", line 121, in __del__
    self.close()
  File "/data_nvme1n1/devlib/devlib/utils/misc.py", line 960, in wrapper
    return f_(*args, **kwargs)
  File "/data_nvme1n1/devlib/devlib/connection.py", line 111, in close 
    self._close()
  File "/data_nvme1n1/devlib/devlib/utils/misc.py", line 960, in wrapper
    return f_(*args, **kwargs)
  File "/data_nvme1n1/devlib/devlib/utils/ssh.py", line 727, in _close 
    self.client.close()
AttributeError: 'NoneType' object has no attribute 'close'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug that needs to be fixed separately. When __init__ is called, the object is already created, as __init__ is not a constructor per say, it only initializes an already-constructed object. That means that even if __init__ ends up raising, __del__ will still be called. So __del__ needs to be prepared to deal with an object where __init__ never ran (or e.g. raised on the first line before setting any attribute).

I'm not quite sure why that problem is masked by your initial code though, but I suppose it doesn't really matter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: actually there is an issue with my code, it should be:

        except BaseException as e:
            try:
                if self.client is not None:
                    self.client.close()
            finally:
                raise e

The reason you saw that issue is because the exception was discarded and never re-raised in the self.client.close() path. This means that the metaprogramming __del__ relies on told it that the object finished initializing cleanly, but in reality it did not and was missing an attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this patch.

if self.client is not None:
self.client.close()
raise

def _make_client(self):
if self.strict_host_check:
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Contents:
derived_measurements
platform
connection
tools

Indices and tables
==================
Expand Down
Loading