Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONiC VM with vrnetlab does not come up healthy #2365

Open
mfzhsn opened this issue Dec 24, 2024 · 7 comments
Open

SONiC VM with vrnetlab does not come up healthy #2365

mfzhsn opened this issue Dec 24, 2024 · 7 comments
Labels

Comments

@mfzhsn
Copy link

mfzhsn commented Dec 24, 2024

Build: 20241224.7
Commit: ea4ccc0377
Version: 202405

Issue:

[root@srossonic sonic-lab]# docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED         STATUS                     PORTS                                 NAMES
779e86c4ecde   ghcr.io/nokia/srlinux            "/tini -- fixuid -q …"   7 minutes ago   Up 7 minutes                                                     srl
d5562a89fd15   vrnetlab/sonic_sonic-vs:202405   "/bin/bash"              7 minutes ago   Up 7 minutes (unhealthy)   22/tcp, 443/tcp, 5000/tcp, 8080/tcp   sonic

Waited over 10 minutes and nothing happens, Also when I console in there is nothing running inside sonic-vrnet container

[root@srossonic sonic-lab]# docker exec -ti sonic bash
root@sonic:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 22:25 pts/0    00:00:00 /bin/bash
root         128       0  0 22:34 pts/1    00:00:00 bash
root         134     128  0 22:34 pts/1    00:00:00 ps -ef

Step-1: Built SONiC image

drwxr-xr-x 2 root root         58 Dec 24 16:16 docker
-rw-r--r-- 1 root root        269 Dec 24 16:16 Makefile
-rw-r--r-- 1 root root       1305 Dec 24 16:16 README.md
-rw-r--r-- 1 root root 6123618304 Dec 24 16:19 sonic-vs-202405.qcow2
[root@srossonic sonic]# make
for IMAGE in sonic-vs-202405.qcow2; do \
	echo "Making $IMAGE"; \
	make IMAGE=$IMAGE docker-build; \
	make IMAGE=$IMAGE docker-clean-build; \
done
Making sonic-vs-202405.qcow2
make[1]: Entering directory '/home/vrnetlab/sonic'
--> Cleaning docker build context
rm -f docker/*.qcow2* docker/*.tgz* docker/*.vmdk* docker/*.iso docker/*.xml docker/*.bin
rm -f docker/healthcheck.py docker/vrnetlab.py
Building docker image using sonic-vs-202405.qcow2 as vrnetlab/sonic_sonic-vs:202405
cp ../common/* docker/
make IMAGE=$IMAGE docker-build-image-copy
make[2]: Entering directory '/home/vrnetlab/sonic'
cp sonic-vs-202405.qcow2* docker/
make[2]: Leaving directory '/home/vrnetlab/sonic'
(cd docker; docker build --build-arg http_proxy= --build-arg HTTP_PROXY= --build-arg https_proxy= --build-arg HTTPS_PROXY= --build-arg IMAGE=sonic-vs-202405.qcow2 --build-arg VERSION=202405 --label "vrnetlab-version=$(git log -1 --format=format:"Commit: %H from %aD")" -t vrnetlab/sonic_sonic-vs:202405 .)
[+] Building 84.7s (10/10) FINISHED                                                                                                                                                                          docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                                   0.0s
 => => transferring dockerfile: 495B                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                        0.0s
 => [internal] load metadata for public.ecr.aws/docker/library/debian:bookworm-slim                                                                                                                                    1.1s
 => [1/5] FROM public.ecr.aws/docker/library/debian:bookworm-slim@sha256:1537a6a1cbc4b4fd401da800ee9480207e7dc1f23560c21259f681db56768f63                                                                              2.7s
 => => resolve public.ecr.aws/docker/library/debian:bookworm-slim@sha256:1537a6a1cbc4b4fd401da800ee9480207e7dc1f23560c21259f681db56768f63                                                                              0.0s
 => => sha256:1537a6a1cbc4b4fd401da800ee9480207e7dc1f23560c21259f681db56768f63 8.56kB / 8.56kB                                                                                                                         0.0s
 => => sha256:b73bf02f32434c9be21adf83b9aedf33e731784d8d2dacbbd3ce5f4993f2a2de 1.02kB / 1.02kB                                                                                                                         0.0s
 => => sha256:a815f2ceb3b0c8e16829cfa5c6b5a96dad4d17f5e35be3d52ee81ce2e3cc0ced 453B / 453B                                                                                                                             0.0s
 => => sha256:bc0965b23a04fe7f2d9fb20f597008fcf89891de1c705ffc1c80483a1f098e4f 28.23MB / 28.23MB                                                                                                                       0.7s
 => => extracting sha256:bc0965b23a04fe7f2d9fb20f597008fcf89891de1c705ffc1c80483a1f098e4f                                                                                                                              1.9s
 => [internal] load build context                                                                                                                                                                                     42.5s
 => => transferring context: 6.13GB                                                                                                                                                                                   40.6s
 => [2/5] RUN apt-get update -qy    && apt-get install -y --no-install-recommends    bridge-utils    iproute2    python3-ipy    qemu-kvm    qemu-utils    socat    ssh    sshpass    && rm -rf /var/lib/apt/lists/*   19.1s
 => [3/5] COPY sonic-vs-202405.qcow2* /                                                                                                                                                                               16.0s
 => [4/5] COPY *.py /                                                                                                                                                                                                  0.0s
 => [5/5] COPY backup.sh /                                                                                                                                                                                             0.0s
 => exporting to image                                                                                                                                                                                                25.1s
 => => exporting layers                                                                                                                                                                                               25.1s
 => => writing image sha256:b0ac5c5b2016cd5eef22783269ce8d85dbd2b43e87b41e3b1f84bbcfbc183bd6                                                                                                                           0.0s
 => => naming to docker.io/vrnetlab/sonic_sonic-vs:202405                                                                                                                                                              0.0s
make[1]: Leaving directory '/home/vrnetlab/sonic'
make[1]: Entering directory '/home/vrnetlab/sonic'
--> Cleaning docker build context
rm -f docker/*.qcow2* docker/*.tgz* docker/*.vmdk* docker/*.iso docker/*.xml docker/*.bin
rm -f docker/healthcheck.py docker/vrnetlab.py
make[1]: Leaving directory '/home/vrnetlab/sonic'
@mfzhsn mfzhsn added bug Something isn't working sonic labels Dec 24, 2024
@hellt
Copy link
Member

hellt commented Dec 25, 2024

Did you check docker logs for sonic node?

@sudurais
Copy link

Due to some issue with config or other, VM gets shutdown after few syncd/swss restart I think. However, there isn't any log captured outside of the VM to see what went wrong. Is that something containerlab can capture?

root@sonic-1:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 2 23:32 pts/0 00:00:37 python3 /launch.py --username admin --password admin --hostname sonic-1 --connection-mode tc --trace
root 369 0 0 23:57 pts/1 00:00:00 bash
root 375 369 0 23:57 pts/1 00:00:00 ps -ef
root@sonic-1:/#

@sudurais
Copy link

Upon qemu exit (for whatever reason, yet to troubleshoot whats wrong with config or sonic)

root@sonic:~#'
2024-12-31 00:00:13,492: vrnetlab DEBUG writing to serial console: 'sleep 1'
2024-12-31 00:00:13,492: launch INFO completed bootstrap configuration
2024-12-31 00:00:13,492: launch TRACE Backup file /config/config_db.json exists
Copying startup config file to the VM...
Waiting for VM's SSH to become available... (Attempt 1/30)
Waiting for VM's SSH to become available... (Attempt 2/30)
Waiting for VM's SSH to become available... (Attempt 3/30)
SSH connection established.
Running command: /usr/local/bin/sonic-cfggen -j /tmp/config_db.json --write-to-db
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json
2024-12-31 00:00:47,135: launch INFO Startup complete in: 0:01:26.024013
2024-12-31 00:24:04,548: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,548: vrnetlab INFO STDERR:
2024-12-31 00:24:04,549: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,549: vrnetlab INFO STDERR:
2024-12-31 00:24:04,549: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,549: vrnetlab INFO STDERR:
2024-12-31 00:24:04,550: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,550: vrnetlab INFO STDERR:
2024-12-31 00:24:04,550: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,550: vrnetlab INFO STDERR:
2024-12-31 00:24:04,551: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,551: vrnetlab INFO STDERR:
2024-12-31 00:24:04,551: vrnetlab INFO STDOUT:
2024-12-31 00:24:04,551: vrnetlab INFO STDERR:
2024-12-31 00:24:04,552: vrnetlab INFO STDOUT:

clab docker continues to run with above message.. its keep printing above..

@hellt
Copy link
Member

hellt commented Dec 31, 2024

maybe first thing to try is to remove any startup config and see if the node comes up

@sudurais
Copy link

yeah. tried removing startup config as well.. vm doesnt stay up beyond 7-10 minutes.. it comes up fine and even bgp peers are established (if startup config given).. but it disappears (entire vm) after sometime...... same image runs fine with sonic-mgmt env.... to confirm, there is no issue with image and ofcourse, no config at all.

@sudurais
Copy link

sudurais commented Jan 2, 2025

in my case, issue is due to memory.

sonic-1:
  kind: sonic-vm
  image: vrnetlab/sonic_sonic-vs:2024.05
  startup-config: /home/ubuntu/config_db.json
  cpu: 4
  memory: 4GB

4GB isn't enough.. Not just in topo config, launch.py (in vrnetlab as well)


class SONiC_vm(vrnetlab.VM):
    def __init__(self, hostname, username, password, conn_mode):
        disk_image = "/"
        for e in os.listdir("/"):
            if re.search(".qcow2$", e):
                disk_image = "/" + e
                break
        super(SONiC_vm, self).__init__(
            username, password, disk_image=disk_image, ram=8196
        )
        self.qemu_args.extend(["-smp", "2"])
        self.nic_type = "virtio-net-pci"
        self.conn_mode = conn_mode
        self.num_nics = 10
        self.hostname = hostname

Also, self.num_nics is set to 10 explains why max ethernet interfaces works till eth10.. and not beyond that..

@hellt hellt removed the bug Something isn't working label Jan 5, 2025
@hellt
Copy link
Member

hellt commented Jan 18, 2025

Hi @sudurais
I wonder if 4GB RAM is not suffucient for bare config as well? It used to work with 2024.05 when I tried it.

If the mem bump is needed only when some configs is provided to the box, then I can document this behavior and users will have to manually set the mem to higher values using this method - https://containerlab.dev/manual/vrnetlab/#tuning-qemu-parameters

Let me know if this solves it for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants