Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'--ulimit host' permits more open files in the container #17681

Closed
debarshiray opened this issue Mar 2, 2023 · 3 comments · May be fixed by #24243
Closed

'--ulimit host' permits more open files in the container #17681

debarshiray opened this issue Mar 2, 2023 · 3 comments · May be fixed by #24243
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@debarshiray
Copy link
Member

debarshiray commented Mar 2, 2023

Issue Description

Compare:

[rishi@topinka ~]$ podman run -it --rm --env TERM=$TERM --ulimit host \
    registry.fedoraproject.org/fedora:38 ulimit -n
524288
[rishi@topinka ~]$ ulimit -n
1024

Or:

[rishi@topinka ~]$ podman run -it --rm --env TERM=$TERM --ulimit host \
    registry.fedoraproject.org/fedora:38 ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127760
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 524288
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127760
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

... and:

[rishi@topinka ~]$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127760
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127760
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Is this expected?

Steps to reproduce the issue

Steps to reproduce the issue

  1. see above

Describe the results you received

ulimit -n has a higher value inside the container than on the host.

Describe the results you expected

I expected ulimit -n to have the same value inside the container and on the host.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 99.14
    systemPercent: 0.16
    userPercent: 0.7
  cpus: 16
  distribution:
    distribution: fedora
    variant: workstation
    version: "36"
  eventLogger: journald
  hostname: topinka
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.14-100.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 22417584128
  memTotal: 33553547264
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 4h 26m 32.00s (Approximately 0.17 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/rishi/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 2
    stopped: 5
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/rishi/.local/share/containers/storage
  graphRootAllocated: 1695606808576
  graphRootUsed: 262605983744
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 15
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/rishi/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.1
  Built: 1676629882
  BuiltTime: Fri Feb 17 11:31:22 2023
  GitCommit: ""
  GoVersion: go1.18.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Fedora 36 Workstation

Additional information

No response

@debarshiray debarshiray added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2023
@giuseppe
Copy link
Member

giuseppe commented Mar 2, 2023

this is caused by: https://github.com/containers/podman/blob/main/cmd/podman/early_init_linux.go#L19

we raise the soft limit to not hit the limit in Podman itself.

Is it causing any issue?

@debarshiray
Copy link
Member Author

Thanks for tracking that down, @giuseppe !

No, I am not aware of any problems caused by it at the moment. I noticed it in passing when debugging something else, and decided to file an issue just in case this wasn't intentional.

Feel free to close this issue, if you want to. :)

@giuseppe giuseppe closed this as completed Mar 7, 2023
@debarshiray
Copy link
Member Author

Recently, @bilelmoussaoui ran into problems with the higher number of permitted file descriptors (ie., ulimit -n) inside a rootless Podman container in comparison to the host. He was trying to track down a file descriptor leak that was hitting the limits on the host, but it was getting camouflaged inside the container.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Jun 30, 2023
Note that the soft limit for the number of open file descriptors cannot
be tested at the moment because Podman sets the Toolbx container to have
a value higher than the host's [1].

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jun 30, 2023
Note that the soft limit for the maximum number of open file descriptors
cannot be tested at the moment because Podman sets the Toolbx container
to have a value higher than the host's [1].

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 1, 2023
Note that the soft limit for the maximum number of open file descriptors
cannot be tested at the moment because Podman sets the Toolbx container
to have a value higher than the host's [1].

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 1, 2023
Note that the soft limit for the maximum number of open file descriptors
cannot be tested at the moment because Podman sets the Toolbx container
to have a value higher than the host's [1].

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 1, 2023
Podman sets the Toolbx container's soft limit for the maximum number of
open file descriptors to the host's hard limit, which is often greater
than the host's soft limit.

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 1, 2023
Podman sets the Toolbx container's soft limit for the maximum number of
open file descriptors to the host's hard limit, which is often greater
than the host's soft limit [1].

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 4, 2023
The following caveats must be noted:

  * Podman sets the Toolbx container's soft limit for the maximum number
    of open file descriptors to the host's hard limit, which is often
    greater than the host's soft limit [1].

  * The ulimit(1) options -b, -k, P and -T don't work on Fedora 38
    because the corresponding resource arguments for getrlimit(2) are
    absent from the operating system.  These are RLIMIT_SBSIZE,
    RLIMIT_KQUEUES, RLIMIT_NPTS and RLIMIT_PTHREAD respectively.

[1] containers/podman#17681

containers#213
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 4, 2023
The following caveats must be noted:

  * Podman sets the Toolbx container's soft limit for the maximum number
    of open file descriptors to the host's hard limit, which is often
    greater than the host's soft limit [1].

  * The ulimit(1) options -P, -T, -b, and -k don't work on Fedora 38
    because the corresponding resource arguments for getrlimit(2) are
    absent from the operating system.  These are RLIMIT_NPTS,
    RLIMIT_PTHREAD, RLIMIT_SBSIZE and RLIMIT_KQUEUES respectively.

[1] containers/podman#17681

containers#213
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
debarshiray added a commit to debarshiray/podman that referenced this issue Oct 14, 2024
Starting from commit 9126b45 ("Up default Podman rlimits to
avoid max open files"), Podman started bumping its soft limit for the
maximum number of open file descriptors (RLIMIT_NOFILE or ulimit -n) to
permit exposing a large number of ports to a container.  This was later
fine-tuned in commit a2c1a2d ("podman: bump RLIMIT_NOFILE also
without CAP_SYS_RESOURCE").

Unfortunately, this also increases the limits for 'podman exec' sessions
running in containers created with:
  $ podman create --network host --ulimit host ...

This is what Toolbx uses to provide a containerized interactive command
line environment for software development and troubleshooting the host
operating system.

It confuses developers and system administrators debugging a process
that's leaking file descriptors and crashing on the host OS.  The
crashes either don't reproduce inside the container or they take a lot
longer to reproduce, both of which are frustrating.

Therefore, it will be good to retain the limits, at least for this
specific scenario.

It turns out that since this code was written, the Go runtime has had
two interesting changes.

Starting from Go 1.19 [1], the Go runtime bumps the soft limit for
RLIMIT_NOFILE for all Go programs [2].  This means that there's no
longer any need for Podman to bump it's own limits, because it switched
from requiring Go 1.18 to 1.20 in commit 4dd58f2 ("Move golang
requirement from 1.18 to 1.20").  It's probably good to still log the
detected limits, in case Go's behaviour changes.

Not everybody was happy with this [3], because the higher limits got
propagated to child processes spawned by Go programs.  Among other
things, this can break old programs using select(2) [4].  So, Go's
behaviour was fine-tuned to restore the original soft limit for
RLIMIT_NOFILE when forking a child process [5].

With these two changes in Go, which Podman already uses, if the bumping
of RLIMIT_NOFILE is left to the Go runtime, then the limits are no
longer increased for 'podman exec' sessions.  Otherwise, if Podman
continues to bump the soft limit for RLIMIT_NOFILE on its own, then it
prevents the Go runtime from restoring the original limits when forking,
and leads to the higher limits in 'podman exec' sessions.

The existing 'podman run --ulimit host ... ulimit -Hn' test in
test/e2e/run_test.go was extended to also check the soft limit.  The
similar test for 'podman exec' was moved from test/e2e/toolbox_test.go
to test/e2e/exec_test.go for consistency and because there's nothing
Toolbx specific about it.  The test was similarly extended, and updated
to be more idiomatic.

Due to the behaviour of the Go runtime noted above, and since the tests
are written in Go, the current or soft limit for RLIMIT_NOFILE returned
by syscall.Getrlimit() is the same as the hard limit.

The Alpine Linux image doesn't have a standalone binary for 'ulimit' and
it's picky about the order in which the options are listed.  The -H or
-S must come first, followed by a space, and then the -n.

[1] https://go.dev/doc/go1.19#runtime

[2] Go commit 8427429c592588af ("os: raise open file rlimit at startup")
    golang/go@8427429c592588af
    golang/go#46279

[3] containerd/containerd#8249

[4] http://0pointer.net/blog/file-descriptor-limits.html

[5] Go commit f5eef58e4381259c ("syscall: restore original NOFILE ...")
    golang/go@f5eef58e4381259c
    golang/go#46279

Fixes: containers#17681

Signed-off-by: Debarshi Ray <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants