Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate limited number of layers on aarch64 docker #75

Closed
clalancette opened this issue May 16, 2017 · 26 comments
Closed

Investigate limited number of layers on aarch64 docker #75

clalancette opened this issue May 16, 2017 · 26 comments

Comments

@clalancette
Copy link
Contributor

See #73 for a description of the problem, and a link to problematic builds. We've currently worked around the problem with #74, but reverting that commit should show the problem again for debugging.

@nuclearsandwich
Copy link
Member

This is a sort of time bomb / hot potato for whichever buildfarmer's touch to the Dockerfile re-triggers this issue.

moby/moby#1171 looks very promising. I might get a chance to circle back to this during my buildfarmercop reign.

@vielmetti
Copy link

moby/moby#27384 is perhaps the root cause of this. Docker 1.12.x on ARM64 is compiled with Go 1.7, and Go has a bug that's tickled by Docker.

The good news is that we're looking at Go 1.9 successfully being available as a binary release for ARM64 from the Go project, and in some reasonable timeframe after that there should be a fresh binary for ARM64 of Docker that can be easily consumed.

In the meantime the best strategy is to reduce the number of layers you use. There are various Docker optimization strategies to pursue that would reduce the size of the Dockerfile and thus eliminate layers and build steps, with a performance improvement. Looking at

https://github.com/ros2/ci/blob/master/linux_docker_resources/Dockerfile

In particular successive RUN commands can usually be merged, e.g. as

RUN foo
RUN bar

is almost always equivalent to your purposes to

RUN foo && bar

and that wipes out a layer. You might look at this (vintage) writeup

https://blog.tutum.co/2014/10/22/how-to-optimize-your-dockerfile/

that's pretty useful. Newest Docker has some more tricks up its sleeve, so don't go completely overboard on optimizing, but there are small changes that should help stability.

@vielmetti
Copy link

My understanding is that this has been resolved by upgrading the version of Docker on the arm64 machines to one that has better file system stability.

@nuclearsandwich
Copy link
Member

My understanding is that this has been resolved by upgrading the version of Docker on the arm64 machines to one that has better file system stability.

We haven't upgraded this Jenkins instance yet. We have what was previously referred to as the ROS 2 buildfarm which lives at http://ci.ros2.org and provides continuous integration and build archiving for ROS 2 omnibus builds, and in the past couple weeks have created a ROS 2 buildfarm at http://build.ros2.org based more closely on the ROS buildfarm (build.ros.org) to build the Xenial debs for the beta2 release of ROS 2.

The Docker deb that I built last weekend has been deployed to the package buildfarm http://build.ros2.org Depending on the CI load today it might be a good day to upgrade docker on the CI farm and see if that does indeed resolve this.

@vielmetti
Copy link

There's an updated version of Docker 1.12.x also available through a PPA that should fix this particular issue. @nuclearsandwich - if you have not yet resolved this issue, I'd like to help you work through testing it.

@nuclearsandwich
Copy link
Member

@vielmetti it's been a while since I've looked at this since I've been pulled away to other matters.

It looks like we're still running 1.12.6 on the aarch64 host. I've got the deb I built for the buildfarm of 17.6-ce which we could also test resolution with.

I think @wjwwood has the buildfarm shift currently but I don't know if anyone has bandwidth to update docker and test without the workaround leading up to beta 3.

@wjwwood
Copy link
Member

wjwwood commented Aug 23, 2017

I think @wjwwood has the buildfarm shift currently but I don't know if anyone has bandwidth to update docker and test without the workaround leading up to beta 3.

I don't at the moment. But if someone can show it fixes the issue I can work on upgrading the machines in the background (less work than testing it out I think).

@clalancette
Copy link
Contributor Author

FWIW, I pushed a branch to https://github.com/ros2/ci called failing-docker that restores the problematic Docker code we had before. I kicked off a build using that branch here: http://ci.ros2.org/job/ci_turtlebot-demo_linux-aarch64/96/console . Early indications is that it does not show the problem we previously had, but I haven't spent time to investigate why.

@mikaelarguedas
Copy link
Member

mikaelarguedas commented Aug 23, 2017

I think that the problem is a matter of number of layers and not about a specific command. Given that we reduced the total number of layers in the Dockerfile, the referenced job will not prove if it's fixed or not (very close though given that it's now 49 layers deep and the job crashes as soon as we reach 50 layers).
I increased the number of layers on that branch and ran ci with it. It fails with the expected error so that branch can be used for testing.

Edit: actually it failed at step 46 and not 50 o_O, but it does exhibit the error message described in the original issue error creating aufs mount to /var/lib/docker/aufs/mnt/2a5e9a9926e8feb37fc35aedba7e40299e347f493a1e73da38255e9cb1376f2c: invalid argument

@clalancette
Copy link
Contributor Author

@mikaelarguedas Ah, I didn't realize that we had reduced the layers. Thanks for the update.

@vielmetti
Copy link

vielmetti commented Aug 23, 2017

The instructions for installing this newly fixed version of Docker 1.12 are as follows:

A build with option #1 suggested in comment #2 is now available in a PPA
(Many thanks to mwhudson!) and is ready for test. To install :

$ sudo add-apt-repository ppa:mwhudson/devirt
$ sudo apt-get update

Then apt-get upgrade or apt-get install docker.io containerd runc should
get the rebuilt versions.

The related bug report for reference is https://bugs.launchpad.net/bugs/1702979

@mikaelarguedas mikaelarguedas changed the title Investigate why Docker on aarch64 doesn't work with certain combinations of RUN and ADD commands Investigate limited number of layers on aarch64 docker Sep 14, 2017
@mikaelarguedas
Copy link
Member

@nuclearsandwich do you think that you will have spare cycles in the near future to try a more recent docker on the packet machines? (I don't remember if a newer kernel was needed as well or if newer docker could be enough)

@vielmetti
Copy link

Two things of note -

The newest Docker installs very easily with the instructions to use get.docker.com or test.docker.com. Uninstall the old docker.io first and the script will grab keys and install the latest docker-ce package on Arm.

Also, it's much easier to get Arm base images these days that do what you want, because there's support for multi-arch images in the main docker library. (ie. you can say "FROM ubuntu" and it will do the right thing).

@nuclearsandwich
Copy link
Member

Also, it's much easier to get Arm base images these days that do what you want, because there's support for multi-arch images in the main docker library. (ie. you can say "FROM ubuntu" and it will do the right thing).

whoa do you have a link for more info on this? cc @tfoote @ruffsl

@nuclearsandwich
Copy link
Member

nuclearsandwich commented Sep 15, 2017

do you think that you will have spare cycles in the near future to try a more recent docker on the packet machines? (I don't remember if a newer kernel was needed as well or if newer docker could be enough)

To answer this question, yes. I can give this a shot Monday or this afternoon. Since we're not wrangling the CI hosts with puppet we can use the script from get.docker.com or the 17.06 deb I built for the buildfarm hosts.

@vielmetti
Copy link

@nuclearsandwich
Copy link
Member

nuclearsandwich commented Sep 15, 2017

Updated the CI host to 17.07 with the get.docker.com script @vielmetti linked.

Running 3 builds
CI Linux ARM64 - Build Status CI Linux ARM64
Build Status CI 🐢🤖
Build Status CI TURTLEBOT (failing docker branch)
Build Status mikael/failing-docker

All three have made it past the Dockerfile building stage so this looks like it does the thing. I think maybe we let the nightlies run over the weekend and if nothing bad happens probably update the other linux hosts in order to keep the same version of docker everywhere.

/cc @sloretz as build farmer.

@mikaelarguedas
Copy link
Member

@nuclearsandwich Can you please retrigger the turtlebot job to use the mikael/failing-docker branch, the failing-docker branch was not failing last time we tried and has not been deleted since. @clalancette FYI

@nuclearsandwich
Copy link
Member

nuclearsandwich commented Sep 15, 2017

Added http://ci.ros2.org/job/ci_turtlebot-demo_linux-aarch64/102/ to the list of builds above.

Edit: er.. http://ci.ros2.org/job/ci_linux-aarch64/548/

@nuclearsandwich
Copy link
Member

and 💥

23:00:01 Step 46/59 : RUN (apt-get update || true) && apt-get install --no-install-recommends -y python3-dev
23:00:02 error creating aufs mount to /var/lib/docker/aufs/mnt/d7c3a82c1f164dbc8143945e6c0cb559932d36e796204081e59b36456e4f65e6: invalid argument

It seems like aufs is still part of the problem as it's still the default driver. Since we're on Ubuntu Xenial is there anything we need to do to try again with the overlay2 storage driver?

@tfoote
Copy link
Contributor

tfoote commented Sep 15, 2017

We've been waiting to upgrade the OS before trying overlayfs2 it looks like it's not too hard to enable: https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#configure-docker-with-the-overlay-or-overlay2-storage-driver maybe the puppet formula had an option too.

@ruffsl
Copy link
Member

ruffsl commented Sep 26, 2017

whoa do you have a link for more info on this? cc @tfoote @ruffsl

I touch on this in my previous docker for arm announcement:
https://discourse.ros.org/t/announcing-ros-docker-images-for-arm-and-debian/2467

Specifically, the related ticket for this new functionality is here:
docker-library/official-images#2289

@nuclearsandwich
Copy link
Member

I touch on this in my previous docker for arm announcement:

Oh nice. I must have missed it with my head down. Sorry for the spurious ping.

@vielmetti
Copy link

If you're looking to test a Docker installation to see if it will crash when the file system gets too deep, I give you

https://gist.github.com/anonymous/bdafb8e961f55b2533fee8fa5221d186

If you are running an unpatched apt-get install docker.io on Ubuntu 16.04, this will fail at about layer 40.

@vielmetti
Copy link

This is still marked as "open", but we should be OK now with anything resembling a modern Docker version.

@nuclearsandwich
Copy link
Member

This is still marked as "open", but we should be OK now with anything resembling a modern Docker version.

Very good point @vielmetti. In fact we just updated to 18.09.5 last week. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants