Nvidia-Docker2 won't install in Cloudformation UserData bash script - bash

I have a cloudformation template that I have created in hopes to spin up an ec2 instance with the necessary dependencies (where these dependencies are installed as bash in UserData) to leverage GPU hardware within a docker container. The main dependencies are: 1) nvidia drivers, 2) docker, and 3) nvidia-docker2.
The first two dependencies install as expected and after several moments of running can be verified by 1) nvidia-smi, and docker --version. The third dependency however consistently does not install.
For reference here are the relevant parts of my UserData bash:
# install gpu stuff
apt-get install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
apt-get update
apt-get -y install cuda-drivers
# install docker on system
curl https://get.docker.com | sh
systemctl start docker && systemctl enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get -y install nvidia-docker2 > /var/log/mason
# add nvidia runtime stuff
# echo "{ \"runtimes\": { \"nvidia\": { \"path\": \"/usr/bin/nvidia-container-runtime\", \"runtimeArgs\": [] } } }" >> /etc/docker/daemon.json
systemctl restart docker
I have tried to pipe the stdout from apt-get -y install nvidia-docker2 to a log file but the logs only show:
Reading package lists...
Building dependency tree...
Reading state information...
and seems to be stuck there.
Other potential helpful bits:
AMI: ubuntu 18.04 image
I will also note that I am able to SSH into the instance and install the apt-get -y install nvidia-docker2 in the command terminal without a hitch (or any user prompt or anything).
Can anyone help me figure out how to trouble shoot this issue or does anyone see any potential problems in what I have shared above? The stdout pipe to file is about the only trick I know to debug such an issue as this. Please let me know if I can update/edit this post to make this issue easier to debug.

Based on the comments.
The issue was caused by not updating ubuntu's repositories after adding nvidia-docker2 repo.
The solution was to run apt-get update after the addition of the repo.

replace:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
with:
distribution = ubuntu18.04

Related

Apt upgrade on WSL is super slow / unusable

I was trying to setup a build environment on WSL. After starting it up and running sudo apt update -y && sudo apt upgrade -y It started doing its thing. But then got super slow (20kb/s) So i deleted the whole WSL and redownloaded it... Same issue. I tried disabling IPV6 in my sysctl that also didnt work.
Any ideas?
Check the local closet mirror for you from here and update sources.list file
sudo sed -i "s/archive.ubuntu.com/us.archive.ubuntu.com/" /etc/apt/sources.list
Found this at source

Docker build fails to fetch packages from archive.ubuntu.com inside bash script used in Dockerfile

Trying to build a docker image with the execution of a pre-requisites installation script inside the Dockerfile fails for fetching packages via apt-get from archive.ubuntu.com.
Using the apt-get command inside the Dockerfile works flawless, despite being behind a corporate proxy, which is setup via the ENV command in the Dockerfile.
Anyway, executing the apt-get command from a bash-script in a terminal inside the resulting docker container or as "postCreateCommand" in a devcontainer.json of Visual Studio Code does work as expected too. But it won't work in my case for the invocation of a bash script from inside a Dockerfile.
It simply will tell:
Starting installation of package iproute2
Reading package lists...
Building dependency tree...
The following additional packages will be installed:
libatm1 libcap2 libcap2-bin libmnl0 libpam-cap libxtables12
Suggested packages:
iproute2-doc
The following NEW packages will be installed:
iproute2 libatm1 libcap2 libcap2-bin libmnl0 libpam-cap libxtables12
0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded.
Need to get 971 kB of archives.
After this operation, 3,287 kB of additional disk space will be used.
Err:1 http://archive.ubuntu.com/ubuntu focal/main amd64 libcap2 amd64 1:2.32-1
Could not resolve 'archive.ubuntu.com'
... more output ...
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libc/libcap2/libcap2_2.32-1_amd64.deb Could not resolve 'archive.ubuntu.com'
... more output ...
Just for example a snippet of the Dockerfile looks like this:
FROM ubuntu:20.04 as builderImage
USER root
ARG HTTP_PROXY_HOST_IP='http://172.17.0.1'
ARG HTTP_PROXY_HOST_PORT='3128'
ARG HTTP_PROXY_HOST_ADDR=$HTTP_PROXY_HOST_IP':'$HTTP_PROXY_HOST_PORT
ENV http_proxy=$HTTP_PROXY_HOST_ADDR
ENV https_proxy=$http_proxy
ENV HTTP_PROXY=$http_proxy
ENV HTTPS_PROXY=$http_proxy
ENV ftp_proxy=$http_proxy
ENV FTP_PROXY=$http_proxy
# it is always helpful sorting packages alpha-numerically to keep the overview ;)
RUN apt-get update && \
apt-get -y upgrade && \
apt-get -y install --no-install-recommends apt-utils dialog 2>&1 \
&& \
apt-get -y install \
default-jdk \
git \
python3 python3-pip
SHELL ["/bin/bash", "-c"]
ADD ./env-setup.sh .
RUN chmod +x env-setup.sh && ./env-setup.sh
CMD ["bash"]
The minimal version of the environment script env-setup.sh, which is supposed to be invoked by the Dockerfile, would look like this:
#!/bin/bash
packageCommand="apt-get";
sudo $packageCommand update;
packageInstallCommand="$packageCommand install";
package="iproute2"
packageInstallCommand+=" -y";
sudo $packageInstallCommand $package;
Of course the usage of variables is down to making use of a list for the packages to be installed and other aspects.
Hopefully that has covered everything essential to the question:
Why is the execution of apt-get working with a RUN and as well running the bash script inside the container after creating, but not from the very same bash script while building the image from a Dockerfile?
I was hoping to find the answer with the help of an extensive web-search, but unfortunately I was only able to find anything but an answer to this case.
As pointed out in the comment section underneath the question:
using sudo to launch the command, wiping out all the current vars set in the current environment, more specifically your proxy settings
So that is the case.
The solution is either to remove sudo from the bash script and invoke the script as root inside the Dockerfile.
Or, using sudo will work with ENV variables, just apply sudo -E.

How to rebuild all Debian packages of a system with specific flag?

I would like to rebuild/recompile all Debian packages of a machine with specific flags.
How can I do that with less command as possible?
I have found that https://debian-administration.org/article/20/Rebuilding_Debian_packages but it does not explain how to do that for all the packages installed on a system.
You can write a script that does something like this:
for each $pkg in dpkg-query -W -f '${status} ${package}\n' | sed -n 's/^install ok installed //p':
run apt-get source $pkg
run apt-get build-dep $pkg
cd $pkg-version/
run DEB_CPPFLAGS_SET="-I/foo/bar/baz" DEB_CFLAGS_SET="-g -O3" DEB_LDFLAGS_SET="-L/fruzzel/frazzel/" dpkg-buildpackage
install package with dpkg -i deb-file
cd ..
This will go through all of your installed packages and generate .deb files for each of them. Probably there are some edge cases etc. that will have to be handled. You could also leave out packages that are not built from C code etc.
Info taken from these questions:
https://unix.stackexchange.com/questions/184812/how-to-update-all-debian-packages-from-source-code
How to override dpkg-buildflags CFLAGS?
Try this approach:
dpkg --get-selections > selections
sudo dpkg --clear-selections
sudo dpkg --set-selections < selections
sudo apt-get --reinstall dselect-upgrade
Source:
https://www.linuxquestions.org/questions/linux-software-2/force-apt-get-to-redownload-and-reinstall-dependencies-as-well-873038/

How to modify jenkins dashboard?

I have installed jenkins with the following steps.
wget -q -O - https://pkg.jenkins.io/debian/jenkins-ci.org.key | sudo apt-key add -
echo deb http://pkg.jenkins.io/debian-stable binary/ | sudo tee /etc/apt/sources.list.d/jenkins.list
sudo apt-get install jenkins
Now i would like to make small change like inserting one word to the front-end of the jenkins UI(dashboard)
Can anyone help me how to proceed?
Thanks in advance.

Cannot (apt-get) install packages inside docker

I installed ubuntu 14.04 virtual machine and run docker(1.11.2). I try to build sample image (here).
docker file :
FROM java:8
# Install maven
RUN apt-get update
RUN apt-get install -y maven
....
I get following error:
Step 3: RUN apt-get update
--> Using cache
--->64345sdd332
Step 4: RUN apt-get install -y maven
---> Running in a6c1d5d54b7a
Reading package lists...
Reading dependency tree...
Reading state information...
E: Unable to locate package maven
INFO[0029] The command [/bin/sh -c apt-get install -y maven] returned a non-zero code:100
following solutions I have tried, but no success.
restarted docker here
run as apt-get -qq -y install curl here :same error :(
how can i view detailed error message ?
a
any way to fix the issue?
you may need to update os inside docker before
try to run apt-get update first, then apt-get install xxx
The cached result of the apt-get update may be very stale. Redesign the package pull according to the Docker best practices:
FROM java:8
# Install maven
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y maven \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Based on similar issues I had, you want to both look at possible network issues and possible image related issues.
Network issues : you are already looking at proxy related stuff. Make sure also the iptables setup done automatically by docker has not been messed up unintentionnaly by yourself or another application. Typically, if another docker container runs with a net=host option, this can cause trouble.
Image issues : The distro you are running on in your container is not Ubuntu 14.04 but the one that java:8 was built from. If you took the java image from official library on docker hub, you have a hierarchy of images coming initially from Debian jessie. You might want to look the different Dockerfile in this hierarchy to find out where the repo setup is not the one you are looking at.
For both situations, to debug this, I recommand you run inside the latest image a shell to look the actual network and repo situation in your image. In your case
docker run -ti --rm 64345sdd332 /bin/bash
gives you a shell just before running your install maven command.
I am currently working behind proxy. it failed to download some dependency. for that you have to mention proxy configuration in docker file. ref
but, now I facing difficulty to run "mvn", "dependency:resolve" due to the proxy, maven itself block to download some dependency and build failed.
thanks buddies for your great support !
Execute 'apt-get update' and 'apt-get install' in a single RUN instruction. This is done to make sure that the latest packages will be installed. If 'apt-get install' were in a separate RUN instruction, then it would reuse a layer added by 'apt-get update', which could have been created a long time ago.
RUN apt-get update && \
apt-get install -y <tool..eg: maven>
Note: RUN instructions build your image by adding layers on top of the initial image.

Resources