I'm looking for ways to optimize the build time of our singularity HPC containers. I know that I can save some time by building them layer by layer. But still, there is room for optimization.
What I'm interested in is using/caching whatever makes sense on the host system.
CCache for C++ build artifact caching
git repo cloning
APT package downloads
I did some experiments but haven't suceeded in any point.
What I found so far:
CCache
I install ccache in the container and instruct the build system to use it. I know that because I'm running singularity build with sudo, the cache would be under /root. But after running the build, /root/.ccache is empty. I verified the generated CMake build files, and they definitely use ccache.
I even created a test recipe containing a %post
touch "$HOME/.ccache/test"
but the test file did not appear anywhere on the host system (not in /root and not in my user's home). Does the build step mount a container-backed directory to /root instead of the host's root dir?
Is there something more needed to be done to utilize ccache?
Git
People suggest running e.g. git-cache-http-server (https://stackoverflow.com/a/43643622/1076564) and using git config --global url."http://gitcache:1234/".insteadOf https://.
Since singularity can read parts of the host filesystem, I think there could even be a way to have it working without a proxy program. However, if the host git repos are not inside $HOME or /tmp, how can singularity access them during build? singularity build has no --bind flag to specify additional mount directories. And using the %files section in recipe sounds inefficient - to copy everything each time the build is run.
APT
People suggest to use e.g. squid-deb-proxy (https://gist.github.com/dergachev/8441335). Again, since singularity is able to read host filesystem files, I'd like to just utilize the host's /var/cache/apt. But /var is not mounted to the container by default. So the same question again - how do I mount /var/cache/apt during container build time. And is it a good idea overall? Wouldn't it damage the APT cache of the host, given both host and container are based on the same version of Ubuntu and architecture?
Or does singularity do some clever APT caching itself? I've just noticed it downloaded 420 MB of packages in 25 seconds, which is possible on my connection, but not very probable given the standard speed of ubuntu mirrors.
Edit: I've created an issue on singularity repo: https://github.com/hpcng/singularity/issues/5352 .
As far as I know, there is no mechanism of caching the singularity build when building from a definition file. You can cache the download of the base image, but that's it.
There is a GitHub issue about this, where one of the main developers of Singularity gave the following reply:
You can build a Singularity container from an existing container on disk. So you could build your base container and save it and then modify the def file to build from the existing container to save time while you prototype.
But since Singularity does not create layers there is really no way to implement this as Docker does.
One point about your question:
I know that I can save some time by building them layer by layer
Singularity does not have a concept of layers, so this does not apply here. Docker uses layers, and those are cached.
The workflow I typically follow when building Singularity images is to first create a Docker image from a Dockerfile and then convert that to a Singularity image. The Docker build step has caching, so that might be useful to you.
# Build Docker image
docker build --tag my_image:latest .
# Convert to Singularity format
sudo singularity build my_image.sif docker-daemon://my_image:latest
This sounds like unnecessary optimization. As mentioned, you can build from a Docker image which can take advantage of some layer caching. If you plan on a lot of iteration, you can either do that to a base docker container or create the singularity image as a sandbox and write it out to a read-only SIF once it is working as you like it. If you are making frequent code changes, you can mount the source in when running the image until it is finalized.
Singularity does some caching on the host OS, by default to $HOME/.singularity/cache (generally in /root since most of the time it's a sudo singularity build ...). You can see more detail using singularity --verbose or singularity --debug. I believe this is mostly for caching images / layers from other formats, but I've not looked too in depth at it.
Building does not mount the host filesystem and is unable to be made to do so, to the best of my knowledge. This is by design for reproducibility. You could copy files (e.g, apt cache) to the image in the %files block, but that seems very hackish and ultimately questionable that it would be any faster while opening the possibility for some strange bugs.
The %post steps are built in isolation within the container and nothing is mounted in, so again it won't be able to take advantage of any caching on the host OS.
It shows there is a way to utilize some caches on the host. As stated by one of the singularity developers, host's /tmp is mounted during the %post phase of build. And it is not possible to mount any other directory.
So utilizing the host's caches is all about making the data accessible from /tmp.
CCache
Before running the build command, mount the ccache directory into /tmp:
sudo mkdir /tmp/ccache
sudo mount --bind /root/.ccache /tmp/ccache
Then add the following line to your recipe's %post and you're done:
export CCACHE_DIR=/tmp/ccache
I'm not sure how sharing the cache with your user and not root would work, but I assume the documentation on sharing caches could help (especially setting umask for ccache).
APT
On the host, bind the apt cache dir:
sudo mkdir /tmp/apt
sudo mount --bind /var/cache/apt /tmp/apt
In your %setup or %post, create container file /etc/apt/apt.conf.d/singularity-cache.conf with the following contents:
Dir{Cache /tmp/apt}
Dir::Cache /tmp/apt;
Git
The git-cache-http-server should work seamlessly - host ports should be accessible during build. I just did not use it in the end as it doesn't support SSH auth. Another way would be to manually clone all repos to /tmp and then clone in the build process with the --reference flag which should speed up the clone.
Related
I am sorry for taking up your time.
I have a local docker setup and I want to copy files from my local host to my container.
But the thing is that I need a command that I can use WHILE i am inside the container.
To explain the situation further: I executed "docker exec -it CONTAINERNAME bash" to enter my container,
and now I am on /var/www/html
and I need to find a way to copy a file/folder from my local environment into that container.
Reason: I am currently writing a dockerfile which automates the process of setting things up. I need that very specific command because a Dockerfile RUN-command can only be executed while inside the container.
What I tried:
"docker cp" is a good command to use when I am outside the container but it doesn't work while in the container.
"DOCKERFILE COPY" might do the trick but I need a general shell command to double check if it really does what it is supposed to do. I must be able to reproduce the same process of my Dockerfile via manually executing the commands one by one.
Once again, I apologize for my inability to solve this problem by myself. My inexperience has caused me nothing but trouble.
Edit: I am using a Win10 64bit OS with dual monitor setup and a lefthanded mouse. My keyboard, albeit old, should possess all the necessary keys to replicate any essential keyboard-shortcuts if required. All my drivers are installed and updated.
When you build an image you need to put there everything you need for a normal work of your container. You shouldn't copy files from the host once your image is built. You might use volumes as a common storage for both the host and the container but I don't think this is your case.
Until this is not totally clear what you do I'd suggest to prepare all the data you need and put it within docker context. Then build an image. You also may find docker-compose useful as, at least, it helps separately define the context and the path to your dockerfile if needed.
Is it somehow possible to build images without having docker installed. On maven build of my project I'd like to produce docker image, but I don't want to force others to install docker on their machines.
I can think of some virtual box image with docker installed, but it is kind of heavy solution. Is there some way to build the image with some maven plugin only, some Go code or already prepared virtual box image for exactly this purpose?
It boils down to question how to use docker without forcing users to install anything. Either just for build or even for running docker images.
UPDATE
There are some, not really up to date, maven plugins for virtual machine provisioning with vagrant or with vbox. I have found article about building docker images without docker on basel
So far I see two options either I can somehow build the images only or run some VM with docker daemon inside(which can be used not only for builds, but even for integration tests)
We can create Docker image without Docker being installed.
Jib Maven and Gradle Plugins
Google has an open source tool called Jib that is relatively new, but
quite interesting for a number of reasons. Probably the most interesting
thing is that you don’t need docker to run it - it builds the image using
the same standard output as you get from docker build but doesn’t use
docker unless you ask it to - so it works in environments where docker is
not installed (not uncommon in build servers). You also don’t need a
Dockerfile (it would be ignored anyway), or anything in your pom.xml to
get an image built in Maven (Gradle would require you to at least install
the plugin in build.gradle).
Another interesting feature of Jib is that it is opinionated about
layers, and it optimizes them in a slightly different way than the multi-
layer Dockerfile created above. Just like in the fat jar, Jib separates
local application resources from dependencies, but it goes a step further
and also puts snapshot dependencies into a separate layer, since they are
more likely to change. There are configuration options for customizing the
layout further.
Pls refer this link https://cloud.google.com/blog/products/gcp/introducing-jib-build-java-docker-images-better
For example with Spring Boot refer https://spring.io/blog/2018/11/08/spring-boot-in-a-container
Have a look at the following tools:
Fabric8-maven-plugin - http://maven.fabric8.io/ - good maven integration, uses a remote docker (openshift) cluster for the builds.
Buildah - https://github.com/containers/buildah - builds without a docker daemon but does have other pre-requisites.
Fabric8-maven-plugin
The fabric8-maven-plugin brings your Java applications on to Kubernetes and OpenShift. It provides a tight integration into Maven and benefits from the build configuration already provided. This plugin focus on two tasks: Building Docker images and creating Kubernetes and OpenShift resource descriptors.
fabric8-maven-plugin seems particularly appropriate if you have a Kubernetes / Openshift cluster available. It uses the Openshift APIs to build and optionally deploy an image directly to your cluster.
I was able to build and deploy their zero-config spring-boot example extremely quickly, no Dockerfile necessary, just write your application code and it takes care of all the boilerplate.
Assuming you have the basic setup to connect to OpenShift from your desktop already, it will package up the project .jar in a container and start it on Openshift. The minimum maven configuration is to add the plugin to your pom.xml build/plugins section:
<plugin>
<groupId>io.fabric8</groupId>
<artifactId>fabric8-maven-plugin</artifactId>
<version>3.5.41</version>
</plugin>
then build+deploy using
$ mvn fabric8:deploy
If you require more control and prefer to manage your own Dockerfile, it can handle this too, this is shown in samples/secret-config.
Buildah
Buildah is a tool that facilitates building Open Container Initiative (OCI) container images. The package provides a command line tool that can be used to:
create a working container, either from scratch or using an image as a starting point
create an image, either from a working container or via the instructions in a Dockerfile
images can be built in either the OCI image format or the traditional upstream docker image format
mount a working container's root filesystem for manipulation
unmount a working container's root filesystem
use the updated contents of a container's root filesystem as a filesystem layer to create a new image
delete a working container or an image
rename a local container
I don't want to force others to install docker on their machines.
If by "without Docker installed" you mean without having to install Docker locally on every machine running the build, you can leverage the Docker Engine API which allow you to call a Docker Daemon from a distant host.
The Docker Engine API is a RESTful API accessed by an HTTP client such
as wget or curl, or the HTTP library which is part of most modern
programming languages.
For example, the Fabric8 Docker Maven Plugin does just that using the DOCKER_HOST parameter. You'll need a recent Docker version and you'll have to configure at least one Docker Daemon properly so it can securely accept remote requests (there are lot of resources on this subject, such as the official doc, here or here). From then on, your Docker build can be done remotely without having to install Docker locally.
Google has released Kaniko for this purpose. It should be run as a container, whether in Kubernetes, Docker or gVisor.
I was running into the same problems, and I did not find any solution, thus i developed odagrun, it's a runner for Gitlab with integrated registry api, update DockerHub, Microbadger etc.
OpenSource and has a MIT license.
Ideal to create a docker image on the fly, without the need of a docker daemon nor the need of a root account, or any image at all (image: scratch will do), currrently still in development, but i use it every day.
Requirements
project repository on Gitlab
an openshift cluster (an openshift-online-starter will do for most medium/small
extract how the docker image for this project was created:
# create and push image to ImageStream:
build_rootfs:
image: centos
stage: build-image
dependencies:
- build
before_script:
- mkdir -pv rootfs
- cp -v output/oc-* rootfs/
- mkdir -pv rootfs/etc/pki/tls/certs
- mkdir -pv rootfs/bin-runner
- cp -v /etc/pki/tls/certs/ca-bundle.crt rootfs/etc/pki/tls/certs/ca-bundle.crt
- chmod -Rv 777 rootfs
tags:
- oc-runner-shared
script:
- registry_push --rootfs --name=test-$CI_PIPELINE_ID --ISR --config
I have a Python script with a Windows .exe dependency, which in return relies on a (closed-source) Windows DLL. The Python script runs just fine in Ubuntu via a call to Wine.
Is it possible (and practical) to run this on AWS Lambda?
What would be involved in preparing the code package?
Update: the lambda container image feature supports images up to 10gb. I haven't tried it but I think that would be a viable approach, and wouldn't require the hacks I did below to reduce the wine build size.
TL;DR;
Is it Possible? Yes.
Is it practical? The approach I tried is not. A better approach might be to try and put wine into different lambda layers or a custom execution environment.
Will it work for you? It depends, deployment package size and disk space are the limiting factors.
Old, somewhat hacky method to fit wine into the regular lambda environment:
I compiled a custom wine with minimal dependencies for lambda, compressed it and then put it onto S3.
Then, in the lambda at runtime, I downloaded the archive, extracted it to /tmp and ran it with a custom empty wine prefix.
My test windows executable was 64bit curl.exe.
1. Compile Wine for Lambda
From https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html, I first tried amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2, but it had an older compilation environment and wouldn't configure.
With AMI amzn2-ami-hvm-2.0.20190313-x86_64-gp2 on a t3.2xlarge ec2, I was able to configure and compile. These are the commands I used, references aws-compile and building-wine:
> sudo yum groupinstall "Development Tools"
> mkdir -p ~/wine-dirs/wine-source
> git clone git://source.winehq.org/git/wine.git ~/wine-dirs/wine-source
> cd ~/wine-dirs/wine-source
> ./configure --enable-win64 --without-x --without-freetype --prefix /opt/wine
> make -j8
> sudo mkdir -p /opt/wine
> sudo chown ec2-user.ec2-user /opt/wine/
> make install
> cd /opt/
> tar zcvf ~/wine-64.tar.gz wine/
This was only a 64-bit build. It also had almost no other optional wine dependencies.
2. Reduce the size of the Wine build further
I removed a lot of optional dependencies from the wine build at compilation time, but it was still too big. /tmp is limited to 500MB.
I deleted files in the package subdirectories, including what looked like optional libs, until I got it down to around 300MB uncompressed.
I verified that wine would still run curl.exe after deleting files from the build.
3. Compress it
I created a tar.bz2 of wine and curl with default bz2 options, it ended up around 80MB. The compressed and extracted files together required about 390MB.
That way there is enough room to both download the archive and extract it to /tmp inside the lambda.
> du -h .
290M ./wine/lib64/wine
292M ./wine/lib64
276K ./wine/share/wine
8.0K ./wine/share/applications
288K ./wine/share
5.0M ./wine/curl-7.66.0-win64-mingw/bin
5.0M ./wine/curl-7.66.0-win64-mingw
12M ./wine/bin
308M ./wine
390M .
> ls
wine wine.tar.bz2
4. Upload wine.tar.bz2 to S3
Create an S3 bucket and upload the wine.tar.bz2 file to it.
5. Create the Lambda
Create an AWS Lambda using the python 3.7 runtime. While this uses a different underlying AMI than what wine was built on above, it still worked.
In the lambda execution role, grant access to the S3 bucket.
RAM: 1024MB. I chose this because lambda CPU power scales with the memory.
Timeout: 1 min
6. Lambda code:
I needed to follow the advice from this question and answer to change the wine prefix inside the lambda. I also turned off the display as it suggested.
e.g.:
handler():
... download from S3 to /tmp, cd to /tmp
subprocess.call(["tar", "-jxvf", "./wine.tar.bz2"])
os.environ['DISPLAY'] = ''
os.environ['WINEARCH'] = 'win64'
os.environ['WINEPREFIX'] = '/tmp/wineprefix'
subprocess.call(["./wine/bin/wine64", "./wine/curl-7.66.0-win64-mingw/bin/curl.exe", "http://www.stackoverflow.com"])
Success!
Windows 10. I have in folder just:
app (directory with many files)
Dockerfile (simpliest docker file)
I run "docker build ." and it just hangs.
If I remove "app" directory. Build runs ok.
In docker file just one line:
FROM node
Didn't find any issues like that. It fills like it tries to scan the directory or something.
Any advice?
UPD: It seems that I should use .dockerignore https://docs.docker.com/engine/reference/builder/#/dockerignore-file
When you run docker build ... the Docker client sends the context (recursive contents of the directory) via REST to the Docker daemon for building. If that context is large, this could take some time (depending on a variety of factors, if your daemon is local / remote, platform maybe, etc...).
How long are you giving it to hang before giving up? Could be that it's still just working? Or could be that the context was so large maybe the client / daemon experienced an issue. Checking the (client / daemon) logs would help debug that.
And yes, a .dockerignore file (basically a .gitignore but for Docker context) is probably what you're looking for, unless you need the contents of the app directory during your build.
Your Dockerfile should be put in the directory that only includes it's build context. For example, if you are building a spring-boot app, you can put the Dockerfile right under /app, as shown in this official docker sample.
Docker's documentation:
In most cases, it’s best to start with an empty directory as context and keep your Dockerfile in that directory. Add only the files needed for building the Dockerfile.
Warning: Do not use your root directory, /, as the PATH as it causes the build to transfer the entire contents of your hard drive to the Docker daemon.
I've seen that simple docker examples put dockerfile in the root directory, but for complicated examples like the one I posted above, the dockerfile is put only in it's relevant directory. You can dig through the dockersamples repository and find your case.
I'm trying to create my own docker image in a ubuntu-14 system.
My docker file is like the following:
FROM scratch
RUN /bin/bash -c 'echo "hello"'
I got the error message when I run docker build .:
exec: "/bin/sh": stat /bin/sh: no such file or directory
I guess it is because /bin/sh doesn't exist in the base image "scratch". How should I solve this problem?
Docker is basically a containerising tool that helps to build systems and bring them up and running in a flash without a lot of resource utilisation as compared to Virtual Machines.
A Docker container is basically a layered container. In case you happen to read a Dockerfile, each and every command in that file will lead to a creation of a new layer in container and the final layer is what your container actually is after all the commands in the Dockerfile has been executed.
The images available on the Dockerhub are specially optimised for this sort of environment and are very easy to setup and build. In case you are building a container right from scratch i.e. without any base image, then what you basically have is an empty container. An empty container does not understand what /bin/bash actually is and hence it won't work for you.
The Docker container does not use any specifics from your underlying OS. Multiple docker containers will make use of the same underlying kernel in an effective manner. That's it. Nothing else.
( There is however a concept of volumes wherein the container shares a specific volume on the local underlying system )
So in case you want to use /bin/bash, you need a base image which will setup the nitty gritties of this command for your container and then you can successfully execute it.
However, it is recommended that you use official Docker images for say Ubuntu and then install your custom stuff on top of it. The official images are right from the makers and are highly optimised for this environment.
Base image scratch does not use /bin/bash. So you should change to:
FROM ubuntu:14.04
RUN /bin/sh -c 'echo "hello"'