unshare/isolate mount namespace - linux-kernel

I'm trying to set up a linux container with isolated mount namespace using unshare tool from util-linux package :
% sudo unshare -m -f /bin/bash
So I'm expecting that bash will be launched in a namespace, where the mount namespace, i.e. filesystems, will be completely isolated form the host one, however I still can modify the host FS (create/delete files on the host FS). What am I doing wrong here?

A mount namespace only creates a separate mount tree by copying the parent tree.
You still have to remount the file systems as read-only, unmount them, mount a tmpfs over them or pivot_root into a clean tree to prevent access. Switching to an umapped user via user namespaces can help to some extent but it won't prevent access to world-readable/writable files.
If you need to setup more complex namespace environments - containers basically - you can use firejail or runc to automate those tasks based on configuration files. systemd-nspawn provides some intermediate featureset between accessing the primitives directly, as unshare does, and container runtimes.

I assume that mount namespace is isolated because mount/unmount in the namespace does not have impact on the host FS. So I think modifying FS is another issue, probably related to userns, but not fully sure about this.

Related

How to utilize host caches in a singularity build?

I'm looking for ways to optimize the build time of our singularity HPC containers. I know that I can save some time by building them layer by layer. But still, there is room for optimization.
What I'm interested in is using/caching whatever makes sense on the host system.
CCache for C++ build artifact caching
git repo cloning
APT package downloads
I did some experiments but haven't suceeded in any point.
What I found so far:
CCache
I install ccache in the container and instruct the build system to use it. I know that because I'm running singularity build with sudo, the cache would be under /root. But after running the build, /root/.ccache is empty. I verified the generated CMake build files, and they definitely use ccache.
I even created a test recipe containing a %post
touch "$HOME/.ccache/test"
but the test file did not appear anywhere on the host system (not in /root and not in my user's home). Does the build step mount a container-backed directory to /root instead of the host's root dir?
Is there something more needed to be done to utilize ccache?
Git
People suggest running e.g. git-cache-http-server (https://stackoverflow.com/a/43643622/1076564) and using git config --global url."http://gitcache:1234/".insteadOf https://.
Since singularity can read parts of the host filesystem, I think there could even be a way to have it working without a proxy program. However, if the host git repos are not inside $HOME or /tmp, how can singularity access them during build? singularity build has no --bind flag to specify additional mount directories. And using the %files section in recipe sounds inefficient - to copy everything each time the build is run.
APT
People suggest to use e.g. squid-deb-proxy (https://gist.github.com/dergachev/8441335). Again, since singularity is able to read host filesystem files, I'd like to just utilize the host's /var/cache/apt. But /var is not mounted to the container by default. So the same question again - how do I mount /var/cache/apt during container build time. And is it a good idea overall? Wouldn't it damage the APT cache of the host, given both host and container are based on the same version of Ubuntu and architecture?
Or does singularity do some clever APT caching itself? I've just noticed it downloaded 420 MB of packages in 25 seconds, which is possible on my connection, but not very probable given the standard speed of ubuntu mirrors.
Edit: I've created an issue on singularity repo: https://github.com/hpcng/singularity/issues/5352 .
As far as I know, there is no mechanism of caching the singularity build when building from a definition file. You can cache the download of the base image, but that's it.
There is a GitHub issue about this, where one of the main developers of Singularity gave the following reply:
You can build a Singularity container from an existing container on disk. So you could build your base container and save it and then modify the def file to build from the existing container to save time while you prototype.
But since Singularity does not create layers there is really no way to implement this as Docker does.
One point about your question:
I know that I can save some time by building them layer by layer
Singularity does not have a concept of layers, so this does not apply here. Docker uses layers, and those are cached.
The workflow I typically follow when building Singularity images is to first create a Docker image from a Dockerfile and then convert that to a Singularity image. The Docker build step has caching, so that might be useful to you.
# Build Docker image
docker build --tag my_image:latest .
# Convert to Singularity format
sudo singularity build my_image.sif docker-daemon://my_image:latest
This sounds like unnecessary optimization. As mentioned, you can build from a Docker image which can take advantage of some layer caching. If you plan on a lot of iteration, you can either do that to a base docker container or create the singularity image as a sandbox and write it out to a read-only SIF once it is working as you like it. If you are making frequent code changes, you can mount the source in when running the image until it is finalized.
Singularity does some caching on the host OS, by default to $HOME/.singularity/cache (generally in /root since most of the time it's a sudo singularity build ...). You can see more detail using singularity --verbose or singularity --debug. I believe this is mostly for caching images / layers from other formats, but I've not looked too in depth at it.
Building does not mount the host filesystem and is unable to be made to do so, to the best of my knowledge. This is by design for reproducibility. You could copy files (e.g, apt cache) to the image in the %files block, but that seems very hackish and ultimately questionable that it would be any faster while opening the possibility for some strange bugs.
The %post steps are built in isolation within the container and nothing is mounted in, so again it won't be able to take advantage of any caching on the host OS.
It shows there is a way to utilize some caches on the host. As stated by one of the singularity developers, host's /tmp is mounted during the %post phase of build. And it is not possible to mount any other directory.
So utilizing the host's caches is all about making the data accessible from /tmp.
CCache
Before running the build command, mount the ccache directory into /tmp:
sudo mkdir /tmp/ccache
sudo mount --bind /root/.ccache /tmp/ccache
Then add the following line to your recipe's %post and you're done:
export CCACHE_DIR=/tmp/ccache
I'm not sure how sharing the cache with your user and not root would work, but I assume the documentation on sharing caches could help (especially setting umask for ccache).
APT
On the host, bind the apt cache dir:
sudo mkdir /tmp/apt
sudo mount --bind /var/cache/apt /tmp/apt
In your %setup or %post, create container file /etc/apt/apt.conf.d/singularity-cache.conf with the following contents:
Dir{Cache /tmp/apt}
Dir::Cache /tmp/apt;
Git
The git-cache-http-server should work seamlessly - host ports should be accessible during build. I just did not use it in the end as it doesn't support SSH auth. Another way would be to manually clone all repos to /tmp and then clone in the build process with the --reference flag which should speed up the clone.

Mount container volume on (windows) host without copying

I have a container that I start like
docker run -it --mount type=bind,source=/path/to/my/data,target=/staging -v myvol:/myvol buildandoid bash -l
It has two mounts, one bind mount that I use to get data into the container, and one named volume that I use to persist data. The container is used as a reproducable android (AOSP) build environment, so not your typical web service.
I would like to access the files on myvol from the Windows host. If I use an absolute path for the mount, e.g. -v /c/some/path:/myvol, I can do that, but I believe docker creates copies of all the files and keeps them in sync. I really want to avoid creating these files on the windows side (for space reasons, as it is several GB, and performance reasons, since NTFS doesn't seem to handle many little files well).
Can I somehow "mount" a container directory or a named volume on the host? So the exact reverse of a bind mount. I think alternatively I could install samba or sshd in the container, and use that, but maybe there is something built into docker / VirtualBox to achive this.
Use bind mounts.
https://docs.docker.com/engine/admin/volumes/bind-mounts/
By contrast, when you use a volume, a new directory is created within Docker’s storage directory on the host machine, and Docker manages that directory’s contents.

Mount docker host volume but overwrite with container's contents

Several articles have been extremely helpful in understanding Docker's volume and data management. These two in particular are excellent:
http://container-solutions.com/understanding-volumes-docker/
http://www.alexecollins.com/docker-persistence/
However, I am not sure if what I am looking for is discussed. Here is my understanding:
When running docker run -v /host/something:/container/something the host files will overlay (but not overwrite) the container files at the specified location. The container will no longer have access to the location's previous files, but instead only have access to the host files at that location.
When defining a VOLUME in a Dockerfile, other containers may share the contents created by the image/container.
The host may also view/modify a Dockerfile volume, but only after discovering the true mountpoint using docker inspect. (usually somewhere like /var/lib/docker/vfs/dir/cde167197ccc3e138a14f1a4f7c....). However, this is hairy when Docker has to run inside a Virtualbox VM.
How can I reverse the overlay so that when mounting a volume, the container files take precedence over my host files?
I want to specify a mountpoint where I can easily access the container filesystem. I understand I can use a data container for this, or I can use docker inspect to find the mountpoint, but neither solution is a good solution in this case.
The docker 1.10+ way of sharing files would be through a volume, as in docker volume create.
That means that you can use a data volume directly (you don't need a container dedicated to a data volume).
That way, you can share and mount that volume in a container which will then keep its content in said volume.
That is more in line with how a container is working: isolating memory, cpu and filesystem from the host: that is why you cannot "mount a volume and have the container's files take precedence over the host file": that would break that container isolation and expose to the host its content.
Begin your container's script with copying files from a read-only mount bind reflecting the host files to a work location in the container. End the script with copying necessary results from the container's work location back to the host using either the same or different mount point.
Alternatively to the end-of-the script command, run the container without automatically removing it at the end, then run docker cp CONTAINER_NAME:CONTAINER_DIR HOST_DIR, then docker rm CONTAINER_NAME.
Alternatively to copying results back to the host, keep them in a separate "named" volume, provided that the container had it mounted (type=volume,src=datavol,dst=CONTAINER_DIR/work). Use the named volume with other docker run commands to retrieve or use the results.
The input files may be modified in the host during development between the repeated runs of the container. Avoid shadowing them with the frozen files in the named volume. Beginning the container script with copying the input files from the host may help.
Using a named volume helps running the container read-only. (One may still need --tmpfs /tmp for temporary files or --tmpfs /tmp:exec if some container commands create and run executable code in the temporary location).

Get actual moint point of network volume in osx cli

In an automated system, i copy files to a mounted network volume with a sh
In basic i do "cp file.pdf /Volumes/NetworkShare/".
This works well until the remote system is down.
So before copying i can do a ping to detect if it's online.
But... when i get online OSX often remounts on a different path "/Volumes/NetworkShare-1/".
The old path "/Volumes/NetworkShare/" stil exists altough it's useless.
So, how can i find the actual mount point of this share in OSX cli?
I found out that diskutil does something like this for local disks, not for network volumes. Is there an equivalent for diskutil for network volumes?
The mount command (just on its own) will list all mounted filesystems. As for why OS X is creating that extra directory, that is pretty odd. Did you manually mount the filesystem, by any chance? If you created the “NetworkShare” directory yourself, OS X’s auto mounter might do what you’re suggesting.

Vagrant file structure and web root

I've read the docs and a few things still confuse me, mostly related to sync folders and database data.
I want to use the following folder structure on my host machine
ROOT
|- workFolder
||- project1
|||- project1DatabaseAndFiles
|||- project1WebRoot
||- project2
|||- project2DatabaseAndFiles
|||- project2WebRoot
||- project3
|||- project3DatabaseAndFiles
|||- project3WebRoot
And then create VM's where each VM host webroot points to the appropriate projectX/projectXWebRoot folder.
From what I've read, I can only specify one remote Sync DIR. (http://docs.vagrantup.com/v2/synced-folders/). But if I create a new VM I want to specify the project name too, thereby selecting the correct host folder.
Is what I'm describing possible using Vagrant?
If I wanted another developer to use this environment, I'd like for them to have instant access to the database structure/setup etc without having to import any SQL files. Is this possible?
I'm hoping I'm just not understanding Vagrants purpose, but this seems like a good use of shared VM's to me. Any pointers or articles that might help would be very welcome.
From what I've read, I can only specify one remote Sync DIR.
No that is not true. You can always add more shared folders. From the
manual:
This directive is used to configure shared folders on the virtual machine and may be used multiple times in a Vagrantfile.
This means you can define additional shared folders using:
config.vm.share_folder "name", "/path/on/vm", "path/on/host"
If I wanted another developer to use this environment, I'd like for them to have instant access to the database structure/setup etc without having to import any SQL files. Is this possible?
Yes, you can alter the data storage path of, say, MySQL to store it in on a share on the host so that
the data is not lost when the VM is destroyed.
However, this is not as simple as it sounds. If you're using the MySQL cookbook (again, assuming you're using MySQL), you have to modify it so that the shared folder is mounted with the uid and gid of the mysql user or otherwise the user can't write to it. You can mount a share manually like this:
mount -t vboxsf -o uid=`id -u mysql` -o gid=`id -g mysql` sharename /new/data/dir
Also, if you're using Ubuntu or Debian Wheezy, Apparmor needs to be configured differently for MySQL,
as it does not allow writes to the newly configured data directory. This can be done by writing
/new/data/dir r,
/new/data/dir/** rwk,
to /etc/apparmor/apparmor.d/local/usr.sbin.mysqld. This version of the mysql cookbook supports this behaviour already, so you can look up how it does that.

Resources