How to delete the HDFS data in Docker containers - hadoop

I run hadoop cluster in Docker by mount a local folder by -v.
Then I login the hadoop cluster and 'cd' to the mount folder and execute hdfs dfs -put ./data/* input/. It works.
But my problem is that I cannot delete the data that I copied to hdfs. I delete containers by docker rm ,but the data still exist. Now I only can reset Docker and the data can be deleted.
Is there any other solution?
This is my docker info
➜ hadoop docker info
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 22
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.27-moby
Operating System: Alpine Linux v3.4
OSType: linux
Architecture: x86_64
CPUs: 5
Total Memory: 11.71 GiB
Name: moby
ID: NPR6:2ZTU:CREI:BHWE:4TQI:KFAC:TZ4P:S5GM:5XUZ:OKBH:NR5C:NI4T
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 56
Goroutines: 81
System Time: 2016-11-22T08:10:37.120826598Z
EventsListeners: 2
Username: chaaaa
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
Insecure Registries:
127.0.0.0/8

This is an issue. https://github.com/docker/for-mac/issues/371
If you can remove all images/containers then:
Stop Docker.
run
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
docker volume rm $(docker volume ls |awk '{print $2}')
rm -rf ~/Library/Containers/com.docker.docker/Data/*
Start Docker, you have yours GB back.

To delete data in a HDFS you need to make a similar call like the one you did to put the file, in this case:
hdfs dfs -rm ./data/*
If there are directories you should add -r
hdfs dfs -rm -R ./data/*
And finally, by default Hadoop move deleted files/directories to a trash directory, which would be in the home of the hadoop user you're using for this requests, something like /user/<you>/.Trash/
About HDFS
Usually in the namenode there is some metadata about the structure of the HDFS, like the directories or files in it and where are the blocks forming it stored (Which datanodes). While datanodes will keep blocks of HDFS data, the data stored usually is not usable, since it will usually be just part of the data blocks in the HDFS.
Because of this, all operations with the HDFS are done through the namenode using hdfs calls, like put, get, rm, mkdir... instead of regular operating system command line tools.

Related

minikube mount crashes: mount system call fails

I am running minikube on my mac (running OSX 10.14.5)
minikube version: 1.1.0
minikube is using VirtualBox
I would like to have a single set of kubernetes yaml files that I use in different environments. Therefore, I'm trying to mount the same directory I would use in other environments into my minikube. (If there's a different way to go about this but ease development let me know.)
Anyway, the mount fails.
$ minikube mount /etc/vsc:/etc/vsc
📁 Mounting host path /etc/vsc into VM as /etc/vsc ...
💾 Mount options:
▪ Type: 9p
▪ UID: docker
▪ GID: docker
▪ Version: 9p2000.L
▪ MSize: 262144
▪ Mode: 755 (-rwxr-xr-x)
▪ Options: map[]
🚀 Userspace file server: ufs starting
💣 mount failed: mount: /etc/vsc: mount(2) system call failed: Connection timed out.
: Process exited with status 32

Issue: Error while running ubuntu bash shell in docker

I am running docker on my arm based 32 bit device.
However, when i try to run an ubuntu bash shell as a docker container via the command : docker run -it ubuntu bash , I keep getting the following error:
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:348: starting container process caused
"process_linux.go:402: container init caused \"open /dev/ptmx: no such file or directory\"": unknown.
Here's what docker info gives:
Containers: 4
Running: 0
Paused: 0
Stopped: 4
Images: 3
Server Version: 18.06.1-ce
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.65-00273-gfa38327-dirty
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 923MiB
ID: 2PDV:3KHU:VZZM:DM6F:4MVR:TXBN:35YJ:VWP5:TMHD:GMKW:TPMI:MALC
Docker Root Dir: /opt/usr/media/docker_workdir
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
It would be great if someone could tell me what's wrong and how can I fix this ?
It could be that, for one reason or another, your docker container can't find its own /dev/ptmx or even perhaps your /dev/ altogether.
One quick solution is to do:
docker run -it -v /dev:/dev ubuntu bash
This binds your /dev/ directory to the container's, meaning that they will use the same files.
Notice that, although in of itself this operation is harmless, in production environments this means that the isolation between the host's and the container's devices is gone.
For that reason, make sure to only ever use this trick in test environments.
It looks like your OS is missing pseudo-terminals (PTY) - a device that has the functions of a physical terminal without actually being one.
The file /dev/ptmx is a character file with major number 5
and minor number 2, usually of mode 0666 and owner.group
of root.root. It is used to create a pseudo-terminal mas­ter and slave pair.
FILES
/dev/ptmx - UNIX 98 master clone device
/dev/pts/* - UNIX 98 slave devices
/dev/pty[p-za-e][0-9a-f] - BSD master devices
/dev/tty[p-za-e][0-9a-f] - BSD slave devices
Reference: http://man7.org/linux/man-pages/man7/pty.7.html
This is by default included into Linux kernel. Maybe lack of it is somehow related to your OS architecture. Also, I'm not sure how can you fix, maybe try to update && upgrade OS.
Quick workaround if you don't need a tty would be to skip -t flag:
docker run -i ubuntu bash
In docker run -it, -i/--interactive means "keep stdin open" and -t/--tty means "tell the container that stdin is a pseudo tty". The key here is the word "interactive". If you omit the flag, the container still executes /bin/bash but exits immediately. With the flag, the container executes /bin/bash then patiently waits for your input. That means now you will have bash session inside the container, so you can ls, mkdir, or do any bash command inside the container.
one workable fix:
docker exec -i hello-world rm /dev/ptmx
docker exec -i hello-world mknod /dev/ptmx c 5 2
or enable kernel config: CONFIG_DEVPTS_MULTIPLE_INSTANCES=y

Postgres Docker Container data fails to mount to local

I'm trying to do data persistence in postgres. But when I want to mount the data folder into my local file, I get this error.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
LOG: could not link file "pg_xlog/xlogtemp.25" to "pg_xlog/000000010000000000000001": Operation not permitted
FATAL: could not open file "pg_xlog/000000010000000000000001": No such file or directory
child process exited with exit code 1
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ...
Here's my YAML file
version: '3.1'
services:
postgres:
restart: always
image: postgres:9.6.4-alpine
ports:
- 8100:5432
volumes:
- ./pgdata:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: root
I'm using docker toolbox on windows. The docker machine in Virtual Box.
It looks like you use a shared data directory (host dir shared into the virtual) for database storage.
Only two options make sense:
1) you have a trivial issue with directory permissions
2) you hit a known problem (google!) with some VirtualBox and also VmWare versions that on some Windows versions, you cannot create symlinks in directories shared from the host to virtual machine.
for (2), a workaround is to NOT use shared folder to keep data.
Either way, it's a problem which should be solved by the provider of the docker image itself, or by the provider of virtualizer (vbox, vmware etc).
This is NOT a fault of Windows OS, or PostgreSQL.
Looks like it has to be /mnt/sda1/var/lib/docker/volumes/psql/_data for windows docker toolbox. This worked for me
docker run -it --name psql -p 5432:5432 -v psql:/var/lib/postgresql/data postgres:9.5-alpine
"Mounts": [
{
"Type": "volume",
"Name": "psql",
"Source": "/mnt/sda1/var/lib/docker/volumes/psql/_data",
"Destination": "/var/lib/postgresql/data",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
docker volume ls
DRIVER VOLUME NAME
local 65f253d220ad390337daaacf39e4d17000c36616acfe1707e41e92ab26a6a23a
local 761f7eceaed5525b70d75208a1708437e0ddfa3de3e39a6a3c069b0011688a07
local 8a42268e965e6360b230d16477ae78035478f75dc7cb3e789f99b15a066d6812
local a37e0cf69201665b14813218c6d0441797b50001b70ee51b77cdd7e5ef373d6a
local psql
Please refer this for more info: bad mount

How to mount network Volume in Docker for Windows (Windows 10)

We're working to create a standard "data science" image in Docker in order to help our team maintain a consistent environment. In order for this to be useful for us, we need the containers to have read/write access to our company's network. How can I mount a network drive to a docker container?
Here's what I've tried using the rocker/rstudio image from Docker Hub:
This works:
docker run -d -p 8787:8787 -v //c/users/{insert user}:/home/rstudio/foobar rocker/rstudio
This does not work (where P is the mapped location of the network drive):
docker run -d -p 8787:8787 -v //p:/home/rstudio/foobar rocker/rstudio
This also does not work:
docker run -d -p 8787:8787 -v //10.1.11.###/projects:/home/rstudio/foobar rocker/rstudio
Any suggestions?
I'm relatively new to Docker, so please let me know if I'm not being totally clear.
I know this is relatively old - but for the sake of others - here is what usually works for me. for use - we use a windows file-server so we use cifs-utils in order to map the drive. I assume that below instructions can be applied to nfs or anything else as well.
first - need to run the container in privileged mode so that you can mount remote folders inside of the container (--dns flag might not be required)
docker run --dns <company dns ip> -p 8000:80 --privileged -it <container name and tag>
now, (assuming centos with cifs and being root in the container) - hop into the container and run:
install cifs-utils if not installed yet
yum -y install cifs-utils
create the local dir to be mapped
mkdir /mnt/my-mounted-folder
prepare a file with username and credentials
echo "username=<username-with-access-to-shared-drive>" > ~/.smbcredentials
echo "password=<password>" > ~/.smbcredentials
map the remote folder
mount <remote-shared-folder> <my-local-mounted-folder> -t cifs -o iocharset=utf8,credentials=/root/.smbcredentials,file_mode=0777,dir_mode=0777,uid=1000,gid=1000,cache=strict
now you should have access
hope this helps..
I will write my decision. I have a Synology NAS. The shared folder uses the smb protocol.
I managed to connect it in the following way. The most important thing was to write version 1.0 (vers=1.0). It didn't work without it! I tried to solve the issue for 2 days.
version: "3"
services:
redis:
image: redis
restart: always
container_name: 'redis'
command: redis-server
ports:
- '6379:6379'
environment:
TZ: "Europe/Moscow"
celery:
build:
context: .
dockerfile: celery.dockerfile
container_name: 'celery'
command: celery --broker redis://redis:6379 --result-backend redis://redis:6379 --app worker.celery_worker worker --loglevel info
privileged: true
environment:
TZ: "Europe/Moscow"
volumes:
- .:/code
- nas:/mnt/nas
links:
- redis
depends_on:
- redis
volumes:
nas:
driver: local
driver_opts:
type: cifs
o: username=user,password=pass,**vers=1.0**
device: "//192.168.10.10/main"
I have been searching the solution the last days and I just get one working.
I am running docker container on an ubuntu virtual machine and I am mapping a folder on other host on the same network which is running windows 10, but I am almost sure that the operative system where the container is running is not a problem because the mapping is from the container itself so I think this solution should work in any SO.
Let's code.
First you should create the volume
docker volume create
--driver local
--opt type=cifs
--opt device=//<network-device-ip-folder>
--opt o=user=<your-user>,password=<your-pw>
<volume-name>
And then you have to run a container from an image
docker run
--name <desired-container-name>
-v <volume-name>:/<path-inside-container>
<image-name>
After this a container is running with the volume assignated to it,
and is mapped to .
You create some file in any of this folders and it will be replicated
automatically to the other.
In case someone wants to get this running from docker-compose I leave
this here
services:
<image-name>:
build:
context: .
container_name: <desired-container-name>
volumes:
- <volume-name>:/<path-inside-container>
...
volumes:
<volume-name>:
driver: local
driver_opts:
type: cifs
device: //<network-device-ip-folder>
o: "user=<your-user>,password=<your-pw>"
Hope I can help
Adding to the solution by #Александр Рублев, the trick that solved this for me was reconfiguring the Synology NAS to accept the SMB version used by docker. In my case I had to enable SMBv3
I know this is old, but I found this when looking for something similar but see that it's receiving comments for others, like myself, who find it.
I have figured out how to get this to work for a similar situation that took me awhile to figure out.
The answers here are missing some key information that I'll include, possibly because they weren't available at the time
The CIFS storage is, I believe, only for when you are connecting to a Windows System as I do not believe it is used by Linux at all unless that system is emulating a Windows environment.
This same thing can be done with NFS, which is less secure, but is supported by almost everything.
you can create an NFS volume in a similar way to the CIFS one, just with a few changes. I'll list both so they can be seen side by side
When using NFS on WSL2 you 1st need to install the NFS service into the Linux Host OS. I believe CIFS requires a similar one, most likely the cifs-utils mentioned by #LevHaikin, but as I don't use it I'm not certain. In my case the Host OS is Ubuntu, but you should be able to find the appropriate one by finding your system's equivalent for nfs-common (or cifs-utils if that's correct) installation
sudo apt update
sudo apt install nfs-common
That's it. That will install the service so NFS works on Docker (It took me forever to realize that was the problem since it doesn't seem to be mentioned as needed anywhere)
If using NFS, On the network device you need to have set NFS permissions for the NFS folder, in my case this would be done at the folder folder with the mount then being to a folder inside it. That's fine. (In my case the NAS that is my server mounts to #IP#/volume1/folder, within the NAS I never see the volume1 in the directory structure, but that full path to the shared folder is shown in the settings page when I set the NFS permissions. I'm not including the volume1 part as your system will likely be different) & you want the FULL PATH after the IP (use the IP as the numbers NOT the HostName), according to your NFS share, whatever it may be.
If using a CIFS device the same is true just for CIFS permissions.
The nolock option is often needed but may not be on your system. It just disables the ability to "lock" files.
The soft option means that if the system cannot connect to the mount directory it will not hang. If you need it to only work if the mount is there you can change this to hard instead.
The rw (read/write) option is for Read/Write, ro (read-only) would be for Read Only
As I don't personally use the CIFS volume the options set are just ones in the examples I found, whether they are necessary for you will need to be looked into.
The username & password are required & must be included for CIFS
uid & gid are Linux user & group settings & should be set, I believe, to what your container needs as Windows doesn't use them to my knowledge
file_mode=0777 & dir_mode=0777 are Linux Read/Write Permissions essentially like chmod 0777 giving anything that can access the file Read/Write/Execute permissions (More info Link #4) & this should also be for the Docker Container not the CIFS host
noexec has to do with execution permissions but I don't think actually function here, but it was included in most examples I found, nosuid limits it's ability to access files that are specific to a specific user ID & shouldn't need to be removed unless you know you need it to be, as it's a protection I'd recommend leaving it if possible, nosetuids means that it won't set UID & GUID for newly created files, nodev means no access to/creation of devices on the mount point, vers=1.0 I think is a fallback for compatibility, I personally would not include it unless there is a problem or it doesn't work without it
In these examples I'm mounting //NET.WORK.DRIVE.IP/folder/on/addr/device to a volume named "my-docker-volume" in Read/Write mode. The CIFS volume is using the user supercool with password noboDyCanGue55
NFS from the CLI
docker volume create --driver local --opt type=nfs --opt o=addr=NET.WORK.DRIVE.IP,nolock,rw,soft --opt device=:/folder/on/addr/device my-docker-volume
CIFS from CLI (May not work if Docker is installed on a system other than Windows, will only connect to an IP on a Windows system)
docker volume create --driver local --opt type=cifs --opt o=user=supercool,password=noboDyCanGue55,rw --opt device=//NET.WORK.DRIVE.IP/folder/on/addr/device my-docker-volume
This can also be done within Docker Compose or Portainer.
When you do it there, you will need to add a Volumes: at the bottom of the compose file, with no indent, on the same level as services:
In this example I am mounting the volumes
my-nfs-volume from //10.11.12.13/folder/on/NFS/device to "my-nfs-volume" in Read/Write mode & mounting that in the container to /nfs
my-cifs-volume from //10.11.12.14/folder/on/CIFS/device with permissions from user supercool with password noboDyCanGue55 to "my-cifs-volume" in Read/Write mode & mounting that in the container to /cifs
version: '3'
services:
great-container:
image: imso/awesome/youknow:latest
container_name: totally_awesome
environment:
- PUID=1000
- PGID=1000
ports:
- 1234:5432
volumes:
- my-nfs-volume:/nfs
- my-cifs-volume:/cifs
volumes:
my-nfs-volume:
name: my-nfs-volume
driver_opts:
type: "nfs"
o: "addr=10.11.12.13,nolock,rw,soft"
device: ":/folder/on/NFS/device"
my-cifs-volume:
driver_opts:
type: "cifs"
o: "username=supercool,password=noboDyCanGue55,uid=1000,gid=1000,file_mode=0777,dir_mode=0777,noexec,nosuid,nosetuids,nodev,vers=1.0"
device: "//10.11.12.14/folder/on/CIFS/device/"
More details can be found here:
https://docs.docker.com/engine/reference/commandline/volume_create/
https://www.thegeekdiary.com/common-nfs-mount-options-in-linux/
https://web.mit.edu/rhel-doc/5/RHEL-5-manual/Deployment_Guide-en-US/s1-nfs-client-config-options.html
https://www.maketecheasier.com/file-permissions-what-does-chmod-777-means/

docker and image size limit

I've been reading a lot about this issue in here and other websites, but I haven't manage to find a proper solution on how to increase the images size limit which is set to 10GB by default.
A bit of background informations.
I'm building a docker container:
https://bitbucket.org/efestolab/docker-buildgaffer
Which download and builds a consistent set of libraries on top of a centos image. (takes a horrible amount of time and space to build)
The problem is that every single time I try to build it I hit this error :
No space left on device
Docker version:
Docker version 1.7.1, build 786b29d
Docker Info :
Containers: 1
Images: 76
Storage Driver: devicemapper
Pool Name: docker-8:7-12845059-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: extfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 11.28 GB
Data Space Total: 107.4 GB
Data Space Available: 96.1 GB
Metadata Space Used: 10.51 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.137 GB
Udev Sync Supported: false
Deferred Removal Enabled: false
Data loop file: /home/_varlibdockerfiles/devicemapper/devicemapper/data
Metadata loop file: /home/_varlibdockerfiles/devicemapper/devicemapper/metadata
Library Version: 1.02.82-git (2013-10-04)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.15.9-031509-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 15.58 GiB
Name: hdd-XPS-15-9530
ID: 2MEF:IYLS:MCN5:AR5O:6IXJ:3OB3:DGJE:ZC4N:YWFD:7AAB:EQ73:LKXQ
Username: efesto
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
After stopping the service and nuking the /var/lib/docker folder,
I've been updating by docker startup script
/lib/systemd/system/docker.service
with these flags :
ExecStart=/usr/bin/docker -d --storage-opt dm.basesize=20G --storage-opt dm.loopdatasize=256G -H fd:// $DOCKER_OPTS
and restarted docker service, but still fails with the same error.
I've also been reading that might be due to the original image I'm rely on (centos:6), which might have been built with 10GB limit.
So I rebuild the centos6 image, and used that as base for mine, but I did hit the same error.
Does anyone have a reliable way to make me able to build this docker image fully ?
If there's any other informations which might be useful, just feel free to ask.
Thanks for any reply or suggestions !
L.
Found this article
Basically edit /etc/docker/daemon.json file to include
"storage-opts": [
"dm.basesize=40G"
]
Restart the docker service, and it will enable to create/import images larger than 10Gb
Thanks to the test of #user2915097, I've been updating kernel version 3.16.0, installed the kernel extras, and removed and re installed docker.
the problem seems to be addressable to devicemapper, now without any change in the docker command I get:
Containers: 0
Images: 94
Storage Driver: aufs
Root Dir: /home/_varlibdockerfiles/aufs
Backing Filesystem: extfs
Dirs: 94
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-45-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 8
Total Memory: 15.58 GiB
Name: hdd-XPS-15-9530
ID: 2MEF:IYLS:MCN5:AR5O:6IXJ:3OB3:DGJE:ZC4N:YWFD:7AAB:EQ73:LKXQ
Username: efesto
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
and it finally builds images > 10GB.
L.
Since this question has been asked, the storage driver here:
Storage Driver: devicemapper
is no longer used by default, and not recommended. That also means the settings for the 10GB limit no longer apply.
The overlay2 storage driver (currently enabled by default) does not have size limits of it's own. Instead, the underlying filesystem you use for /var/lib/docker is used for any available free space and inodes there. You can check that free space with:
df -h /var/lib/docker
df -ih /var/lib/docker
after modifing the docker daemon startup parameters do the following
systemctl daemon-reload
systemctl stop docker
rm -rf /var/lib/docker/*
systemctl start docker
This will remove all your images, make sure you save them before
eg docker save -o something.tar.gz image_name
and reload them after starting docker
eg docker load -i something.tar.gz

Resources