HDFS as volume in cloudera quickstart docker - hadoop

I am fairly new to both hadoop and docker.
I haven been working on extending the cloudera/quickstart docker image docker file and wanted to mount a directory form host and map it to hdfs location, so that performance is increased and data are persist localy.
When i mount volume anywhere with -v /localdir:/someDir everything works fine, but that's not my goal. But when i do -v /localdir:/var/lib/hadoop-hdfs both datanode and namenode fails to start and I get : "cd /var/lib/hadoop-hdfs: Permission denied". And when i do -v /localdir:/var/lib/hadoop-hdfs/cache no permission denied but datanode and namenode, or one of them fails to start on starting the docker image and i can't find any useful information in log files about the reason for that.
Mayby someone came across this problem, or have some other solution for putting hdfs outside the docker container?

I've the same problem and I've managed the situation copying the entire /var/lib directory from container to a local directory
From terminal, start the cloudera/quickstart container without start all hadoop services:
docker run -ti cloudera/quickstart /bin/bash
In another terminal copy the container directory to the local directory
:
mkdir /local_var_lib
docker exec your_container_id tar Ccf $(dirname /var/lib) - $(basename /var/lib) | tar Cxf /local_var_lib -
After all files copied from container to local dir, stop the container and point the /var/lib to the new target. Make sure the /local_var_lib directory contains the hadoop directories (hbase, hadoop-hdfs, oozie, mysql, etc).
Start the container:
docker run --name cloudera \
--hostname=quickstart.cloudera \
--privileged=true \
-td \
-p 2181:2181 \
-p 8888:8888 \
-p 7180:7180 \
-p 6680:80 \
-p 7187:7187 \
-p 8079:8079 \
-p 8080:8080 \
-p 8085:8085 \
-p 8400:8400 \
-p 8161:8161 \
-p 9090:9090 \
-p 9095:9095 \
-p 60000:60000 \
-p 60010:60010 \
-p 60020:60020 \
-p 60030:60030 \
-v /local_var_lib:/var/lib \
cloudera/quickstart /usr/bin/docker-quickstart

You should run a
docker exec -it "YOUR CLOUDERA CONTAINER" chown -R hdfs:hadoop /var/lib/hadoop-hdfs/

Related

Bcrypt docker passwd using --admin-passwd

What is wrong with the following command? It is intended to create a portainer container with admin passwd 'portainer':
docker run --rm -d --name "portainer" -p "127.0.0.1:9001:9000" -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer --admin-password='$2a$10$0PW6gPY0TSeYzry2RSakl.7VUVmzdmD6mQPcemiG6i2vfJGGGePYu'
It leads to a Portainer container that will deny access for 'admin', saying that passwd 'portainer' is invalid. Details:
I put it into a .bat file. The thing runs on docker CE in Windows 10.
The longish crypt string within single quotes is a bcrypt equivalent of 'portainer', the designated admin password. I created and checked it here: https://www.javainuse.com/onlineBcrypt
Prior to running the command I stopped and removed an old portainer container, and even said docker volume rm portainer_data.
Doubling the "$" to "$$" did not solve the issue.
The command is deeply inspired by the official portainer docs: https://documentation.portainer.io/v2.0/deploy/initial/
For now I have a simple workaround: Simply drop that --admin-passwd parameter. Given that I grant a volume to portainer, I can just define a passwd at first start. However, I'd still prefer the script-only solution. Any ideas?
Here it is the solution you need:
docker run --detach \
--name=portainer-ce \
-p 8000:8000 \
-p 9000:9000 \
--restart=always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /volume1/docker/portainer-ce:/data \
portainer/portainer-ce \
--admin-password="$(htpasswd -nb -B admin adminpwPC | cut -d ':' -f 2)"

chowning the host's bound `docker.sock` inside container breaks host docker

On a vanilla install of Docker for Mac my docker.sock is owned by my local user:
$ stat -c "%U:%G" /var/run/docker.sock
juliano:staff
Even if I add the user and group on my Dockerfile, when trying to run DinD as me, the mount of the docker.sock is created with root:root.
$ docker run -it --rm \
--volume /var/run/docker.sock:/var/run/docker.sock \
--group-add staff \
--user $(id -u):$(id -g) \
"your-average-container:latest" \
/bin/bash -c 'ls -l /var/run/docker.sock'
srw-rw---- 1 root root 0 Jun 17 07:34 /var/run/docker.sock
Going the other way, running DinD as root, chowning the socket, then running commands breaks the host docker.
$ docker run -it --rm \
--volume /var/run/docker.sock:/var/run/docker.sock \
--group-add staff \
"your-average-container:latest" \
/bin/bash
$ chown juliano:staff /var/run/docker.sock
$ sudo su juliano
$ docker ps
[some valid docker output]
$ exit
$ docker ps
Error response from daemon: Bad response from Docker engine
I've seen people reporting chowning as the way to go, so maybe I'm doing something wrong.
Questions:
Why does the host docker break?
Is there some way to prevent host docker from breaking and still giving my user permission to the socket inside docker?
I believe that when you are mounting the volume the owner UID/GID is set to the same as in the host machine (the --user flag simply allows you to run the command as a specific UID/GID and it doesn't have impact on the permission for mounted volume)
The main question is - why would you need to chown? Can't you just run the commands inside the container as root?

h2o Driverless AI Install on GCP

I'm installing H20 Driverless AI on Google Cloud Platform on Ubuntu 16.04.
I'm following these instructions:
http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/UsingDriverlessAI.pdf
It goes well - or so I think - until step 15, the last one.
I type the following
docker run \
> --rm \
> -u `id -u`:`id -g` \
> -p 12345:12345 \
> -p 9090:9090 \
> -v `pwd`/data:/data \
> -v `pwd`/log:/log \
> -v `pwd`/license:/license \
> -v `pwd`/tmp:/tmp \
> opsh2oai/h2oai-runtime
And get:
mkdir: cannot create directory '/log/20180111-180304': Permission denied
20180111-180304 corresponds to the timestamp of the action.
When ls, here is the list of the files and folders present on the virtual machine:
data demo driverless-ai-docker-runtime-rel-1.0.5.gz install.sh jupyter license log scripts tmp
I'd be keen to hear if you've encountered a similar error or understand what I am doing wrong.
I've also tried sudo docker run \; similar outcome
In this case the command presented is suggesting you want to mount
the docker host /log into the docker containers /log
The /log folder in question must have the privileges to write
of the user who launched the docker run command.
Or launch the container with sudo

Why is DB data not being persisted from docker container?

I have created an Oracle 12c docker instance on my Mac(Sierra). I can do everything outlined in this link (bring it up, connect to it, create table, insert data):
https://www.toadworld.com/platforms/oracle/b/weblog/archive/2017/06/21/modularization-by-using-oracle-database-containers-and-pdbs-on-docker-engine
In the docker toolkit I have mapped a shared drive /Users/user/projects/database.
I am executing this command:
docker run --name oraclecdb \
-p 1521:1521 -p 5500:5500 \
-e ORACLE_SID=ORCLCDB \
-e ORACLE_PDB=ORCLPDB1 \
-e ORACLE_PWD=oracle \
-v /Users/user/projects/database/oradata:/home/oracle/oradata \
oracle/database:12.2.0.1-ee
"oradata" gets created, but the pluggable database never gets persisted to the shared volume. So what am I missing?
Turns out that /home/oracle/oradata should be /opt/oracle/oradata

Docker intercontainer communication

I would like to run Hadoop and Flume dockerized. I have a standard Hadoop image with all the default values. I cannot see how can these services communicate each other placed in separated containers.
Flume's Dockerfile looks like this:
FROM ubuntu:14.04.4
RUN apt-get update && apt-get install -q -y --no-install-recommends wget
RUN mkdir /opt/java
RUN wget --no-check-certificate --header "Cookie: oraclelicense=accept-securebackup-cookie" -qO- \
https://download.oracle.com/otn-pub/java/jdk/8u20-b26/jre-8u20-linux-x64.tar.gz \
| tar zxvf - -C /opt/java --strip 1
RUN mkdir /opt/flume
RUN wget -qO- http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz \
| tar zxvf - -C /opt/flume --strip 1
ADD flume.conf /var/tmp/flume.conf
ADD start-flume.sh /opt/flume/bin/start-flume
ENV JAVA_HOME /opt/java
ENV PATH /opt/flume/bin:/opt/java/bin:$PATH
CMD [ "start-flume" ]
EXPOSE 10000
You should link your containers. There are some variants how you can implement this.
1) Publish ports:
docker run -p 50070:50070 hadoop
option p binds port 50070 of your docker container with port 50070 of host machine
2) Link containers (using docker-compose)
docker-compose.yml
version: '2'
services:
hadoop:
image: hadoop:2.6
flume:
image: flume:last
links:
- hadoop
link option here binds your flume container with hadoop
more info about this https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/

Resources