Docker intercontainer communication - hadoop

I would like to run Hadoop and Flume dockerized. I have a standard Hadoop image with all the default values. I cannot see how can these services communicate each other placed in separated containers.
Flume's Dockerfile looks like this:
FROM ubuntu:14.04.4
RUN apt-get update && apt-get install -q -y --no-install-recommends wget
RUN mkdir /opt/java
RUN wget --no-check-certificate --header "Cookie: oraclelicense=accept-securebackup-cookie" -qO- \
https://download.oracle.com/otn-pub/java/jdk/8u20-b26/jre-8u20-linux-x64.tar.gz \
| tar zxvf - -C /opt/java --strip 1
RUN mkdir /opt/flume
RUN wget -qO- http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz \
| tar zxvf - -C /opt/flume --strip 1
ADD flume.conf /var/tmp/flume.conf
ADD start-flume.sh /opt/flume/bin/start-flume
ENV JAVA_HOME /opt/java
ENV PATH /opt/flume/bin:/opt/java/bin:$PATH
CMD [ "start-flume" ]
EXPOSE 10000

You should link your containers. There are some variants how you can implement this.
1) Publish ports:
docker run -p 50070:50070 hadoop
option p binds port 50070 of your docker container with port 50070 of host machine
2) Link containers (using docker-compose)
docker-compose.yml
version: '2'
services:
hadoop:
image: hadoop:2.6
flume:
image: flume:last
links:
- hadoop
link option here binds your flume container with hadoop
more info about this https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/

Related

HDFS as volume in cloudera quickstart docker

I am fairly new to both hadoop and docker.
I haven been working on extending the cloudera/quickstart docker image docker file and wanted to mount a directory form host and map it to hdfs location, so that performance is increased and data are persist localy.
When i mount volume anywhere with -v /localdir:/someDir everything works fine, but that's not my goal. But when i do -v /localdir:/var/lib/hadoop-hdfs both datanode and namenode fails to start and I get : "cd /var/lib/hadoop-hdfs: Permission denied". And when i do -v /localdir:/var/lib/hadoop-hdfs/cache no permission denied but datanode and namenode, or one of them fails to start on starting the docker image and i can't find any useful information in log files about the reason for that.
Mayby someone came across this problem, or have some other solution for putting hdfs outside the docker container?
I've the same problem and I've managed the situation copying the entire /var/lib directory from container to a local directory
From terminal, start the cloudera/quickstart container without start all hadoop services:
docker run -ti cloudera/quickstart /bin/bash
In another terminal copy the container directory to the local directory
:
mkdir /local_var_lib
docker exec your_container_id tar Ccf $(dirname /var/lib) - $(basename /var/lib) | tar Cxf /local_var_lib -
After all files copied from container to local dir, stop the container and point the /var/lib to the new target. Make sure the /local_var_lib directory contains the hadoop directories (hbase, hadoop-hdfs, oozie, mysql, etc).
Start the container:
docker run --name cloudera \
--hostname=quickstart.cloudera \
--privileged=true \
-td \
-p 2181:2181 \
-p 8888:8888 \
-p 7180:7180 \
-p 6680:80 \
-p 7187:7187 \
-p 8079:8079 \
-p 8080:8080 \
-p 8085:8085 \
-p 8400:8400 \
-p 8161:8161 \
-p 9090:9090 \
-p 9095:9095 \
-p 60000:60000 \
-p 60010:60010 \
-p 60020:60020 \
-p 60030:60030 \
-v /local_var_lib:/var/lib \
cloudera/quickstart /usr/bin/docker-quickstart
You should run a
docker exec -it "YOUR CLOUDERA CONTAINER" chown -R hdfs:hadoop /var/lib/hadoop-hdfs/

Firefox Proxy to Docker Fiddler refusing connection

Running docker-fiddler container on Ubuntu-14.04 host. Container brings up fiddler and redirects GUI to host, but proxy fails. Docker ver 1.11.1,
Firefox displays either "The connection was reset" or "The proxy server is refusing connections" depending on setups shown below.
Question:
What are the correct Firefox proxy settings, http and ssl?
What changes are need to docker run cmd line?
What changes are need for the Dockerfile?
Note: I am hitting an http url, not https
This configuration, localhost, assuming port fwd, FF Output: The connection was reset
Firefox proxy:
manual proxy
HTTP Proxy 127.0.0.1 Port 8888
SSL Proxy 127.0.0.1 Port 8888
This Configuration, using container ip, FF Output: The Proxy server is refusing connections
Firefox proxy:
manual proxy
HTTP Proxy 172.17.02 Port 8888
SSL Proxy 172.17.02 Port 8888
TL;DR
Docker Run:
docker run -d -p 8888:8888 -v /tmp/.X11-unix:/tmp/.X11-unix -e \
DISPLAY=$DISPLAY fiddler -h $HOSTNAME -v \
$HOME/.Xauthority:/home/$USER/.Xauthority
docker ps:
16a4f7531222 fiddler "mono /app/Fiddler.ex" 3 hours ago Up 3 hours 0.0.0.0:8888->8888/tcp cranky_pare
Dockerfile jwieringa/docker-fiddler , I added expose 8888, and User config to support bind mount X server
FROM debian:wheezy
RUN apt-get update \
&& apt-get install -y curl unzip \
&& rm -rf /var/lib/apt/lists/*
RUN apt-key adv --keyserver pgp.mit.edu --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
RUN echo "deb http://download.mono-project.com/repo/debian wheezy/snapshots/3.12.0 main" > /etc/apt/sources.list.d/mono-xamarin.list \
&& apt-get update \
&& apt-get install -y mono-devel ca-certificates-mono fsharp mono-vbnc nuget \
&& rm -rf /var/lib/apt/lists/*
RUN cd /tmp && curl -O http://ericlawrence.com/dl/MonoFiddler-v4484.zip
RUN unzip /tmp/MonoFiddler-v4484.zip
## I added this for X11 Display of Fiddler GUI on linux Host
RUN groupadd -g <gid> <user>
RUN useradd -d /home/<user> -s /bin/bash -m <user> -u <uid> -g <gid>
USER <user>
ENV HOME /home/<user>
# I added this also
EXPOSE 8888
ENTRYPOINT ["mono", "/app/Fiddler.exe"]
1) The Host is considered a remote computer to docker-fiddler container
Fiddler > Tools > Fiddler Options > Connections > [x] Allow remote computers to connect
2) Fiddler requires a reset after changing this attribute, this closes the container. must add bind-mount volume to Dockerfile to maintain config
-v /tmp/docker-fiddler/.mono:/home/$USER/.mono
3) create /tmp/docker-fiddler/.mono on the host first and give it $USER permissions. Docker should do this for me but, I'm not sure how
4) Changed docker run to :
docker run -d -p 8888:8888 \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-h $HOSTNAME \
-v $HOME/.Xauthority:/home/$USER/.Xauthority \
-v /tmp/docker-fiddler/.mono:/home/$USER/.mono \
-e DISPLAY=$DISPLAY fiddler
5) For debugging, change the first line above to add Debug (-D) and remove daemon (-d), doing this was key to finding the missing libs
docker -D run -p 8888:8888
6) There were several libs missing, the last one was gsettings-desktop-schema which contains/brings in the gnome proxy schema. This is used by fiddler, until this was in place the "AllowRemote" config setting was not being stored
.mono/registry/CurrentUser/software/telerik/fiddler/values.xml:<value name="AllowRemote"
7) Several changes to Dockerfile, including using ubuntu, creates a very large image, might be able to backout libglib2.0-bin libcanberra-gtk-module:
FROM ubuntu:14.04
RUN apt-get update \
&& apt-get install -y curl unzip libglib2.0-bin libcanberra-gtk-module gsettings-desktop-schemas \
&& rm -f /etc/apt/sources.list.d/mono-xamarin* \
&& rm -rf /var/lib/apt/lists/*
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
RUN echo "deb http://download.mono-project.com/repo/debian wheezy main" > /etc/apt/sources.list.d/mono-xamarin.list \
&& apt-get update \
&& apt-get install -y mono-complete ca-certificates-mono fsharp mono-vbnc nuget \
&& rm -rf /var/lib/apt/lists/*
RUN cd /tmp && curl -O http://ericlawrence.com/dl/MonoFiddler-v4484.zip
RUN unzip /tmp/MonoFiddler-v4484.zip
RUN groupadd -g 1000 <USER>
RUN useradd -d /home/<USER> -s /bin/bash \
-m <USER> -u <UID> -g <GID>
USER <user>
ENV HOME /home/<USER>
EXPOSE 8888
ENTRYPOINT ["mono", "/app/Fiddler.exe"]
8) Firefox Proxy, - did not address HTTPS/SSL
FF > edit > preferences > Advanced > settings
manual proxy
HTTP Proxy <container-ip> Port 8888
SSL Proxy <left this blank>
see: Install Mono on Linux
see: Docker In Practice, Miell/Sayers - CH4 Tech 26 Running GUIs, X11

Wrong permissions in volume in Docker container

I run Docker 1.8.1 in OSX 10.11 via an local docker-machine VM.
I have the following docker-compose.yml:
web:
build: docker/web
ports:
- 80:80
- 8080:8080
volumes:
- $PWD/cms:/srv/cms
My Dockerfile looks like this:
FROM alpine
# install nginx and php
RUN apk add --update \
nginx \
php \
php-fpm \
php-pdo \
php-json \
php-openssl \
php-mysql \
php-pdo_mysql \
php-mcrypt \
php-ctype \
php-zlib \
supervisor \
wget \
curl \
&& rm -rf /var/cache/apk/*
RUN mkdir -p /etc/nginx && \
mkdir -p /etc/nginx/sites-enabled && \
mkdir -p /var/run/php-fpm && \
mkdir -p /var/log/supervisor && \
mkdir -p /srv/cms
RUN rm /etc/nginx/nginx.conf
ADD nginx.conf /etc/nginx/nginx.conf
ADD thunder.conf /etc/nginx/sites-enabled/thunder.conf
ADD nginx-supervisor.ini /etc/supervisor.d/nginx-supervisor.ini
WORKDIR "/srv/cms"
VOLUME "/srv/cms"
EXPOSE 80
EXPOSE 8080
EXPOSE 22
CMD ["/usr/bin/supervisord"]
When I run everything with docker-compose up everything works fine, my volumes are mounted at the correct place.
But the permissions in the mounted folder /srv/cms look wrong. The user is "1000" and the group is "50" in the container. The webserver could not create any files in this folder, because it runs with the user "root".
1) General idea: Docker it is not Vagrant. It is wrong to put two different services into one container! Split it into two different images and link them together. Don't do this shitty image.
Check and follow https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Avoid installing unnecessary packages
Run only one process per container
Minimize the number of layers
If you do it:
you will remove your supervisor
your can decrease numbers of layers
It should be something like (example):
FROM alpine
RUN apk add --update \
wget \
curl
RUN apk add --update \
php \
php-fpm \
php-pdo \
php-json \
php-openssl \
php-mysql \
php-pdo_mysql \
php-mcrypt \
php-ctype \
php-zlib
RUN usermod -u 1000 www-data
RUN rm -rf /var/cache/apk/*
EXPOSE 9000
For nginx it is enough to use default image and mount configs.
docker-compose file like:
nginx:
image: nginx
container_name: site.dev
volumes:
- ./myconf1.conf:/etc/nginx/conf.d/myconf1.conf
- ./myconf2.conf:/etc/nginx/conf.d/myconf2.conf
- $PWD/cms:/srv/cms
ports:
- "80:80"
links:
- phpfpm
phpfpm:
build: ./phpfpm/
container_name: phpfpm.dev
command: php5-fpm -F --allow-to-run-as-root
volumes:
- $PWD/cms:/srv/cms
2)
Add RUN usermod -u 1000 www-data into Dockerfile for php container, it will fix problem with permission.
For alpine version you need to use:
RUN apk add shadow && usermod -u 1000 www-data && groupmod -g 1000 www-data

Why does "docker run" error with "no such file or directory"?

I am trying to run a container which runs an automated build. Here is the dockerfile:
FROM ubuntu:14.04
MAINTAINER pmandayam
# update dpkg repositories
RUN apt-get update
# install wget
RUN apt-get install -y wget
# get maven 3.2.2
RUN wget --no-verbose -O /tmp/apache-maven-3.2.2.tar.gz http://archive.apache.or
g/dist/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gz
# verify checksum
RUN echo "87e5cc81bc4ab9b83986b3e77e6b3095 /tmp/apache-maven-3.2.2.tar.gz" | md5
sum -c
# install maven
RUN tar xzf /tmp/apache-maven-3.2.2.tar.gz -C /opt/
RUN ln -s /opt/apache-maven-3.2.2 /opt/maven
RUN ln -s /opt/maven/bin/mvn /usr/local/bin
RUN rm -f /tmp/apache-maven-3.2.2.tar.gz
ENV MAVEN_HOME /opt/maven
# remove download archive files
RUN apt-get clean
# set shell variables for java installation
ENV java_version 1.8.0_11
ENV filename jdk-8u11-linux-x64.tar.gz
ENV downloadlink http://download.oracle.com/otn-pub/java/jdk/8u11-b12/$filename
# download java, accepting the license agreement
RUN wget --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie
" -O /tmp/$filename $downloadlink
# unpack java
RUN mkdir /opt/java-oracle && tar -zxf /tmp/$filename -C /opt/java-oracle/
ENV JAVA_HOME /opt/java-oracle/jdk$java_version
ENV PATH $JAVA_HOME/bin:$PATH
# configure symbolic links for the java and javac executables
RUN update-alternatives --install /usr/bin/java java $JAVA_HOME/bin/java 20000 &
& update-alternatives --install /usr/bin/javac javac $JAVA_HOME/bin/javac 20000
# install mongodb
RUN echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
' | sudo tee /etc/apt/sources.list.d/mongodb.list && \
apt-get update && \
apt-get --allow-unauthenticated install -y mongodb-org mongodb-org-s
erver mongodb-org-shell mongodb-org-mongos mongodb-org-tools && \
echo "mongodb-org hold" | dpkg --set-selections && \
echo "mongodb-org-server hold" | dpkg --set-selections && \
echo "mongodb-org-shell hold" | dpkg --set-selections &&
\
echo "mongodb-org-mongos hold" | dpkg --set-selectio
ns && \
echo "mongodb-org-tools hold" | dpkg --set-selec
tions
RUN mkdir -p /data/db
VOLUME /data/db
EXPOSE 27017
COPY build-script /build-script
CMD ["/build-script"]
I can build the image successfully but when I try to run the container I get this error:
$ docker run mybuild
no such file or directory
Error response from daemon: Cannot start container 3e8aa828909afcd8fb82b5a5ac894
97a537bef2b930b71a5d20a1b98d6cc1dd6: [8] System error: no such file or directory
what does it mean 'no such file or directory'?
Here is my simple script:
#!/bin/bash
sudo service mongod start
mvn clean verify
sudo service mongod stop
I copy it like this: COPY build-script /build-script
and run it like this: CMD ["/build-script"] not sure why its not working
Using service isn't going to fly - the Docker base images are minimal and don't support this. If you want to run multiple processes, you can use supervisor or runit etc.
In this case, it would be simplest just to start mongo manually in the script e.g. /usr/bin/mongod & or whatever the correct incantation is.
BTW the lines where you try to clean up don't have much effect:
RUN rm -f /tmp/apache-maven-3.2.2.tar.gz
...
# remove download archive files
RUN apt-get clean
These files have already been committed to a previous image layer, so doing this doesn't save any disk-space. Instead you have to delete the files in the same Dockerfile instruction in which they're added.
Also, I would consider changing the base image to a Java one, which would save a lot of work. However, you may have trouble finding one which bundles the official Oracle JDK rather than OpenJDK if that's a problem.

How to rebuild dockerfile quick by using cache?

I want to optimize my Dockerfile. And I wish to keep cache file in disk.
But, I found when I run docker build . It always try to get every file from network.
I wish to share My cached directory during build (eg. /var/cache/yum/x86_64/6).
But, it works only on docker run -v ....
Any suggestion?(In this example, only 1 rpm installed, in real case, I require to install hundreds rpms)
My draft Dockerfile
FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server
RUN sed -i -e 's:keepcache=0:keepcache=1:' /etc/yum.conf
VOLUME ["/var/cache/yum/x86_64/6"]
EXPOSE 22
At second time, I want to build a similar image
FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server vim
I don't want the fetch openssh-server from internat again(It is slow). In my real case, it is not one package, it is about 100 packages.
An update to previous answers, current docker build
accepts --build-arg that pass environment variables like http_proxy
without saving it in the resulting image.
Example:
# get squid
docker run --name squid -d --restart=always \
--publish 3128:3128 \
--volume /var/spool/squid3 \
sameersbn/squid:3.3.8-11
# optionally in another terminal run tail on logs
docker exec -it squid tail -f /var/log/squid3/access.log
# get squid ip to use in docker build
SQUID_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' squid)
# build your instance
docker build --build-arg http_proxy=http://$SQUID_IP:3128 .
Just use an intermediate/base image:
Base Dockerfile, build it with docker build -t custom-base or something:
FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server vim
RUN sed -i -e 's:keepcache=0:keepcache=1:' /etc/yum.conf
Application Dockerfile:
FROM custom-base
VOLUME ["/var/cache/yum/x86_64/6"]
EXPOSE 22
You should use a caching proxy (f.e Http Replicator, squid-deb-proxy ...) or apt-cacher-ng for Ubuntu to cache installation packages. I think, you can install this software to the host machine.
EDIT:
Option 1 - caching http proxy - easier method with modified Dockerfile:
> cd ~/your-project
> git clone https://github.com/gertjanvanzwieten/replicator.git
> mkdir cache
> replicator/http-replicator -r ./cache -p 8080 --daemon ./cache/replicator.log --static
add to your Dockerfile (before first RUN line):
ENV http_proxy http://172.17.42.1:8080/
You should optionally clear the cache from time to time.
Option 2 - caching transparent proxy, no modification to Dockerfile:
> cd ~/your-project
> curl -o r.zip https://codeload.github.com/zahradil/replicator/zip/transparent-requests
> unzip r.zip
> rm r.zip
> mv replicator-transparent-requests replicator
> mkdir cache
> replicator/http-replicator -r ./cache -p 8080 --daemon ./cache/replicator.log --static
You need to start the replicator as some user (non root!).
Set up the transparent redirect:
> iptables -t nat -A OUTPUT -p tcp -m owner ! --uid-owner <replicator-user> --dport 80 -j REDIRECT --to-port 8080
Disable redirect:
> iptables -t nat -D OUTPUT -p tcp -m owner ! --uid-owner <replicator-user> --dport 80 -j REDIRECT --to-port 8080
This method is the most transparent and general and your Dockerfile does not need to be modified. You should optionally clear the cache from time to time.

Resources