HDFS Clustering in swarm - hadoop

In normal docker environment HDFS clustered images like hadoop-master and hadoop-slave works fine. But when I try to run these images in swarm mode, I am facing connectivity issues. Is clustered hdfs compatible with docker swarm?
The service that I deployed is restarting and exiting continously for every 2-3 seconds.
Can someone help me in detail to implement HDFS clustering in swarm mode.
When I do docker logs conatinerid, I get
start sshd...
/bin/sh: 0: Can't open /bin/which
/etc/init.d/ssh: 424: .: Can't open /lib/lsb/init-functions.d/20-left-info-blocks
start serf...
Error connecting to Serf agent: dial tcp 127.0.0.1:7373: connection refused

Obviously, you don't have /bin/which nor LSB support installed.
Install all prerequisites.

Related

Not Able to Start Hadoop Service Locally On Windows 7

I am trying to set up hadoop locally on my windows 7 computer by the instructions on the following link:
https://dimensionless.in/know-how-to-install-and-run-hadoop-on-windows-for-beginners/
I followed each single step and hadoop looks to be properly installed as I checked it by running this command: hadoop version in the windows command prompt and it returned the installed hadoop version hadoop 3.1.0 successfully.
However it failed to start all the nodes e.g. namenode, data node, yarn. I realised it must be to do with the port used as I use local port 9000 in the core site configuration: hdfs://localhost:9000 , and I checked if the port is open by running command: telnet localhost 9000, and it returned the message that failed to open the port.
Can anyone provide any guidance on the above issue which looks to be the port issue which then failed the hadoop service from starting up?
Thank you.

copy a file from wsl to hdfs running on docker

I'm trying to copy a file from my local drive to hdfs.
I'm running Hadoop on docker as an image. I try to perform some exercise on MapReduce, therefore, I want to copy a data file from a local drive (let's say my d: drive) to hdfs.
i tried below command but it fails with ssh: connect to host localhost port 22: Connection refused:
scp -P 50070 /mnt/d/project/recreate.out root#localhost:/root
since I'm new to Hadoop and big data my explanation may terrible. Please tolerate with me.
I'm trying to do above things from windows subsystem for Linux (WSL)
Regards,
crf
SCP won't move data to Hadoop. And port 50070 is not accepting connections over that protocol (SSH)
You need to setup and use a command similar to hdfs dfs -copyFromLocal. You can install the HDFS cli on the Windows host command prompt, too, so you don't need WSL to upload files...
When using Docker, I would suggest doing this
Add a volume mount from your host to some Hadoop container outside of the datanode and namenode directories (in other words, don't override the data that is there, and mounting files here will not "upload to HDFS")
docker exec into this running container
Run above hdfs command, uploading from the mounted volume

Write to HDFS running in Docker from another Docker container running Spark

I have a docker image for spark + jupyter (https://github.com/zipfian/spark-install)
I have another docker image for hadoop. (https://github.com/kiwenlau/hadoop-cluster-docker)
I am running 2 containers from the above 2 images in Ubuntu.
For the first container:
I am able to successfully launch jupyter and run python code:
import pyspark
sc = pyspark.sparkcontext('local[*]')
rdd = sc.parallelize(range(1000))
rdd.takeSample(False,5)
For the second container:
In the host Ubuntu OS, I am able to successfully go to the
web browser localhost:8088 : And browse the Hadoop all applications
localhost:50070: and browse the HDFS file system.
Now I want to write to the HDFS file system (running in the 2nd container) from jupyter (running in the first container).
So I add the additional line
rdd.saveAsTextFile("hdfs:///user/root/input/test")
I get the error:
HDFS URI, no host: hdfs:///user/root/input/test
Am I giving the hdfs path incorrectly ?
My understanding is that, I should be able to talk to a docker container running hdfs from another container running spark. Am I missing anything ?
Thanks for your time.
I haven't tried docker compose yet.
The URI hdfs:///user/root/input/test is missing an authority (hostname) section and port. To write to hdfs in another container you would need to fully specify the URI and make sure the two containers were on the same network and that the HDFS container has the ports for the namenode and data node exposed.
For example, you might have set the host name for the HDFS container to be hdfs.container. Then you can write to that HDFS instance using the URI hdfs://hdfs.container:8020/user/root/input/test (assuming the Namenode is running on 8020). Of course you will also need to make sure that the path you're seeking to write has the correct permissions as well.
So to do what you want:
Make sure your HDFS container has the namenode and datanode ports exposed. You can do this using an EXPOSE directive in the dockerfile (the container you linked does not have these) or using the --expose argument when invoking docker run. The default ports are 8020 and 50010 (for NN and DN respectively).
Start the containers on the same network. If you just do docker run with no --network they will start on the default network and you'll be fine. Start the HDFS container with a specific name using the --name argument.
Now modify your URI to include the proper authority (this will be the value of the docker --name argument you passed) and port as described above and it should work

Hue configuration error -/etc/hue/conf.empty - Potential misconfiguration detected

Hi Experts,
I'm newbie to Hadoop , linux environment and Cloudera. I installed cloudera vm 5.7 on my machine and imported mysql data to hdfs using SQOOP. I'm trying to execute to some queries against this data using impala. So, I tried launching HUE. When I launched I could see there is some misconfiguration error.
Error:
Potential misconfiguration detected. Fix and restart Hue.
Steps I have taken to troubleshoot this issue
1)I restarted HUE using below command:
sudo service hue stop
sudo service hue start
2) I tried looking at following directory file ./etc/hue - I could see there are two config folder. One is config and other on config.empty. I couldn't figure out the problem.
But Still I'm facing the same issue.
check out! your internet access from docker/VM, and after lots of messing around trying to figure out why the vmWare Bridge adapter wasn't working, I found my problem was docker. So you have to increase docker memory from UI or command ,mine was 2 I increased to 8 but 4 is ok
stop hue :
sudo service hue stop
restart HBASE :
sudo service hbase-thrift stop;
sudo service hbase-thrift start;
Restart Hive :
sudo service hive-server2 stop
sudo service hive-server2 start
start hue
sudo service hue start
Open, http://quickstart.cloudera:8888/about/ : it should work like a charm💫

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

Resources