Why Namenode is not coming UP in HDFS on kubernetes - hadoop

I tried running HDFS on k8s using this project. I got journal node and zookeeper Up and running but if I see namenode it is not going to Error state
If I look after the logs I see it as hostname could not be resolved. Then I tried seeing the /etc/hosts entry of the file on the boot up script, IP and hostname for name is not set
I'm not using any customized image just using the same helm charts as it was.
What should I need to do If I want name node to be up and running
Helm Chart Link: https://github.com/apache-spark-on-k8s/kubernetes-HDFS/tree/master/charts

Related

How to change the container log location in a Dataproc cluster?

What is the correct way to change the container log location in a Dataproc cluster during cluster creation?
The default path is /var/log/hadoop-yarn/userlogs and I want to change it to a local SSD mount such as /mnt/1/hadoop/yarn/userlogs. I tried adding
--properties=yarn:yarn.nodemanager.log-dirs
to the gcloud dataproc clusters create command but got the error -
bash: --properties=yarn:yarn.nodemanager.log-dirs=/mnt/1/hadoop/yarn: No such file or directory
This is most likely because the local SSD gets mounted after the cluster is created. Can someone please help?

Write to HDFS running in Docker from another Docker container running Spark

I have a docker image for spark + jupyter (https://github.com/zipfian/spark-install)
I have another docker image for hadoop. (https://github.com/kiwenlau/hadoop-cluster-docker)
I am running 2 containers from the above 2 images in Ubuntu.
For the first container:
I am able to successfully launch jupyter and run python code:
import pyspark
sc = pyspark.sparkcontext('local[*]')
rdd = sc.parallelize(range(1000))
rdd.takeSample(False,5)
For the second container:
In the host Ubuntu OS, I am able to successfully go to the
web browser localhost:8088 : And browse the Hadoop all applications
localhost:50070: and browse the HDFS file system.
Now I want to write to the HDFS file system (running in the 2nd container) from jupyter (running in the first container).
So I add the additional line
rdd.saveAsTextFile("hdfs:///user/root/input/test")
I get the error:
HDFS URI, no host: hdfs:///user/root/input/test
Am I giving the hdfs path incorrectly ?
My understanding is that, I should be able to talk to a docker container running hdfs from another container running spark. Am I missing anything ?
Thanks for your time.
I haven't tried docker compose yet.
The URI hdfs:///user/root/input/test is missing an authority (hostname) section and port. To write to hdfs in another container you would need to fully specify the URI and make sure the two containers were on the same network and that the HDFS container has the ports for the namenode and data node exposed.
For example, you might have set the host name for the HDFS container to be hdfs.container. Then you can write to that HDFS instance using the URI hdfs://hdfs.container:8020/user/root/input/test (assuming the Namenode is running on 8020). Of course you will also need to make sure that the path you're seeking to write has the correct permissions as well.
So to do what you want:
Make sure your HDFS container has the namenode and datanode ports exposed. You can do this using an EXPOSE directive in the dockerfile (the container you linked does not have these) or using the --expose argument when invoking docker run. The default ports are 8020 and 50010 (for NN and DN respectively).
Start the containers on the same network. If you just do docker run with no --network they will start on the default network and you'll be fine. Start the HDFS container with a specific name using the --name argument.
Now modify your URI to include the proper authority (this will be the value of the docker --name argument you passed) and port as described above and it should work

Hadoop UI shows only one Datanode

I've started hadoop cluster composed of on master and 4 slave nodes.
Configuration seems ok:
hduser#ubuntu-amd64:/usr/local/hadoop$ ./bin/hdfs dfsadmin -report
When I enter NameNode UI (http://10.20.0.140:50070/) Overview card seems ok - for example total Capacity of all Nodes sumes up.
The problem is that in the card Datanodes I see only one datanode.
I came across the same problem, fortunately, I solved it. I guess it causes by the 'localhost'.
Config different name for these IP in /etc/host
Remember to restart all the machines, things will go well.
It's because of the same hostname in both datanodes.
In your case both datanodes are registering to the namenode with same hostname ie 'localhost' Try with different hostnames it will fix your problem.
in UI it will show only one entry for a hostname.
in "hdfs dfsadmin -report" output you can see both.
The following tips may help you
Check the core-site.xml and ensure that the namenode hostname is correct
Check the firewall rules in namenode and datanodes and ensure that the required ports are open
Check the logs of datanodes
Ensure that all the datanodes are up and running
As #Rahul said the problem is because of the same hostname
change your hostname in /etc/hostname file and give different hostname for each host
and resolve hostname with ip address /etc/hosts file
then restart your cluster you will see all datanodes in Datanode information tab on browser
I have the same trouble because I use ip instead of hostname, [hdfs dfsadmin -report] is correct though it is only one[localhost] in UI. Finally, I solved it like this:
<property>
       <name>dfs.datanode.hostname</name>                   
       <value>the name you want to show</value>
</property>
you almost can't find it in any doucument...
Sorry, feels like it's been a time. But still I'd like to share my answer:
the root cause is from hadoop/etc/hadoop/hdfs-site.xml:
the xml file has a property named dfs.datanode.data.dir. If you set all the datanodes with the same name, then hadoop is assuming the cluster has only one datanode. So the proper way of doing it is naming every datanode with a unique name:
Regards,
YUN HANXUAN
Your admin report looks absolutely fine. Please run the below to check the HDFS disk space details.
"hdfs dfs -df /"
If you still see the size being good, its just a UI glitch.
My Problems: I have 1 master node and 3 slave nodes. when I start all nodes by start-all. sh and accessing the dashboard of master nodes. I was able to see only one data node on the web UI.
My Solution:
Try to stop the Firewall temporary by sudo systemctl stop firewalld. if you do not want to stop your firewalld service then r allow the ports of the data node by
sudo firewall-cmd --permanent --add-port{PORT_Number/tcp,PORT_number2/tcp} ; sudo firewall-cmd --reload
If you are using sapretae user for Hadoop in my case I am using hadoop user to manage hadoop daemons then change the owner on your dataNode and nameNode file by. sudo chown hadoop:hadoop /opt/data -R
My hdfs-site.xml config as given in image
Check your daemons on data node by jps command. it should show as given in the below image.
jps Output

Not able to deploy workers on Spark-1.2.0

I am new to spark and using spark-1.2.0 with hadoop 2.4.1. I have set up master and four slave nodes. But two of my nodes are not starting.
I have defined IP addresses of nodes in slaves file in spark-1.2.0/conf/ directory.
But when I try to run ./sbin/start-all.sh the error is as follows :
failed to launch org.apache.spark.deploy.worker.Worker
could not find or load main class org.apache.spark.deploy.worker.Worker
This is happening for two nodes. Other two are working fine.
I've also setup spark-env.sh in master as well as slaves. The master also has passwordless ssh connectiviy to the slaves.
I've also tried doing ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT
It gives out the same error as before. Can someone help me with this. Where am I doing mistake?
So I figured out the solution. For all those who are starting new with spark, please check all the jar files in lib folder. I had spark-assembly-1.2.0-hadoop2.4.0.jar file missing in my slave.
I also encountered the same issue. If this is localmode cluster setup then you can run instead:
./sbin/start-master.sh
./sbin/start-slave.sh spark://localhost:7077
Then run:
MASTER=spark://localhost:7077 ./bin/pyspark
I was able to execute my jobs on the shell.
Do remember to setup up conf/slaves and conf/spark-env.sh as per here:
http://pulasthisupun.blogspot.com/2013/11/how-to-set-up-apache-spark-cluster-in.html
Also change localhost to your hostname.

How to connect mac to hadoop/hdfs cluster

I have CDH for running in a cluster and I have ssh access to the machine. I need to connect my Mac to Cluster, so if I do hadoop fs -ls , it should show me the content of the cluster.
I have configured HADOOP_CONF to point to the configuration of the cluster. I am running CDH4 in my cluster. Am I missing something here , Is it possible to connect ?
Is there some ssh key setup that I need to do ?
There are a few of things you will need to ensure to do this:
You need to set your HADOOP_CONF_DIR environment variable to point to a directory that carries config XMLs that point to your cluster.
Your Mac should be able to directly access the hosts that form your cluster (all of them). This can be done via VPN, for example - if the cluster is secured from external networks.
Your Mac should carry the same version of Hadoop that the cluster runs.

Resources