Why does Yarn ResourceManager always shutdown as I submit a job? - hadoop

I am now learning how to build a Hadoop cluster and the first step is to try a Pseudo-Distributed cluster following the guide of https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation. And I succeeded to start yarn by call $HADOOP_HOME/sbin/start-dfs.sh and $HADOOP_HOME/sbin/start-yarn.sh. The output of jps is
However, if I submit a job, which does nothing though, the ResouceManager shutdown immediately.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount input output
The output of console is
and the log is
The result of strace to ResoruceManager is
+++ killed by SIGKILL +++
I struggled days and have not figured it out. Any insight advice would be welcome.
Oh I forgot to leave the version:
Hadoop 3.3.1
WSL: 2, Ubuntu 20.04
Windows 11: 22518.1000

Related

hadoop 2.9.1 failed to start DataNode

I'm new with Apache Hadoop and I'm trying to install in Alpine (docker container) in pseudo-distribuited mode Apache Hadoop 2.9.1 but I got this error when I run start-dfs.sh
localhost: /usr/local/hadoop/sbin/hadoop-daemon.sh: line 131: 883 Aborted (core dumped) nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$#" > "$log" 2>&1 < /dev/null
The NameNode and SecondaryNameNode start succesfully but the DataNode no.
I had the exact same problem and also every version >2.9.1 ended in a quick core dump of a the DataNode in Docker.
The comment from #OneCricketeer actually lead me to the right direction which should have been an acceptable answer - so here is a quick heads up for future users:
Apparently, components of the DataNode won't work well with Alpine/Musl and switching to e.g. an Ubuntu-based parent image like 8-jdk solves this problem.
Here is a link to the Dockerfile I am currently using.

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop
I do get the rest of the processes while i run the "JPS" command in unix.
Not sure why i am not being shown with the job tracker and task tracker.Have been following couple of links and couldn't get my prob sorted.
Steps done :
-Multiple times formatted the namenode
-Multiple time deleted and recreated the tmp folder with appropriate permissions.
What could be the issue ?
Any suggestions would really help me as i am struggling in setting up hadoop on my laptop.I am new to it though.
Try starting jobtracker and tasktracker separately.
From your hadoop HOME directory run
. bin/../libexec/hadoop-config.sh
Then from hadoop BIN directory run
hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
You must have been using hadoop 2.x version where jobtracker is replaced with YARN resource manager. Using jps(jdk is needed) you can check whether resouce manager is running. If it is running then the default url for it is (host-name):8088. You can check your nodes,jobs also configuration there.If not running then start them with sbin/start-yarn.sh.

How to check if hdfs is running?

I would like to see if the hdfs file system for Hadoop is working properly. I know that jps lists the daemons that are running, but I don't actually know which daemons to look for.
I ran the following commands:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
Only namenode, resourcemanager, and nodemanager appeared when I entered jps.
Which daemons are supposed to be running in order for hdfs/Hadoop to function? Also, what could you do to fix hdfs if it is not running?
Use any of the following approaches for to check your deamons status
JPS command would list all active deamons
the below is the most appropriate
hadoop dfsadmin -report
This would list down details of datanodes which is basically in a sense your HDFS
cat any file available in hdfs path.
So, I spent two weeks validating my setup (it was fine) , finally found this command:
sudo -u hdfs jps
Initially my simple JPS command was showing only one process, but Hadoop 2.6 under Ubuntu LTS 14.04 was up. I was using 'Sudo' to run the startup scripts.
Here is the startup that work with JPS listing multiple processes:
sudo su hduser
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh

Hadoop Installation: Format Namenode

I'm struggling with installing Hadoop 2.2.0 on my Mac OSX 10.9.3. I essentially followed this tutorial:
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
When I run $HADOOP_PREFIX/bin/hdfs namenode -format to format namenode, I get the message:
SHUTDOWN_MSG: Shutting down NameNode at Macintosh.local/192.168.0.103. I believe this is preventing me from successfully running the test
$HADOOP_PREFIX/bin/hadoop jar $HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-
distributedshell-2.2.0.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar
$HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar --
shell_command date --num_containers 2 --master_memory 1024
Does anyone know how to correctly format namenode?
(Regarding the test command above, someone mentioned to me that it could have something to do with the hdfs file system not functioning properly, if this is relevant.)

Hadoop Configuration

I have started configuring Hadoop 2.1.0-beta version for single node. I followed steps mentioned in Michael Noll's Tutorial (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#configuring-single-node-clusters-first). Every thing I did and configured well. As a result of JPS, I got that NameNode, DataNode, Secondary NameNode started fine. Then I found out that there is no start-mapred.sh script. So I tried starting the jobtracker using hadoop-daemons.sh (hadoop-daemon.sh --config /home/nayan/dev/hadoop/etc/hadoop/ start jobtracker) and it resulted in failure with message "Sorry, the jobtracker command is no longer supported. You may find similar functionality with the "yarn" shell command.". I do not know what all configuration changes (if any) I need to make. I made changes in "yarn-site.xml" file, as suggested in Hadoop:The Definitive Guide. But could not proceed further. Where can I find out about Yarn. I checked Apache site, but could not figure it out.
You need to check your configuration xml files. Sometimes if you have any problrm in xml then some daemons wont start.
and try to use ./start-all.sh and then JPS
you can use start-yarn.sh to start the ResourceManger and Jobtracker daemons
I usually start everything using these two commands
./start-dfs.sh
./start-yarn.sh
You Should use start-dfs.sh for Hdfs Daemons and start-yarn.sh for Resource manager and nodemanager daemon both are in /bin of hadoop.
./start-dfs.sh or start-dfs.sh will start only HDFS components , while ./start-yarn.sh or start-yarn.sh will start Yarn component like NodeManager , Resource manager etc. If you don't want to start both the components separately , try using this command :
./start-all.sh or start-all.sh (This is deprecated command though).
To answer your question , use ./start-yarn.sh
Cheers!
First have to start the yarn daemons in the YARN( HADOOP 2.x) Environment.
So start with this
at /hadoop_installed_path/sbin$ ./start-yarn.sh
Once the yarn daemons started then we can start df daemons
at /hadoop_installed_path/sbin$ ./start-dfs.sh
1.You should check all the steps in Hadoop The definitive guide.
if it's all proper than use start-all.sh
than run jps.
2.some time You have to close console for reflecting your changes.so close the console and reopen it again and then try jps,
hope this will help.

Resources