Everything else starting on Hadoop pseudo-distributed except namenode - hadoop

I have Hadoop 2.9.0 on Ubuntu at
/usr/local/hadoop
But when I try start-dfs.sh
No error is shown while starting namenode
But when I type jps, only
10900 SecondaryNameNode
11047 Jps
10696 DataNode
Seams to have started, not namenode
Things tried:
=> Removed temp files and formatted namenode hadoop namenode -format
terminal:
blaze#blazian:/tmp$ start-dfs.sh
Starting namenodes on [localhost]
blaze#localhost's password:
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-blaze-namenode-blazian.out
blaze#localhost's password:
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-blaze-datanode-blazian.out
Starting secondary namenodes [0.0.0.0]
blaze#0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-blaze-secondarynamenode-blazian.out
blaze#blazian:/tmp$ jps
10900 SecondaryNameNode
11047 Jps
10696 DataNode

You don't have a SSH setup with localhost. Please follow these steps and you'll be able to run the namenode.
Go to your system terminal and type:
cd(It will redirect you to ~)
ssh-keygen(hit enter three times and it will create a .ssh directory in ~)
cat id_rsa.pub >> authorized_keys(it will make sure that your localhost is the trusted source and give permission to make
passwordless ssh.)
Then simply run start-all.sh and you're all set.

Related

Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error

vijay#ubuntu:~$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as vijay in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: namenode is running as process 22733. Stop it first and ensure /tmp/hadoop-vijay-namenode.pid file is empty before retry.
Starting datanodes
localhost: datanode is running as process 22866. Stop it first and ensure /tmp/hadoop-vijay-datanode.pid file is empty before retry.
Starting secondary namenodes [ubuntu]
ubuntu: secondarynamenode is running as process 23072. Stop it first and ensure /tmp/hadoop-vijay-secondarynamenode.pid file is empty before retry.
Starting resourcemanager
Starting nodemanagers
vijay#ubuntu:~$ jps
23072 SecondaryNameNode
22866 DataNode
22733 NameNode
24447 Jps
enter image description here
I am facing hadoop web console error
Currently installed java version "19.0.1" 2022-10-18 and Hadoop 3.3.4

How to start hadoop without asking local machine password?

amtex#amtex-desktop:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
amtex#localhost's password:
localhost: starting namenode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-namenode-amtex-desktop.out
amtex#localhost's password:
localhost: starting datanode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-datanode-amtex-desktop.out
Starting secondary namenodes [0.0.0.0]
amtex#0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-secondarynamenode-amtex-desktop.out
starting yarn daemons
starting resourcemanager, logging to /home/amtex/Documents/installed/hadoop/logs/yarn-amtex-resourcemanager-amtex-desktop.out
amtex#localhost's password:
localhost: starting nodemanager, logging to /home/amtex/Documents/installed/hadoop/logs/yarn-amtex-nodemanager-amtex-desktop.out
amtex#amtex-desktop:~$ jps
2404 Startup
18244 DataNode
18580 ResourceManager
18101 NameNode
18889 NodeManager
18425 SecondaryNameNode
18924 Jps
You need set password less login between machine
Below link has the step by step procedure to setup ssh password less login
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
Hope this helps!!!...
Based on my research i follow these steps to avoid the above problem
step 1: ssh-keygen -t rsa -P ""
step 2: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
and now i started hadoop
amtex#amtex-desktop:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-namenode-amtex-desktop.out
localhost: starting datanode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-datanode-amtex-desktop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/amtex/Documents/installed/hadoop/logs/hadoop-amtex-secondarynamenode-amtex-desktop.out
starting yarn daemons
starting resourcemanager, logging to /home/amtex/Documents/installed/hadoop/logs/yarn-amtex-resourcemanager-amtex-desktop.out
localhost: starting nodemanager, logging to /home/amtex/Documents/installed/hadoop/logs/yarn-amtex-nodemanager-amtex-desktop.out
amtex#amtex-desktop:~$ start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/amtex/Documents/installed/spark/logs/spark-amtex-org.apache.spark.deploy.master.Master-1-amtex-desktop.out
amtex#amtex-desktop:~$ start-slaves.sh
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/amtex/Documents/installed/spark/logs/spark-amtex-org.apache.spark.deploy.worker.Worker-1-amtex-desktop.out
amtex#amtex-desktop:~$ jps
21523 Jps
2404 Startup
21029 NodeManager
20581 DataNode
20439 NameNode
20760 SecondaryNameNode
21353 Master
21466 Worker
20911 ResourceManager

Hadoop "not getting namenode with jps command" why namenode is not starting

hadoop1#xyzfsdemo:/usr/local/hadoop$ sbin/start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop1-namenode-xyzfsdemo.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop1-datanode-xyzfsdemo.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop1-secondarynamenode-xyzfsdemo.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop1-resourcemanager-xyzfsdemo.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop1-nodemanager-xyzfsdemo.out
hadoop1#xyzfsdemo:/usr/local/hadoop$ jps
52396 DataNode
53151 Jps
53072 NodeManager
52660 SecondaryNameNode
52860 ResourceManager
you should format your namenode every time before starting that.
$HADOOP_INSTALL/hadoop/bin/hadoop namenode -format

Running Hadoop in full-distributed mode in a 5-machines cluster takes more time than in a single machine

I am running hadoop in a cluster of 5 machines (1 master and 4 slaves). I am running a map-reduce algorithm for friends-in-common recommandation, and I am using a file with 49995 lines (or 49995 people each one followed by his friends).
The problem is that it takes more time to execute the algorithm on the cluster than on one machine !!
I don't know if this is normal because the file is not big enough (and thus the time is slower due to latency between machines) or that I must change something to run the algorithm in parallel on the different nodes, but I think this is done automatically.
Typically, running the algorithm on one machine takes this:
real 3m10.044s
user 2m53.766s
sys 0m4.531s
While on the cluster it takes this time:
real 3m32.727s
user 3m10.229s
sys 0m5.545s
Here is the output when I execute the start_all.sh script on the master:
ubuntu#ip:/usr/local/hadoop-2.6.0$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-namenode-ip-172-31-37-184.out
slave1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-slave1.out
slave2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-slave2.out
slave3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-slave3.out
slave4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-slave4.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-ubuntu-secondarynamenode-ip-172-31-37-184.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-ubuntu-resourcemanager-ip-172-31-37-184.out
slave4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-slave4.out
slave1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-slave1.out
slave3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-slave3.out
slave2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-slave2.out
And here is the output when I execute the stop_all.sh script:
ubuntu#ip:/usr/local/hadoop-2.6.0$ sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave4: no datanode to stop
slave3: stopping datanode
slave1: stopping datanode
slave2: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave2: no nodemanager to stop
slave3: no nodemanager to stop
slave4: no nodemanager to stop
slave1: no nodemanager to stop
no proxyserver to stop
Thank you in advance !
One possible reason is that your file is not uploaded on the HDFS. In other words it is stored on a single machine, and all the other running machines have to get their data from that machine.
Before you run your mapreduce program. You can do the following steps:
1- Make sure that the HDFS is up and running. Open the link:
master:50070
Where master is the IP for the node running the namenode, and check on that link that you have all the nodes live and running. So if you have 4 datanodes you should see: datanodes (4 live).
2- Call:
hdfs dfs -put yourfile /someFolderOnHDFS/yourfile
That way you have uploaded your input file to the HDFS and the data is now distributed among multiple nodes.
Try running your program now and see if it is faster
Best of luck

unable to initialized namenode ,datanode,jobtracker,tasktracker in cenos

when i give the command
for service in /etc/init.d/hadoop*
>do
>sudo $service stop
>done
its stops all the service
and when i give
for service in /etc/init.d/hadoop-hdfs-*
>do
>sudo $service stop
>done
its stops all the service
it sometimes start datanode and sometimes namenode
eg:
21270 NameNode
21422 Jps
21374 SecondaryNameNode
2624 HMaster
or
11070 DataNode
11422 Jps
11554 SecondaryNameNode
2554 HMaster
same thing happens for jobtracker and tasktracker
I tried formating the namenode but it didnt help
I also changing the path of localhost in
core-site.xml from 8020 to 50020
and also in mapred-site.xml from 8021 to 50020
this time it shows NameNode, DataNode, JobTracker,Tasktracker using jps
but when i check the browser localhost:50070 and localhost:50030
it refers to 8020 instead of 50020.
why is this happening ?
please help
Run the following script from terminal to stop the running hadoop daemons.
> $HADOOP_INSTALL/hadoop/bin/stop-all.sh
Run the following script from terminal to start the hadoop daemons.
$HADOOP_INSTALL/hadoop/bin/start-all.sh

Resources