Error in loading hadoop distributed file system - hadoop

I installed Hadoop-3.3.4 in Ubuntu-20. I wrote the command for starting hadoop, i.e.
samar#pc:~$ $HADOOP_HOME/sbin/start-all.sh
Then it showed the output as.
WARNING: Attempting to start all Apache Hadoop daemons as samar in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [pc]
Starting resourcemanager
Starting nodemanagers
But when I tried to access the HDFS with the command
samar#pc:~$ hdfs dfs -ls
It gave a message as:
ls: Call From pc/127.0.1.1 to localhost:9000 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
and the output of jps was:
10485 Jps
10101 NodeManager
9946 ResourceManager
9739 SecondaryNameNode
9533 DataNode

Namenode did not start successfully (9000 is namenodes services port)
Are there more logs?

Related

Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error

vijay#ubuntu:~$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as vijay in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: namenode is running as process 22733. Stop it first and ensure /tmp/hadoop-vijay-namenode.pid file is empty before retry.
Starting datanodes
localhost: datanode is running as process 22866. Stop it first and ensure /tmp/hadoop-vijay-datanode.pid file is empty before retry.
Starting secondary namenodes [ubuntu]
ubuntu: secondarynamenode is running as process 23072. Stop it first and ensure /tmp/hadoop-vijay-secondarynamenode.pid file is empty before retry.
Starting resourcemanager
Starting nodemanagers
vijay#ubuntu:~$ jps
23072 SecondaryNameNode
22866 DataNode
22733 NameNode
24447 Jps
enter image description here
I am facing hadoop web console error
Currently installed java version "19.0.1" 2022-10-18 and Hadoop 3.3.4

How to install Hadoop on M1 Mac

I followed serveral tuitorial and everytime I start Hadoop will have these
feiyechen#FEIYEdeMac-mini ~ % start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as feiyechen in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
localhost: datanode is running as process 55832. Stop it first and ensure /tmp/hadoop-feiyechen-datanode.pid file is empty before retry.
Starting secondary namenodes [FEIYEdeMac-mini.local]
FEIYEdeMac-mini.local: secondarynamenode is running as process 55966. Stop it first and ensure /tmp/hadoop-feiyechen-secondarynamenode.pid file is empty before retry.
2022-01-28 20:35:24,311 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
feiyechen#FEIYEdeMac-mini ~ % jps
55832 DataNode
57838 Jps
55966 SecondaryNameNode
57247 NameNode
Tutorial said should got these after run jps
I only have 4 items: DataNode, Jps, SecondaryNameNode, NameNode. Is that mean I failed?
It means you have a running HDFS installation, but not YARN.
You should be able to run start-yarn.sh separately if you want the ResourceManger + NodeManager
Otherwise, there are log files created for both the YARN processes that would include information about why they are failing.

Everything else starting on Hadoop pseudo-distributed except namenode

I have Hadoop 2.9.0 on Ubuntu at
/usr/local/hadoop
But when I try start-dfs.sh
No error is shown while starting namenode
But when I type jps, only
10900 SecondaryNameNode
11047 Jps
10696 DataNode
Seams to have started, not namenode
Things tried:
=> Removed temp files and formatted namenode hadoop namenode -format
terminal:
blaze#blazian:/tmp$ start-dfs.sh
Starting namenodes on [localhost]
blaze#localhost's password:
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-blaze-namenode-blazian.out
blaze#localhost's password:
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-blaze-datanode-blazian.out
Starting secondary namenodes [0.0.0.0]
blaze#0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-blaze-secondarynamenode-blazian.out
blaze#blazian:/tmp$ jps
10900 SecondaryNameNode
11047 Jps
10696 DataNode
You don't have a SSH setup with localhost. Please follow these steps and you'll be able to run the namenode.
Go to your system terminal and type:
cd(It will redirect you to ~)
ssh-keygen(hit enter three times and it will create a .ssh directory in ~)
cat id_rsa.pub >> authorized_keys(it will make sure that your localhost is the trusted source and give permission to make
passwordless ssh.)
Then simply run start-all.sh and you're all set.

java.net.ConnectException: Connection refused when trying to use hdfs

I find a problem when I try to use hadoop hdfs command:
root#ec2-35-205-125-85:~# hdfs dfs -copyFromLocal ~/input/ ~/input/
copyFromLocal: Call From ip-172-32-5-110.us-west-2.compute.internal/172.32.5.110 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
This problem happen not just for -copyFromLocal but for all command start with hdfs, for example -ls, -mkdir.....
After the following code the problem solved:
bash /usr/local/hadoop/sbin/start-all.sh
And after this code the all subnodes should've run, I check it with jps, it show the following:
2033 SecondaryNameNode
2778 Jps
2325 NodeManager
2195 ResourceManager
1691 NameNode
After that run:
hdfs namenode -format
And then the warning message has gone.

Hadoop: no node available for block blk_-5883966349607013512_1099

I am very new to Hadoop. I start Hadoop with the following command...
[gpadmin#BigData1-ahandler root]$ /usr/local/hadoop-0.20.1/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-0.20.1/logs/hadoop-gpadmin-namenode-BigData1-ahandler.out
localhost: starting datanode, logging to /usr/local/hadoop-0.20.1/logs/hadoop-gpadmin-datanode-BigData1-ahandler.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-0.20.1/logs/hadoop-gpadmin-secondarynamenode-BigData1-ahandler.out
starting jobtracker, logging to /usr/local/hadoop-0.20.1/logs/hadoop-gpadmin-jobtracker-BigData1-ahandler.out
localhost: starting tasktracker, logging to /usr/local/hadoop-0.20.1/logs/hadoop-gpadmin-tasktracker-BigData1-ahandler.out
When I try to -cat the output from the following directory, I get an error: "no node available". What does this error mean? How can I fix it? Or start debuging it?
[gpadmin#BigData1-ahandler root]$ hadoop fs -cat output/d*/part-*
13/11/13 15:33:09 INFO hdfs.DFSClient: No node available for block: blk_-5883966349607013512_1099 file=/user/gpadmin/output/d15795/part-00000
13/11/13 15:33:09 INFO hdfs.DFSClient: Could not obtain block blk_-5883966349607013512_1099 from any node: java.io.IOException: No live nodes contain current block
This happens when you start the datanodes before the namenode.
When the datanodes start before the namenode starts, the datanode services try to check in to the namenode & fail saying "namenode not found". Then once the namenode starts, it has no datanodes checked in, therefore it cannot find the node on which the block of data being accessed is located.
You should go through the script start-all.sh and make sure that the namenode starts before the datanodes.

Resources