Failed to read data from a hadoop URL - hadoop

I am following Tom White's 'Hadoop - The Definitive Guide'.
When I try to use the Java Interface to read data from a hadoop URL I get the following error message:
hadoop#ubuntu:/usr/local/hadoop$ hadoop URLCat hdfs://master/hdfs/data/SampleText.txt
12/11/21 13:46:32 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 0 time(s).
12/11/21 13:46:33 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 1 time(s).
12/11/21 13:46:34 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 2 time(s).
12/11/21 13:46:35 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 3 time(s).
12/11/21 13:46:36 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 4 time(s).
12/11/21 13:46:37 INFO ipc.Client: Retrying connect to server: master/192.168.9.55:8020. Already tried 5 time(s).
The Contents of the URLCat file are as follows:
import java.net.URL;
import java.io.InputStream;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
public class URLCat {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
InputStream in = null;
try {
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
}
}
The /etc/hosts file contents are:
127.0.0.1 localhost
127.0.1.1 ubuntu.ubuntu-domain ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# /ect/hosts Master and slaves
192.168.9.55 master
192.168.9.56 slave1
192.168.9.57 slave2
192.168.9.58 slave3

First I'd check whether the Hadoop daemons are running. A convenient tool is jps . Make sure that (at least) the namenode and the datanodes are running.
If you still can't connect, check whether the url is correct. As you provided hdfs://master/ (without any port number) Hadoop assumes that your namenode listens on port 8020 (default). This is what you see in the logs.
For a quick lookup in core-site.xml (fs.default.name) you can check whether you have a custom port defined for the filesystem URI (in this case 54310).

Related

Hadoop datanode -> namenode communication issue

I have a Vagrant machine running a local Hadoop installation. Hadoop was working fine until today. Today Vagrant's insecure SSH key stopped working so I had to replace it. Now Hadoop is not working. In the logs I see:
17/09/18 09:35:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 0 time(s); maxRetries=45
17/09/18 09:36:01 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 1 time(s); maxRetries=45
17/09/18 09:36:21 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 2 time(s); maxRetries=45
17/09/18 09:36:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 3 time(s); maxRetries=45
The claim here is that it's a datanode -> namenode communication issue. core-site.xml contains:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mymachine:8020</value>
</property>
</configuration>
Which is correct. Trying getent hosts mymachine yields 192.168.33.10, which means the host is ok. I tried sudo netstat -antp | grep 8020 and got:
tcp 0 1 10.0.2.15:42002 192.168.33.10:8020 SYN_SENT 2630/java
tcp 0 1 10.0.2.15:42004 192.168.33.10:8020 SYN_SENT 2772/java
tcp 0 1 10.0.2.15:41998 192.168.33.10:8020 SYN_SENT 3312/java
So it appears that the port is also ok. However, when I do curl http://mymachine:8020 I get no reply. I checked on an identical machine and the correct reply should be It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon..
Any ideas?
There are some answers in my opinion following:
1. Check if you can "ssh" to localhost without the password;
2. Check the authority when you start the hadoop;
3. It should be 127.0.0.1:8020 if running a local hadoop on your machine.Because the hadoop may run rightly while the network disconnecting...

Hadoop Multi-Node Cluster: exception: java.net.ConnectException: Connection refused

I have set up 4 nodes hadoop cluster using http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/:
Namenode: node04
Datanode: node01
Datanode: node02
Datanode: node03
I can see only two nodes(node01,node03) running in my cluster. Node02 has an log with error message as:
2015-12-11 10:15:18,698 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-11 10:15:19,699 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-11 10:15:20,699 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node04/127.17.0.224:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Every nodes /etc/hosts contains following:
127.0.0.1 localhost
127.17.0.221 node01
127.17.0.222 node02
127.17.0.223 node03
127.17.0.224 node04
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
And /etc/hadoop/masters contains node04, /etc/hadoop/slaves contains node01 node02 and node03
Would you please help me understand how to get to it?
Thanks!
Perform these actions:
Go to node02 and run telnet node04 9000 and ping node04 commands
to confirm there is connectivity between node02 and node04
On all nodes check whether core-site.xml and hdfs-site.xml have the same contents
Check ssh and sshd
ssh connection between the nodes
Check port binding details (Hadoop Datanodes cannot find NameNode)
Also refer https://wiki.apache.org/hadoop/ServerNotAvailable

handoop connect error with put/copyFromLocal

I was following a tutorial to install hadoop: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Now I am stuck at the "Copy local example data to HDFS" step.
The connection error I get:
<12/10/26 17:29:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
12/10/26 17:29:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
12/10/26 17:29:18 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
12/10/26 17:29:19 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
12/10/26 17:29:20 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
12/10/26 17:29:21 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
12/10/26 17:29:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
12/10/26 17:29:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s).
12/10/26 17:29:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
12/10/26 17:29:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
which is pretty much the same to this question already:
Errors while running hadoop
The point now is, I have disabled the ivp6, as described there and in above tutorial, but it doesn't help. Is there something I have been missing?
EDIT:
I repeated the tutorial on a second machine with a freshly installed ubuntu and compared it step by step. It turned out, there was some bug in the bashrc configuration of the hduser. Afterwards it worked fine...
I get the exact error message if I try to do Hadoop fs <anything> when the DataNode/NameNode aren't running, so I would guess the same is happening for you.
Type jps in your terminal. If everything is running, it should look like:
16022 DataNode
16524 Jps
15434 TaskTracker
15223 JobTracker
15810 NameNode
16229 SecondaryNameNode
I would wager that you're DataNode or NameNode isn't running. If anything is missing from jps's print out, start it again.
after the whole configuration give this command
hadoop namenode -formate
and the start all services by this command
start-all.sh
this will solve your problem
go to your etc/hadoop/core-site.xml. check the value for fs.default.name
It should be as shown below.
{
fs.default.name
hdfs://localhost:54310
}
after the whole configuration give this command
hadoop namenode -format
the start all services by this command
start-all.sh
this will solve your problem .
Your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
then follow step - 2 and step -3

unable to check nodes on hadoop [Connection refused]

If I type http://localhost:50070 or http://localhost:9000 to see the nodes,my browser shows me nothing I think it can't connect to the server.
I tested my hadoop with this command:
hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
but too didn't work and it tries to connect to the server,this is the output:
12/06/06 17:25:24 INFO mapred.FileInputFormat: nrFiles = 10
12/06/06 17:25:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000
12/06/06 17:25:24 INFO mapred.FileInputFormat: bufferSize = 1000000
12/06/06 17:25:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
12/06/06 17:25:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
12/06/06 17:25:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
12/06/06 17:25:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
12/06/06 17:25:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
12/06/06 17:25:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
12/06/06 17:25:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
12/06/06 17:25:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
12/06/06 17:25:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
12/06/06 17:25:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
I changed some files like this:
in conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
in conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
</configuration>
in conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Thanks for your attention. If I run this command:
cat /etc/hosts
I see:
127.0.0.1 localhost
127.0.1.1 ubuntu.ubuntu-domain ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
and if i run this one:
ps axww | grep hadoop
I see this result:
2170 pts/0 S+ 0:00 grep --color=auto hadoop
but no effect. Have you any idea, how can I solve my problem?
There are few things that you need to take care of before starting hadoop services.
Check what this returns:
hostname --fqdn
In your case this should be localhost.
Also comment out IPV6 in /etc/hosts.
Did you format the namenode before starting HDFS.
hadoop namenode -format
How did you install Hadoop. Location of log files will depend on that. Usually it is in location "/var/log/hadoop/" if you have used cloudera's distribution.
If you are a complete newbie, I suggest installing Hadoop using Cloudera SCM which is quite easy. I have posted my approach in installing Hadoop with Cloudera's distribution.
Also
Make sure DFS location has a write permission. It usually sits # /usr/local/hadoop_store/hdfs
That is a common reason.
same problem i got and this solved my problem:
problem lies with the permission given to the folders
"chmod" 755 or greater for the folders
/home/username/hadoop/*
Another possibility is the namenode is not running.
You can remove the HDFS files:
rm -rf /tmp/hadoop*
Reformat the HDFS
bin/hadoop namenode -format
And restart hadoop services
bin/hadoop/start-all.sh (Hadoop 1.x)
or
sbin/hadoop/start-all.sh (Hadoop 2.x)
also edit your /etc/hosts file and change 127.0.1.1 to 127.0.0.1...proper dns resolution is very important for hadoop and a bit tricky too..also add following property in your core-site.xml file -
<property>
<name>hadoop.tmp.dir</name>
<value>/path_to_temp_directory</value>
</property>
the default location for this property is /tmp directory which get emptied after each system restart..so you loose all your info at each restart..also add these properties in your hdfs-site.xml file -
<property>
<name>dfs.name.dir</name>
<value>/path_to_name_directory</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/path_to_data_directory</value>
</property>
I am assuming that is your first installation of hadoop.
At the beginning please check if your daemons are working. To do that use (in terminal):
jps
If only jps appears that means all daemons are down. Please check the log files. Especially the namenode. Log folder is probably somewhere there /usr/lib/hadoop/logs
If you have some permission problems. Use this guide during the installation.
Good installation guide
I am shooting with this explanations but these are most common problems.
Hi Edit your core conf/core-site.xml and change localhost to 0.0.0.0. Use the conf below. That should work.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
</property>

Errors while running hadoop

haduser#user-laptop:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/input
/user/haduser/input
11/12/14 14:21:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
11/12/14 14:21:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
11/12/14 14:21:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
11/12/14 14:21:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
11/12/14 14:21:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
11/12/14 14:21:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
11/12/14 14:21:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
11/12/14 14:21:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. -Already tried 7 time(s).
11/12/14 14:21:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
11/12/14 14:21:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
I am getting the above errors when I'm trying to copy files from /tmp/input to /user/haduser/input even though the file /etc/hosts contain entry for localhost.
When the jps command is run, the TaskTracker and the namenode are not listed.
What could be the problem? Please someone help me with this.
I had similar issues - Actually Hadoop was binding to IPv6.
Then I Added - "export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true " to $HADOOP_HOME/conf/hadoop-env.sh
Hadoop was binding to IPv6 even when I had disabled IPv6 on my system.
Once I added it to env, started working fine.
Hope this helps someone.
Try to do ssh to your local system using the IP, in this case:
$ ssh 127.0.0.1
Once you are able to do the ssh successfully. Run the below command to know the list of open ports
~$ lsof -i
look for a listening connector with name: localhost:< PORTNAME > (LISTEN)
copy this < PORTNAME > and replace the existing value of port number in tag of fs.default.name property in your core-site.xml in the hadoop conf folder
save the core-site.xml, this should resolve the issue.
NameNode (NN) maintains the namespace for HDFS and it should be running for filesystem operations on HDFS. Check the logs why the NN hasn't started. TaskTracker is not required for operations on HDFS, only NN and DN are sufficient. Check the http://goo.gl/8ogSk and http://goo.gl/NIWoK tutorials on how to setup Hadoop on a single and multi node.
All the files in the bin are exectuables. Just copy the command and paste it in the terminal. Make sure the address is right, i.e. the user must be replaced by something. That would do the trick.

Resources