Hadoop Data node IP isn't a real VM - hadoop

I'm currently running a hadoop setup with a Namenode(master-node - 10.0.1.86) and a Datanode(node1 - 10.0.1.85) using two centOS VM's.
When I run a hive query that starts a mapReduce job, I get the following error:
"Application application_1515705541639_0001 failed 2 times due to
Error launching appattempt_1515705541639_0001_000002. Got exception:
java.net.NoRouteToHostException: No Route to Host from
localhost.localdomain/127.0.0.1 to 10.0.2.62:48955 failed on socket
timeout exception: java.net.NoRouteToHostException: No route to host;
For more details see: http://wiki.apache.org/hadoop/NoRouteToHost"
Where on earth is this IP of 10.0.2.62 coming from? Here is an example of what I am seeing.
This IP does not exist on my network. You can not reach it through ping of telnet.
I have gone through all my config files on both master-node and node1 and I cannot find where it is picking up this IP. I've stopped/started both hdfs and yarn and rebooted both the VM's. Both /etc/host files are how they should be. Any general direction on where to look next would be appreciated, I am stumped!

Didn't have any luck on discovering where this rogue IP was coming from. I ended up assigning the VM the IP address that the node-master was looking for. Sure enough all works fine.

Related

Error when starting HDFS in Cloudera Manager - Address already in use when trying to bind to '/var/hdfs-sockets/dn'

I am getting error after installation and I am not able to start HDFS data node.
I am always getting Error:
Exception in secureMain
java.net.BindException: bind(2) error: Address already in use when trying to bind to '/var/hdfs-sockets/dn'
at org.apache.hadoop.net.unix.DomainSocket.bind0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:191)
I checked and with netstat I am not getting that something is busy on port 50010 since Data Node is run on 50010
Opened streaming server at /10.0.9.6:50010.
I tried by setting the parameter dfs.domain.socket.path to different paths:
/var/hdfs-sockets/dn
and
/var/hdfs-sockets
These folders are created on NameNode servers but also I created it on DataNode server.
I tried by setting it in the ownership of the root user but also and to cloudera-scm user.
And the same error is always thrown.
Can someone please provide me answer how to resolve this kind of error which is always thrown when trying to start HDFS since I am not able to continue further?

Hadoop:hadoop fs -put error MSG:[ There are 2 datanode(s) running and 2 node(s) are excluded in this operation.]

enter image description here
I have installed the hadoop 2.6.5,when i try to put file from local to hdfs ,there comes this Exception ,i don't know how to solve this problem !!need hlep...
This is going to be a networking problem. The client process (where you ran the hdfs dfs -put command) failed to connect to the DataNode hosts. I can tell from the stack trace that at this point, you have already passed the point of interacting with the NameNode, so connectivity from client to NameNode is fine.
I recommend handling this as a basic network connectivity troubleshooting problem between client and all DataNode hosts. Use tools like ping or nc or telnet to test connectivity. If basic connectivity fails, then resolve it by fixing network configuration.

Hadoop | Archlinux | DFS: cannot launch start-dfs.sh

I have issue with dfs in hadoop. Does somebody know how to solve my problem?
[hduser#evghost ~]$ start-dfs.sh
Starting namenodes on [evghost]
Error: Please specify one of --hosts or --hostnames options and not both.
evghost: starting datanode, logging to /usr/lib/hadoop-2.7.1/logs/hadoop-hduser-datanode-evghost.out
Starting secondary namenodes [0.0.0.0]
Error: Please specify one of --hosts or --hostnames options and not both.
As you can see here is something with hosts and hostname. I don't know what to do here about 2 days... I didn't find any solution of this problem in internet, help me please.
It's issue with DNS server. If you have a hostname not like 'localhost' you'll not to be able to deploy a pseudo mode for dfs because DNS won't give you ip address from your request domain name. Here i had a hostname evghost, lets look:
[main#evghost ~]$ host evghost
Host evghost not found: 3(NXDOMAIN)
DNS didn't get answer to you. Noway to deal with it, but you can set up your own dns server in your PC. Much pain, but i think it can works.
Solution is to post
localhost
in /etc/hostname and NOT another!
I spend 2 days to understand that, hate this technology and like it together.

java.net.ConnectException: Connection refused error when running Hive

I'm trying work through a hive tutorial in which I enter the following:
load data local inpath '/usr/local/Cellar/hive/0.11.0/libexec/examples/files/kv1.txt' overwrite into table pokes;
Thits results in the following error:
FAILED: RuntimeException java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
I see that there are some replies on SA having to do with configuring my ip address and local host, but I'm not familiar with the concepts in the answers. I'd appreciate anything you can tell me about the fundamentals of what causes this kind of answer and how to fix it. Thanks!
This is because hive is not able to contact your namenode
Check if your hadoop services has started properly.
Run the command jps to see what all services are running.
The reason why you get this error is that Hive needs hadoop as its base. So, you need to start Hadoop first.
Here are some steps.
Step1: download hadoop and unzip it
Step2: cd #your_hadoop_path
Step3: ./bin/hadoop namenode -format
Step4: ./sbin/start-all.sh
And then, go back to #your_hive_path and start hive again
Easy way i found to edit the /etc/hosts file. default it looks like
127.0.0.1 localhost
127.0.1.1 user_user_name
just edit and make 127.0.1.1 to 127.0.0.1 thats it , restart your shell and restart your cluster by start-all.sh
same question when set up hive.
solved by change my /etc/hostname
formerly it is my user_machine_name
after I changed it to localhost, then it went well
I guess it is because hadoop may want to resolve your hostname using this /etc/hostname file, but it directed it to your user_machine_name while the hadoop service is running on localhost
I was able to resolve the issue by executing the below command:
start-all.sh
This would ensure that the Hive service has started.
Then starting the Hive was straight forward.
I had a similar problem with a connection timeout:
WARN DFSClient: Failed to connect to /10.165.0.27:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
DFSClient was resolving nodes by internal IP. Here's the solution for this:
.config("spark.hadoop.dfs.client.use.datanode.hostname", "true")

HDFS error: could only be replicated to 0 nodes, instead of 1

I've created a ubuntu single node hadoop cluster in EC2.
Testing a simple file upload to hdfs works from the EC2 machine, but doesn't work from a machine outside of EC2.
I can browse the the filesystem through the web interface from the remote machine, and it shows one datanode which is reported as in service. Have opened all tcp ports in the security from 0 to 60000(!) so I don't think it's that.
I get the error
java.io.IOException: File /user/ubuntu/pies could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:928)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:811)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)
namenode log just gives the same error. Others don't seem to have anything interesting
Any ideas?
Cheers
WARNING: The following will destroy ALL data on HDFS. Do not execute the steps in this answer unless you do not care about destroying existing data!!
You should do this:
stop all hadoop services
delete dfs/name and dfs/data directories
hdfs namenode -format Answer with a capital Y
start hadoop services
Also, check the diskspace in your system and make sure the logs are not warning you about it.
This is your issue - the client can't communicate with the Datanode. Because the IP that the client received for the Datanode is an internal IP and not the public IP. Take a look at this
http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/
Look at the sourcecode from DFSClient$DFSOutputStrem (Hadoop 1.2.1)
//
// Connect to first DataNode in the list.
//
success = createBlockOutputStream(nodes, clientName, false);
if (!success) {
LOG.info("Abandoning " + block);
namenode.abandonBlock(block, src, clientName);
if (errorIndex < nodes.length) {
LOG.info("Excluding datanode " + nodes[errorIndex]);
excludedNodes.add(nodes[errorIndex]);
}
// Connection failed. Let's wait a little bit and retry
retry = true;
}
The key to understand here is that Namenode only provide the list of Datanodes to store the blocks. Namenode does not write the data to the Datanodes. It is the job of the Client to write the data to the Datanodes using the DFSOutputStream . Before any write can begin the above code make sure that the Client can communicate with the Datanode(s) and if the communication fails to the Datanode, the Datanode is added to the excludedNodes .
Look at following:
By seeing this exception(could only be replicated to 0 nodes, instead of 1), datanode is not available to Name Node..
This are the following cases Data Node may not available to Name Node
Data Node disk is Full
Data Node is Busy with block report and block scanning
If Block Size is Negative value(dfs.block.size in hdfs-site.xml)
while write in progress primary datanode goes down(Any n/w fluctations b/w Name Node and Data Node Machines)
when Ever we append any partial chunk and call sync for subsequent partial chunk appends client should store the previous data in buffer.
For example after appending "a" I have called sync and when I am trying the to append the buffer should have "ab"
And Server side when the chunk is not multiple of 512 then it will try to do Crc comparison for the data present in block file as well as crc present in metafile. But while constructing crc for the data present in block it is always comparing till the initial Offeset Or For more analysis Please the data node logs
Reference: http://www.mail-archive.com/hdfs-user#hadoop.apache.org/msg01374.html
I had a similar problem setting up a single node cluster. I realized that I didn't config any datanode. I added my hostname to conf/slaves, then it worked out. Hope it helps.
I'll try to describe my setup & solution:
My setup: RHEL 7, hadoop-2.7.3
I tried to setup standalone Operation first and then Pseudo-Distributed Operation where the latter failed with the same issue.
Although, when I start hadoop with:
sbin/start-dfs.sh
I got the following:
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-secondarynamenode-localhost.localdomain.out
which looks promising (starting datanode.. with no failures) - but the datanode wasn't exist indeed.
Another indication was to see that there is no datanode in operation (the below snapshot shows fixed working state):
I've fix that issue by doing:
rm -rf /tmp/hadoop-<user>/dfs/name
rm -rf /tmp/hadoop-<user>/dfs/data
and then start again:
sbin/start-dfs.sh
...
I had the same error on MacOS X 10.7 (hadoop-0.20.2-cdh3u0) due to data node not starting.
start-all.sh produced following output:
starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused
After enabling ssh login via System Preferences -> Sharing -> Remote Login
it started to work.
start-all.sh output changed to following (note start of datanode):
starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting datanode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting secondarynamenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting tasktracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
And I think you should make sure all the datanodes are up when you do copy to dfs. In some case, it takes a while. I think that's why the solution 'checking the health status' works, because you go to the health status webpage and wait for everything up, my five cents.
It take me a week to figure out the problem in my situation.
When the client(your program) ask the nameNode for data operation, the nameNode picks up a dataNode and navigate the client to it, by giving the dataNode's ip to the client.
But, when the dataNode host is configured to has multiple ip, and the nameNode gives you the one your client CAN'T ACCESS TO, the client would add the dataNode to exclude list and ask the nameNode for a new one, and finally all dataNode are excluded, you get this error.
So check node's ip settings before you try everything!!!
If all data nodes are running, one more thing to check whether the HDFS has enough space for your data. I can upload a small file but failed to upload a big file (30GB) to HDFS. 'bin/hdfs dfsadmin -report' shows that each data node only has a few GB available.
Have you tried the recommend from the wiki http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment ?
I was getting this error when putting data into the dfs. The solution is strange and probably inconsistent: I erased all temporary data along with the namenode, reformatted the namenode, started everything up, and visited my "cluster's" dfs health page (http://your_host:50070/dfshealth.jsp). The last step, visiting the health page, is the only way I can get around the error. Once I've visited the page, putting and getting files in and out of the dfs works great!
Reformatting the node is not the solution. You will have to edit the start-all.sh. Start the dfs, wait for it to start completely and then start mapred. You can do this using a sleep. Waiting for 1 second worked for me. See the complete solution here http://sonalgoyal.blogspot.com/2009/06/hadoop-on-ubuntu.html.
I realize I'm a little late to the party, but I wanted to post this
for future visitors of this page. I was having a very similar problem
when I was copying files from local to hdfs and reformatting the
namenode did not fix the problem for me. It turned out that my namenode
logs had the following error message:
2012-07-11 03:55:43,479 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-920118459-192.168.3.229-50010-1341506209533, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:491)
at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:462)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1628)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1514)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:113)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:381)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
Apparently, this is a relatively common problem on hadoop clusters and
Cloudera suggests increasing the nofile and epoll limits (if on
kernel 2.6.27) to work around it. The tricky thing is that setting
nofile and epoll limits is highly system dependent. My Ubuntu 10.04
server required a slightly different configuration for this to work
properly, so you may need to alter your approach accordingly.
Don't format the name node immediately. Try stop-all.sh and start it using start-all.sh. If the problem persists, go for formatting the name node.
Follow the below steps:
1. Stop dfs and yarn.
2. Remove datanode and namenode directories as specified in the core-site.xml.
3. Start dfs and yarn as follows:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

Resources