I'm trying to write a simple query in Hive (just an INSERT) but I'm having issues with how MapReduce jobs are being provisioned. Containers are getting allocated correctly, but my jobs never run.
It seems that they're contacting the ResourceManager incorrectly. I have verified (via JPS) that my ResourceManager is indeed running, and is running on hostname hadoop1.personal which all servers have a reference to in /etc/hosts. The issue looks like this:
2016-09-27 09:41:55,223 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-09-27 09:41:55,224 INFO [Socket Reader #1 for port 45744] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45744
2016-09-27 09:41:55,230 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-09-27 09:41:55,230 INFO [IPC Server listener on 45744] org.apache.hadoop.ipc.Server: IPC Server listener on 45744: starting
2016-09-27 09:41:55,299 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-09-27 09:41:55,375 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2016-09-27 09:41:56,414 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:57,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:58,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:59,416 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:42:00,417 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
And of course it does go on for some time before eventually dying.
Now, I know that my configurations are getting picked up in some sense. Earlier in the logs, the containers say 2016-09-27 09:41:52,783 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hadoop1.personal:8020] which is the correct NameNode to be using.
Additionally, if I go to the NodeManager configuration (i.e. http://hadoop2.personal:8042/conf) then I can see that:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1.personal</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<source>yarn-default.xml</source>
</property>
So the NodeManager appears to know exactly where it needs to be at.
This seems incredibly strange to me: The NodeManager and ResourceManagers are talking together just fine, but containers are contacting the wrong scheduler. How do I control the address the containers are contacting for scheduling?
As a sidenote, I have tested this both with and without IPv6 enabled as recommended in this answer. No effect.
Related
I am trying to setup 2 nodes in my Hadoop "cluster" in VirtualBox. In the master node, I have setup
The master slaves config:
hadoop#Master:/usr/local/hadoop/etc/hadoop$ cat slaves
Master
Slave
Slave's hosts file
hadoop#Slave:/usr/local/hadoop/sbin$ cat /etc/hosts
127.0.0.1 localhost
192.168.56.102 Master
127.0.0.1 Slave
Slave's core-site.xml
hadoop#Slave:/usr/local/hadoop/etc/hadoop$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Slave:9000</value>
</property>
</configuration>
In master I have run:
hadoop#Master:/usr/local/hadoop/sbin$ ./start-dfs.sh
18/03/18 12:46:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out
Slave: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-Slave.out
Master: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-Master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-Master.out
18/03/18 12:46:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop#Master:/usr/local/hadoop/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-Master.out
Slave: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-Slave.out
Master: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-Master.out
hadoop#Master:/usr/local/hadoop/sbin$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-hadoop-historyserver-Master.out
Seems like no issues here? Then I go http://localhost:8088/cluster/nodes
But there is no nodes, why is this? Is there any other config I should post here?
Update: Checking out my logs ...
It seems like there maybe some issues starting my nodes?
hadoop#Master:/usr/local/hadoop/sbin$ tail -n 100 /usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out
ulimit -a for user hadoop
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 11659
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 11659
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hadoop#Master:/usr/local/hadoop/sbin$ tail -n 100 /usr/local/hadoop/logs/hadoop-hadoop-datanode-Master.out
ulimit -a for user hadoop
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 11659
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 11659
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Update 2: Resource manager logs
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue mappings, override: false
2018-03-18 16:13:29,465 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:4>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
2018-03-18 16:13:29,497 INFO org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: YARN system metrics publishing service is not enabled
2018-03-18 16:13:29,498 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state
2018-03-18 16:13:29,653 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating AMRMToken
2018-03-18 16:13:29,655 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
2018-03-18 16:13:29,655 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2018-03-18 16:13:29,656 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:29,656 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 1
2018-03-18 16:13:29,657 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2018-03-18 16:13:29,660 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2018-03-18 16:13:29,660 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:29,660 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 2
2018-03-18 16:13:29,660 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2018-03-18 16:13:29,662 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.nodelabels.event.NodeLabelsStoreEventType for class org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler
2018-03-18 16:13:29,720 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:29,732 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031
2018-03-18 16:13:30,072 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
2018-03-18 16:13:30,086 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,086 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8031: starting
2018-03-18 16:13:30,271 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:30,278 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8030
2018-03-18 16:13:30,285 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
2018-03-18 16:13:30,285 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,286 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting
2018-03-18 16:13:30,386 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:30,386 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8032
2018-03-18 16:13:30,390 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
2018-03-18 16:13:30,394 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,394 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
2018-03-18 16:13:30,405 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
2018-03-18 16:13:30,686 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2018-03-18 16:13:30,705 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2018-03-18 16:13:30,711 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
2018-03-18 16:13:30,733 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2018-03-18 16:13:30,742 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
2018-03-18 16:13:30,744 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2018-03-18 16:13:30,744 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2018-03-18 16:13:30,746 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
2018-03-18 16:13:30,746 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2018-03-18 16:13:31,369 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2018-03-18 16:13:31,371 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8088
2018-03-18 16:13:31,371 INFO org.mortbay.log: jetty-6.1.26
2018-03-18 16:13:31,426 INFO org.mortbay.log: Extract jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.4.jar!/webapps/cluster to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp
2018-03-18 16:13:31,788 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:31,794 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2018-03-18 16:13:31,794 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:33,623 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8088
2018-03-18 16:13:33,623 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app cluster started at 8088
2018-03-18 16:13:33,752 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100
2018-03-18 16:13:33,752 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033
2018-03-18 16:13:33,755 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server
2018-03-18 16:13:33,755 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:33,759 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2018-03-18 16:13:34,915 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from Master doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
Is the problem with the Slave node?
hadoop#Slave:/usr/local/hadoop/logs$ tail -n 50 hadoop-hadoop-datanode-Slave.log
2018-03-18 16:18:54,090 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: Slave/127.0.0.1:9000
2018-03-18 16:19:00,093 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:01,094 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:02,095 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:03,096 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:04,097 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:05,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:06,102 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:07,104 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:08,106 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
yarn-site.xml
hadoop#Master:/usr/local/hadoop/etc/hadoop$ cat yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
I'm trying to setup Hadoop with a Single Node Cluster (Psuedo-distributed) and using the apache guide to do so. Right now I'm trying to run a MapReduce job and using the example it provides bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z]+'
hadoop#hadoop:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z]+'
xxxx-xx-xx xx:xx:xx,xxx INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
xxxx-xx-xx xx:xx:xx,xxx WARN ipc.Client: Failed to connect to server: 0.0.0.0/0.0.0.0:8032: retries get failed due to exceeded maximum allowed retries number: 10
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
Looking this issue up online, everyone else that has this problem seem to have it with YARN but not MapReduce. And my hdfs-site.xml is the same as mentioned in the guide:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
I ran jps although I don't know what I'm looking for here:
hadoop#hadoop:/usr/local/hadoop$ jps
9860 DataNode
10075 SecondaryNameNode
9708 NameNode
11021 Jps
Any help is appreciated.
Edit: I looked into hadoop-hadoop-resourcemanager-hadoop.log and found this:
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
xxxx-xx-xx xx:xx:xx,xxx INFO org.eclipse.jetty.util.log: Logging initialized #7307ms
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
xxxx-xx-xx xx:xx:xx,xxx FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.ExceptionInInitializerError
at com.google.inject.internal.cglib.reflect.$FastClassEmitter.<init>(FastClassEmitter.java:67)
at com.google.inject.internal.cglib.reflect.$FastClass$Generator.generateClass(FastClass.java:72)
...
Edit2: here's my yarn-site.xml if it helps:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
I'm using Java 9 and for the time being there is no Java 9 support for Hadoop yet. https://issues.apache.org/jira/browse/HADOOP-11123
I am trying to setup a multi-node hadoop cluster, however datanode is failing to start, need hep on this. Below are the details. No other setups done apart from this. I have only one data node and one name node setup as of now.
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.name.dir</name>
<value>/data/namenode</value>
</property>
DATANODE SETUP:
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.data.dir</name>
<value>/data/datanode</value>
</property>
When I run namenode it runs fine however when I try to run data node on other machine whos IP is 192.168.1.8 it fails and log says
2017-05-13 21:26:27,744 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2017-05-13 21:26:27,862 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2017-05-13 21:26:32,908 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:34,979 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:36,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:37,093 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:38,162 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:39,238 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
#
and datanode dies
Is anything there to setup ?
let me any other details required. Is there any other files to change? I am using centos7 to setup the env. I did formatting of namenode also more than 2-3 times, and also permissions are proper. Only connectivity issue however when I try to scp from master to slave (namenode to datanode) its works fine.
Suggest if there are any other setup to be done to make it successful!
There is a typo in the property name of your configuration. A 'a' is missing : fs.defult.name (vs fs.default.name).
I run select * from customers in hive and i get the result.
Now when I run select count(*) customers, the job status is failed. In JobHistory I found 4 failed maps.
And in the map log file I have this :
2016-10-19 12:47:09,725 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-10-19 12:47:09,786 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-10-19 12:47:09,786 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-10-19 12:47:09,796 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-10-19 12:47:09,796 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1476893269614_0006, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#18aabe9c)
2016-10-19 12:47:09,878 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-10-19 12:47:29,958 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 0 time(s); maxRetries=45
2016-10-19 12:47:30,961 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:31,962 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:32,963 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:36,971 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:37,975 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:38,976 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:46,992 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:47,993 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:48,994 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:50,999 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave1/192.168.1.33:37159. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-19 12:47:51,002 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.NoRouteToHostException: No Route to Host from master1/192.168.1.30 to slave1:37159 failed on socket timeout exception: java.net.NoRouteToHostException: Aucun chemin d'accès pour atteindre l'hôte cible; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243)
at com.sun.proxy.$Proxy9.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.NoRouteToHostException: Aucun chemin d'accès pour atteindre l'hôte cible
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
... 4 more
And in Clouedra Manager Hosts > slave1 > Processes > YARN Nodemanager > LogFile I found two worning :
WARN org.apache.hadoop.hdfs.BlockReaderFactory :
I/O error constructing remote block reader.
java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.33:56208, remote=/192.168.1.30:50010, for file /user/admin/.staging/job_1476893269614_0001/libjars/hive-hbase-handler-1.1.0-cdh5.8.2.jar, for pool BP-1641388066-192.168.1.30-1476615377122 block 1073751347_10539
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:881)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:759)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:942)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
WARN org.apache.hadoop.hdfs.DFSClient :
Failed to connect to /192.168.1.30:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.33:56208, remote=/192.168.1.30:50010, for file /user/admin/.staging/job_1476893269614_0001/libjars/hive-hbase-handler-1.1.0-cdh5.8.2.jar, for pool BP-1641388066-192.168.1.30-1476615377122 block 1073751347_10539
java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.33:56208, remote=/192.168.1.30:50010, for file /user/admin/.staging/job_1476893269614_0001/libjars/hive-hbase-handler-1.1.0-cdh5.8.2.jar, for pool BP-1641388066-192.168.1.30-1476615377122 block 1073751347_10539
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:881)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:759)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:942)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any Help please. I have been sitting with this issue for a long time now. Thank you!
You get a result from your first query: select * from customers because Hive doesn't use map reduce to get result
Are you sure about your hadoop configuration ?
Did you configure Hosts File ?
How to configure hosts file for Hadoop ecosystem
Firewall on nodes is active.
stop firewall on node machines.
check your hosts file and host file
it should be match other wise it will through error like you
sudo gedit /etc/hosts
======
hosts
======
127.0.0.1 localhost
127.0.0.1 orienit
sudo gedit /etc/hostname
hostname
========
orienit
I configured an Apache hadoop cluster with 1 Namenode and 2 Datanodes in VMware Workstation and Namenode is working fine, also did ssh-passwordless login too, but when I try to start datanode get the following error?
Under data nodes log getting Retrying error for namenode under both datanodes, whereas I tried to ping and connect with Namenode no error.
Below is the log for datanode,
2015-11-14 19:54:22,622 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = dn2.hcluster.com/192.168.155.133
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 1
5:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_65
************************************************************/
2015-11-14 19:54:23,447 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-11-14 19:54:23,485 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2015-11-14 19:54:23,486 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-11-14 19:54:23,486 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2015-11-14 19:54:23,876 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2015-11-14 19:54:25,720 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:27,723 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:28,726 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:29,729 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:30,733 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:31,753 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:32,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:33,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:34,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:35,764 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn1.hcluster.com/192.168.155.131:9000. Already tri
ed 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-11-14 19:54:35,922 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to nn1.hcluster.com/192.168.155.
131:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:414)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:453)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:335)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:300)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
at org.apache.hadoop.ipc.Client.call(Client.java:1093)
... 16 more
2015-11-14 19:54:35,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at dn2.hcluster.com/192.168.155.133
************************************************************/
From Datanode 1 and 2, Namenode and it's GUI is working and all 3Desktop are able to communicate with eachother via pin or ssh passwordless too. Please help..
core-site.xml under namenode
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://nn01.hcluster.com:9000</value>
</property>
</configuration>
make sure your Namenode is running fine. Otherwise check the Machine IP and host name in /etc/hosts file.
Make sure that you have added this hostname "nn01.hcluster.com" there.