No nodes in Hadoop setup - hadoop

I am trying to setup 2 nodes in my Hadoop "cluster" in VirtualBox. In the master node, I have setup
The master slaves config:
hadoop#Master:/usr/local/hadoop/etc/hadoop$ cat slaves
Master
Slave
Slave's hosts file
hadoop#Slave:/usr/local/hadoop/sbin$ cat /etc/hosts
127.0.0.1 localhost
192.168.56.102 Master
127.0.0.1 Slave
Slave's core-site.xml
hadoop#Slave:/usr/local/hadoop/etc/hadoop$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Slave:9000</value>
</property>
</configuration>
In master I have run:
hadoop#Master:/usr/local/hadoop/sbin$ ./start-dfs.sh
18/03/18 12:46:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out
Slave: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-Slave.out
Master: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-Master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-Master.out
18/03/18 12:46:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop#Master:/usr/local/hadoop/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-Master.out
Slave: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-Slave.out
Master: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-Master.out
hadoop#Master:/usr/local/hadoop/sbin$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-hadoop-historyserver-Master.out
Seems like no issues here? Then I go http://localhost:8088/cluster/nodes
But there is no nodes, why is this? Is there any other config I should post here?
Update: Checking out my logs ...
It seems like there maybe some issues starting my nodes?
hadoop#Master:/usr/local/hadoop/sbin$ tail -n 100 /usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out
ulimit -a for user hadoop
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 11659
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 11659
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hadoop#Master:/usr/local/hadoop/sbin$ tail -n 100 /usr/local/hadoop/logs/hadoop-hadoop-datanode-Master.out
ulimit -a for user hadoop
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 11659
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 11659
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Update 2: Resource manager logs
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2018-03-18 16:13:29,464 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue mappings, override: false
2018-03-18 16:13:29,465 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:4>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
2018-03-18 16:13:29,497 INFO org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: YARN system metrics publishing service is not enabled
2018-03-18 16:13:29,498 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state
2018-03-18 16:13:29,653 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating AMRMToken
2018-03-18 16:13:29,655 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
2018-03-18 16:13:29,655 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2018-03-18 16:13:29,656 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:29,656 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 1
2018-03-18 16:13:29,657 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2018-03-18 16:13:29,660 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2018-03-18 16:13:29,660 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:29,660 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 2
2018-03-18 16:13:29,660 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2018-03-18 16:13:29,662 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.nodelabels.event.NodeLabelsStoreEventType for class org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler
2018-03-18 16:13:29,720 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:29,732 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031
2018-03-18 16:13:30,072 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
2018-03-18 16:13:30,086 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,086 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8031: starting
2018-03-18 16:13:30,271 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:30,278 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8030
2018-03-18 16:13:30,285 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
2018-03-18 16:13:30,285 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,286 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting
2018-03-18 16:13:30,386 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 5000
2018-03-18 16:13:30,386 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8032
2018-03-18 16:13:30,390 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
2018-03-18 16:13:30,394 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:30,394 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
2018-03-18 16:13:30,405 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
2018-03-18 16:13:30,686 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2018-03-18 16:13:30,705 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2018-03-18 16:13:30,711 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
2018-03-18 16:13:30,733 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2018-03-18 16:13:30,742 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static
2018-03-18 16:13:30,743 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
2018-03-18 16:13:30,744 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2018-03-18 16:13:30,744 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2018-03-18 16:13:30,746 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
2018-03-18 16:13:30,746 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2018-03-18 16:13:31,369 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2018-03-18 16:13:31,371 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8088
2018-03-18 16:13:31,371 INFO org.mortbay.log: jetty-6.1.26
2018-03-18 16:13:31,426 INFO org.mortbay.log: Extract jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.4.jar!/webapps/cluster to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp
2018-03-18 16:13:31,788 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:31,794 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2018-03-18 16:13:31,794 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-03-18 16:13:33,623 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8088
2018-03-18 16:13:33,623 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app cluster started at 8088
2018-03-18 16:13:33,752 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100
2018-03-18 16:13:33,752 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033
2018-03-18 16:13:33,755 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server
2018-03-18 16:13:33,755 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-03-18 16:13:33,759 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2018-03-18 16:13:34,915 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from Master doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
Is the problem with the Slave node?
hadoop#Slave:/usr/local/hadoop/logs$ tail -n 50 hadoop-hadoop-datanode-Slave.log
2018-03-18 16:18:54,090 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: Slave/127.0.0.1:9000
2018-03-18 16:19:00,093 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:01,094 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:02,095 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:03,096 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:04,097 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:05,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:06,102 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:07,104 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-03-18 16:19:08,106 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave/127.0.0.1:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
yarn-site.xml
hadoop#Master:/usr/local/hadoop/etc/hadoop$ cat yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Related

Hadoop Mapreduce job map 100% reduce 0% - rduction is not running and node manager is shutting down

I am trying to run the hadoop wordcount example after installation:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output
I tried to run with default memory settings. The moment map job finishes nodemanager shutsdown and reduce job cannot start. Find the logs below:
2022-03-08 16:18:22,557 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14844 for container-id container_1646774131562_0001_01_00
0001: 320.0 MB of 2 GB physical memory used; 2.7 GB of 4.2 GB virtual memory used
2022-03-08 16:18:25,266 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Done waiting for Applications to be Finished. Still alive: [application_1646774131562_0001]
2022-03-08 16:18:25,266 INFO org.apache.hadoop.ipc.Server: Stopping server on 38731
2022-03-08 16:18:25,270 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,271 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 38731
2022-03-08 16:18:25,275 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2022-03-08 16:18:25,300 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,303 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2022-03-08 16:18:25,304 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2022-03-08 16:18:25,307 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at ubuntu2110/127.0.1.1
************************************************************/

Hadoop MapReduce Unable to connect to ResourceManager

I'm trying to setup Hadoop with a Single Node Cluster (Psuedo-distributed) and using the apache guide to do so. Right now I'm trying to run a MapReduce job and using the example it provides bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z]+'
hadoop#hadoop:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z]+'
xxxx-xx-xx xx:xx:xx,xxx INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
xxxx-xx-xx xx:xx:xx,xxx INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
xxxx-xx-xx xx:xx:xx,xxx WARN ipc.Client: Failed to connect to server: 0.0.0.0/0.0.0.0:8032: retries get failed due to exceeded maximum allowed retries number: 10
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
Looking this issue up online, everyone else that has this problem seem to have it with YARN but not MapReduce. And my hdfs-site.xml is the same as mentioned in the guide:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
I ran jps although I don't know what I'm looking for here:
hadoop#hadoop:/usr/local/hadoop$ jps
9860 DataNode
10075 SecondaryNameNode
9708 NameNode
11021 Jps
Any help is appreciated.
Edit: I looked into hadoop-hadoop-resourcemanager-hadoop.log and found this:
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
xxxx-xx-xx xx:xx:xx,xxx INFO org.eclipse.jetty.util.log: Logging initialized #7307ms
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
xxxx-xx-xx xx:xx:xx,xxx INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
xxxx-xx-xx xx:xx:xx,xxx FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.ExceptionInInitializerError
at com.google.inject.internal.cglib.reflect.$FastClassEmitter.<init>(FastClassEmitter.java:67)
at com.google.inject.internal.cglib.reflect.$FastClass$Generator.generateClass(FastClass.java:72)
...
Edit2: here's my yarn-site.xml if it helps:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
I'm using Java 9 and for the time being there is no Java 9 support for Hadoop yet. https://issues.apache.org/jira/browse/HADOOP-11123

Yarn Container wrong hostname when contacting ResourceManager

I'm trying to write a simple query in Hive (just an INSERT) but I'm having issues with how MapReduce jobs are being provisioned. Containers are getting allocated correctly, but my jobs never run.
It seems that they're contacting the ResourceManager incorrectly. I have verified (via JPS) that my ResourceManager is indeed running, and is running on hostname hadoop1.personal which all servers have a reference to in /etc/hosts. The issue looks like this:
2016-09-27 09:41:55,223 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-09-27 09:41:55,224 INFO [Socket Reader #1 for port 45744] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45744
2016-09-27 09:41:55,230 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-09-27 09:41:55,230 INFO [IPC Server listener on 45744] org.apache.hadoop.ipc.Server: IPC Server listener on 45744: starting
2016-09-27 09:41:55,299 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-09-27 09:41:55,300 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-09-27 09:41:55,375 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2016-09-27 09:41:56,414 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:57,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:58,415 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:41:59,416 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-27 09:42:00,417 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
And of course it does go on for some time before eventually dying.
Now, I know that my configurations are getting picked up in some sense. Earlier in the logs, the containers say 2016-09-27 09:41:52,783 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hadoop1.personal:8020] which is the correct NameNode to be using.
Additionally, if I go to the NodeManager configuration (i.e. http://hadoop2.personal:8042/conf) then I can see that:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1.personal</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<source>yarn-default.xml</source>
</property>
So the NodeManager appears to know exactly where it needs to be at.
This seems incredibly strange to me: The NodeManager and ResourceManagers are talking together just fine, but containers are contacting the wrong scheduler. How do I control the address the containers are contacting for scheduling?
As a sidenote, I have tested this both with and without IPv6 enabled as recommended in this answer. No effect.

Datanode and nodemanger appear in jps but arent start correctly

when I execute jps command datanode and nodemanagger appears, but it seems that is not starting correctly, because if I check logs it seems that they arent running correctly.
In the namenode node when I format namenode and start the cluster I get a Version file in namenodefolder created automatically as I set in hdfs-site.xml:
clusterID=CID-76572234-4ef7-4e6a-8ec5-1f54fe22b17d
cTime=0
storageType=NAME_NODE
blockpoolID=BP-141486958-10.17.0.88-1463916426343
layoutVersion=-63
But in the node where is datanode this file its not created, the folder datanode that I set in hdfs-site.xml is created but its empty inside, I dont know if it is normal.
Nodemanager log:
STARTUP_MSG: java = 1.8.0_91
************************************************************/
2016-05-22 11:41:11,219 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]
2016-05-22 11:41:12,264 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
2016-05-22 11:41:12,265 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
2016-05-22 11:41:12,267 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
2016-05-22 11:41:12,286 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
2016-05-22 11:41:12,286 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.NodeManager
2016-05-22 11:41:12,326 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-05-22 11:41:12,397 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-05-22 11:41:12,398 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started
2016-05-22 11:41:12,420 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
2016-05-22 11:41:12,421 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
2016-05-22 11:41:12,421 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192
2016-05-22 11:41:12,478 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: usercache path : file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache_DEL_1463913672424
2016-05-22 11:41:12,529 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
2016-05-22 11:41:12,548 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin#2dfaea86
2016-05-22 11:41:12,548 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null
2016-05-22 11:41:12,549 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true
2016-05-22 11:41:12,549 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: true
2016-05-22 11:41:12,552 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.9 G). Thrashing might happen.
2016-05-22 11:41:12,557 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=8192 virtual-memory=17204 virtual-cores=8
2016-05-22 11:41:12,596 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:41:12,619 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40484
2016-05-22 11:41:12,651 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
2016-05-22 11:41:12,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting.
2016-05-22 11:41:12,651 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:12,652 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 40484: starting
2016-05-22 11:41:12,661 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ubuntuslave:40484
2016-05-22 11:41:12,668 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:41:12,669 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040
2016-05-22 11:41:12,671 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server
2016-05-22 11:41:12,672 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting
2016-05-22 11:41:12,672 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:12,673 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040
2016-05-22 11:41:12,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ubuntuslave/10.17.0.89:40484
2016-05-22 11:41:12,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0
2016-05-22 11:41:12,676 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042
2016-05-22 11:41:12,749 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-05-22 11:41:12,758 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-05-22 11:41:12,763 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined
2016-05-22 11:41:12,771 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-05-22 11:41:12,776 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/*
2016-05-22 11:41:12,777 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2016-05-22 11:41:12,786 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042
2016-05-22 11:41:12,786 INFO org.mortbay.log: jetty-6.1.26
2016-05-22 11:41:12,813 INFO org.mortbay.log: Extract jar:file:/usr/local/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
2016-05-22 11:41:13,010 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2016-05-22 11:41:13,010 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2016-05-22 11:41:13,316 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2016-05-22 11:41:13,324 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at masternode/10.18.0.50:8031
2016-05-22 11:41:13,417 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2016-05-22 11:41:13,426 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-05-22 11:41:33,471 INFO org.apache.hadoop.ipc.Client: Retrying connect to server
datanode log:
STARTUP_MSG: java = 1.8.0_91
************************************************************/
2016-05-22 11:40:40,852 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2016-05-22 11:40:41,523 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-05-22 11:40:41,607 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-05-22 11:40:41,607 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2016-05-22 11:40:41,612 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2016-05-22 11:40:41,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ubuntuslave
2016-05-22 11:40:41,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
2016-05-22 11:40:41,644 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2016-05-22 11:40:41,646 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2016-05-22 11:40:41,646 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 5
2016-05-22 11:40:41,739 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-05-22 11:40:41,750 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-05-22 11:40:41,768 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2016-05-22 11:40:41,776 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-05-22 11:40:41,779 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2016-05-22 11:40:41,780 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-05-22 11:40:41,780 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-05-22 11:40:41,796 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 52013
2016-05-22 11:40:41,796 INFO org.mortbay.log: jetty-6.1.26
2016-05-22 11:40:41,990 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#localhost:52013
2016-05-22 11:40:42,109 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:50075
2016-05-22 11:40:42,298 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hadoopadmin
2016-05-22 11:40:42,298 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2016-05-22 11:40:42,343 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:40:42,361 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2016-05-22 11:40:42,388 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2016-05-22 11:40:42,400 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2016-05-22 11:40:42,424 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2016-05-22 11:40:42,436 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to masternode/10.18.0.50:9000 starting to offer service
2016-05-22 11:40:42,444 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2016-05-22 11:40:42,445 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:02,555 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/10.18.0.50:9000. Already tried 0 time(s); maxRetries=45
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>masternode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>masternode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>masternode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>masternode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>masternode:8088</value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>masternode:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoopadmin/hadooptmp</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoopadmin/hadooptmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
masters file:
masternode
slaves file:
ubuntuslave
Do you understand why its not working?
After looking at datanode logs it looks like the error is coming because of NameNode. It seems that namenode is down. Since NameNode is down, Datanode is not able to start properly.
Here is what you can do :
A. start the namenode.
B. verify that namenode is running properly.
C. start the datanode. verify it started properly.
D. Run your spark application

HADOOP datanode strange things

I think I must have some misunderstanding about the datanodes in Hadoop Cluster. I have a hadoop virtural cluster composed of master,slave1,slave2, slave3. Master and slave1 are in a phsical machine while slave2 and slave3 are in one physical machine. When I start the cluster, in the HDFS webUI, I can only see three living datanodes, slave1,master, slave2. But sometimes, the three living datanodes are master, slave1,slave3. That's strange. I ssh to the unstarted data node, though I execute jps and found no datanode, I can still copy and delete files on HDFS on this node.
So I believe I must not understand datanode correctly. I have three questions here. 1 Is there one datanode per node? 2 Why the node which is not datanode can still read and write on HDFS? 3 can we decide the number of datanode?
Here is the log of the unstarted datanode:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = slave11/192.168.111.31
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch- 1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
2012-08-03 17:47:07,578 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-08-03 17:47:07,595 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-03 17:47:07,911 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-03 17:47:07,915 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-08-03 17:47:09,457 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 0 time(s).
2012-08-03 17:47:10,460 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 1 time(s).
2012-08-03 17:47:11,464 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 2 time(s).
2012-08-03 17:47:19,565 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-03 17:47:19,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2012-08-03 17:47:19,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2012-08-03 17:47:24,721 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-03 17:47:24,854 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-03 17:47:24,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2012-08-03 17:47:24,953 INFO org.mortbay.log: jetty-6.1.26
2012-08-03 17:47:25,665 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:25,688 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-08-03 17:47:25,690 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2012-08-03 17:47:30,717 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2012-08-03 17:47:30,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave11:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)
2012-08-03 17:47:30,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
2012-08-03 17:47:30,766 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:30,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2012-08-03 17:47:30,778 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2012-08-03 17:47:30,772 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2012-08-03 17:47:30,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.
2012-08-03 17:47:30,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 52ms
2012-08-03 17:47:30,838 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 32 ms
2012-08-03 17:47:30,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 2 ms
2012-08-03 17:47:31,158 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-6072482390929551157_78209
2012-08-03 17:47:33,775 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 1 ms
2012-08-03 17:47:33,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.111.31:50010 is attempting to report storage ID DS-1062340636-127.0.0.1-50010-1339803955209. Node 192.168.111.32:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:4608)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3460)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy5.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:958)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,873 INFO org.mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:33,980 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,982 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:170)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:102)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,982 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2012-08-03 17:47:33,983 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
2012-08-03 17:47:33,984 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Exiting DataBlockScanner thread.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:33,987 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:522)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:737)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2012-08-03 17:47:33,988 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2067)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:799)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down.
2012-08-03 17:47:33,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
There are problems having several DataNodes per single hostname. You say it is virtual, so are they on different virtual machines? If so, this shouldn't be a problem...
I would check the DataNode logs for slave2 and slave3 and see why one isn't booting. The error message will be printed there. If the error says something along the lines of the port being taken or something like that.
You don't need to be on a DataNode to access HDFS. The HDFS client (such as hadoop fs -put) directly communicates with the NameNode and other DataNode processes without ever having to access the local one.
It is actually quite common on large clusters to have a separate "query node" that has access to HDFS and MapReduce, but isn't running any DataNode or TaskTracker services.
As long as you have the Hadoop packages installed and the configuration files point to the NameNode and JobTracker correctly, you can access your cluster "remotely".

Resources