Hadoop data node start failing - hadoop

I have a Hadoop cluster of 11 nodes. One node is acting as master node and 10 slave nodes running DATANODE & TASKTRACKERS.
TASK TRACKER is started on all the slave nodes.
DATANODE is only started 6 out of 10 nodes
Below is the log from /hadoop/logs/...Datanode....log file.
2014-12-03 17:55:05,057 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = trans9/192.168.0.16
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_65
************************************************************/
2014-12-03 17:55:05,371 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-12-03 17:55:05,384 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-12-03 17:55:05,385 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-12-03 17:55:05,385 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2014-12-03 17:55:05,776 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-12-03 17:55:05,789 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-12-03 17:55:08,850 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2014-12-03 17:55:08,865 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened data transfer server at 50010
2014-12-03 17:55:08,867 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2014-12-03 17:55:08,876 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2014-12-03 17:55:08,962 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-12-03 17:55:09,055 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-12-03 17:55:09,068 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2014-12-03 17:55:09,068 INFO org.mortbay.log: jetty-6.1.26
2014-12-03 17:55:09,804 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2014-12-03 17:55:09,812 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2014-12-03 17:55:09,813 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2014-12-03 17:55:09,893 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2014-12-03 17:55:09,894 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2014-12-03 17:55:09,895 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2014-12-03 17:55:09,903 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave9:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020)
2014-12-03 17:55:09,914 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished generating blocks being written report for 1 volumes in 0 seconds
2014-12-03 17:55:09,933 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 5ms
2014-12-03 17:55:09,933 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/home/ubuntu/hadoop/hadoop-data/dfs/data/current'}
2014-12-03 17:55:09,945 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-12-03 17:55:09,946 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2014-12-03 17:55:09,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2014-12-03 17:55:09,955 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2014-12-03 17:55:09,955 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2014-12-03 17:55:09,959 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2014-12-03 17:55:10,140 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.0.16:50010 is attempting to report storage ID DS-551911532-192.168.0.31-50010-1417617118848. Node 192.168.0.14:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:5049)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3939)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1095)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy3.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1084)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,144 INFO org.mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50075
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2014-12-03 17:55:10,148 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2014-12-03 17:55:10,148 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2014-12-03 17:55:10,149 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2014-12-03 17:55:10,149 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:248)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:100)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,149 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2014-12-03 17:55:10,149 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2014-12-03 17:55:10,150 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2014-12-03 17:55:10,151 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down
2014-12-03 17:55:10,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/home/ubuntu/hadoop/hadoop-data/dfs/data/current'}
2014-12-03 17:55:10,152 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:586)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:855)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1601)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,152 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2014-12-03 17:55:10,152 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2014-12-03 17:55:10,153 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2014-12-03 17:55:10,153 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-551911532-192.168.0.31-50010-1417617118848
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-551911532-192.168.0.31-50010-1417617118848
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2093)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:917)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1601)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,159 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down.
2014-12-03 17:55:10,159 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-12-03 17:55:10,166 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at trans9/192.168.0.16
************************************************************/

Oh, One system for name node and rest of the systems allocated to the data node is not recommendable. I recommend one system for Name node , another for Job Tracker, another system for secondary name node. Rest of the 8 systems should data node is highly recommended.
Come to your question. Format the name node to resolve this problem. Also re open Terminal. some time network also one of the reason.

This is due to incompatible cluster id problems, so format your datanode directory and start again.

Since your cluster is a tens of nodes, it is fine to have a single node with Namenode and JobTracker. Secondary Namenode should be on the other node as it requires more memory to perform periodic backups.
Coming to your question,
It could be the configuration file copying causing conflits.Same problem Answered here
And you can try copying the working datanode configuration with the appropriate changes.
If there is no data on the cluster, You can format the namenode, before this stop all the the daemons and then start again.
Hope it helps!

The fix would be as follows. This will not cause any data deletion as you would have set replication factor to more than 1.
The fix would be to check for the services running on the box. In this case, you will have tasktracker running on the box. Stop task tracker by running
hadoop-daemon stop tasktracker
Once this is done, go to the location you might have mentioned under dfs.data.dir property and delete all the files\folders here. Once this is done, run
hadoop-daemon start datanode
hadoop-daemon start tasktracker
This should bring up the datanode and the tasktracker. If you are successful in doing this, go to the namenode and run
hadoop dfsadmin -refreshNodes

Related

Hadoop Mapreduce job map 100% reduce 0% - rduction is not running and node manager is shutting down

I am trying to run the hadoop wordcount example after installation:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output
I tried to run with default memory settings. The moment map job finishes nodemanager shutsdown and reduce job cannot start. Find the logs below:
2022-03-08 16:18:22,557 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14844 for container-id container_1646774131562_0001_01_00
0001: 320.0 MB of 2 GB physical memory used; 2.7 GB of 4.2 GB virtual memory used
2022-03-08 16:18:25,266 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Done waiting for Applications to be Finished. Still alive: [application_1646774131562_0001]
2022-03-08 16:18:25,266 INFO org.apache.hadoop.ipc.Server: Stopping server on 38731
2022-03-08 16:18:25,270 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,271 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 38731
2022-03-08 16:18:25,275 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2022-03-08 16:18:25,300 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,303 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2022-03-08 16:18:25,304 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2022-03-08 16:18:25,307 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at ubuntu2110/127.0.1.1
************************************************************/

Datanode and nodemanger appear in jps but arent start correctly

when I execute jps command datanode and nodemanagger appears, but it seems that is not starting correctly, because if I check logs it seems that they arent running correctly.
In the namenode node when I format namenode and start the cluster I get a Version file in namenodefolder created automatically as I set in hdfs-site.xml:
clusterID=CID-76572234-4ef7-4e6a-8ec5-1f54fe22b17d
cTime=0
storageType=NAME_NODE
blockpoolID=BP-141486958-10.17.0.88-1463916426343
layoutVersion=-63
But in the node where is datanode this file its not created, the folder datanode that I set in hdfs-site.xml is created but its empty inside, I dont know if it is normal.
Nodemanager log:
STARTUP_MSG: java = 1.8.0_91
************************************************************/
2016-05-22 11:41:11,219 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]
2016-05-22 11:41:12,264 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
2016-05-22 11:41:12,265 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
2016-05-22 11:41:12,266 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
2016-05-22 11:41:12,267 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
2016-05-22 11:41:12,286 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
2016-05-22 11:41:12,286 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.server.nodemanager.NodeManager
2016-05-22 11:41:12,326 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-05-22 11:41:12,397 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-05-22 11:41:12,398 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started
2016-05-22 11:41:12,420 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
2016-05-22 11:41:12,421 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
2016-05-22 11:41:12,421 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192
2016-05-22 11:41:12,478 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: usercache path : file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache_DEL_1463913672424
2016-05-22 11:41:12,529 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
2016-05-22 11:41:12,548 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin#2dfaea86
2016-05-22 11:41:12,548 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null
2016-05-22 11:41:12,549 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true
2016-05-22 11:41:12,549 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: true
2016-05-22 11:41:12,552 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.9 G). Thrashing might happen.
2016-05-22 11:41:12,557 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=8192 virtual-memory=17204 virtual-cores=8
2016-05-22 11:41:12,596 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:41:12,619 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40484
2016-05-22 11:41:12,651 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
2016-05-22 11:41:12,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting.
2016-05-22 11:41:12,651 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:12,652 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 40484: starting
2016-05-22 11:41:12,661 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ubuntuslave:40484
2016-05-22 11:41:12,668 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:41:12,669 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040
2016-05-22 11:41:12,671 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server
2016-05-22 11:41:12,672 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting
2016-05-22 11:41:12,672 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:12,673 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040
2016-05-22 11:41:12,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ubuntuslave/10.17.0.89:40484
2016-05-22 11:41:12,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0
2016-05-22 11:41:12,676 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042
2016-05-22 11:41:12,749 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-05-22 11:41:12,758 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-05-22 11:41:12,763 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined
2016-05-22 11:41:12,771 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-05-22 11:41:12,773 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-05-22 11:41:12,776 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/*
2016-05-22 11:41:12,777 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2016-05-22 11:41:12,786 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042
2016-05-22 11:41:12,786 INFO org.mortbay.log: jetty-6.1.26
2016-05-22 11:41:12,813 INFO org.mortbay.log: Extract jar:file:/usr/local/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
2016-05-22 11:41:13,010 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2016-05-22 11:41:13,010 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2016-05-22 11:41:13,316 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2016-05-22 11:41:13,324 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at masternode/10.18.0.50:8031
2016-05-22 11:41:13,417 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2016-05-22 11:41:13,426 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-05-22 11:41:33,471 INFO org.apache.hadoop.ipc.Client: Retrying connect to server
datanode log:
STARTUP_MSG: java = 1.8.0_91
************************************************************/
2016-05-22 11:40:40,852 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2016-05-22 11:40:41,523 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-05-22 11:40:41,607 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-05-22 11:40:41,607 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2016-05-22 11:40:41,612 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2016-05-22 11:40:41,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ubuntuslave
2016-05-22 11:40:41,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
2016-05-22 11:40:41,644 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2016-05-22 11:40:41,646 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2016-05-22 11:40:41,646 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 5
2016-05-22 11:40:41,739 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-05-22 11:40:41,750 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-05-22 11:40:41,768 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2016-05-22 11:40:41,776 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-05-22 11:40:41,779 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2016-05-22 11:40:41,780 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-05-22 11:40:41,780 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-05-22 11:40:41,796 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 52013
2016-05-22 11:40:41,796 INFO org.mortbay.log: jetty-6.1.26
2016-05-22 11:40:41,990 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#localhost:52013
2016-05-22 11:40:42,109 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:50075
2016-05-22 11:40:42,298 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hadoopadmin
2016-05-22 11:40:42,298 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2016-05-22 11:40:42,343 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-05-22 11:40:42,361 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2016-05-22 11:40:42,388 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2016-05-22 11:40:42,400 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2016-05-22 11:40:42,424 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2016-05-22 11:40:42,436 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to masternode/10.18.0.50:9000 starting to offer service
2016-05-22 11:40:42,444 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2016-05-22 11:40:42,445 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-22 11:41:02,555 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/10.18.0.50:9000. Already tried 0 time(s); maxRetries=45
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>masternode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>masternode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>masternode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>masternode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>masternode:8088</value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>masternode:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoopadmin/hadooptmp</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoopadmin/hadooptmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
masters file:
masternode
slaves file:
ubuntuslave
Do you understand why its not working?
After looking at datanode logs it looks like the error is coming because of NameNode. It seems that namenode is down. Since NameNode is down, Datanode is not able to start properly.
Here is what you can do :
A. start the namenode.
B. verify that namenode is running properly.
C. start the datanode. verify it started properly.
D. Run your spark application

Hadoop: Datanode process killed

I am currently using Hadoop-2.0.3-alpha and after I could work perfectly with HDFS (copying files into HDFS, getting success from an external framework, using the webfrontend), after a new start of my VM, the datanode process is stopping after a while. The namenode process and all yarn processes work without a problem. I installed Hadoop in a folder under an additional user, as I also still have installed Hadoop 0.2, which worked fine too.
Taking a look at the log-file of all datanode processes I got the following information:
2013-04-11 16:23:50,475 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-11 16:24:17,451 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-04-11 16:24:23,276 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-04-11 16:24:23,279 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-04-11 16:24:23,480 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is user-VirtualBox
2013-04-11 16:24:28,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2013-04-11 16:24:29,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2013-04-11 16:24:38,348 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-04-11 16:24:44,627 INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingIn putFilter)
2013-04-11 16:24:45,163 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context datanode
2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context logs
2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context static
2013-04-11 16:24:45,355 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:50075
2013-04-11 16:24:45,508 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-04-11 16:24:45,536 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2013-04-11 16:24:45,576 INFO org.mortbay.log: jetty-6.1.26
2013-04-11 16:25:18,416 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2013-04-11 16:25:42,670 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2013-04-11 16:25:44,955 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2013-04-11 16:25:45,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2013-04-11 16:25:47,079 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2013-04-11 16:25:47,660 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to localhost/127.0.0.1:8020 starting to offer service
2013-04-11 16:25:50,515 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2013-04-11 16:25:50,631 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2013-04-11 16:26:15,068 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data/in_use.lock acquired by nodename 3099#user-VirtualBox
2013-04-11 16:26:15,720 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020
java.io.IOException: Incompatible clusterIDs in /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data: namenode clusterID = CID-1745a89c-fb08-40f0-a14d-d37d01f199c3; datanode clusterID = CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
at org.apache.hadoop.hdfs.server.datanode.DataStorage .doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itStorage(DataNode.java:850)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itBlockPool(DataNode.java:821)
at org.apache.hadoop.hdfs.server.datanode.BPOfferServ ice.verifyAndSetNamespaceInfo(BPOfferService.java: 280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.connectToNNAndHandshake(BPServiceActor.java:22 2)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:722)
2013-04-11 16:26:16,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020
2013-04-11 16:26:16,276 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363)
2013-04-11 16:26:18,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2013-04-11 16:26:18,940 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2013-04-11 16:26:19,668 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************** **********
SHUTDOWN_MSG: Shutting down DataNode at user-VirtualBox/127.0.1.1
************************************************** **********/
Any ideas? May be I made a mistake during the installation process? But it is strange, that it worked once. I also have to say, that if I am logged in as my additional user to execute the commands ./hadoop-daemon.sh start namenode and the same with the datanode, I need to add sudo.
I used this installation guide: http://jugnu-life.blogspot.ie/2012/0...rial-023x.html
By the way, I use the Oracle Java-7 version.
The problem could be that the namenode was formatted after the cluster was set up and the datanodes were not, so the slaves are still referring to the old namenode.
We have to delete and recreate the folder /home/hadoop/dfs/data on the local filesystem for the datanode.
Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
and delete that folder
and then restart the datanode daemon on the machine
The steps above should recreate the folder and resolve the problem.
Please share your config info if the instructions above do not work.
DataNode dies because of incompatible Clusterids. To fix this problem
If you are using hadoop 2.X, then you have to delete everything in the folder that you have specified in hdfs-site.xml - "dfs.datanode.data.dir" (but NOT the folder itself).
The ClusterID will be maintained in that folder. Delete and restart dfs.sh. This should work!!!
You need to delete both
C:\hadoop\data\dfs\datanode and
C:\hadoop\data\dfs\namenode folders.
If you don't have this folders - open your C:\hadoop\etc\hadoop\hdfs-site.xml file and get paths for this folders for next deletion. For me it says:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
Run command for Format namenodec:\hadoop\bin>hdfs namenode -format
Now it should work!
I think the recommended way of doing this without deleting the data directory is to simply change the clusterID variable in the datanode's VERSION file.
If you look in your daemons directory, you will see the datanode directory exmaple
data/hadoop/daemons/datanode
The VERSION file should look like this.
cat current/VERSION
#Tue Oct 14 17:31:58 CDT 2014
storageID=DS-23bf7f3a-085c-4531-808f-801ff6d52d14
clusterID=CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
cTime=0
datanodeUuid=63154929-ae68-4149-9f75-9a6558545041
storageType=DATA_NODE
layoutVersion=-55
You need to change the clusterId to the first value in the output of the message so in your case that would be CID-1745a89c-fb08-40f0-a14d-d37d01f199c3 instead of CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
The updated version should appear like this with the altered clusterId
cat current/VERSION
#Tue Oct 14 17:31:58 CDT 2014
storageID=DS-23bf7f3a-085c-4531-808f-801ff6d52d14
clusterID=CID-1745a89c-fb08-40f0-a14d-d37d01f199c3
cTime=0
datanodeUuid=63154929-ae68-4149-9f75-9a6558545041
storageType=DATA_NODE
layoutVersion=-55
Restart hadoop and the datanode should start just fine.

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception

I am getting an error in starting the data node while initiating the single node cluster set up on my machine
************************************************************/
2013-02-18 20:21:32,300 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = somnath-laptop/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
2013-02-18 20:21:32,593 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-02-18 20:21:32,618 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-02-18 20:21:32,620 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-02-18 20:21:32,620 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-02-18 20:21:33,052 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-02-18 20:21:33,056 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-02-18 20:21:37,890 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
Any idea about how to resolve this error?
Ok got the problem solved.
Since I was using my single-node cluster through a network proxy, I had added the following property line to $HADOOP_HOME/conf/mapred-site.xml to by-pass proxy server while communicating across Hadoop daemons.
However, this time I was trying out on a direct internet connection, so had to comment out the property that I added in mapred-site.xml.
Below is the property from mapred-site.xml that I commented out:
<!--
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.StandardSocketFactory</value>
<final>true</final>
<description>
Prevent proxy settings set up by clients in their job configs from affecting our connectivity.
</description>
</property>
-->

HADOOP datanode strange things

I think I must have some misunderstanding about the datanodes in Hadoop Cluster. I have a hadoop virtural cluster composed of master,slave1,slave2, slave3. Master and slave1 are in a phsical machine while slave2 and slave3 are in one physical machine. When I start the cluster, in the HDFS webUI, I can only see three living datanodes, slave1,master, slave2. But sometimes, the three living datanodes are master, slave1,slave3. That's strange. I ssh to the unstarted data node, though I execute jps and found no datanode, I can still copy and delete files on HDFS on this node.
So I believe I must not understand datanode correctly. I have three questions here. 1 Is there one datanode per node? 2 Why the node which is not datanode can still read and write on HDFS? 3 can we decide the number of datanode?
Here is the log of the unstarted datanode:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = slave11/192.168.111.31
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch- 1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
2012-08-03 17:47:07,578 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-08-03 17:47:07,595 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-03 17:47:07,911 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-03 17:47:07,915 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-08-03 17:47:09,457 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 0 time(s).
2012-08-03 17:47:10,460 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 1 time(s).
2012-08-03 17:47:11,464 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 2 time(s).
2012-08-03 17:47:19,565 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-03 17:47:19,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2012-08-03 17:47:19,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2012-08-03 17:47:24,721 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-03 17:47:24,854 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-03 17:47:24,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2012-08-03 17:47:24,953 INFO org.mortbay.log: jetty-6.1.26
2012-08-03 17:47:25,665 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:25,688 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-08-03 17:47:25,690 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2012-08-03 17:47:30,717 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2012-08-03 17:47:30,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave11:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)
2012-08-03 17:47:30,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
2012-08-03 17:47:30,766 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:30,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2012-08-03 17:47:30,778 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2012-08-03 17:47:30,772 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2012-08-03 17:47:30,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.
2012-08-03 17:47:30,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 52ms
2012-08-03 17:47:30,838 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 32 ms
2012-08-03 17:47:30,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 2 ms
2012-08-03 17:47:31,158 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-6072482390929551157_78209
2012-08-03 17:47:33,775 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 1 ms
2012-08-03 17:47:33,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.111.31:50010 is attempting to report storage ID DS-1062340636-127.0.0.1-50010-1339803955209. Node 192.168.111.32:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:4608)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3460)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy5.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:958)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,873 INFO org.mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:33,980 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,982 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:170)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:102)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,982 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2012-08-03 17:47:33,983 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
2012-08-03 17:47:33,984 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Exiting DataBlockScanner thread.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:33,987 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:522)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:737)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2012-08-03 17:47:33,988 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2067)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:799)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down.
2012-08-03 17:47:33,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
There are problems having several DataNodes per single hostname. You say it is virtual, so are they on different virtual machines? If so, this shouldn't be a problem...
I would check the DataNode logs for slave2 and slave3 and see why one isn't booting. The error message will be printed there. If the error says something along the lines of the port being taken or something like that.
You don't need to be on a DataNode to access HDFS. The HDFS client (such as hadoop fs -put) directly communicates with the NameNode and other DataNode processes without ever having to access the local one.
It is actually quite common on large clusters to have a separate "query node" that has access to HDFS and MapReduce, but isn't running any DataNode or TaskTracker services.
As long as you have the Hadoop packages installed and the configuration files point to the NameNode and JobTracker correctly, you can access your cluster "remotely".

Resources