Hadoop MapReduce Yarn example - hadoop

I am following this tutorial.
I run hadoop on virtual machine with ubuntu 14.04 32bit installed. I tried many configurations and solutions for similar problems but it didn't work.
When I execute jar, I will get this error:
hduser#branislav-vm:/usr/local/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
15/12/15 23:40:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/15 23:40:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/15 23:40:10 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/hduser/.staging/job_1450251211211_0001/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
15/12/15 23:40:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hduser/.staging/job_1450251211211_0001
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/hduser/.staging/job_1450251211211_0001/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
JPS:
hduser#branislav-vm:/usr/local/hadoop$ jps
15120 NameNode
16481 Jps
15458 SecondaryNameNode
15926 NodeManager
15799 ResourceManager
EDIT1:
I removed tmp and logs folder and lets do that again.
$ rm -rf /usr/local/hadoop_tmp/hdfs/namenode/
$ rm -rf /usr/local/hadoop_tmp/hdfs/datanode/
$ rm -rf /usr/local/hadoop/output
$ sudo rm -rf /tmp/hadoop-hduser/dfs
$ rm -rf /usr/local/hadoop/logs
$ start-dfs.sh
15/12/16 02:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-branislav-vm.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-branislav-vm.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-branislav-vm.out
15/12/16 02:39:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$ jps
15923 DataNode
16206 Jps
16095 SecondaryNameNode
Now NameNode did not start. This is hadoop-hduser-namenode-branislav-vm.out
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
and .log file from first WARN:
2015-12-16 02:39:04,417 WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /usr/local/hadoop_tmp/hdfs/namenode does not exist
2015-12-16 02:39:04,422 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_tmp/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2015-12-16 02:39:04,438 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2015-12-16 02:39:04,439 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2015-12-16 02:39:04,439 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2015-12-16 02:39:04,439 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2015-12-16 02:39:04,439 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_tmp/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
2015-12-16 02:39:04,441 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-12-16 02:39:04,446 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at branislav-vm/127.0.1.1
************************************************************/
YARN successfully starts.

Related

hadoop datanode failed to start with invalid location

all the services are running except datanode. when i start datanode, i get the errors below.
> ************************************************************/
2017-09-05 10:19:06,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2017-09-05 10:19:06,675 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data1/hdfs/dn should be specified as a URI in configuration files. Please update hdfs configuration.
2017-09-05 10:19:06,675 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data2/hdfs/dn should be specified as a URI in configuration files. Please update hdfs configuration.
2017-09-05 10:19:06,679 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data3/hdfs/dn should be specified as a URI in configuration files. Please update hdfs configuration.
2017-09-05 10:19:07,441 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hadoop/hzes#HZ.DATA.COM using keytab file /opt/keytab/hadoop.keytab
2017-09-05 10:19:08,208 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid dfs.datanode.data.dir /data1/hdfs/dn :
java.io.FileNotFoundException: File file:/data1/hdfs/dn does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:598)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:425)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2516)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2558)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2540)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2432)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2479)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2661)
at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
...............
2017-09-05 10:19:08,218 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/data1/hdfs/dn" "/data2/hdfs/dn" "/data3/hdfs/dn"
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2567)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2540)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2432)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2479)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2661)
at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
2017-09-05 10:19:08,221 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-09-05 10:19:08,233 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
my relative configuration:
<name>dfs.datanode.data.dir</name>
<value>/data1/hdfs/dn,/data2/hdfs/dn,/data3/hdfs/dn</value>
why? i runinig all services with root
when i change the configuration to another location(eq: /home/hadoop), it works.
What are the permissions and owner of /data1/hdfs/dn,/data2/hdfs/dn,/data3/hdfs/dn directories.
You don't have to mention /hdfs/dn/, it will automatically create /dfs/dn/.

Call From kv.local/172.20.12.168 to localhost:8020 failed on connection exception, when using tera gen

I am working with hadoop teragen to check the hadoop mapreduce benchmarking with the terasort.
But when i run the following command,
hadoop jar /Users/**/Documents/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar teragen -Dmapreduce.job.maps=100 1t random-data
I got the following exception,
17/06/01 15:09:21 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/06/01 15:09:22 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
17/06/01 15:09:23 INFO terasort.TeraSort: Generating -727379968 using 100
17/06/01 15:09:23 INFO mapreduce.JobSubmitter: number of splits:100
17/06/01 15:09:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1496303775726_0003
17/06/01 15:09:23 INFO impl.YarnClientImpl: Submitted application application_1496303775726_0003
17/06/01 15:09:23 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1496303775726_0003/
17/06/01 15:09:23 INFO mapreduce.Job: Running job: job_1496303775726_0003
17/06/01 15:09:27 INFO mapreduce.Job: Job job_1496303775726_0003 running in uber mode : false
17/06/01 15:09:27 INFO mapreduce.Job: map 0% reduce 0%
17/06/01 15:09:27 INFO mapreduce.Job: Job job_1496303775726_0003 failed with state FAILED due to: Application application_1496303775726_0003 failed 2 times due to AM Container for appattempt_1496303775726_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://localhost:8088/proxy/application_1496303775726_0003/Then, click on links to logs of each attempt.
Diagnostics: Call From KV.local/172.20.12.168 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From KV.local/172.20.12.168 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
... 31 more
As the error show, it is not able to connect to localhost:8020, but when i chech the namenode web UI, it shows that the namenode is active. Please see the below screenshot:
I found many posts related to this, but none helped me out. I also checked out the hosts file, which contains the following line:
127.0.0.1 localhost
172.20.12.168 localhost
Can anybody help me out sorting out this problem?
The following procedure helped me out in solving the issue:
Stop all the services.
Delete namenode and datanode directories as specified in hdfs-site.xml.
Create new namenode and datanode directories and modify hdfs-site.xml accordingly.
in core-site.xml, make the following changes or add the following properties:
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.20.12.168/</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://172.20.12.168:8020</value>
</property>
Make the following changes in hadoop-2.6.4/etc/hadoop/hadoop-env.sh file:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
Restart dfs, yarn and mr as follows:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

Hadoop Streaming Error No such file or directory

I study Hadoop and test Hadoop Streaming using Ruby whether my MapReduce algorithm could work without error.
So, I did next command.
hadoop jar hadoop-streaming-2.7.2.jar -files mapper.rb,reducer.rb -mapper mapper.rb -reducer reducer.rb -input test.json -output test
But, next error occurred.
dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory
Also, next is one job all error.
16/07/21 19:22:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/hadoop-unjar8708804571112102348/] [] /var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/streamjob5933629893966751936.jar tmpDir=null
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:06 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: number of splits:2
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469093819516_0005
16/07/21 19:22:06 INFO impl.YarnClientImpl: Submitted application application_1469093819516_0005
16/07/21 19:22:06 INFO mapreduce.Job: The url to track the job: http://hatanokaoruakira-no-MacBook-Air.local:8088/proxy/application_1469093819516_0005/
16/07/21 19:22:06 INFO mapreduce.Job: Running job: job_1469093819516_0005
16/07/21 19:22:13 INFO mapreduce.Job: Job job_1469093819516_0005 running in uber mode : false
16/07/21 19:22:13 INFO mapreduce.Job: map 0% reduce 0%
16/07/21 19:22:18 INFO mapreduce.Job: Task Id : attempt_1469093819516_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/private/tmp/hadoop-Kuma/nm-local-dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
mapper.rb and reducer.rb exist at current directory which executed the command.
wordcount MapReduce for testing is running without error, so I think that Hadoop setting is ok.
Environment
Mac El Capitan
Hadoop 2.7.2(presudo distributed mode)
files specified with -files options must be in hdfs:
command:
hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1
Error:
Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/f1.txt
After copying files to hdfs (in your case you will need to copy to hdfs root dir - (according to error in your log)):
hadoop fs -put f1.txt /user/cloudera
hadoop fs -put f2.txt /user/cloudera
job ran with NO errors:
[cloudera#quickstart hadoop-mapreduce]$ hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1
packageJobJar: [] [/usr/jars/hadoop-streaming-2.6.0-cdh5.7.0.jar] /tmp/streamjob5729321067745308196.jar tmpDir=null
16/07/23 11:36:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:17 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:18 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/23 11:36:18 INFO mapreduce.JobSubmitter: number of splits:1
HadoopStreaming - Making_Files_Available_to_Tasks:
The -files and -archives options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
Note: The -files and -archives options are generic options. Be sure to place the generic options before the command options, otherwise the command will fail.

start-dfs.sh datanode fails in Single node mode

I launched the script start-dfs.sh to start hdfs, but it doesn't run datanode (the namenode is runnimg and I can create folders correctly). So when I try to put data in the hdfs, I got the following error:
hdfs dfs -put history25 /mydata
16/03/05 20:42:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/05 20:42:30 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /mydata/history25._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
put: File /mydata/history25._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
Thank you for you help
Start the namenode bin/hadoop namenode. Check for errors, if any.
Start the datanodes bin/hadoop datanode. Check for errors, if any.
Now start the task-tracker, job tracker using 'bin/start-mapred.sh'

Mahout Cookbook - Weird FileNotFoundException

I'm following the mahout cookbook example. In one of them, I'm getting an exception:
mahout seqdirectory -i /home/hduser/cook/lastfm/current -o /home/hduser/cook/lastfm sequencefiles/
And then, I'm getting the following exception:
14/04/29 15:45:38 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/hduser/cook/lastfm/current], --keyPrefix=[], --method=[mapreduce], --output=[/home/hduser/cook/lastfm/sequencefiles/], --startPhase=[0], --tempDir=[temp]}
14/04/29 15:45:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
14/04/29 15:45:38 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /home/hduser/cook/lastfm/current
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:162)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The problem is that the folder exists, as shown here:
hduser#ACER-V3-571G:~/cook/lastfm/current$ ls
artists.txt ArtistTags.dat README.txt tags.txt
Mahout 0.9 and Hadoop 2.2.0
jps shows me:
5435 ResourceManager
7257 Jps
5531 NodeManager
5104 DataNode
5262 SecondaryNameNode
5008 NameNode
I'm sorry, but i found the solution by myself.
Since the file isn't in the Hadoop Filesystem, I should run mahout as local.
export MAHOUT_LOCAL=TRUE
solves the problem.
Since an "export" configures a system environment variable in unix, an similar command should do it in Windows. Try:
SET MAHOUT_LOCAL=TRUE

Resources