FileNotFoundException dfs.include - Hortonworks for windows

FileNotFoundException dfs.include - Hortonworks for windows - windows

I tried to install Hortonworks 2.2.0.2.0.6.0-0009 for Windows on a Windows Server 2012.
Everything seems clean during the installation except when launching "start_local_hdp_services.cmd" to start hadoop services. There, namenode and historyserver services fail to start and generate folowing logs :
For "hadoop-namenode-M1BY1HADOOP.log" :
2014-03-06 09:39:06,755 ERROR org.apache.hadoop.hdfs.server.namenode.HostFileManager: failed to read include file 'c:\hdp\hadoop-2.2.0.2.0.6.0-0009/etc/hadoop/dfs.include'. Continuing to use previous include list.
java.io.FileNotFoundException: c:\hdp\hadoop-2.2.0.2.0.6.0-0009\etc\hadoop\dfs.include (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.hadoop.util.HostsFileReader.readFileToSet(HostsFileReader.java:54)
at org.apache.hadoop.hdfs.server.namenode.HostFileManager$MutableEntrySet.readFile(HostFileManager.java:265)
at org.apache.hadoop.hdfs.server.namenode.HostFileManager.refresh(HostFileManager.java:284)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.<init>(DatanodeManager.java:176)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:237)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:609)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:567)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
For "hadoop-historyserver-M1BY1HADOOP.log":
014-03-06 09:39:20,130 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://VMHADOOP:8020/mapred/history/done]
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://VMHADOOP:8020/mapred/history/done]
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:503)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:88)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:93)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:155)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:165)
Does anybody know the reason of this error and can help me to solve it please?
Thank you

It is a bug in hadoop (https://issues.apache.org/jira/browse/AMBARI-2355), create empty dfs.include and dfs.exclude file and the problem would vanish.

Related

java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO Fails to start DFS

I have installed/configured Hadoop on windows hadoop-2.6.0
I couldn't successfully start "sbin\start-dfs" run command.
I am getting below error
16/12/20 13:03:56 FATAL namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.a
ccess0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:5
57)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyze
Storage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSI
mage.java:308)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(
FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNam
esystem.java:1020)
There was a similar question for running YARN. And it was told that including hadoop-2.6.0/sbin and hadoop-2.6.0/bin in path would resolve the problem. But still i am facing the error.
Can anyone help me in fixing this?

Please make sure you have proper access permission to namenode directory. Also, format the namenode and start the hdfs services.

MapReduce ERROR UserGroupInformation - PriviledgedActionException

I am trying to run following MapReduce code in my local machine:
https://github.com/Jeffyrao/warcbase/blob/extract-links/src/main/java/org/warcbase/data/ExtractLinks.java
However, I met this exception:
[main] ERROR UserGroupInformation - PriviledgedActionException as:jeffy (auth:SIMPLE) cause:java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:144)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:155)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:625)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:391)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.warcbase.data.ExtractLinks.run(ExtractLinks.java:254)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.warcbase.data.ExtractLinks.main(ExtractLinks.java:270)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:140)
... 14 more
I think this problem is because of I am trying to add a file to DistributedCache(Look at my code at Line 81-86 and Line 235). Any suggestion is welcome. Thanks!

I've met with a similar problem while running a Hadoop 2 job with DistributedCache added in local environment. Finally the cause of my problem is that Hadoop 2 is not only verifying the path itself to have public execution & read access permission, but it also verifies that all its ancestor directories should have execution permission. In this case, if "/" or "/Users" does not have a 755 permission, the file will still fail to be added into public cache.
See method static boolean ancestorsHaveExecutePermissions(FileSystem fs,
Path path, LoadingCache<Path,Future<FileStatus>> statCache)at Hadoop class FSDownload.java
One solution could be granting permission to all directories (sounds unsafe).
And a better solution is making sure all resource files to be cached are in /tmp folder or any other folder that defaultly have a >755 permission.

I've met with similar problem.
I run mahout seq2sparse with tfidf in local mode. And raise error:
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/root/title.tfidf/dictionary.file-0 is not publicly accessable and as such cannot be part of the public cache.
I found permission of /root is 750 by default
drwxr-x---. 12 root root 4096 16:04 root
So i changed permission of /root
chmod 755 /root
then it works. so thank Yitong.

I had to change permissions of only my home directory as following
chmod go+rx /home/hadoop
to fix the problem as / and /home already have rx permisions for group and other users on my system. Here 'hadoop' is my linux login/user name.

Hadoop Nodemanager and Resourcemanager not starting

I am trying to setup the latest Hadoop 2.2 single node cluster on Ubuntu 13.10 64 bit. the OS is a fresh installation, and I have tried using both java-6 64 bit and java-7 64 bit.
After following the steps from this and after failing, from this link, I am not able to start nodemanager and resourcemanager with the command:
sbin/yarn-daemon.sh start nodemanager
sudo sbin/yarn-daemon.sh start nodemanager
and resource manager with
sbin/yarn-daemon.sh start resourcemanager
sudo sbin/yarn-daemon.sh start resourcemanager
and both fails with error:
starting nodemanager, logging to /home/hduser/yarn/hadoop-2.2.0/logs/yarn-hduser-nodemanager-ubuntu.out
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:788)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:447)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
Resource Manager fails with similar error: NoClassDefFoundError
I have been trying this for many hours and have tried Google and nothing worked. Please let me know what I have missed. This and this link while searching for a solution didn't work.
I have tried using both java-6 and java-7 64 bit, with no success.
Edit
The accepted answer managed to get rid of the exception and all the daemons are now starting but there is still an exception while running jobs, mentioned in this question

Those instructions are stale and seem to reflecting one of the very early alpha releases. Make this change: YARN_HOME -> HADOOP_YARN_HOME. The environment variable got renamed a while back. This should fix it for you.
You can use Apache Ambari 1.4.1 that eases installation of Hadoop and many of its ecosystem components. You can see http://docs.hortonworks.com/#2.0 on how to install using Ambari.

you should see this is solution, add $HADOOP_HOME/share/ and its sub-directories.
http://www.srccodes.com/p/article/46/noclassdeffounderror-org-apache-hadoop-service-compositeservice-shell-exitcodeexception-classnotfoundexception

Error writing event data into HDFS through flume

I am using cdh3 update 4 tarball for development purpose. I have hadoop up and running. Now, I also downloaded equivalent flume tarball from cloudera viz 1.1.0 and tried writing a tail of log file into hdfs using hdfs-sink. When I run the flume agent, it starts okay but ends up in error when it attempts writing the new event data into hdfs. I couldn't find better group to post this question than stackoverflow.
here is flume configuration I am using
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /locationoffile
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=hdfs://localhost:8020/flume
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess
agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000
agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1
Also, this is a small snippet of error that gets displayed in console when it receives new event data and tries writing it into hdfs.
13/03/16 17:59:21 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060424.tmp
13/03/16 17:59:22 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1215)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1173)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:272)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:261)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:78)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:805)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1060)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:369)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:65)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:49)
at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:190)
at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:157)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:154)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:154)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:316)
at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:718)
at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:715)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:861)
at org.apache.hadoop.ipc.Client.call(Client.java:1141)
... 37 more
13/03/16 17:59:27 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060425.tmp
13/03/16 17:59:27 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020;

As people in cloudera mail list suggest, there are probable reasons of this error:
The HDFS safemode is turned on. Try to run hadoop fs -safemode leave and see if the error goes away.
Flume and Hadoop versions are mismatched. To check this replace the hadoop-core.jar in flume/lib directory with the one found in hadoop's installation folder.

hbase-master fails to start

I am trying to set up a Hadoop development environment. I am using CDH4 and following the installation instructions in their website https://ccp.cloudera.com/display/CDH4DOC/.
I got to the point in which I was able to install CDH4 in pseudo-distributed mode and I am following the part regarding "Components that require additional configuration".
I have installed HBase-master package, but when I try to start the service I am getting the following error:
$ sudo /sbin/service hbase-master start
starting master, logging to /var/log/hbase/hbase-hbase-master-slc01euu.out
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
I suposse that it has something to do with some env variable (i believe HADOOP_HOME). But I am not sure where to look at since all the previous processes (name node, data node, job tracker,task tracker) started with no problem.
When I search for HADOOP_HOME variable it says that it is undefined.
Do you guys have any idea about how I could solve this?
Thanks a lot in advance.

Please set HADOOP_HOME to your hadoop installation directory
eg: export HADOOP_HOME=/usr/hadoop
Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.
Add a copy of hdfs-site.xml (and core-site.xml) in under ${HBASE_HOME}/conf folder

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio