task error because symlink creation failed - hadoop

My Hadoop cluster is running on 8 CentOS 6.3 machines and the Hadoop version is CDH 4.3 (installed from Coludera Manager 4.6).
Recently I found that some of my jobs had failed tasks. The failed tasks will success in the next attempt. However, the failed tasks are so many ( 50000 tasks, 1000 failed ) and I'm afraid this will cause performance issue or other potential issue.
All of the failed tasks have the same call stack:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Creation of symlink from /var/log/hadoop-0.20-mapreduce/userlogs/job_201311140947_0002/attempt_201311140947_0002_m_051950_0 to /hdfs7/mapred/local/userlogs/job_201311140947_0002/attempt_201311140947_0002_m_051950_0 failed.
at org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:126)
at org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:72)
at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:295)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:215)
I tried to build the symlink manually on the same path and I encountered no problems. I'm wondering what cause this issue.

Related

Launching cluster-openstack

I am working on a project Openstack in linux ubuntu 20.04. I want to create a cluster hadoop with one master-node and three worker-nodes and i have a problem with a cluster that doesn't work.
Status ERROR:
Creating cluster failed for the following reason(s): Failed to create trust Error ID: ef5e8b0a-8e6d-4878-bebb-f37f4fa50a88, Failed to create trust Error ID: 43157255-86af-4773-96c1-a07ca7ac66ed
links: https://docs.openstack.org/devstack/latest/
Can you guys advise me about these errors. Is there anything to worry about?

Installing splice machine on HDP 2.6.5

We tried almost all the guides that we can find online to splice machine as a ambari service.
But everytime we run the sqlshell.sh, it just says there is no server running and unable to connect to port 1527 on localhost.
We have a simple HDP sandbox version 2.6.5 and a three node 2.6.5 cluster. We are trying to install version 2.87 of splicemachine.
These are the guides we followed.
https://github.com/splicemachine/spliceengine/blob/branch-2.8/platforms/hdp2.6.5/docs/HDP-installation.md
That did not work on our three node cluster
Then we tried the sandbox with this tutorial
https://github.com/splicemachine/splice-ambari-service
Again the same result.
Please let us know if there is anything that we have missed in the guide/ or are there any extra steps.
Can you confirm you are running sqlshell.sh on a server that is running a region server? I would look at the region server log files. You want to look for 'Ready to accept JDBC connections on 0.0.0.0:1527'. That is the indicator that a region server is up and running. If you do not see that, then can you look in the log file and see if there are any error messages?
Found out the answer - Apart from what is mentioned in the doc here - https://github.com/splicemachine/spliceengine/blob/branch-2.8/platforms/hdp2.6.5/docs/HDP-installation.md.
There were multiple NoClassDefFoundError errors being thrown in the hbase master logs for classes found in spark2 jars.
One of them was
master.HMaster: The coprocessor com.splicemachine.hbase.SpliceMasterObserver threw java.io.IOException: Unexpected exception
java.io.IOException: Unexpected exception
at com.splicemachine.si.data.hbase.coprocessor.CoprocessorUtils.getIOException(CoprocessorUtils.java:30)
at com.splicemachine.hbase.SpliceMasterObserver.start(SpliceMasterObserver.java:111)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment.startup(CoprocessorHost.java:415)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadInstance(CoprocessorHost.java:256)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadSystemCoprocessors(CoprocessorHost.java:159)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost.<init>(MasterCoprocessorHost.java:93)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:774)
at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225)
at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/spark_project/guava/util/concurrent/ThreadFactoryBuilder
at com.splicemachine.timestamp.impl.TimestampClient.<init>(TimestampClient.java:108)
at com.splicemachine.timestamp.hbase.ZkTimestampSource.initialize(ZkTimestampSource.java:62)
at com.splicemachine.timestamp.hbase.ZkTimestampSource.<init>(ZkTimestampSource.java:48)
at com.splicemachine.si.data.hbase.coprocessor.HBaseSIEnvironment.<init>(HBaseSIEnvironment.java:146)
at com.splicemachine.si.data.hbase.coprocessor.HBaseSIEnvironment.loadEnvironment(HBaseSIEnvironment.java:100)
at com.splicemachine.hbase.SpliceMasterObserver.start(SpliceMasterObserver.java:81)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.spark_project.guava.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
We found out that this class was in the spark-network-common jar and symlinked it to one of the folders that was part of HBASE_CLASSPATH. But then it threw another classdef not found error for a different class.
We ended up symlinking all the jars in spark2/jars folder to one of the folders that was part of HBASE_CLASSPATH.
After this the hbase master came up successfully and started the splice db process and we were able to connect with the sqlshell.sh
Nots: Make sure to do this symlinking on all nodes that have a regionserver.
What helped us in figuring this out were these two java files in splice documentation
https://github.com/splicemachine/spliceengine/blob/master/hbase_sql/src/main/java/com/splicemachine/hbase/SpliceMasterObserver.java
and
https://github.com/splicemachine/spliceengine/blob/ee61cadf17c97a0c5d866e2b764142f8e55311a5/splice_timestamp_api/src/main/java/com/splicemachine/timestamp/impl/TimestampServer.java

Unable to schedule falcon process - Could not perform authorization operation, java.io.IOException: Couldn't set up IO streams

​Hi,
I am trying to schedule a falcon process using falcon CLI and falcon service user on a Kerberised cluster. I am getting the following error message:
ERROR: Bad Request;default/org.apache.falcon.FalconWebException::org.apache.falcon.FalconException: Entity schedule failed for process: testHiveProc
Falcon app logs shows following:
used by: org.apache.falcon.FalconException: E0501 : E0501: Could not perform authorization operation, Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details :
Any suggestions?
Thanks.
Root cause:
Oozie was running out of processes due to more number of scheduled jobs.
Short term solution:
Restart Oozie server
Long term solution:
- Increase ulimit
- Limit the number of scheduled jobs in Oozie

cant run pig with single node hadoop server

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage
I am missing something very obvious. Can someone help me get this running please?
More details:
1. I assume that hadoop is running fine because, I could run MapReduce jobs in python.
2. pig -x local runs as i expect.
3. when i just type pig it gives me following error
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.(PigServer.java:226)
at org.apache.pig.PigServer.(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================
Link helped me understand possible cause of failure.
Here is what fixed my problem.
1. Recompile pig without hadoop.
2. Update PIG_CLASSPATH to have all the jars from $HADOOP_HOME/lib
3. Run pig.
Thanks.
set your PIG_CLASSPATH to point to your correct HADOOP_HOME installation so that Pig can pick up ur cluster information from core-site.xml,mapreduce-site.xml and hdfs-site.xml,better to follow the link for correct installation.
Just install Cygwin, then add the Cygwin path to the Path Environment Variable:
For details see here.

"Child Error" in Executing stream Job on multi node Hadoop cluster (cloudera distribution CDH3u0 Hadoop 0.20.2)

I am working on 8 node Hadoop cluster, and I am trying to execute a simple streaming Job with the specified configuration.
hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar \-D mapred.map.max.tacker.failures=10 \-D mared.map.max.attempts=8 \-D mapred.skip.attempts.to.start.skipping=8 \-D mapred.skip.map.max.skip.records=8 \-D mapred.skip.mode.enabled=true \-D mapred.max.map.failures.percent=5 \-input /user/hdfs/ABC/ \-output "/user/hdfs/output1/" \-mapper "perl -e 'while (<>) { chomp; print; }; exit;" \-reducer "perl -e 'while (<>) { ~s/LR\>/LR\>\n/g; print ; }; exit;"
I am using cloudera's distribution for hadoop CDH3u0 with hadoop 0.20.2. The problem in execution of this job is that the job is getting failed everytime. The job is giving the error:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
-------
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
STDERR on the datanodes:
Exception in thread "main" java.io.IOException: Exception reading file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:146)
at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:159)
at org.apache.hadoop.mapred.Child.main(Child.java:107)
Caused by: java.io.FileNotFoundException: File file:/mnt/hdfs/06/local/taskTracker/hdfs/jobcache/job_201107141446_0001/jobToken does not exist.
For the cause of the error I have checked the following things and still it is crashing for which I am unable to understand the reason.
1. All the temp directories are in place
2. Memory is way more than it might be required for job (running a small job)
3. Permissions verified.
4. Nothing Fancier done in the configuration just usual stuff.
The most weird thing is that job runs successfully sometime and fails most of the time. Any guidance/Help regarding the issues would be really helpful. I am working on this error from last 4 days and I am not able to figure out anything. Please Help!!!
Thanks & Regards,
Atul
I have faced the same problem, it happens if task tracker is not able to allocates specified memory to the child JVM for the task.
Try executing same job again when cluster is not busy running many other jobs along with this one, it will go through or have speculative execution to true, in that case hadoop will execute the same task in another task tracker.

Resources