I came to work today and i found my hudson with this problem! I've tried to research, but i didn't found anything that help me.
Follow the full stack:
hudson.util.IOException2: Failed to create a temp file on /home/cpcaserver5/.hudson/jobs/SVN/workspace
at hudson.FilePath.createTextTempFile(FilePath.java:966)
at hudson.tasks.CommandInterpreter.createScriptFile(CommandInterpreter.java:124)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:68)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:630)
at hudson.model.Build$RunnerImpl.build(Build.java:175)
at hudson.model.Build$RunnerImpl.doRun(Build.java:137)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:429)
at hudson.model.Run.run(Run.java:1366)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:145)
Caused by: hudson.util.IOException2: Failed to create a temporary directory in /etc/tomcat6/apache-tomcat-6.0.35/temp
at hudson.FilePath$12.invoke(FilePath.java:955)
at hudson.FilePath$12.invoke(FilePath.java:944)
at hudson.FilePath.act(FilePath.java:758)
at hudson.FilePath.act(FilePath.java:740)
at hudson.FilePath.createTextTempFile(FilePath.java:944)
... 12 more
Caused by: java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1716)
at java.io.File.createTempFile(File.java:1804)
at hudson.FilePath$12.invoke(FilePath.java:953)
... 16 more
Email was triggered for: Failure
Sending email for trigger: Failure
It looks like you have a permissions problem. Make sure you run Jenkins/Tomcat with appropriate user permissions. Ditto if this happens on a slave - check that slave process runs as a user that has appropriate permissions.
Related
Whenever i run start-master.sh command on my local machine i am getting following error please someone help me to fix this issue
Terminal Error
Error which i get in terminal
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.0.1-bin-hadoop2.6/logs/spark-andani-org.apache.spark.deploy.master.Master-1-andani.sakha.com.out
failed to launch org.apache.spark.deploy.master.Master:
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
Log Error
If i check the spark log file following is the error
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
The error is due to your sparkMaster was not able to contact to your internal IP
Once check your /etc/hosts file weather it pointing to proper host name or your previous IP address might have changed.
Reconfigure it and run the command once again.
I have several parallel Spark jobs doing the same thing, they work on separate input/output dirs, at the end they write results to parquet from dataframe using one of the columns as a partitioner. Jobs with the biggest inputs often fail. Some of executors start to fail with below exceptions, then a stage fails and start recalculating a failed partition, if number of failed stages reaches 4(if it reaches, sometimes it doesn't and the whole job finishes successfully) the whole job is canceled.
Stages fails with these failure reasons(from spark UI):
org.apache.spark.shuffle.FetchFailedException
Connection closed by
peer
I tried to find clues on the Internet and it seems the reason maybe speculative execution, but I don't enable it in Spark, any other ideas what is the reason of that?
Spark job code:
sqlContext
.createDataFrame(finalRdd, structType)
.write()
.partitionBy(PARTITION_COLUMN_NAME)
.parquet(tmpDir);
Exceptions in executors:
16/09/14 11:04:06 ERROR datasources.DynamicPartitionWriterContainer: Aborting task.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_006023_0/partition=2/part-r-06023-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet] for [DFSClient_NONMAPREDUCE_1489398656_198] for client [10.117.102.72], because this file is already being created by [DFSClient_NONMAPREDUCE_-2049022202_200] on [10.117.102.15]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3152)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141105_0001_m_006489_0/partition=2/part-r-06489-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet (inode 318361396): File does not exist. Holder DFSClient_NONMAPREDUCE_-1428957718_196 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3625)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3428)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141105_0001_m_006310_0/partition=2/part-r-06310-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet] for [DFSClient_NONMAPREDUCE_-419723425_199] for client [10.117.102.44], because this file is already being created by [DFSClient_NONMAPREDUCE_596138765_198] on [10.117.102.35]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3152)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_005877_0/partition=2/part-r-05877-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet (inode 318359423): File does not exist. Holder DFSClient_NONMAPREDUCE_193375828_196 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3625)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3428)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_005621_0/partition=2/part-r-05621-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet] for [DFSClient_NONMAPREDUCE_498917218_197] for client [10.117.102.36], because this file is already being created by [DFSClient_NONMAPREDUCE_-578682558_197] on [10.117.102.16]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3152)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_006311_0/partition=2/part-r-06311-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet (inode 318359109): File does not exist. Holder DFSClient_NONMAPREDUCE_-60951070_198 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3625)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3428)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3284)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_006215_0/partition=2/part-r-06215-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet (inode 318359393): File does not exist. Holder DFSClient_NONMAPREDUCE_-331523575_197 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3625)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3428)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/erm/data/core/internal/ekp/stg/tmp/Z_PLAN_OPER/_temporary/0/_temporary/attempt_201609141104_0001_m_006311_0/partition=2/part-r-06311-482b0b4d-1174-4c76-b203-92b2b47c78cb.parquet] for [DFSClient_NONMAPREDUCE_1869576560_198] for client [10.117.102.44], because this file is already being created by [DFSClient_NONMAPREDUCE_-60951070_198] on [10.117.102.70]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3152)
Spark UI:
We use Spark 1.6 (CDH 5.8)
I am trying to run following MapReduce code in my local machine:
https://github.com/Jeffyrao/warcbase/blob/extract-links/src/main/java/org/warcbase/data/ExtractLinks.java
However, I met this exception:
[main] ERROR UserGroupInformation - PriviledgedActionException as:jeffy (auth:SIMPLE) cause:java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:144)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:155)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:625)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:391)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.warcbase.data.ExtractLinks.run(ExtractLinks.java:254)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.warcbase.data.ExtractLinks.main(ExtractLinks.java:270)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/Users/jeffy/Documents/Eclipse/warcbase/map_backup.txt is not publicly accessable and as such cannot be part of the public cache.
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:140)
... 14 more
I think this problem is because of I am trying to add a file to DistributedCache(Look at my code at Line 81-86 and Line 235). Any suggestion is welcome. Thanks!
I've met with a similar problem while running a Hadoop 2 job with DistributedCache added in local environment. Finally the cause of my problem is that Hadoop 2 is not only verifying the path itself to have public execution & read access permission, but it also verifies that all its ancestor directories should have execution permission. In this case, if "/" or "/Users" does not have a 755 permission, the file will still fail to be added into public cache.
See method static boolean ancestorsHaveExecutePermissions(FileSystem fs,
Path path, LoadingCache<Path,Future<FileStatus>> statCache)at Hadoop class FSDownload.java
One solution could be granting permission to all directories (sounds unsafe).
And a better solution is making sure all resource files to be cached are in /tmp folder or any other folder that defaultly have a >755 permission.
I've met with similar problem.
I run mahout seq2sparse with tfidf in local mode. And raise error:
Exception in thread "main" java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Resource file:/root/title.tfidf/dictionary.file-0 is not publicly accessable and as such cannot be part of the public cache.
I found permission of /root is 750 by default
drwxr-x---. 12 root root 4096 16:04 root
So i changed permission of /root
chmod 755 /root
then it works. so thank Yitong.
I had to change permissions of only my home directory as following
chmod go+rx /home/hadoop
to fix the problem as / and /home already have rx permisions for group and other users on my system. Here 'hadoop' is my linux login/user name.
I tried to install Hortonworks 2.2.0.2.0.6.0-0009 for Windows on a Windows Server 2012.
Everything seems clean during the installation except when launching "start_local_hdp_services.cmd" to start hadoop services. There, namenode and historyserver services fail to start and generate folowing logs :
For "hadoop-namenode-M1BY1HADOOP.log" :
2014-03-06 09:39:06,755 ERROR org.apache.hadoop.hdfs.server.namenode.HostFileManager: failed to read include file 'c:\hdp\hadoop-2.2.0.2.0.6.0-0009/etc/hadoop/dfs.include'. Continuing to use previous include list.
java.io.FileNotFoundException: c:\hdp\hadoop-2.2.0.2.0.6.0-0009\etc\hadoop\dfs.include (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.hadoop.util.HostsFileReader.readFileToSet(HostsFileReader.java:54)
at org.apache.hadoop.hdfs.server.namenode.HostFileManager$MutableEntrySet.readFile(HostFileManager.java:265)
at org.apache.hadoop.hdfs.server.namenode.HostFileManager.refresh(HostFileManager.java:284)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.<init>(DatanodeManager.java:176)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:237)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:609)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:567)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
For "hadoop-historyserver-M1BY1HADOOP.log":
014-03-06 09:39:20,130 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://VMHADOOP:8020/mapred/history/done]
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://VMHADOOP:8020/mapred/history/done]
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:503)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:88)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:93)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:155)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:165)
Does anybody know the reason of this error and can help me to solve it please?
Thank you
It is a bug in hadoop (https://issues.apache.org/jira/browse/AMBARI-2355), create empty dfs.include and dfs.exclude file and the problem would vanish.
I am trying to setup Hadoop with Kerberos
I am following the CDH3 Security Guide.
Things went pretty well so far (HFDS works ok etc), but I am getting the following error when I try to submit the Job.
I run HDFS server as user HDFS and Hadoop as user called mapred. I Submit the job using user called bob, who is in mapred group.
Following are values I have for taskcontroller.cfg
mapred.local.dir=/opt/hadoop-work/local/
hadoop.log.dir=/opt/hadoop-1.0.3/logs
mapreduce.tasktracker.group=mapred
min.user.id=1000
Error I am getting is
java.io.IOException: Job initialization failed (24) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
Can't get group information for mapred - Success.
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:192)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:185)
... 8 more
Error always comes with value given to "mapreduce.tasktracker.group=mapred" in the taskcontroller.cfg.
I have been debugging and looking in, and I think the problem is I have setup the permission among different users and groups wrong.
Any help is greatly appreciated.