java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] - hadoop

My configurations are as follows:
Running Spark 1.2.0, Hadoop 2.5.0/YARN, Cloudera CDH5 VM Centos 6.2 running on Windows 64 bit platform 8GB RAM
Below is the sequence of commands being run from the spark-shell but while trying to print the cust RDD, I am getting the Kerberos authentication error. I have logged in to spark-shell from cloudera user as login and the Cloudera VM is Kerberos authenticated with cloudera#HADOOP.LOCALDOMAIN as the default principal
Is there any way to authenticate Kerberos from spark-shell for normal RDD operations ?
Or I am missing something ? Appreciate any rightful help and will be rewarded
Below is the Spark Shell commands :
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext#26226a12
scala> sqlContext
res1: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext#7213fc4a
scala> import sqlContext.createSchemaRDD
import sqlContext.createSchemaRDD
scala> case class Cust_flat_xml(xmldata: String)
defined class Cust_flat_xml
TRIED TO LOAD FROM LOCAL FILE PATH; BUT AS IT SEEMS FROM ERROR MESSAGE HDFS INPUT IS REQUIRED
**scala> val cust = sc.textFile("/home/cloudera/tdaf/tdaf_xml_data/new_cust_20110630_cpy").map(_.split(" ")).map(p => Cust_flat_xml(p(0)))**
15/07/01 11:11:45 INFO MemoryStore: ensureFreeSpace(260017) called with curMem=843639, maxMem=280248975
15/07/01 11:11:45 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 253.9 KB, free 266.2 MB)
15/07/01 11:11:45 INFO MemoryStore: ensureFreeSpace(21212) called with curMem=1103656, maxMem=280248975
15/07/01 11:11:45 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 20.7 KB, free 266.2 MB)
15/07/01 11:11:45 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.113.234.25:58467 (size: 20.7 KB, free: 267.2 MB)
15/07/01 11:11:45 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/07/01 11:11:45 INFO SparkContext: Created broadcast 3 from textFile at <console>:28
cust: org.apache.spark.rdd.RDD[Cust_flat_xml] = MappedRDD[9] at map at <console>:28
scala> cust.foreach(println)
15/07/01 11:12:07 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 42 for cloudera on 127.0.0.1:8020
15/07/01 11:12:07 INFO TokenCache: Got dt for hdfs://localhost.localdomain:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 127.0.0.1:8020, Ident: (HDFS_DELEGATION_TOKEN token 42 for cloudera)
**org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost.localdomain:8020/home/cloudera/tdaf/tdaf_xml_data/new_cust_20110630_cpy**
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:765)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC.<init>(<console>:50)
at $iwC.<init>(<console>:52)
at <init>(<console>:54)
at .<init>(<console>:58)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ADDED new_cust_20110630_cpy TO HDFS PATH hdfs://localhost.localdomain:8020/spark/sparksql/input
scala> val cust = sc.textFile("/spark/sparksql/input/new_cust_20110630_cpy").map(_.split(" ")).map(p => Cust_flat_xml(p(0)))
15/07/01 11:19:06 INFO MemoryStore: ensureFreeSpace(260041) called with curMem=1124868, maxMem=280248975
15/07/01 11:19:06 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 253.9 KB, free 265.9 MB)
15/07/01 11:19:06 INFO MemoryStore: ensureFreeSpace(21212) called with curMem=1384909, maxMem=280248975
15/07/01 11:19:06 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 20.7 KB, free 265.9 MB)
15/07/01 11:19:06 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.113.234.25:58467 (size: 20.7 KB, free: 267.2 MB)
15/07/01 11:19:06 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/07/01 11:19:06 INFO SparkContext: Created broadcast 4 from textFile at <console>:28
cust: org.apache.spark.rdd.RDD[Cust_flat_xml] = MappedRDD[13] at map at <console>:28
scala>
scala> cust.foreach(println)
15/07/01 11:19:47 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 43 for cloudera on 127.0.0.1:8020
15/07/01 11:19:47 INFO TokenCache: Got dt for hdfs://localhost.localdomain:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 127.0.0.1:8020, Ident: (HDFS_DELEGATION_TOKEN token 43 for cloudera)
15/07/01 11:19:47 INFO FileInputFormat: Total input paths to process : 1
15/07/01 11:19:47 INFO SparkContext: Starting job: foreach at <console>:31
15/07/01 11:19:47 INFO DAGScheduler: Got job 0 (foreach at <console>:31) with 2 output partitions (allowLocal=false)
15/07/01 11:19:47 INFO DAGScheduler: Final stage: Stage 0(foreach at <console>:31)
15/07/01 11:19:47 INFO DAGScheduler: Parents of final stage: List()
15/07/01 11:19:47 INFO DAGScheduler: Missing parents: List()
15/07/01 11:19:47 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[13] at map at <console>:28), which has no missing parents
15/07/01 11:19:47 INFO MemoryStore: ensureFreeSpace(3080) called with curMem=1406121, maxMem=280248975
15/07/01 11:19:47 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.0 KB, free 265.9 MB)
15/07/01 11:19:47 INFO MemoryStore: ensureFreeSpace(1800) called with curMem=1409201, maxMem=280248975
15/07/01 11:19:47 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 1800.0 B, free 265.9 MB)
15/07/01 11:19:47 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.113.234.25:58467 (size: 1800.0 B, free: 267.2 MB)
15/07/01 11:19:47 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0
15/07/01 11:19:47 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:838
15/07/01 11:19:47 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[13] at map at <console>:28)
15/07/01 11:19:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/07/01 11:19:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:48 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.113.234.25:40605 (size: 1800.0 B, free: 267.3 MB)
15/07/01 11:19:50 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.113.234.25:40605 (size: 20.7 KB, free: 267.2 MB)
15/07/01 11:19:53 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.113.234.25): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1179)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1169)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1457)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:679)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 45 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:552)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:717)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:713)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
... 48 more
15/07/01 11:19:53 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:53 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 1]
15/07/01 11:19:53 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:53 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 2]
15/07/01 11:19:53 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:54 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 3]
15/07/01 11:19:54 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:54 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 4]
15/07/01 11:19:54 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:54 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 5]
15/07/01 11:19:54 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, 10.113.234.25, ANY, 1340 bytes)
15/07/01 11:19:54 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 6]
15/07/01 11:19:54 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
15/07/01 11:19:54 INFO TaskSchedulerImpl: Cancelling stage 0
15/07/01 11:19:54 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/07/01 11:19:54 INFO DAGScheduler: Job 0 failed: foreach at <console>:31, took 6.713733 s
15/07/01 11:19:54 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor 10.113.234.25: java.io.IOException (Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020; ) [duplicate 7]
15/07/01 11:19:54 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.113.234.25): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1179)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1169)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1457)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:679)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 45 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:552)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:717)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:713)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
... 48 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
scala>

I have been running Spark 1.2.0 in standalone mode and using a Kerberos-enabled CDH5 cluster (Cloudera VM). So Spark application cannot be run.
For more details see the below link from cloudera:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_spark_auth.html
HTH ... Pls award points if found useful. Thanks

Related

Execute hive query cause yarn resource manager to throw file does not exist exception

I'm configuring hive 3.1.0 to work with hadoop 3.0.0.
This error throw almost immediately when I submit a simple query on beeline that cause map reduce
0: jdbc:hive2://> select count(*) from airlinedata;
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in the futureversions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = UUT81HC_20181011102444_2df01ff5-ca05-403c-b0e1-15f8f7715dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
2018-10-11 10:24:45,510 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
2018-10-11 10:24:45,555 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.mapreduce.TypeConverter.toYarn(TypeConverter.java:78)
at org.apache.hadoop.mapred.ClientServiceDelegate.(ClientServiceDelegate.java:120)
at org.apache.hadoop.mapred.ClientCache.getClient(ClientCache.java:68)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:343)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 40 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.(JobIdPBImpl.java:37)
... 45 more
yarn resoucemanager stacktrace
2018-10-11 10:24:49,896 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2018-10-11 10:24:49,896 INFO recovery.RMStateStore: Updating info for app: application_1539226955170_0002
2018-10-11 10:24:49,897 INFO capacity.CapacityScheduler: Application Attempt appattempt_1539226955170_0002_000002 is done. finalState=FAILED
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application.
2018-10-11 10:24:49,897 INFO scheduler.AppSchedulingInfo: Application application_1539226955170_0002 requests cleared
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2018-10-11 10:24:49,898 INFO capacity.LeafQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2018-10-11 10:24:49,898 WARN resourcemanager.RMAuditLogger: USER=UUT81HC OPERATION=Application Finished - Failed
TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application. APPID=application_1539226955170_0002
2018-10-11 10:24:49,898 INFO capacity.ParentQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC leaf-queue of parent: root #applications: 0
2018-10-11 10:24:49,899 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1539226955170_0002,name=select count(*) from airlinedata (Stage-1),user=UUT81HC,queue=default,state=FAILED,trackingUrl=http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002,appMasterHost=N/A,submitTime=1539228287412,startTime=1539228287413,finishTime=1539228289896,finalStatus=FAILED,memorySeconds=1482,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE,resourceSeconds=1482 MB-seconds\, 0 vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
After examine how hive execute mapreduce job on yarn, I found that it first it create map.xml and reduce.xml in /tmp with permission drwx------ (only owner can use it)
2018-10-11 10:24:45,133 INFO hdfs.StateChange: BLOCK* allocate blk_1073742318_1495, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,225 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,248 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,294 INFO hdfs.StateChange: BLOCK* allocate blk_1073742319_1496, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,411 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,437 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,772 INFO hdfs.StateChange: BLOCK* allocate blk_1073742320_1497, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,438 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,463 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,618 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,639 INFO hdfs.StateChange: BLOCK* allocate blk_1073742321_1498, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,706 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,791 INFO hdfs.StateChange: BLOCK* allocate blk_1073742322_1499, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo
2018-10-11 10:24:46,870 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,971 INFO hdfs.StateChange: BLOCK* allocate blk_1073742323_1500, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml
2018-10-11 10:24:47,370 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:32:15,741 INFO blockmanagement.BlockManager: StorageInfo TreeSet fill ratio DS-d4c2a5a0-435d-4b44-b408-3cd04587cd09 : 1.0
But somehow yarn can't read that when executing job and throw out file does not exist. I did set permission 777 on /tmp but this file is self create by hive in executing process so I can't do anything with it.
I doubt that this problem is something related to user or permission when using hive in hadoop. What should I do with this?

Spark1.6.2 in cluster mode in emr

iam a submitted a spark job in emr. it runs well in client mode. but while running in cluster mode it shows following error.in
in code im using sqs receiver from GitHub - imapi/spark-sqs-receiver. and trying to access from amazon s3 using s3a.
16/10/21 07:10:00 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 71, ip-172-31-54-127.ec2.internal, partition 0,NODE_LOCAL, 2115 bytes)
16/10/21 07:10:00 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-54-127.ec2.internal:43789 (size: 2.4 KB, free: 511.5 MB)
16/10/21 07:10:01 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 72, ip-172-31-54-127.ec2.internal, partition 1,NODE_LOCAL, 2115 bytes)
16/10/21 07:10:01 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, ip-172-31-54-127.ec2.internal): java.lang.ExceptionInInitializerError
at sqs.sqstest.Streaming$1.call(Streaming.java:146)
at sqs.sqstest.Streaming$1.call(Streaming.java:97)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:170)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:170)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sqs.sqstest.Streaming.<clinit>(Streaming.java:41)
... 15 more
16/10/21 07:10:01 INFO YarnClusterSchedulerBackend: Disabling executor 1.
16/10/21 07:10:01 INFO DAGScheduler: Executor lost: 1 (epoch 1)
16/10/21 07:10:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/10/21 07:10:01 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-172-31-54-127.ec2.internal, 43789)
16/10/21 07:10:01 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
16/10/21 07:10:02 INFO YarnAllocator: Completed container container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal (state: COMPLETE, exit status: 50)
16/10/21 07:10:02 WARN YarnAllocator: Container marked as failed: container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1477032067457_0003_02_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 50
16/10/21 07:10:02 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1477032067457_0003_02_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

sparkR Rstudio error

With sparkR in Rstudio Unable to read data in error.
How solve What can I
environment
R：version 3.3.1
RStudio：Version 0.99.902
sparkR：Version 1.6.1
mac：Version 10.11.6
code
SPARK_HOME <- "/usr/local/Cellar/apache-spark/1.6.1/libexec"
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
.libPaths(c(file.path(SPARK_HOME, "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local[3]", sparkHome=SPARK_HOME,
sparkEnvir=list(spark.driver.maemory="6g",
sparkPackages="com.databricks:spark-csv_2.10:1.4.0"))
sqlContext <- sparkRSQL.init(sc)
WARN
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
code
df <- read.df(sqlContext, "iris.csv", source="com.databricks.spark.csv", inferSchema="true")
WARN
WARN : Your hostname, xxxx-no-MacBook-Pro.local resolves to a loopback/non-reachable address: fe80:0:0:0:701f:d8ff:fe34:fd1%8, but we couldn't find any external IP address!
ERROR
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
WARN
16/07/20 14:00:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
ERROR
16/07/20 14:00:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/07/20 14:00:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/07/20 14:00:44 INFO TaskSchedulerImpl: Cancelling stage 0
16/07/20 14:00:44 INFO DAGScheduler: ResultStage 0 (first at CsvRelation.scala:267) failed in 60.099 s
16/07/20 14:00:44 INFO DAGScheduler: Job 0 failed: first at CsvRelation.scala:267, took 60.168711 s
16/07/20 14:00:44 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
invokeJava(isStatic = TRUE, className, methodName, ...) でエラー:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
Because a solved way isn't understood.
Please tell me.

Try this
Sys.setenv(SPARK_HOME="/usr/local/Cellar/apache-spark/1.6.1/libexec")
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
sc <- sparkR.init(master="local", sparkEnvir = list(spark.driver.memory="4g", spark.executor.memory="6g"))
sqlContext <- sparkRSQL.init(sc)
It works for me.

Hadoop, Apache Spark

I have installed the Spark in Window. I am trying to load the text file from D: drive. RDD is getting created normally but when I perform any action on that receiving error. I have tried with all the combinations of slash but not succeed
scala> val file = sc.textFile("D:\\file\\file1.txt")
15/12/16 07:53:51 INFO MemoryStore: ensureFreeSpace(175321) called with curMem=4
01474, maxMem=280248975
15/12/16 07:53:51 INFO MemoryStore: Block broadcast_2 stored as values in memory
(estimated size 171.2 KB, free 266.7 MB)
15/12/16 07:53:51 INFO MemoryStore: ensureFreeSpace(25432) called with curMem=57
6795, maxMem=280248975
15/12/16 07:53:51 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in
memory (estimated size 24.8 KB, free 266.7 MB)
15/12/16 07:53:51 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on l
ocalhost:51963 (size: 24.8 KB, free: 267.2 MB)
15/12/16 07:53:51 INFO BlockManagerMaster: Updated info of block broadcast_2_pie
ce0
15/12/16 07:53:51 INFO SparkContext: Created broadcast 2 from textFile at <conso
le>:21
file: org.apache.spark.rdd.RDD[String] = D:\file\file1.txt MapPartitionsRDD[5] a
t textFile at <console>:21
RDD getting created normally, but when I try to perform any action on RDD receiving the below error
scala> file.count()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file
/D:/file/file1.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(Fi
eInputFormat.java:285)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.
ava:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.j
va:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD
scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)
at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC$$iwC.<init>(<console>:37)
at $iwC$$iwC.<init>(<console>:39)
at $iwC.<init>(<console>:41)
at <init>(<console>:43)
at .<init>(<console>:47)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala
1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala
1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:84
)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:
56)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.sc
la:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$l
op(SparkILoop.scala:669)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spar
ILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spar
ILoop$$process$1.apply(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spar
ILoop$$process$1.apply(SparkILoop.scala:944)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClas
Loader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$p
ocess(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSu
mit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:1
6)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
scala>

You have to use sc.textFile("file:///...")
By default it might look into HDFS. By using file protocol you will be saying to use local file system.
On windows, I got this command working
sc.textFile("file:\\C:\\Users\\data.txt").count()
For you try sc.textFile("file:\\D:\\file\\file1.txt"). Also check if you have D:/file/file.txt with the permissions. You can go to file browser and see what permissions you have on directory file and file file.txt

Spark HbaseRDD giving Exception

I am trying to read form Hbase using following code
JavaPairRDD<ImmutableBytesWritable, Result> pairRdd = ctx
.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class).cache().cache();
System.out.println(pairRdd.count());
But getting exception
java.lang.IllegalStateException: unread block data
Find below code
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
sparkConf.set("spark.master","spark://192.168.50.247:7077");
/* String [] stjars={"/home/BreakDown/SparkDemo2/target/SparkDemo2-0.0.1-SNAPSHOT.jar"};
sparkConf.setJars(stjars);*/
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaSQLContext sqlCtx = new JavaSQLContext(ctx);
Configuration conf= HBaseConfiguration.create();
;
conf.set("hbase.master","192.168.50.73:60000");
conf.set("hbase.zookeeper.quorum","192.168.50.73");
conf.set("hbase.zookeeper.property.clientPort","2181");
conf.set("zookeeper.session.timeout","6000");
conf.set("zookeeper.recovery.retry","1");
conf.set("hbase.mapreduce.inputtable","employee11");
Any pointer will be of great help
Spark version 1.1.1 hadoop 2
hadoop 2.2.0
Hbase 0.98.8-hadoop2
PFB Stack Trace
14/12/17 21:18:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/17 21:18:46 INFO AppClient$ClientActor: Connecting to master spark://192.168.50.247:7077...
14/12/17 21:18:46 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/12/17 21:18:46 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141217211846-0035
14/12/17 21:18:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.50.253, ANY, 1256 bytes)
14/12/17 21:18:47 INFO BlockManagerMasterActor: Registering block manager 192.168.50.253:41717 with 265.4 MB RAM, BlockManagerId(0, 192.168.50.253, 41717, 0)
14/12/17 21:18:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.50.253): java.lang.IllegalStateException: unread block data
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2420)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:724)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] - hadoop

Related

Execute hive query cause yarn resource manager to throw file does not exist exception

Spark1.6.2 in cluster mode in emr

sparkR Rstudio error

Hadoop, Apache Spark

Spark HbaseRDD giving Exception

Categories

Resources