Im trying to aggregate some data in an Oozie workflow. However the aggregation step fails.
I found two points of interests in the logs: The first is an error(?) that seems to occur repeatedly:
After a container finishes, it gets killed but exits with non-zero Exit code 143.
It finishes:
2015-05-04 15:35:12,013 INFO [IPC Server handler 7 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 0.7231312
2015-05-04 15:35:12,015 INFO [IPC Server handler 19 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 1.0
And then then when it gets killed by Application Master:
2015-05-04 15:35:13,831 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1430730089455_0009_m_000048_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
The second point of interest is the actual error that crashes the job completely, this happens in the reduce-phase, not sure if these two are related though:
2015-05-04 15:35:28,767 INFO [IPC Server handler 20 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000051_0 is : 0.31450257
2015-05-04 15:35:29,930 INFO [IPC Server handler 10 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000052_0 is : 0.19511986
2015-05-04 15:35:31,549 INFO [IPC Server handler 1 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000050_0 is : 0.5324404
2015-05-04 15:35:31,771 INFO [IPC Server handler 28 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000051_0 is : 0.31450257
2015-05-04 15:35:31,890 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error communicating with RM: Resource Manager doesn't recognize AttemptId: application_1430730089455_0009
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Resource Manager doesn't recognize AttemptId: application_1430730089455_0009
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:675)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:244)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
at java.lang.Thread.run(Thread.java:695)
Caused by: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1430730089455_0009_000001 doesn't exist in ApplicationMasterService cache.
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy36.allocate(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:188)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:667)
... 3 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1430730089455_0009_000001 doesn't exist in ApplicationMasterService cache.
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy35.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 11 more
After that the oozie:launcher job and the job that got the error just sit there indefinitely with STATE:accepted, FINALSTATUS:undefined and TRACKING UI:unassigned.
Does anyone know what is causing this error and how I can fix it?
The same workflow worked before, and I couldnt say that I changed anything inbetween...
Just in case somebody else stubles upon this error: It seemed like this was caused due to hadoop running out of disc space... Pretty cryptic error for something as simple as that. I thought ~90GB would be enough to work on my 30GB Dataset, I was wrong.
I was also facing the similar issue, In my case task got completed successfully. But it took very much time to finish and there were very large logs on jobtracker, most of the log entries were from the progress report such as the following ones :
2015-05-04 15:35:12,013 INFO [IPC Server handler 7 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 0.7231312
2015-05-04 15:35:12,015 INFO [IPC Server handler 19 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 1.0
On debugging my code I found that distributed cache was getting read through the map function of Mapper class, After moving the logic of reading dist-cache data from map function to configure method of Mapper class, my issue got resolved.
We should have to use configure method for such kind of operations which are mainly related with configurations or the code blocks which we need to process only once before calling map function.
Related
I use windows 10 and node manager also not starting correctly. I see the following errors:
Resource manager is not connecting and failing due to :
2021-07-07 11:01:52,473 ERROR delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2021-07-07 11:01:52,493 INFO handler.ContextHandler: Stopped o.e.j.w.WebAppContext#756b58a7{/,null,UNAVAILABLE}{/cluster}
2021-07-07 11:01:52,504 INFO server.AbstractConnector: Stopped ServerConnector#633a2e99{HTTP/1.1,[http/1.1]}{0.0.0.0:8088}
2021-07-07 11:01:52,504 INFO handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler#7b420819{/static,jar:file:/F:/hadoop_new/share/hadoop/yarn/hadoop-yarn-common-3.2.1.jar!/webapps/static,UNAVAILABLE}
2021-07-07 11:01:52,507 INFO handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler#c9d0d6{/logs,file:///F:/hadoop_new/logs/,UNAVAILABLE}
2021-07-07 11:01:52,541 INFO ipc.Server: Stopping server on 8033
2021-07-07 11:01:52,543 INFO ipc.Server: Stopping IPC Server listener on 8033
2021-07-07 11:01:52,544 INFO resourcemanager.ResourceManager: Transitioning to standby state
2021-07-07 11:01:52,544 INFO ipc.Server: Stopping IPC Server Responder
2021-07-07 11:01:52,550 INFO resourcemanager.ResourceManager: Transitioned to standby state
2021-07-07 11:01:52,554 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: 5: Access is denied.
and
2021-07-07 11:01:51,625 INFO recovery.RMStateStore: Storing RMDTMasterKey.
2021-07-07 11:01:52,158 INFO store.AbstractFSNodeStore: Created store directory :file:/tmp/hadoop-yarn-Abby/node-attribute
2021-07-07 11:01:52,186 INFO service.AbstractService: Service NodeAttributesManagerImpl failed in state STARTED
5: Access is denied.
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:595)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:246)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:232)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:331)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:320)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:160)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.FileSystemNodeAttributeStore.recover(FileSystemNodeAttributeStore.java:95)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.initNodeAttributeStore(NodeAttributesManagerImpl.java:140)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.serviceStart(NodeAttributesManagerImpl.java:123)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:895)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1262)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1303)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1299)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1350)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1535)
2021-07-07 11:01:52,212 INFO service.AbstractService: Service RMActiveServices failed in state STARTED
org.apache.hadoop.service.ServiceStateException: 5: Access is denied.
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:895)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1262)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1303)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1299)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1350)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1535)
Caused by: 5: Access is denied.
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:595)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:246)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:232)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:331)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:320)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:160)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.FileSystemNodeAttributeStore.recover(FileSystemNodeAttributeStore.java:95)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.initNodeAttributeStore(NodeAttributesManagerImpl.java:140)
at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.serviceStart(NodeAttributesManagerImpl.java:123)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
... 13 more
You have access denied, maybe need to run with another user. Try to start services with a user with more access like Administrator in windows.
We have Created a High availability hadoop cluster with default file system as azure blob storage instead of hdfs by following the link https://hadoop.apache.org/docs/stable/hadoop-azure/index.html
Hivethrift service where started successfully but spark thift service where not started.
I can able to connect the spark-shell and connect with blob by referening the jar file hadoop-azure.jar but cannot start the thrift service.
Command used to start spark thrift server:
spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn
Following are the error details.
17/04/26 10:19:32 INFO metastore: Connected to metastore.
Exception in thread "main" java.lang.IllegalArgumentException: Error while insta
ntiating 'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$
$reflect(SparkSession.scala:981)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSessio
n.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109
)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.appl
y(SparkSession.scala:878)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.appl
y(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala
:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.sc
ala:878)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.
scala:47)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveTh
riftServer2.scala:81)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThr
iftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:18
7)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$
$reflect(SparkSession.scala:978)
... 22 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.ap
ache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$inter
nal$SharedState$$reflect(SharedState.scala:169)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86
)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkS
ession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkS
ession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession
.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:
157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.sc
ala:32)
... 27 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$inter
nal$SharedState$$reflect(SharedState.scala:166)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(Is
olatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.s
cala:366)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.s
cala:270)
at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCata
log.scala:65)
... 40 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException
: java.util.NoSuchElementException: An error occurred while enumerating the resu
lt, check the original exception for details.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl
.scala:192)
... 48 more
Caused by: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementExc
eption: An error occurred while enumerating the result, check the original excep
tion for details.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadat
a(AzureNativeFileSystemStore.java:1930)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(Native
AzureFileSystem.java:1592)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(Sess
ionState.java:596)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(Sess
ionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:508)
... 49 more
Caused by: java.util.NoSuchElementException: An error occurred while enumerating
the result, check the original exception for details.
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySe
gmentedIterator.java:113)
at org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasN
ext(StorageInterfaceImpl.java:128)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadat
a(AzureNativeFileSystemStore.java:1909)
... 54 more
Caused by: com.microsoft.azure.storage.StorageException: The server encountered
an unknown failure: OK
at com.microsoft.azure.storage.StorageException.translateException(Stora
geException.java:178)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(Exe
cutionEngine.java:273)
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySe
gmentedIterator.java:109)
... 56 more
Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAware
ParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConf
iguration
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Sou
rce)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Sour
ce)
at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:54
6)
at com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobList
Handler.java:72)
at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResp
onse(CloudBlobContainer.java:1253)
at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResp
onse(CloudBlobContainer.java:1217)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(Exe
cutionEngine.java:148)
... 57 more
17/04/26 10:19:33 INFO SparkContext: Invoking stop() from shutdown hook
17/04/26 10:19:33 INFO SparkUI: Stopped Spark web UI at http://10.0.0.4:4040
17/04/26 10:19:33 INFO YarnClientSchedulerBackend: Interrupting monitor thread
17/04/26 10:19:33 INFO YarnClientSchedulerBackend: Shutting down all executors
17/04/26 10:19:33 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each exec
utor to shut down
17/04/26 10:19:33 INFO SchedulerExtensionServices: Stopping SchedulerExtensionSe
rvices
(serviceOption=None,
services=List(),
started=false)
17/04/26 10:19:33 INFO YarnClientSchedulerBackend: Stopped
17/04/26 10:19:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEnd
point stopped!
17/04/26 10:19:33 INFO MemoryStore: MemoryStore cleared
17/04/26 10:19:33 INFO BlockManager: BlockManager stopped
17/04/26 10:19:33 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/26 10:19:33 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
17/04/26 10:19:33 INFO SparkContext: Successfully stopped SparkContext
17/04/26 10:19:33 INFO ShutdownHookManager: Shutdown hook called
17/04/26 10:19:33 INFO ShutdownHookManager: Deleting directory C:\Users\labuser\
AppData\Local\Temp\2\spark-11c406ec-2c53-4042-b336-9d1164c3c6f9
17/04/26 10:19:33 INFO MetricsSystemImpl: Stopping azure-file-system metrics sys
tem...
17/04/26 10:19:33 INFO MetricsSystemImpl: azure-file-system metrics system stopp
ed.
17/04/26 10:19:33 INFO MetricsSystemImpl: azure-file-system metrics system shutd
own complete.
Please help me to resolve this issue. any help would be greatly appreciated.
I have an Elastic MapReduce job which uses elasticsearch-hadoop via scalding-taps to transfer data from Amazon S3 to Amazon Elasticsearch Service. For a long time this job ran successfully. However, it has recently started failing with the following stack trace:
2016-03-02 07:28:34,003 FATAL [IPC Server handler 0 on 41019] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1456902751849_0012_m_000000_0 - exited : cascading.tuple.TupleException: unable to sink into output identifier: myindex/mytable
at cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:95)
at cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
at cascading.flow.stream.SinkStage.receive(SinkStage.java
:37)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at com.twitter.scalding.MapFunction.operate(Operations.scala:59)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null
We have enabled the "es.nodes.wan.only" setting.
What could be causing this failure?
I have set up my first simple Hadoop multi-node cluster(I am a newbie) consisting of two nodes, a master-node(also acting as slave-node) and a dedicated slave-node.
I am running a Hadoop "word count" example job, which finishes successfully. Although when i look in the logs i can see an error, this is from TaskTracker log on slave-node:
2013-04-12 00:30:16,436 WARN org.apache.hadoop.mapred.TaskTracker: Failed validating JVM
java.io.IOException: JvmValidate Failed. Ignoring request from task:
attempt_201304112309_0006_m_000003_0, with JvmId: jvm_201304112309_0006_m_1485441759
at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3434)
at org.apache.hadoop.mapred.TaskTracker.statusUpdate(TaskTracker.java:3504)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
2013-04-12 00:30:16,515 WARN org.apache.hadoop.mapred.DefaultTaskController: Exit code from task is : 143
2013-04-12 00:30:16,515 INFO org.apache.hadoop.mapred.DefaultTaskController: Output from DefaultTaskController's launchTask follows:
2013-04-12 00:30:16,515 INFO org.apache.hadoop.mapred.TaskController:
2013-04-12 00:30:16,516 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201304112309_0006_m_1485441759 exited with exit code 143. Number of tasks it ran: 1
Do anyone know what happens here?
Sorry about the ugly code indentation, i couldn´t figure out how to do it properly, this is my first post.
It looks like something killed your process, but you mentioned that it finished so Hadoops resilient features kicked in and either ran it on another node or retried it on the same getting your word count done either way. check out the answer to JVM error code 143
I have been running a Hadoop job(word count example) a few times on my two-node cluster setup, and it´s been working fine up until now. I keep getting a RuntimeException which stalls the reduce process at 19%:
2013-04-13 18:45:22,191 INFO org.apache.hadoop.mapred.Task: Task:attempt_201304131843_0001_m_000000_0 is done. And is in the process of commiting
2013-04-13 18:45:22,299 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201304131843_0001_m_000000_0' done.
2013-04-13 18:45:22,318 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-13 18:45:23,181 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error while running command to get file permissions : org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Has anyone any ideas of what might be causing this?
Edit: Solved it myself.
If anyone else runs into the same problem, this was caused by the etc/hosts file on the master-node. I hadn´t entered the host-name and address of the slave-node.
This is how my hosts-file is structured on the master-node:
127.0.0.1 MyUbuntuServer
192.xxx.x.xx2 master
192.xxx.x.xx3 MySecondUbuntuServer
192.xxx.x.xx3 slave
a similar problem is described here:
http://comments.gmane.org/gmane.comp.apache.mahout.user/8898
The info there might be related to other version of hadoop. It says:
java.lang.RuntimeException: Error while running command to
get file permissions : java.io.IOException: Cannot run program
"/bin/ls": error=12, Not enough space
The solution their was to change heapsize through mapred.child.java.opts* *-Xmx1200M
See also: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/BHGYJDNKMGE
HTH,
Avner