I am running spark-submit in yarn client mode. Yarn has been setup with HDP sandbox with kerberos enabled. HDP Sandbox is running on docker container on Mac host.
When spark submit is run from within the docker container of the sandbox, it’s runs successfully but when spark submit is run from the host machine it fails immediately after ACCEPTED state with error:
19/07/28 00:41:21 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:22 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:23 INFO yarn.Client: Application report for application_1564298049378_0008 (state: FAILED)
19/07/28 00:41:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1564298049378_0008 failed 2 times due to AM Container for appattempt_1564298049378_0008_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: (Client.java:1558)
... 37 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
I could not find any more information about the failure. Any help will be greatly appreciated.
Here is the resourcemanager log:
2019-07-28 22:39:04,654 INFO resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(341)) - Allocated new applicationId: 20
2019-07-28 22:39:10,982 INFO capacity.CapacityScheduler (CapacityScheduler.java:checkAndGetApplicationPriority(2526)) - Application 'application_1564332457320_0020' is submitted without priority hence considering default queue/cluster priority: 0
2019-07-28 22:39:10,982 INFO capacity.CapacityScheduler (CapacityScheduler.java:checkAndGetApplicationPriority(2547)) - Priority '0' is acceptable in queue : santosh for application: application_1564332457320_0020
2019-07-28 22:39:10,983 WARN rmapp.RMAppImpl (RMAppImpl.java:(473)) - The specific max attempts: 0 for application: 20 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2019-07-28 22:39:10,983 INFO collector.TimelineCollectorManager (TimelineCollectorManager.java:putIfAbsent(142)) - the collector for application_1564332457320_0020 was added
2019-07-28 22:39:10,984 INFO resourcemanager.ClientRMService (ClientRMService.java:submitApplication(648)) - Application with id 20 submitted by user santosh
2019-07-28 22:39:10,984 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleAppSubmitEvent(458)) - application_1564332457320_0020 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20)
2019-07-28 22:39:11,011 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020]]
2019-07-28 22:39:11,011 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:setTimerForTokenRenewal(613)) - Renew Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020] in 86399996 ms, appId = [application_1564332457320_0020]
2019-07-28 22:39:11,011 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1259)) - Storing application with id application_1564332457320_0020
2019-07-28 22:39:11,012 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from NEW to NEW_SAVING on event = START
2019-07-28 22:39:11,012 INFO recovery.RMStateStore (RMStateStore.java:transition(222)) - Storing info for app: application_1564332457320_0020
2019-07-28 22:39:11,022 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2019-07-28 22:39:11,022 INFO capacity.ParentQueue (ParentQueue.java:addApplication(494)) - Application added - appId: application_1564332457320_0020 user: santosh leaf-queue of parent: root #applications: 1
2019-07-28 22:39:11,023 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplication(990)) - Accepted application application_1564332457320_0020 from user: santosh, in queue: santosh
2019-07-28 22:39:11,023 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2019-07-28 22:39:11,023 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(479)) - Registering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,024 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:11,024 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(911)) - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:11,025 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(941)) - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:11,025 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(1036)) - Added Application Attempt appattempt_1564332457320_0020_000001 to scheduler from user santosh in queue santosh
2019-07-28 22:39:11,028 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:11,033 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1564332457320_0020_000001 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:11,034 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:11,035 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e20_1564332457320_0020_01_000001 of capacity on host sandbox-hdp.hortonworks.com:45454, which has 1 containers, used and available after allocation
2019-07-28 22:39:11,038 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : sandbox-hdp.hortonworks.com:45454 for container : container_e20_1564332457320_0020_01_000001
2019-07-28 22:39:11,043 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:11,043 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,044 INFO capacity.ParentQueue (ParentQueue.java:apply(1332)) - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:11,044 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2890)) - Allocation proposal accepted
2019-07-28 22:39:11,044 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(2213)) - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000001 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:11,051 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:11,057 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:11,060 INFO amlauncher.AMLauncher (AMLauncher.java:run(307)) - Launching masterappattempt_1564332457320_0020_000001
2019-07-28 22:39:11,068 INFO amlauncher.AMLauncher (AMLauncher.java:launch(109)) - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO amlauncher.AMLauncher (AMLauncher.java:launch(130)) - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:11,852 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:updateAppCollectorsMap(713)) - Update collector information for application application_1564332457320_0020 with new address: sandbox-hdp.hortonworks.com:35197 timestamp: 1564332457320, 36
2019-07-28 22:39:11,854 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:12,833 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=yarn.async.batch.hdfs, interval=01:11.979 minutes, events=162, succcessCount=162, totalEvents=17347, totalSuccessCount=17347
2019-07-28 22:39:12,834 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:1
2019-07-28 22:39:12,857 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:updateAppCollectorsMap(713)) - Update collector information for application application_1564332457320_0020 with new address: sandbox-hdp.hortonworks.com:35197 timestamp: 1564332457320, 37
2019-07-28 22:39:14,054 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:14,055 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1412)) - Updating application attempt appattempt_1564332457320_0020_000001 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:14,055 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:14,066 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(496)) - Unregistering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:14,067 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1538)) - The number of failed attempts is 1. The max attempts is 2
2019-07-28 22:39:14,067 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(479)) - Registering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,067 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:14,067 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(1085)) - Application Attempt appattempt_1564332457320_0020_000001 is done. finalState=FAILED
2019-07-28 22:39:14,067 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(159)) - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:14,067 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(1003)) - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:14,068 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(911)) - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:14,068 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(941)) - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:14,068 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(1036)) - Added Application Attempt appattempt_1564332457320_0020_000002 to scheduler from user santosh in queue santosh
2019-07-28 22:39:14,068 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:14,074 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1564332457320_0020_000002 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:14,074 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:14,075 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e20_1564332457320_0020_02_000001 of capacity on host sandbox-hdp.hortonworks.com:45454, which has 1 containers, used and available after allocation
2019-07-28 22:39:14,075 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : sandbox-hdp.hortonworks.com:45454 for container : container_e20_1564332457320_0020_02_000001
2019-07-28 22:39:14,076 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:14,076 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,076 INFO capacity.ParentQueue (ParentQueue.java:apply(1332)) - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:14,076 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2890)) - Allocation proposal accepted
2019-07-28 22:39:14,076 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(2213)) - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000002 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:14,077 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:14,088 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:14,089 INFO amlauncher.AMLauncher (AMLauncher.java:run(307)) - Launching masterappattempt_1564332457320_0020_000002
2019-07-28 22:39:14,091 INFO amlauncher.AMLauncher (AMLauncher.java:launch(109)) - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO amlauncher.AMLauncher (AMLauncher.java:launch(130)) - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:15,056 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:16,752 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:16,755 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1412)) - Updating application attempt appattempt_1564332457320_0020_000002 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:16,755 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:16,899 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(496)) - Unregistering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1538)) - The number of failed attempts is 2. The max attempts is 2
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(1278)) - Updating application application_1564332457320_0020 with final state: FAILED
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2019-07-28 22:39:16,900 INFO recovery.RMStateStore (RMStateStore.java:transition(260)) - Updating info for app: application_1564332457320_0020
2019-07-28 22:39:16,900 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(1085)) - Application Attempt appattempt_1564332457320_0020_000002 is done. finalState=FAILED
2019-07-28 22:39:16,901 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(159)) - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:16,901 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(1003)) - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:16,916 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1197)) - Application application_1564332457320_0020 failed 2 times due to AM Container for appattempt_1564332457320_0020_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: (Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1389)
... 37 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:410)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)
... 40 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
I get following error, while running a Oozie job.
Command:
oozie job -oozie http://10.xxx.xx.xx:11000/oozie/ -log 0000017-151029172404066-oozie-oozi-W
Logs:
2015-11-24 11:50:23,469 INFO ActionStartXCommand:543 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#:start:] Start action [0000017-151029172404066-oozie-oozi-W#:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-11-24 11:50:23,470 INFO ActionStartXCommand:543 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#:start:] [***0000017-151029172404066-oozie-oozi-W#:start:***]Action status=DONE
2015-11-24 11:50:23,470 INFO ActionStartXCommand:543 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#:start:] [***0000017-151029172404066-oozie-oozi-W#:start:***]Action updated in DB!
2015-11-24 11:50:23,567 INFO ActionStartXCommand:543 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#sqoop-node] Start action [0000017-151029172404066-oozie-oozi-W#sqoop-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-11-24 11:50:24,323 WARN ActionStartXCommand:546 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#sqoop-node] Error starting action [sqoop-node]. ErrorType [NON_TRANSIENT], ErrorCode [JA002], Message [JA002: SIMPLE authentication is not enabled. Available:[TOKEN]]
org.apache.oozie.action.ActionExecutorException: JA002: SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:418)
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:980)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1135)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:281)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:309)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy45.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:486)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:174)
at org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:221)
at org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:400)
at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1240)
at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1237)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1236)
at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:439)
at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1178)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
... 10 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1469)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy44.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:306)
... 29 more
2015-11-24 11:50:24,324 WARN ActionStartXCommand:546 - SERVER[hostname.abc.com] USER[oozie] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000017-151029172404066-oozie-oozi-W] ACTION[0000017-151029172404066-oozie-oozi-W#sqoop-node] Suspending Workflow Job id=0000017-151029172404066-oozie-oozi-W
For me, I was connecting to yarn scheduler instead of yarn resource manager.
In your oozie job.properties, make sure jobTracker url is pointing to yarn resource manager. Look for "yarn.resourcemanager.address" in your yarn-site.xml
i'm trying to write data into Hdfs using Flume-ng for exec source.But it always ended with exit code 127.and it's also showing warning like
Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null).
This is exec.conf file
execAgent.sources=e
execAgent.channels=memchannel
execAgent.sinks=HDFS
execAgent.sources.e.type=org.apache.flume.source.ExecSource
execAgent.sources.e.channels=memchannel
execAgent.sources.e.shell=/bin/bash
execAgent.sources.e.command=tail -f /home/sample.txt
execAgent.sinks.HDFS.type=hdfs
execAgent.sinks.HDFS.channel=memchannel
execAgent.sinks.HDFS.hdfs.path=hdfs://ip:address:port/user/flume/
execAgent.sinks.HDFS.hdfs.fileType=DataStream
execAgent.sinks.HDFS.hdfs.writeFormat=Text
execAgent.channels.memchannel.type=file
execAgent.channels.memchannel.capacity=1000
execAgent.channels.memchannel.transactionCapacity=100
execAgent.sources.e.channels=memchannel
execAgent.sinks.HDFS.channel=memchannel
this is the output i'm getting on console
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:exec.conf
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: execAgent
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [execAgent]
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Creating channels
15/04/17 06:24:55 INFO channel.DefaultChannelFactory: Creating instance of channel memchannel type file
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Created channel memchannel
15/04/17 06:24:55 INFO source.DefaultSourceFactory: Creating instance of source e, type org.apache.flume.source.ExecSource
15/04/17 06:24:55 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
15/04/17 06:24:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
15/04/17 06:24:56 INFO node.AbstractConfigurationProvider: Channel memchannel connected to [e, HDFS]
15/04/17 06:24:56 INFO node.Application: Starting new configuration:{ sourceRunners:{e=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:e,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#2577d2c2 counterGroup:{ name:null counters:{} } }} channels:{memchannel=FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }} }
15/04/17 06:24:56 INFO node.Application: Starting Channel memchannel
15/04/17 06:24:56 INFO file.FileChannel: Starting FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }...
15/04/17 06:24:56 INFO file.Log: Encryption is not enabled
15/04/17 06:24:56 INFO file.Log: Replay started
15/04/17 06:24:56 INFO file.Log: Found NextFileID 0, from []
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFile: Preallocated /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 to 16232 for capacity 1000
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFileV3: Starting up with /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 and /root/.flume/file-channel/checkpoint/checkpoint_1429251896225.meta
15/04/17 06:24:57 INFO file.Log: Last Checkpoint Fri Apr 17 06:24:56 UTC 2015, queue depth = 0
15/04/17 06:24:57 INFO file.Log: Replaying logs with v2 replay logic
15/04/17 06:24:57 INFO file.ReplayHandler: Starting replay of []
15/04/17 06:24:57 INFO file.ReplayHandler: read: 0, put: 0, take: 0, rollback: 0, commit: 0, skip: 0, eventCount:0
15/04/17 06:24:57 INFO file.Log: Rolling /root/.flume/file-channel/data
15/04/17 06:24:57 INFO file.Log: Roll start /root/.flume/file-channel/data
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Direct Memory Allocation: Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 18874368, Remaining = 18874368
15/04/17 06:24:57 INFO file.LogFile: Opened /root/.flume/file-channel/data/log-1
15/04/17 06:24:57 INFO file.Log: Roll end
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/.flume/file-channel/checkpoint/checkpoint_1429251896225, elements to sync = 0
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1429251897136, queueSize: 0, queueHead: 0
15/04/17 06:24:57 INFO file.Log: Updated checkpoint for file: /root/.flume/file-channel/data/log-1 position: 0 logWriteOrderID: 1429251897136
15/04/17 06:24:57 INFO file.FileChannel: Queue Size after replay: 0 [channel=memchannel]
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: memchannel, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memchannel started
15/04/17 06:24:57 INFO node.Application: Starting Sink HDFS
15/04/17 06:24:57 INFO node.Application: Starting Source e
15/04/17 06:24:57 INFO source.ExecSource: Exec source starting with command:tail -f /home/sample.txt
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SOURCE, name: e, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: e started
15/04/17 06:24:57 INFO source.ExecSource: Command [tail -f /home/brillio/sample.txt] exited with 127
From the source documentation
1) Modify the parameter : execAgent.sources.e.type to exec
2) Remove the execAgent.sources.e.shell parameter from your configuration
Check permission to see if user can run tail -f /home/brillio/sample.txt on your target dir
I am trying to set up a oozie and sqoop workflow (I want to backup mySql data into my hdfs).
But I am stuck when I try to start up my job.
I am using hadoop2(working hdfs node), the last version of oozie.
I installed oozie server on my computer (I want to test it before deploying it) with the hdfs config (core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml on the oozie conf/haddop-conf dir), and my hdfs on a server.
I have made a basic workflow (testing purpose, I just want to see if sqoop is working) like this:
<workflow-app name="Sqoop" xmlns="uri:oozie:workflow:0.4">
<start to="Sqoop"/>
<action name="Sqoop">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>yarn.resourcemanager.address:8040</job-tracker>
<name-node>hdfs://hdfs-server:54310</name-node>
<command>job --list</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I put this workflow into my hdfs.
I have made a java code for starting my job:
OozieClient wc = new OozieClient("http://localhost:11000/oozie");
Properties conf = wc.createConfiguration();
conf.setProperty( OozieClient.APP_PATH, "hdfs://hdfs_server:54310/hive/testSqoop/sqoop-workflow.xml" );
conf.setProperty("queueName", "default");
try {
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
System.out.println("Workflow job running ...");
System.out.println("..." + wc.getJobInfo(jobId).getStatus().toString() );
Thread.sleep(10 * 1000);
}
System.out.println("Workflow job completed ...");
System.out.println(wc.getJobInfo(jobId));
} catch (Exception r) {
r.printStackTrace();
}
In Oozie webinterface I can see my job running
2013-05-28 12:42:30,004 INFO ActionStartXCommand:539 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#:start:] Start action [0000000-130528124140043-oozie-anth-W#:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2013-05-28 12:42:30,008 WARN ActionStartXCommand:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#:start:] [***0000000-130528124140043-oozie-anth-W#:start:***]Action status=DONE
2013-05-28 12:42:30,009 WARN ActionStartXCommand:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#:start:] [***0000000-130528124140043-oozie-anth-W#:start:***]Action updated in DB!
2013-05-28 12:42:30,192 INFO ActionStartXCommand:539 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#Sqoop] Start action [0000000-130528124140043-oozie-anth-W#Sqoop] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2013-05-28 12:42:31,389 WARN SqoopActionExecutor:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#Sqoop] credentials is null for the action
2013-05-28 12:42:42,942 INFO SqoopActionExecutor:539 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#Sqoop] checking action, external ID [job_1369126414383_0003] status [RUNNING]
2013-05-28 12:42:42,945 WARN ActionStartXCommand:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#Sqoop] [***0000000-130528124140043-oozie-anth-W#Sqoop***]Action status=RUNNING
2013-05-28 12:42:42,946 WARN ActionStartXCommand:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[0000000-130528124140043-oozie-anth-W#Sqoop] [***0000000-130528124140043-oozie-anth-W#Sqoop***]Action updated in DB!
2013-05-28 12:47:43,034 INFO KillXCommand:539 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000000-130528124140043-oozie-anth-W
2013-05-28 12:47:43,328 WARN CoordActionUpdateXCommand:542 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
2013-05-28 12:47:43,328 INFO KillXCommand:539 - USER[anthonyc] GROUP[-] TOKEN[] APP[Sqoop] JOB[0000000-130528124140043-oozie-anth-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000000-130528124140043-oozie-anth-W
And when I check the yarn webinterface, I can see my job but with the status FAILED with
Application application_1369126414383_0003 failed 1 times due to AM Container for appattempt_1369126414383_0003_000001 exited with exitCode: 1 due to: .Failing this attempt.. Failing the application.
I really dont know what is wrong.
I need your advice.
Thank you~
You have to inspect the job logs:
$ oozie job -log <coord_job_id>
to understand what is happening.
I have a problem starting a Oozie workflow:
Config:
<workflow-app name="Hive" xmlns="uri:oozie:workflow:0.4">
<start to="Hive"/>
<action name="Hive">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>hive-default.xml</value>
</property>
</configuration>
<script>/user/hue/oozie/workspaces/hive/hive.sql</script>
<param>INPUT_TABLE=movieapp_log_json</param>
<param>OUTPUT=/user/hue/oozie/workspaces/output</param>
<file>hive-default.xml#hive-default.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
File content:
[root#localhost ~]# hadoop fs -cat /user/hue/oozie/workspaces/hive/hive.sql
SELECT * FROM ${INPUT_TABLE}
And I get error:
2013-03-11 06:53:10,196 INFO org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#:start:] Start action [0000025-130310103217365-oozie-oozi-W#:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2013-03-11 06:53:10,197 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#:start:] [0000025-130310103217365-oozie-oozi-W#:start:]Action status=DONE
2013-03-11 06:53:10,197 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#:start:] [0000025-130310103217365-oozie-oozi-W#:start:]Action updated in DB!
2013-03-11 06:53:10,351 INFO org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] Start action [0000025-130310103217365-oozie-oozi-W#Hive] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2013-03-11 06:53:11,244 WARN org.apache.oozie.action.hadoop.HiveActionExecutor: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] credentials is null for the action
2013-03-11 06:53:13,734 INFO org.apache.oozie.action.hadoop.HiveActionExecutor: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] checking action, external ID [job_201303101032_0029] status [RUNNING]
2013-03-11 06:53:13,838 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] [0000025-130310103217365-oozie-oozi-W#Hive]Action status=RUNNING
2013-03-11 06:53:13,839 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] [0000025-130310103217365-oozie-oozi-W#Hive]Action updated in DB!
2013-03-11 06:53:41,459 INFO org.apache.oozie.servlet.CallbackServlet: USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] callback for action [0000025-130310103217365-oozie-oozi-W#Hive]
2013-03-11 06:53:41,570 INFO org.apache.oozie.action.hadoop.HiveActionExecutor: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] action completed, external ID [job_201303101032_0029]
2013-03-11 06:53:41,610 WARN org.apache.oozie.action.hadoop.HiveActionExecutor: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
2013-03-11 06:53:41,807 INFO org.apache.oozie.command.wf.ActionEndXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#Hive] ERROR is considered as FAILED for SLA
2013-03-11 06:53:41,877 INFO org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#kill] Start action [0000025-130310103217365-oozie-oozi-W#kill] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2013-03-11 06:53:41,877 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#kill] [0000025-130310103217365-oozie-oozi-W#kill]Action status=DONE
2013-03-11 06:53:41,877 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[0000025-130310103217365-oozie-oozi-W#kill] [0000025-130310103217365-oozie-oozi-W#kill]Action updated in DB!
2013-03-11 06:53:42,030 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[Hive] JOB[0000025-130310103217365-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
Any ideas?
Error is from Oozie, workflow did not launched by coordinator i.e, if you started Oozie with root user stop the service and restart the oozie with User which you installed Oozie. Now re-run the workflow.
This would solve your problem!!