Spring-XD Sqoop Configuration - sqoop

We are planning to execute sqoop jobs through Spring-XD and we did the configuration that as suggested in the Spring-XD documentation.
Below is the configuration from the servers.xml file
spring:
hadoop:
fsUri: hdfs://nnservice
resourceManagerHost: server-m01.mydomain.com
resourceManagerPort: 8032
resourceManagerSchedulerAddress: ${spring.hadoop.resourceManagerHost}:8030
jobHistoryAddress: server-m02.mydomain.com:10020
security:
authMethod: kerberos
userPrincipal: springxd#mydomain.COM
userKeytab: /home/springxd/springxd.keytab
namenodePrincipal: nn/_HOST#mydomain.COM
rmManagerPrincipal: rm/_HOST#mydomain.COM
config:
mapreduce.framework.name: yarn
dfs.nameservices: nnservice
dfs.ha.namenodes.nnservice: nn1, nn2
dfs.namenode.rpc-address.nnservice.nn1: server-m01.mydomain.com:8020
dfs.namenode.rpc-address.nnservice.nn2: server-m02.mydomain.com:8020
dfs.client.failover.proxy.provider.nnservice: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.automatic-failover.enabled: True
mapreduce.application.framework.path: '/hdp/apps/2.4.2.0-258/mapreduce/mapreduce.tar.gz#mr-framework'
mapreduce.application.classpath: '$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.4.2.0-258/hadoop/lib/hadoop-lzo-0.6.0.2.4.2.0-258.jar:/etc/hadoop/conf/secure'
yarnApplicationClasspath: '$HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*'
The sqoop jobs are getting terminated with the below error
2016-10-28T10:08:35-0400 1.3.1.RELEASE INFO main mapreduce.Job - Job job_1477527264038_0051 failed with state FAILED due to: Application application_1477527264038_0051 failed 2 times due to AM Container for appattempt_1477527264038_0051_000002 exited with exitCode: 1
Any idea if we are missing on any configuration??

Related

Spark streaming job on YARN cluster mode stuck in accepted, then fails with a Timeout Exception

I am running a spark streaming application that simply read messages from a Kafka topic, enrich them and then write the enriched messages in another kafka topic.
I already tried it in Standalone mode (both client and cluster deploy mode) and in YARN client mode, successfully.
When I submit the application in cluster mode it gives me the following messages:
18/01/10 12:13:34 INFO Client: Submitting application application_1515582681419_0001 to ResourceManager
18/01/10 12:13:34 INFO YarnClientImpl: Submitted application application_1515582681419_0001
18/01/10 12:13:35 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:35 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1515582814080
final status: UNDEFINED
tracking URL: http://ambari1.internal:8088/proxy/application_1515582681419_0001/
user: root
18/01/10 12:13:36 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:37 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
And keeps stuck in ACCEPTED Status until after around 4-5 minutes, exit with the following error message:
18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms
18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped
18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately
Funny fact: If I visit the age of the application, I can see that the Spark Context has been started and it processes some messages.
Could anyone help me on this?
PS: These are the resources of my YARN cluster:
The problem might be with Yarn "App Timeline Server". Try to restart it.
Are you creating your spark session with master as local?. Please do check this.

Problems using Spark 1.6.2 for Hadoop 2.6.0 in a Hadoop 2.7.1 cluster

I have access to a Hadoop cluster, version 2.7.1, that was installed using HDP 2.4. Such a cluster has Spark installed, specifically:
$ cat /usr/hdp/2.4.3.0-227/spark/RELEASE
Spark 1.6.2.2.4.3.0-227 built for Hadoop 2.7.1.2.4.3.0-227
I'm trying to set up a "client" machine able to remotelly connect to the cluster and deploy Spark jobs. Thus, I need to install a Spark distribution for the same versions above.
First of all, I've gone to the official Spark download page, but 1.6.2 is only available for Hadoop 2.6.
Then, I decided to download Spark source code and build it by following this guide. The interesting thing is the required building profile for Hadoop "2.6.x and later 2.x" is hadoop-2-6. I.e. if I build by myself Spark, I'll obtain a distribution as the one available in the official Spark download page.
Thus, I've gone with such official pre-built distribution of Spark 1.6.2 for Hadoop 2.6.0.
And it seems not to be working properly. I've submitted a Python script -a very simple one only creating a Spark context- and there is some kind of problem (only showing relevant parts of the log):
$ ./bin/spark-submit --master yarn --deploy-mode cluster basic.py
...
17/08/28 13:08:29 INFO Client: Requesting a new application from cluster with 8 NodeManagers
17/08/28 13:08:29 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
17/08/28 13:08:29 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/08/28 13:08:29 INFO Client: Setting up container launch context for our AM
17/08/28 13:08:29 INFO Client: Setting up the launch environment for our AM container
17/08/28 13:08:29 INFO Client: Preparing resources for our AM container
17/08/28 13:08:36 INFO Client: Uploading resource file:/Users/frb/Applications/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://<host>:8020/user/frb/.sparkStaging/application_1495097788339_0066/spark-assembly-1.6.2-hadoop2.6.0.jar
17/08/28 13:14:40 INFO Client: Uploading resource file:basic.py -> hdfs://<host>:8020/user/frb/.sparkStaging/application_1495097788339_0066/basic.py
17/08/28 13:14:40 INFO Client: Uploading resource file:/Users/frb/Applications/spark-1.6.2-bin-hadoop2.6/python/lib/pyspark.zip -> hdfs://<host>:8020/user/frb/.sparkStaging/application_1495097788339_0066/pyspark.zip
17/08/28 13:14:41 INFO Client: Uploading resource file:/Users/frb/Applications/spark-1.6.2-bin-hadoop2.6/python/lib/py4j-0.9-src.zip -> hdfs://<host>:8020/user/frb/.sparkStaging/application_1495097788339_0066/py4j-0.9-src.zip
17/08/28 13:14:42 INFO Client: Uploading resource file:/private/var/folders/cc/p9gx2wnn3dz8g6yf_r4308fm0000gn/T/spark-0d86f1f4-d310-423a-9d2f-90e2ff46f84e/__spark_conf__3704082754178078870.zip -> hdfs://<host>:8020/user/frb/.sparkStaging/application_1495097788339_0066/__spark_conf__3704082754178078870.zip
17/08/28 13:14:42 INFO SecurityManager: Changing view acls to: frb
17/08/28 13:14:42 INFO SecurityManager: Changing modify acls to: frb
17/08/28 13:14:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(frb); users with modify permissions: Set(frb)
17/08/28 13:14:42 INFO Client: Submitting application 66 to ResourceManager
17/08/28 13:14:42 INFO YarnClientImpl: Submitted application application_1495097788339_0066
17/08/28 13:14:48 INFO Client: Application report for application_1495097788339_0066 (state: ACCEPTED)
17/08/28 13:14:48 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1503918882943
final status: UNDEFINED
tracking URL: <host>:8088/proxy/application_1495097788339_0066/
user: frb
17/08/28 13:14:49 INFO Client: Application report for application_1495097788339_0066 (state: ACCEPTED)
...
17/08/28 13:14:52 INFO Client: Application report for application_1495097788339_0066 (state: RUNNING)
17/08/28 13:14:52 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.95.120.6
ApplicationMaster RPC port: 0
queue: default
start time: 1503918882943
final status: UNDEFINED
tracking URL: <host>:8088/proxy/application_1495097788339_0066/
user: frb
17/08/28 13:14:53 INFO Client: Application report for application_1495097788339_0066 (state: RUNNING)
...
17/08/28 13:14:59 INFO Client: Application report for application_1495097788339_0066 (state: ACCEPTED)
17/08/28 13:14:59 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1503918882943
final status: UNDEFINED
tracking URL: <host>:8088/proxy/application_1495097788339_0066/
user: frb
17/08/28 13:15:00 INFO Client: Application report for application_1495097788339_0066 (state: ACCEPTED)
17/08/28 13:15:01 INFO Client: Application report for application_1495097788339_0066 (state: RUNNING)
17/08/28 13:15:01 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.95.58.21
ApplicationMaster RPC port: 0
queue: default
start time: 1503918882943
final status: UNDEFINED
tracking URL: <host>:8088/proxy/application_1495097788339_0066/
user: frb
17/08/28 13:15:02 INFO Client: Application report for application_1495097788339_0066 (state: RUNNING)
...
17/08/28 13:15:09 INFO Client: Application report for application_1495097788339_0066 (state: FINISHED)
17/08/28 13:15:09 INFO Client:
client token: N/A
diagnostics: Max number of executor failures (4) reached
ApplicationMaster host: 10.95.58.21
ApplicationMaster RPC port: 0
queue: default
start time: 1503918882943
final status: FAILED
tracking URL: <host>:8088/proxy/application_1495097788339_0066/
user: frb
Exception in thread "main" org.apache.spark.SparkException: Application application_1495097788339_0066 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/08/28 13:15:09 INFO ShutdownHookManager: Shutdown hook called
17/08/28 13:15:09 INFO ShutdownHookManager: Deleting directory /private/var/folders/cc/p9gx2wnn3dz8g6yf_r4308fm0000gn/T/spark-0d86f1f4-d310-423a-9d2f-90e2ff46f84e
If I check the logs for this job, I see that:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
File "basic.py", line 36, in <module>
sc = SparkContext(conf=conf)
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 1062, in __call__
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 631, in send_command
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/disk0/hadoop/yarn/local/usercache/frb/appcache/application_1495097788339_0066/container_e03_1495097788339_0066_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
I.e. the Spark context is not created, the connection fails between the JVM running the Java gateway and the Python driver running the Spark Context.
This must be related to the Spark distribution I've installed in my client machine for sure, because:
The Spark distribution of my client machine is uploaded to the clsuter, thus it is the one used; just remember this log when submitting:
17/08/28 13:08:36 INFO Client: Uploading resource file:/Users/frb/Applications/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://:8020/user/frb/.sparkStaging/application_1495097788339_0066/spark-assembly-1.6.2-hadoop2.6.0.jar
The same above command works when submitted within the cluster, i.e. when using the "Spark 1.6.2.2.4.3.0-227 built for Hadoop 2.7.1.2.4.3.0-227" version of Spark installed by HDP.
Any idea about how to fix this? Thanks!
I finaly solved this:
I added to the spark-submit command the option --conf spark.yarn.jars, with value the location of the Spark assembly jar in the remote Spark cluster. This avoids uploading the client-side Spark assembly jar I installed (which is a slow process, and does not exactly match the remote version, indeed).
I added to the client-side of yarn-site.xml the property hdp.version, with value the HDP version of the remote Hadoop-Spark cluster. This avoids a substitution error in certain paths, which in the end was revealed as the connection error I described in the question.

Remotely connect to spark on yarn cluster in client mode

I have a remote spark on yarn cluster that if I use rstudio server(web version) hosted on that cluster to connect in client mode I can do the following:
sc <- SparkR::sparkR.init(master = "yarn-client")
However if I try to use rstudio on my local machine to connect to that spark cluster the same way then I have errors:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master
...
ERROR Utils: Uncaught exception in thread nioEventLoopGroup-2-2
java.lang.NullPointerException
...
ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
A more detailed error message on hadoop application tracking page is like this:
User: blueivy
Name: SparkR
Application Type: SPARK
Application Tags:
State: FAILED
FinalStatus: FAILED
Started: 27-Oct-2015 11:07:09
Elapsed: 4mins, 39sec
Tracking URL: History
Diagnostics:
Application application_1445628650748_0027 failed 2 times due to AM Container for appattempt_1445628650748_0027_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://master:8088/proxy/application_1445628650748_0027/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1445628650748_0027_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:618)
at java.lang.Thread.run(Thread.java:785)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
I have the same configurations and environment for hadoop and spark with remote cluster: spark 1.5.1, hadoop 2.6.0 and ubuntu 14.04. Anyone can help me find what's my mistake here?

Submit Job in Spark using Yarn Cluster

I am unable to submit the job in yarn cluster.The job is running fine under yarn-client option. When submit it to yarn-cluster only this log is coming multiple times.
Application report for application_1421828570504_0002 (state: ACCEPTED)
and got failed with the following exception.
diagnostics: Application application_1421828570504_0002 failed 10 times due to AM Container for app
attempt_1421828570504_0002_000010 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
You should have a look at the logs of your application:
> yarn logs --applicationId application_1421828570504_0002
This will yield some debug information of the actual run within the spark containers.
Since it is running locally but not on the cluster my wild guess would be a missing SparkContext definition. Have a look at my answer to this question for a fix.

Hadoop/YARN job FAILED - "exited with exitCode: -1000 due to: Could not find any valid local directory for nmPrivate..."

I am trying to run a MapReduce job with Hadoop, YARN and Accumulo.
I am getting the following output that I cannot track down the issue. Looks to be a YARN issue, but I am not sure what it is looking for. I have a nmPrivate folder at location $HADOOP_PREFIX/grid/hadoop/hdfs/yarn/logs. Is this the folder it says that it cannot find?
14/03/31 08:48:46 INFO mapreduce.Job: Job job_1395942264921_0023 failed with state FAILED due to: Application application_1395942264921_0023 failed 2 times due to AM Container for appattempt_1395
942264921_0023_000002 exited with exitCode: -1000 due to: Could not find any valid local directory for nmPrivate/container_1395942264921_0023_02_000001.tokens
.Failing this attempt.. Failing the application.
When i test the spark-submit-on-yarn in the cluster mode:
spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/local/install/spark-2.2.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.0.jar 100
i gotten the same error:
Application application_1532249549503_0007 failed 2 times due to AM Container for appattempt_1532249549503_0007_000002 exited with exitCode: -1000 Failing this attempt.Diagnostics: java.io.IOException: Resource file:/usr/local/install/spark-2.2.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.0.jar changed on src filesystem (expected 1531576498000, was 1531576511000
there have one sugesstion to desolve this kind of error,to revise your core-site.xml or other conf of the HADOOP.
Finally, i fixed the error by set the property fs.defaultFS in the the $HADOOP_HOME/etc/hadoop/core-site.xml

Resources