Is there an official way to support both Spark 1.6.2 and 2.0.0 on Hadoop yarn 2.7.2 cluster? - hadoop

I have a cluster running Hadoop yarn 2.7.2 with dynamic allocation enabled for Spark 1.6.2.
Is there an official way to support both Spark 1.6.2 and 2.0.0? Because when I tried to submit an application from Spark 2.0.0 client, exception happened in driver like below:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit;)J
at org.apache.spark.internal.config.ConfigHelpers$.byteFromString(ConfigBuilder.scala:63)
at org.apache.spark.internal.config.ConfigBuilder$$anonfun$bytesConf$1.apply(ConfigBuilder.scala:197)
at org.apache.spark.internal.config.ConfigBuilder$$anonfun$bytesConf$1.apply(ConfigBuilder.scala:197)
at org.apache.spark.internal.config.TypedConfigBuilder.createWithDefaultString(ConfigBuilder.scala:131)
at org.apache.spark.internal.config.package$.<init>(package.scala:41)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:69)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:785)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:784)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

This feature is supported by Hortonwork's HDP distribution, I have a cluster running HDP 2.5, which supports Hadoop 2.7.3, Spark 1.6.2 and 2.0.0 on Centos7.
I have not experienced any problem while using either Spark and Spark2 jobs.
How did you install and configure both Spark versions? You can give a try to HDP sandbox and use as inspiration how is Spark & Spark2 configured for your own cluster.

Related

SAP Vora Thrift Server Error: Instantiating dialect 'sapsql' failed

I have deployed a cloudera CDH 5.13.1 Cluster with SAP Vora 1.4 Patch 4.
When I started the Vora thrift server everything looks fine, but as soon as I start SAP Vora tools and login following error shows up:
17/12/20 11:26:52 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.catalyst.errors.package$DialectException: Instantiating dialect 'sapsql' failed.
Reverting to default dialect 'sapsql'
at org.apache.spark.sql.SQLContext.getSQLDialect(SQLContext.scala:225)
at org.apache.spark.sql.hive.HiveContext.getSQLDialect(HiveContext.scala:577)
at org.apache.spark.sql.hive.SapHiveContext$$anonfun$1.apply(SapHiveContext.scala:54)
at org.apache.spark.sql.hive.SapHiveContext$$anonfun$1.apply(SapHiveContext.scala:54)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
at org.apache.spark.sql.hive.SapHiveContext$$anonfun$2.apply(SapHiveContext.scala:58)
at org.apache.spark.sql.hive.SapHiveContext$$anonfun$2.apply(SapHiveContext.scala:58)
at org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:43)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:334)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.extension.SapSQLDialect
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:177)
at org.apache.spark.sql.SQLContext.getSQLDialect(SQLContext.scala:215)
... 54 more
In the installation guide it says I need to assign the vora user authorization for the Hive Metastore.
Since this is only a test setup authorization is disabled in Hive and the vora user can create and drop tables in the default database and has write access to Hive's warehouse location.
How can I solve it?
This issue is caused by an incompatability with CDH 5.13 and Vora 1.4 patch 4. The issue is currently being investigated by SAP.
Is it an option for you to move to a newer Vora version? Current version is Vora 2.1. Since version 2.0 Vora is deployed in a Kubernetes cluster instead of the Hadoop cluster. This could help to overcome this CDH dependency issue.

How to compile Nutch 2.3.1 with Hbase 1.2.6

I have to setup hadoop stack with Nutch 2.3.1. Supported version of Hbase for hadoop 2.7.4 is 1.2.6 that I have configured and tested successfully. But when I compile Nutch I got following and crawl a sample page I got this error.
/usr/local/nutch/runtime/local/bin/nutch inject urls/ -crawlId kics
InjectorJob: starting at 2017-09-21 14:20:10
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoSuchFieldError: HBASE_CLIENT_PREFETCH_LIMIT
at org.apache.hadoop.hbase.client.HConnectionKey.<clinit>(HConnectionKey.java:43)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:267)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:194)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:115)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Error running:
According to my search such as this and this, Hbase 1.x can be compiled for Nutch 2.3.1. But How to compile I have no idea. Can someone please guide (steps etc.)
Apache Gora 0.7 is the one supporting HBase 1.2.3(+): https://issues.apache.org/jira/browse/GORA-443
You can take a look at https://stackoverflow.com/a/39837926/582789 where I wrote how to modify Nutch 2.3.1 to work with Apache Gora 0.7. About the patch https://paste.apache.org/jjqz in that answer, use "0.7" where it shows "0.7-SNAPSHOT".
By the way, Apache Gora 0.8 was released yesterday :) Just changing 0.7 for 0.8 should work.
http://gora.apache.org/#20-september-2017-apache-gora-08-release

Specify a vaild path to the correct hive jars using $HIVE_METASTORE_JARS or change spark.sql.hive.metastore.version to 1.2.1

When i try to run spark-submit on the Jar which had HiveContext,getting the below error.
Spark-defaults.conf had
spark.sql.hive.metastore.version 0.14.0
spark.sql.hive.metastore.jars ----/external_jars/hive-metastore-0.14.0.jar
#spark.sql.hive.metastore.jars maven
I would like to use Hive Metastore version 0.14. both spark and hadoop are on diff clusters.
Can anyone helping me with resolving this one?
16/09/19 16:52:24 INFO HiveContext: default warehouse location is /apps/hive/warehouse
Exception in thread "main" java.lang.IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 1.2.1 != Metastore: 0.14.0.
Specify a vaild path to the correct hive jars using $HIVE_METASTORE_JARS or change spark.sql.hive.metastore.version to 1.2.1.
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:254)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$cla
try
val hadoopConfig: Configuration = spark.hadoopConfiguration
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getNam‌​e)
hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
in the spark

InvalidResourceRequestException Yarn Exception while running Spark in Cluster mode with yarn in hadoop 2.4

Using Apache spark 1.1.0 with hadoop 2.4
Also my cluster is on CDH 5.1.3
I tried with below command to start spark with yarn.
./spark-shell --master yarn
./spark-shell --master yarn-client
I got the following exception:
14/10/15 21:33:32 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
appMasterRpcPort: 0
appStartTime: 1413388999108
yarnAppState: RUNNING
14/10/15 21:33:44 ERROR cluster.YarnClientSchedulerBackend: Yarn
application already ended: FAILED
======Node manager Exception ============================================
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
Invalid resource request, requested memory < 0, or requested memory >
max configured, requestedMemory=1408, maxMemory=1024 at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:228)
at
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:444)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
at org.apache.hadoop.ipc.Client.call(Client.java:1410) at
org.apache.hadoop.ipc.Client.call(Client.java:1363) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy11.allocate(Unknown Source) at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 20 more
According to your YARN Configuration, the maximum memory an application can request for a container is 1024MB. But the spark client is requesting a container with 1408MB. Either change the config file for spark to request less RAM or raise the max memory in YARN.

Server IPC version 7 cannot communicate with client version 4

I am trying to submit a Hadoop Map-Reduce job from a CDH3u4 cluster to cluster which runs on CDH4.3. (The fs.default.name and mapred.job.tracker Configuration parameters are set to point to the CDH4.3 cluster). Following is the stack trace.
1) Can we submit a hadoop job to a remote cluster working on different versions?
2) Is there a workaround to do this?
hadoop jar Standalone.jar
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:255)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:217)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1563)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1597)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1579)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
at com.poc.standalone.HDFSRemoteAccess.main(HDFSRemoteAccess.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
You need to set the HADOOP_PREFIX environment variable to point to a directory where your have a version of hadoop 2.X.X installed.
e.g.
export HADOOP_PREFIX=pathtohadoop-2.2.0
I have faced this exception when I was trying to connect to hdfs. I'm using cdh4.6 version.
I've solved this issue by adding cloudera mvn dependencies. You can find a dependency list here.
Firstly, you should check your dependencies.
Another point is you should try to use fs.deafultFS config parameter instead of(or beside of) fs.default.name param. because fs.default.name is deprecated in cdh4X
1) You should have dependencies of the both versions and may be able to swicth between them.
2) take look at here to keep the different versions of dependencies.

Resources