How to include spark tests as Maven dependency - maven

I have inherited old code that depends on
org.apache.spark.LocalSparkContext
which is in the spark core tests. But the spark core jar (correctly) does not include test-only classes. I was unable to determine if/where spark test classes have their own maven artifacts. What is the correct approach here?

You can add a dependency to the test-jar of Spark by adding <type>test-jar</type>. For example, for Spark 1.5.1 based on Scala 2.11:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.5.1</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
This dependency provides all the test classes of Spark, including LocalSparkContext.

I came here hoping to find some inspiration for doing the same in SBT. As a reference for other SBT users: Applying the pattern of using test-jars in SBT for Spark 2.0 results in:
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0" classifier "tests"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0" classifier "tests"

If you want to add test jars , might go ahead adding in SBT as mentioned below:
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "test-sources",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"org.scalatest" %% "scalatest" % "3.0.4" % "test")
The same if you want to add it over the MAVEN dependencies, you can do it by as mentioned below :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<classifier>tests</classifier>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<classifier>test-sources</classifier>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependencies>

Related

TableInputFormat is not a member of package org.apache.hadoop.hbase.mapreduce

I import the TableInputFormat in my code as:
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
but it shows errors:
object TableInputFormat is not a member of package org.apache.hadoop.hbase.mapreduce
but package org.apache.hadoop.hbase.mapreduce do has the class TableInputFormat (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html)
And I have added the libraryDependencies including :
"org.apache.spark" % "spark-core_2.11" % "2.4.0""org.apache.hbase" % "hbase-server" % "2.1.1""org.apache.hbase" % "hbase-common" % "2.1.1""org.apache.hbase" % "hbase-hadoop-compat" % "2.1.1""org.apache.hadoop" % "hadoop-common" % "2.8.5"
TableInputFormat is in the org.apache.hadoop.hbase.mapreduce
package, which is part of the hbase-server artifact, so it needs to add that as a dependency. But I have added that dependency, why will it run wrong?
I also encounter the same problem, but after I add "hbase-mapreduce" to the pom.xml and it works well. Here my pom.xml is:
<!-- start of HBase-->
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>${hbase.version}</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-mapreduce -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- end of hbase -->

Spark 1.3 and Cassandra 3.0 problems with guava

I am trying to connect to Cassandra 3.0 from Spark 1.3. I know that there is Cassandra connector for each version in spark, but spark-cassandra-connector-java_2.10:1.3.0 connector depends on cassandra-driver-core:2.1.5, that's why I am using the latest cassandra connector which depends the latest core driver. Anyway, so far this was not the problem. The problem is the com.google.guava package I suppose.
My pom looks like this:
...
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
...
I have excluded google guava from everywhere with:
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
so in the dependency tree only this is present com.google.guava:guava:jar:16.0.1 under com.datastax.spark:spark-cassandra-connector-java_2.10:jar:1.5.0-M3:compile.
However I am still getting the following error:
yarn.ApplicationMaster: User class threw exception: Failed to open native connection to Cassandra at {139.19.52.111}:9042
java.io.IOException: Failed to open native connection to Cassandra at {139.19.52.111}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.ambiverse.tagging.dao.impl.DAOCassandra.createTable(DAOCassandra.java:45)
at com.ambiverse.tagging.dao.impl.DAOCassandra.createTable(DAOCassandra.java:64)
at com.ambiverse.tagging.dao.impl.DAOCassandra.savePairRDD(DAOCassandra.java:70)
at com.ambiverse.tagging.statistics.entitycorrelation.CorrelationStatisticsSparkRunner.run(CorrelationStatisticsSparkRunner.java:176)
at com.ambiverse.tagging.statistics.entitycorrelation.CorrelationStatisticsSparkRunner.main(CorrelationStatisticsSparkRunner.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
at com.datastax.driver.core.Connection.initAsync(Connection.java:178)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:742)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:240)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:187)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1393)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:402)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
Before somebody point me to this blog post for solution: http://arjon.es/2015/10/12/making-hadoop-2-dot-6-plus-spark-cassandra-driver-play-nice-together/, I am using maven as a build tool, not sbt. If you know how can I do the exact same thing with maven, that would be great.
Although i work with scala + sbt, i had several mismatch between different artifacts with spark, and one among them is guava.
here is how i solved it (dependencies in sbt):
val sparkVersion = "1.6.1"//"2.0.0-preview"//
val sparkCassandraConnectorVersion = "1.6.0"
val scalaGuiceVersion = "4.0.1"
val cassandraUnitVersion = "3.0.0.1"
val typesafeConfigVersion = "1.3.0"
val findbugsVersion = "3.0.0"
val sparkRabbitmqVersion = "0.4.0.20160613"
val nettyAllVersion = "4.0.33.Final"
val guavaVersion = "19.0"
val jacksonVersion = "2.7.4"
val xbeanAsm5ShadedVersion = "4.5"
val commonsBeanutilsVersion = "1.8.0"
//IMPORTANT: all spark dependency magic is done in one place, to overcome the assembly mismatch errors
val sparkDependencies :List[ModuleID] = List(
("org.apache.spark" %% "spark-core" % sparkVersion).exclude("com.esotericsoftware.minlog", "minlog"),
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
("com.datastax.spark" %% "spark-cassandra-connector"
% sparkCassandraConnectorVersion).exclude("org.apache.cassandra", "cassandra-clientutil"),
"com.stratio.receiver" % "spark-rabbitmq_1.6" % sparkRabbitmqVersion,//"0.3.0-b", //,//
"org.scalatest" %% "scalatest" % scalaTestVersion % "test",
"org.apache.xbean" % "xbean-asm5-shaded" % xbeanAsm5ShadedVersion,//,//, //https://github.com/apache/spark/pull/9512/files
"io.netty" % "netty-all" % nettyAllVersion,
"commons-beanutils" % "commons-beanutils" % commonsBeanutilsVersion,
"com.google.guava" % "guava" % guavaVersion,
"com.fasterxml.jackson.module" %% "jackson-module-scala" % jacksonVersion,//fix jackson mismatch problem
"com.fasterxml.jackson.core" % "jackson-databind" % jacksonVersion,//fix jackson mismatch problem
//override findbugs artifacts versions(fix assembly issues)
"com.google.code.findbugs" % "annotations" % findbugsVersion,
"com.google.code.findbugs" % "jsr305" % findbugsVersion
).map(_.exclude("commons-collections", "commons-collections"))
i hope it will help.

Play framework 2.3.8, org.apache.poi dependency not found

I added the org.apache.poi to my dependencies, but it just does not resolve.
libraryDependencies ++= Seq(
"postgresql" % "postgresql" % "9.1-901-1.jdbc4",
"net.sf.jasperreports" % "jasperreports" % "6.0.3",
"net.sf.jasperreports" % "jasperreports-fonts" % "6.0.0",
"com.typesafe.play" %% "play-mailer" % "2.4.1",
"org.apache.poi" %% "poi" % "3.13",
javaJdbc,
javaEbean,
cache,
javaWs
)
Getting error, that it does search it but is not found. Interesting is this :
Warning:Play 2 Compiler: ==== public: tried
Warning:Play 2 Compiler: http://repo1.maven.org/maven2/org/apache/poi/poi_2.11/3.13/poi_2.11-3.13.pom
Error:Play 2 Compiler:
(*:update) sbt.ResolveException: unresolved dependency: org.apache.poi#poi_2.11;3.13: not found
But in reality, the location of the pom file is here:
https://repo1.maven.org/maven2/org/apache/poi/poi/3.13/poi-3.13.pom
Why does play framework append that 2.11 version there?
Just remove one percentage symbol
"org.apache.poi" % "poi" % "3.13",

Play Framework and Java8

I have one Java 8 project and this project is a dependency of Play Web app.
Now whenever I try to instantiate classes rom Java 8 project in Play 2.2.3 web app, it gives me following error:
play.PlayExceptions$CompilationException: Compilation error[error: cannot access MongoOperations]
at play.PlayReloader$$anon$1$$anonfun$reload$2$$anonfun$apply$14$$anonfun$apply$16.apply(PlayReloader.scala:304) ~[na:na]
at play.PlayReloader$$anon$1$$anonfun$reload$2$$anonfun$apply$14$$anonfun$apply$16.apply(PlayReloader.scala:304) ~[na:na]
How should let play compile code with Java 8 when I say 'Play "run 8080"' ? Why play isn't able to access the class in Java 8 project ?
FYI: My JAVA_HOME is pointing to JAVA 8.
Here is how my build.sbt looks like.
Note that 'content-aggregator' is my local artifact installed in my local maven repo.
name := "web"
version := "1.0-SNAPSHOT"
resolvers += "Maven central" at "http://repo1.maven.org/maven2"
libraryDependencies ++= Seq(
javaJdbc,
javaEbean,
cache,
"de.undercouch" % "bson4jackson" % "2.1.0" force(),
"com.fasterxml.jackson.core" % "jackson-databind" % "2.1.0" force(),
"com.fasterxml.jackson.core" % "jackson-annotations" % "2.1.0" force(),
"com.fasterxml.jackson.core" % "jackson-core" % "2.1.0" force(),
"org.mongodb" % "mongo-java-driver" % "2.11.3",
"com.techr" % "content-aggregator" % "0.0.1-SNAPSHOT",
"org.jongo" % "jongo" % "1.0",
"uk.co.panaxiom" %% "play-jongo" % "0.6.0-jongo1.0"
)
play.Project.playJavaSettings
In 'content-aggregator'(Java 8) project I am using Spring and have injected beans by autowiring.
MongoOperations is autowired in one of the classes and play is yelling about it.
SpringMongoConfig.java is a class from this project which is marked as #configuration annotation.
Now in Play project I have created config class which imports content-aggregator's config class.
#Configuration
#Import(SpringMongoConfig.class)
public class SpringConfig {
}

Deploy Play 2.1 RC1 Java Application to Heroku Got 'NoSuchMethodError scala.Predef$.augmentString'

This application was migrated from Play 2.0.4. to 2.1-RC1. When push to heroku, I got this error from heroku logs. Should I use a different build pack for Play 2.1?
2012-12-11T03:04:36+00:00 heroku[web.1]: Starting process with command `target/start -Dhttp.port=${PORT} -Dconfig.resource=prod.conf ${JAVA_OPTS}`
2012-12-11T03:04:38+00:00 app[web.1]: Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.augmentString(Ljava/lang/String;)Lscala/collection/immutable/StringOps;
2012-12-11T03:04:38+00:00 app[web.1]: at play.core.server.NettyServer$.createServer(NettyServer.scala:111)
2012-12-11T03:04:38+00:00 app[web.1]: at play.core.server.NettyServer$$anonfun$main$5.apply(NettyServer.scala:153)
2012-12-11T03:04:38+00:00 app[web.1]: at play.core.server.NettyServer$$anonfun$main$5.apply(NettyServer.scala:152)
2012-12-11T03:04:38+00:00 app[web.1]: at scala.Option.map(Option.scala:145)
2012-12-11T03:04:38+00:00 app[web.1]: at play.core.server.NettyServer$.main(NettyServer.scala:152)
2012-12-11T03:04:38+00:00 app[web.1]: at play.core.server.NettyServer.main(NettyServer.scala)
Here is my Build.scala.
val appDependencies = Seq(
javaCore, javaJdbc, javaEbean,
"org.webjars" % "bootstrap" % "2.1.1",
"postgresql" % "postgresql" % "9.1-901-1.jdbc4",
"rome" % "rome" % "1.0",
"com.typesafe" %% "play-plugins-mailer" % "2.1-SNAPSHOT",
"commons-codec" % "commons-codec" % "1.6",
"commons-io" % "commons-io" % "2.3",
"com.typesafe" % "play-plugins-inject" % "2.0.2",
"com.typesafe" %% "play-plugins-mailer" % "2.1-SNAPSHOT",
"com.typesafe.akka" % "akka-testkit" % "2.0.2",
"org.imgscalr" % "imgscalr-lib" % "4.2",
"org.codehaus.jackson" % "jackson-jaxrs" % "1.9.5",
"org.codehaus.jackson" % "jackson-xc" % "1.9.5",
"org.codehaus.jackson" % "jackson-mapper-asl" % "1.9.5",
"org.codehaus.jackson" % "jackson-core-asl" % "1.9.5",
"org.mindrot" % "jbcrypt" % "0.3m"
)
val main = play.Project(appName, appVersion, appDependencies).settings(
resolvers += "webjars" at "http://webjars.github.com/m2",
resolvers += "Mave2" at "http://repo1.maven.org/maven2",
resolvers += "jets3t" at "http://www.jets3t.org/maven2",
resolvers += "Typesafe Releases Repository" at "http://repo.typesafe.com/typesafe/releases/",
resolvers += "Typesafe Snapshots Repository" at "http://repo.typesafe.com/typesafe/snapshots/",
resolvers += Resolver.url("Typesafe Ivy Snapshots", url("http://repo.typesafe.com/typesafe/ivy-snapshots/"))(Resolver.ivyStylePatterns),
resolvers += "Daniel's Repository" at "http://danieldietrich.net/repository/snapshots/"
)
For my case, I have to remove this dependency from build.scala
"com.typesafe" % "play-plugins-inject" % "2.0.2"
and remove plugin from play.plugins.
1500:com.typesafe.plugin.inject.ManualInjectionPlugin
This plugin brings in play_2.9 which has dependency on ehcache and causes play to initialize play's cache second time while play_2.10 from Play 2.1 has initialized it already.

Resources