TableInputFormat is not a member of package org.apache.hadoop.hbase.mapreduce - hadoop

I import the TableInputFormat in my code as:
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
but it shows errors:
object TableInputFormat is not a member of package org.apache.hadoop.hbase.mapreduce
but package org.apache.hadoop.hbase.mapreduce do has the class TableInputFormat (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html)
And I have added the libraryDependencies including :
"org.apache.spark" % "spark-core_2.11" % "2.4.0""org.apache.hbase" % "hbase-server" % "2.1.1""org.apache.hbase" % "hbase-common" % "2.1.1""org.apache.hbase" % "hbase-hadoop-compat" % "2.1.1""org.apache.hadoop" % "hadoop-common" % "2.8.5"
TableInputFormat is in the org.apache.hadoop.hbase.mapreduce
package, which is part of the hbase-server artifact, so it needs to add that as a dependency. But I have added that dependency, why will it run wrong?

I also encounter the same problem, but after I add "hbase-mapreduce" to the pom.xml and it works well. Here my pom.xml is:
<!-- start of HBase-->
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>${hbase.version}</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-mapreduce -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- end of hbase -->

Related

Mockito: To test that a returned String is under a given length

In Junit I want to check that length of a returned String is under a given length .
I am doing the following and it is passing . But I want to know that is there any better alternate solution it test it.
Assertions.assertTrue(roomEntity.getRoomType().length()<=10);
Thanks
Your attempt is good. You could use assertThat(), that may be a bit more readable.
import static org.hamcrest.Matchers.lessThanOrEqualTo;
import static org.junit.Assert.assertThat;
...
assertThat(roomEntity.getRoomType().length(), lessThanOrEqualTo(10));
Or refactor the length into its own variable.
import static org.hamcrest.Matchers.lessThanOrEqualTo;
import static org.junit.Assert.assertThat;
...
int roomTypeLength = roomEntity.getRoomType().length();
assertThat(roomTypeLength, lessThanOrEqualTo(10));
Here is the way I am following for this.
You can use one of these libraries. Both works the same.
<!-- https://mvnrepository.com/artifact/org.assertj/assertj-core -->
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.16.1</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.easytesting/fest-assert -->
<dependency>
<groupId>org.easytesting</groupId>
<artifactId>fest-assert</artifactId>
<version>1.4</version>
<scope>test</scope>
</dependency>
Now to the code segment.
Assertions.assertThat(roomEntity.getRoomType().length()).isLessThanOrEqualTo(10);
There are some other useful ways are also included in this such as isLessThan, isBetween, hasSize etc.

XML Path expression

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<dependencyManagement>
<dependencies>
<dependency>
<groupId>javax</groupId>
<artifactId>javaee-api</artifactId>
<version>7.0</version>
</dependency>
<dependency>
<groupId>telnet</groupId>
<artifactId>telnet.service</artifactId>
<version>1.10.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>xmlthing</groupId>
<artifactId>xmlthing.service</artifactId>
<version>1.9.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>parser</groupId>
<artifactId>parser.service</artifactId>
<version>1.4.0-SNAPSHOT</version>
</dependency>
</dependencies>
</dependencyManagement>
How to be more substantive than this?
I have problem extracting value from the version attribute where groupId=telnet. How can I get xmlpath value of 1.10.0-SNAPSHOT where groupId=telnet?
Sorry not for mention it before:
It should be in linux/unix format (xmllint, grep, sed ...) anything :)
Thanks a lot!
Brgds,
S.
You can start with selecting all version elements nested in dependency elements:
//pom:dependency/pom:version
and then qualify the dependency appropriately:
//pom:dependency[pom:groupId = 'telnet']/pom:version
Of course, you need to specify the namespaces for XPath as well.
Quick PowerShell test:
PS> $x | Select-Xml '//pom:dependency[pom:groupId = ''telnet'']/pom:version' -Namespace #{pom = 'http://maven.apache.org/POM/4.0.0'} | % node
#text
-----
1.10.0-SNAPSHOT
I find the solution using xpath:
xpath -q -e "//dependency[groupId='telnet']/version/text()" pom.xml

Spark 1.3 and Cassandra 3.0 problems with guava

I am trying to connect to Cassandra 3.0 from Spark 1.3. I know that there is Cassandra connector for each version in spark, but spark-cassandra-connector-java_2.10:1.3.0 connector depends on cassandra-driver-core:2.1.5, that's why I am using the latest cassandra connector which depends the latest core driver. Anyway, so far this was not the problem. The problem is the com.google.guava package I suppose.
My pom looks like this:
...
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
...
I have excluded google guava from everywhere with:
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
so in the dependency tree only this is present com.google.guava:guava:jar:16.0.1 under com.datastax.spark:spark-cassandra-connector-java_2.10:jar:1.5.0-M3:compile.
However I am still getting the following error:
yarn.ApplicationMaster: User class threw exception: Failed to open native connection to Cassandra at {139.19.52.111}:9042
java.io.IOException: Failed to open native connection to Cassandra at {139.19.52.111}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.ambiverse.tagging.dao.impl.DAOCassandra.createTable(DAOCassandra.java:45)
at com.ambiverse.tagging.dao.impl.DAOCassandra.createTable(DAOCassandra.java:64)
at com.ambiverse.tagging.dao.impl.DAOCassandra.savePairRDD(DAOCassandra.java:70)
at com.ambiverse.tagging.statistics.entitycorrelation.CorrelationStatisticsSparkRunner.run(CorrelationStatisticsSparkRunner.java:176)
at com.ambiverse.tagging.statistics.entitycorrelation.CorrelationStatisticsSparkRunner.main(CorrelationStatisticsSparkRunner.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
at com.datastax.driver.core.Connection.initAsync(Connection.java:178)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:742)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:240)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:187)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1393)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:402)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
Before somebody point me to this blog post for solution: http://arjon.es/2015/10/12/making-hadoop-2-dot-6-plus-spark-cassandra-driver-play-nice-together/, I am using maven as a build tool, not sbt. If you know how can I do the exact same thing with maven, that would be great.
Although i work with scala + sbt, i had several mismatch between different artifacts with spark, and one among them is guava.
here is how i solved it (dependencies in sbt):
val sparkVersion = "1.6.1"//"2.0.0-preview"//
val sparkCassandraConnectorVersion = "1.6.0"
val scalaGuiceVersion = "4.0.1"
val cassandraUnitVersion = "3.0.0.1"
val typesafeConfigVersion = "1.3.0"
val findbugsVersion = "3.0.0"
val sparkRabbitmqVersion = "0.4.0.20160613"
val nettyAllVersion = "4.0.33.Final"
val guavaVersion = "19.0"
val jacksonVersion = "2.7.4"
val xbeanAsm5ShadedVersion = "4.5"
val commonsBeanutilsVersion = "1.8.0"
//IMPORTANT: all spark dependency magic is done in one place, to overcome the assembly mismatch errors
val sparkDependencies :List[ModuleID] = List(
("org.apache.spark" %% "spark-core" % sparkVersion).exclude("com.esotericsoftware.minlog", "minlog"),
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
("com.datastax.spark" %% "spark-cassandra-connector"
% sparkCassandraConnectorVersion).exclude("org.apache.cassandra", "cassandra-clientutil"),
"com.stratio.receiver" % "spark-rabbitmq_1.6" % sparkRabbitmqVersion,//"0.3.0-b", //,//
"org.scalatest" %% "scalatest" % scalaTestVersion % "test",
"org.apache.xbean" % "xbean-asm5-shaded" % xbeanAsm5ShadedVersion,//,//, //https://github.com/apache/spark/pull/9512/files
"io.netty" % "netty-all" % nettyAllVersion,
"commons-beanutils" % "commons-beanutils" % commonsBeanutilsVersion,
"com.google.guava" % "guava" % guavaVersion,
"com.fasterxml.jackson.module" %% "jackson-module-scala" % jacksonVersion,//fix jackson mismatch problem
"com.fasterxml.jackson.core" % "jackson-databind" % jacksonVersion,//fix jackson mismatch problem
//override findbugs artifacts versions(fix assembly issues)
"com.google.code.findbugs" % "annotations" % findbugsVersion,
"com.google.code.findbugs" % "jsr305" % findbugsVersion
).map(_.exclude("commons-collections", "commons-collections"))
i hope it will help.

How to include spark tests as Maven dependency

I have inherited old code that depends on
org.apache.spark.LocalSparkContext
which is in the spark core tests. But the spark core jar (correctly) does not include test-only classes. I was unable to determine if/where spark test classes have their own maven artifacts. What is the correct approach here?
You can add a dependency to the test-jar of Spark by adding <type>test-jar</type>. For example, for Spark 1.5.1 based on Scala 2.11:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.5.1</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
This dependency provides all the test classes of Spark, including LocalSparkContext.
I came here hoping to find some inspiration for doing the same in SBT. As a reference for other SBT users: Applying the pattern of using test-jars in SBT for Spark 2.0 results in:
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0" classifier "tests"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0" classifier "tests"
If you want to add test jars , might go ahead adding in SBT as mentioned below:
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "test-sources",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"org.scalatest" %% "scalatest" % "3.0.4" % "test")
The same if you want to add it over the MAVEN dependencies, you can do it by as mentioned below :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<classifier>tests</classifier>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<classifier>test-sources</classifier>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependencies>

Maven/Gradle way to calculate the total size of a dependency with all its transitive dependencies included

I would like to be able to perform an analysis on each of my project POMs to determine how many bytes each direct dependency introduces to the resulting package based on the sum of all of its transitive dependencies.
For example, if dependency A brings in B, C, and D, I would like to be able to see a summary showing A -> total size = (A + B + C + D).
Is there an existing Maven or Gradle way to determine this information?
Here's a task for your build.gradle:
task depsize {
doLast {
final formatStr = "%,10.2f"
final conf = configurations.default
final size = conf.collect { it.length() / (1024 * 1024) }.sum()
final out = new StringBuffer()
out << 'Total dependencies size:'.padRight(45)
out << "${String.format(formatStr, size)} Mb\n\n"
conf.sort { -it.length() }
.each {
out << "${it.name}".padRight(45)
out << "${String.format(formatStr, (it.length() / 1024))} kb\n"
}
println(out)
}
}
The task prints out sum of all dependencies and prints them out with size in kb, sorted by size desc.
Update: latest version of task can be found on github gist
I keep the a small pom.xml template on my workstation to identify heavy-weight dependencies.
Assuming you want to see the weight of org.eclipse.jetty:jetty-client with all of its transitives create this in a new folder.
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>not-used</groupId>
<artifactId>fat</artifactId>
<version>standalone</version>
<dependencies>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-client</artifactId>
<version>LATEST</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Then cd to the folder and run mvn package and check the size of the generated fat jar. On Unix-like systems you can use du -h target/fat-standalone.jar for that.
In order to test another maven artifact just change groupId:artifactId in the above template.
I do not know any way to show the totals but you may get a report for your project which can show per dependency size information. Please check this maven plugin : http://maven.apache.org/plugins/maven-project-info-reports-plugin/dependencies-mojo.html
If you have a configuration which includes all the necessary dependencies that you wish to calculate the size for you can simply put the following snippet in your build.gradle file:
def size = 0
configurations.myConfiguration.files.each { file ->
size += file.size()
}
println "Dependencies size: $size bytes"
This should print out when you run any gradle task after the build file is compiled.

Resources