Storm-crawler and Elasticsearch version - maven

I'm working on getting the latest version of ES (5x) working with Storm-crawler.
I did what was mentioned here, I cloned the repo, mvn clean install to build and then I entered all the mvn commands mentioned here and it all worked.
The thing I'm confused about is when it comes to the pom.xml file, for the version number:
<dependency>
<groupId>com.digitalpebble.stormcrawler</groupId>
<artifactId>storm-crawler-elasticsearch</artifactId>
<version>1.4</version>
</dependency>
Do I enter 1.5 there or keep it as 1.4? I'm still trying to get get better with Maven and the Java build process and all.

If you are building the project on your local post cloning the repo.
You shall try
mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -DarchetypeArtifactId=storm-crawler-archetype -DarchetypeVersion=1.5-SNAPSHOT
and then further you can then edit the pom.xml and add the dependency for the Elasticsearch module as -
<dependency>
<groupId>com.digitalpebble.stormcrawler</groupId>
<artifactId>storm-crawler-elasticsearch</artifactId>
<version>1.5-SNAPSHOT</version>
</dependency>

StormCrawler 1.5 should be released soon and as suggested by #nullpointer you need to change the version to 1.5-SNAPSHOT; the tutorial was based on SC 1.4 which uses ES 2.x
See blog for potential issues when upgrading to ES5.

You have to keep it as 1.4, because this is the latest version of storm-crawler-elasticsearch plugin.

Related

Elasticsearch plugin for PySpark 3.1.1

I used Elasticsearch Spark 7.12.0 with PySpark 2.4.5 successfully. Both read and write were perfect. Now, I'm testing the upgrade to Spark 3.1.1, this integration doesn't work anymore. No code change in PySpark between 2.4.5 & 3.1.1.
Is there a compatible plugin? Has anyone got this to work with PySpark 3.1.1?
The error:
Try to use package org.elasticsearch:elasticsearch-spark-30_2.12:7.13.1
The error you're seeing (java.lang.NoClassDefFoundError: scala/Product$class) usually indicates that you are trying to use a package built for an incompatible version of Scala.
If you are using the most recent zip package from Elasticsearch, as of the date of your question, it is still built for Scala v11, as per the conversation here:
https://github.com/elastic/elasticsearch-hadoop/pull/1589
You can confirm the version of Scala used to build your PySpark by doing
spark-submit --version
from the command line. After the Spark logo it will say something like
Using Scala version 2.12.10
You need to take a look at this page:
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
On that page you can see the compatibility matrix.
Elastic gives you some info on "installation" for Hadoop here: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
For Spark, it provides this:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId>
<version>7.14.0</version>
</dependency>
Now if you're using PySpark, you may be unfamiliar with Maven, so I can appreciate that it's not that helpful to be given the maven dependency.
Here's a minimal way to get maven to get your jar for you, without having to get into the weeds of an unfamiliar tool.
Install maven (apt install maven)
Create a new directory
In that directory, create a file called pom.xml
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>spark-es</groupId>
<artifactId>spark-esj</artifactId>
<version>1</version>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId>
<version>7.14.0</version>
</dependency>
</dependencies>
Save that file and create an additional directory called "targetdir" (it could be called anything)
Then
mvn dependency:copy-dependencies -DoutputDirectory=targetdir
You'll find your jar in targetdir.

Can't resolve maven dependency with beam-runners-google-cloud-dataflow-java and bigtable-client-core

I am trying to run Java code from a Maven project that uses both beam-runners-google-cloud-dataflow-java and bigtable-client-core, and I cannot get it to properly reconcile dependencies amongst these two. When I run and attempt to create a BigtableDataClient, I get the following error:
java.lang.NoSuchFieldError: TE_HEADER
at io.grpc.netty.shaded.io.grpc.netty.Utils.<clinit> (Utils.java:74)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.<clinit> (NettyChannelBuilder.java:72)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress (NettyChannelProvider.java:37)
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress (NettyChannelProvider.java:23)
at io.grpc.ManagedChannelBuilder.forAddress (ManagedChannelBuilder.java:39)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel (InstantiatingGrpcChannelProvider.java:242)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel (InstantiatingGrpcChannelProvider.java:198)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel (InstantiatingGrpcChannelProvider.java:185)
at com.google.api.gax.rpc.ClientContext.create (ClientContext.java:160)
at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create (EnhancedBigtableStub.java:151)
at com.google.cloud.bigtable.data.v2.BigtableDataClient.create (BigtableDataClient.java:138)
at com.google.cloud.bigtable.data.v2.BigtableDataClient.create (BigtableDataClient.java:130)
...
I can only conclude this is due to an issue with version conflict on the relevant libraries (either grpc-netty or grpc-netty-shaded); I'm using 1.17 for grpc-netty and 1.23 for grpc-netty-shaded. I've tried using dependencyManagement to force the use of version 1.23.0 for both grpc-netty and grpc-netty-shaded, and then tried 1.17 for both, but this doesn't help. I've also tried using earlier versions of both the Beam runners and bigtable-client-core, and this doesn't help either.
The relevant Maven dependencies are:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>2.15.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-client-core</artifactId>
<version>1.12.1</version>
</dependency>
I look at the code for Utils.java (https://github.com/grpc/grpc-java/blame/master/netty/src/main/java/io/grpc/netty/Utils.java), and I don't see any evidence that I'd be using any earlier version that might not have this constant (it's been there since version 1.7).
I'm completely baffled what the issue is here. How do I identify the dependency conflict? Is there another way I can find what version of the class Maven is actually looking at here?

New Version of Spring-integration-aws in Maven

I am able to find spring-integration-aws of version 0.5.0. Is there a latest release with Spring 4.0 and Spring Integration 4.0 available to be used?
I cannot find it in Maven repo.
Regards
Karthik
Nope. It's really top my priority task from the next several weeks just after the the SpringOne conference.
Meanwhile you can use something like this before the official RELEASE:
<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-aws</artifactId>
<version>1.0.0.BUILD-SNAPSHOT</version>
</dependency>
From the http://repo.spring.io/snapshot/ repository.

where can i find influxdb-java version 2.0

i find the page: https://github.com/influxdb/influxdb-java, i use influxdb 0.9, the java api is influxdb-java 2.0, in maven.
<dependency>
<groupId>org.influxdb</groupId>
<artifactId>influxdb-java</artifactId>
<version>2.0</version>
</dependency>
but i can not find this version. i can find the lastest is 1.5 version. please tell what can i do, how can i find and download this jar. thank you very much.
Or you can use JitPack until it gets published
https://jitpack.io/#influxdb/influxdb-java/influxdb-java-2.0
If you can't find it in maven repos, you can download the release from github: https://github.com/influxdb/influxdb-java/releases/tag/influxdb-java-2.0
Then installing manually on your local maven repo or your nexus/archiva/artifactory.

spring maven repository issue (blog references but wanting to use newer version)

I have created a maven web project using the below site.
http://www.mkyong.com/maven/how-to-create-a-web-application-project-with-maven/
I have performed all the steps given there and executed a simple hello world program. Now, I have to include spring dependencies into my eclipse web project.
So
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring</artifactId>
<version>3.1.2</version>
</dependency>
In the dependencies tag, I added the above configuration. Now, it is saying as below:
unable to find jars from the repositories (local and as well as remote)
It gave suggestion to execute the command:
mvn install -artifactid=springframework (something like this)
But when I mentioned version as 2.5.6 it's correctly taken. Is it the problem with the version 3.1.2 being unavailable at maven repository? How do I get the latest versions if maven is not working properly for latest versions?
It also gave me the suggestion to go for manual download and put in local repository.
The Maven coordinates changed over time.
Try:
<dependency>
<groupId>org.springframework</groupId>
<artifactId>org.springframework.core</artifactId>
<version>3.1.2.RELEASE</version>
</dependency>
OR Try:
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>3.1.2.RELEASE</version>
</dependency>
I'll just find if there is an all-in-one POM or dependency. But "spring-full" looks 1.2.x only and "spring" 2.5.x. CHECKED: Can't find one I've been using separate modules in all projects for sometime (this is better anyway, fine grained dependencies).
The location you can search is at http://ebr.springsource.com/repository/app/
for 3.1.2 see http://ebr.springsource.com/repository/app/library/version/detail?name=org.springframework.spring&version=3.1.2.RELEASE&searchType=librariesByName&searchQuery=spring
Spring have changed their repository URL and online locations at least 3 times to my knowledge over the past 4 years. So I'd look for current information on their website about setting up a Maven <repositories> config to obtain their JARs. Beware of articles with out of date information :(
Also notice the artifactId is different in the 2 example this is another gotcha issue with spring. The "org.springframework.core" are their EBR and OSGi compliant versions of their software. The "spring-core" is the older pre-OSGi co-ordinates. Find what works for you and don't mix them in the same project. For example I am using "spring-core" because I use 3.2.0.M2 which are Milestone releases. But the production release EBR co-ordinates are the best to use.
Sorry for so many edits... but it has been a minefield even if you understand the heritage of getting Spring Source software.

Resources