Google Cloud Spark ElasticSearch TransportClient connection exception - elasticsearch

I am using Spark on Google Cloud and I have the following code to connect to an Elasticsearch database
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
public TransportClient openConnection(String ipAddress, int ipPort) throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch").build();
TransportClient client = TransportClient.builder().settings(settings).build().
addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), ipPort));
return client;
}
When I run it locally, i.e. spark-submit --master local[*] everything runs OK. When I run it in a google cloud spark cluster I get the following Exception:
java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.threadpool.ThreadPool
at org.elasticsearch.client.transport.TransportClient$Builder.build(TransportClient.java:131)
at javaTools.ElasticSearchConnection.openConnection(ElasticSearchConnection.java:24)
The last referred method (openConnection) is the connection described above.
The code is uploaded to the google cloud using a fat jar created using sbt asssembly, so all libraries used are common, except for the native java ones.
I am thinking it might be some library dependency, since the same jar runs fine on my local computer and it is able to connect to the ElasticSearch server, but the same jar fails to run on the spark cluster on Google cloud. Both local and and cloud versions of Spark are the same, 1.6.0.

The problem is caused due to conflicting Guava libraries used in Spark and Elasticsearch. The solution can be found in this StackOverflow question

Related

Why I cannot import JsonFormat from package com.google.protobuf.util?

Currently, I'm working on google cloud (google pub/sub). I've used java client. But I cannot import JsonFormat class from package com.google.protobuf.util. I'm using intelliJ idea, I have tried invalidate cache and restart the idea. What am I missing?
My build.gradle
// google cloud
implementation platform('com.google.cloud:libraries-bom:22.0.0')
implementation 'com.google.cloud:google-cloud-pubsub'
In latest versions of google cloud bom the package com.google.cloud.protobuf:protobuf-util is used with runtime scope, so it's not available during compilation. Not sure why they changed it...
Adding manually the dependency with compile scope (maven) or implementation (gradle) should help.
Source:
https://mvnrepository.com/artifact/com.google.cloud/google-cloud-pubsub/1.114.7

Connect Spring Boot 2.4.x to neo4j-3.4.x

I'm trying to connect a Spring Boot 2.4.6 service to a Neo4j 3.4.18-enterprise server, but I get the following error:
Caused by: org.neo4j.driver.exceptions.ClientException: The server does not support any of the protocol versions supported by this driver. Ensure that you are using driver and server versions that are compatible with one another.
at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:143)
at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:69)
at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:51)
at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:37)
at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:43)
... 99 more
Reading this compatibility matrix I'm assuming (or rather I'm hoping) there should be a way to make Spring Boot 2.4.x work with Neo4j 3.4.x
Here's a docker command to start a neo4j server:
docker run --publish=7474:7474 --publish=7687:7687 --volume=$HOME/neo4j/data:/data --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes --env=NEO4J_AUTH=neo4j/test neo4j:3.4-enterprise
And here's a github repository with a simple test to reproduce the issue:
#SpringBootTest
class SpringBootNeo4jCompatibilityTestApplicationTests {
#Autowired
private Neo4jClient neo4jClient;
#Test
void testNeo4jConnection_whenQueryIsRun_thenNoExceptionShouldBeThrown() {
neo4jClient.query("MATCH (n) RETURN n")
.run();
}
}
The test fails when ran against neo4j 3.4.x, but it passes when ran against neo4j 3.5.6-enterprise.
Could you please suggest a way to make this connection work?
Thank you.
Neo4j 3.4 is not maintained anymore, upgrade at least to 3.5: https://neo4j.com/developer/kb/neo4j-supported-versions/

Cannot import ArangoDriver class in springboot

I have been trying to run an application in batch mode to upload data to arango db. But to start the driver in batch mode I tried importing ArangoDriver class. But the class is not accessible. It is giving import error. But I'am able to import all the other packages in com.arangodb. I tried creating instance as per the tutorial available here https://www.arangodb.com/2014/11/arangodb-java-driver-batch-asynchronous-mode/'. But Iam not able to import anything. Has new changes have come to arango? How to resolve this?

Spark application build in local server, how can we access Hive Warehouse located in HDInsight

I'm trying to connect to the Hive warehouse directory located in HDInsight by using Spark local using IntelliJ Maven.
I am using Spark 1.6 with Scala and Maven Project.
Thrift Server details:
`System.setProperty("hive.metastore.uris", `"thrift://hnaz.xyz123.internal.cloudapp.net:1403")`
i am trying to access tables of hive warehouse.
Code :
package Test
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{SQLContext, SaveMode, sources}
object TestHive {
def main(args: Array[String]): Unit = {
// get spark configuration
val conf = new SparkConf()
.setAppName("SparkHiveTest")
conf.setMaster("local[*]")
System.setProperty("hive.metastore.uris", "thrift://hnaz.xyz123.internal.cloudapp.net:1403")
import org.apache.spark.sql.hive.HiveContext
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
implicit val sqlContext = new SQLContext(sc)
import org.apache.spark.sql.functions._
import sqlContext.implicits._
val df1 = sqlContext.sql(s"use $data_profiling, sqlContext.sql("show tables")");
}
}
POM Dependency:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<!-- provided -->
</dependency>
it is throwing error as
" Error:(22, 37) not found: value data_Test
val df1 = sqlContext.sql(s"use $data_Test, sqlContext.sql("show tables")");"
Error:(22, 74) value tables is not a member of StringContext
val df1 = sqlContext.sql(s"use $data_Test, sqlContext.sql("show tables")");
Thanks a ton. i have only 1 doubt. My Spark is built on a local server, and Hive is located on HDInsight. How can i access HDInsight Hive from Local Spark. I don't have Spark Cluster on HDInsight.
Using Client Mode.
Note: The setMaster method is used to specify a local cluster. If you want to run the application in a cluster on HDInsight, you replace the argument local[]* with the URL spark://: where and are the IP address and port number of the edge node in the cluster.
This tutorial demonstrates how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in Scala, and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:
• Develop and submit a Scala Spark application on an HDInsight Spark cluster.
• Access your Azure HDInsight Spark cluster resources.
• Develop and run a Scala Spark application locally.
Additional details: With Azure HDInsight 4.0 you can integrate Apache Spark and Apache Hive with the Hive Warehouse Connector.
The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive by supporting tasks such as moving data between Spark DataFrames and Hive tables, and also directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It supports Scala, Java, and Python for development.
The Hive Warehouse Connector allows you to take advantage of the unique features of Hive and Spark to build powerful big-data applications. Apache Hive offers support for database transactions that are Atomic, Consistent, Isolated, and Durable (ACID). For more information on ACID and transactions in Hive, see Hive Transactions. Hive also offers detailed security controls through Apache Ranger and Low Latency Analytical Processing not available in Apache Spark.
Hope this helps.

conflicts between spring-mongo and spring-cassandra

1.Work well with spring-mongo in microservice.
2.Work well with spring-cassandra in microservice.
build project with gradle.
but when i add spring-cassandra into spring-mongo,there are errors.even just import spring-cassandra dependencies,without any cassandra
compile "org.springframework.data:spring-cql:1.5.0.M1"
compile "org.springframework.data:spring-data-cassandra:1.5.0.M1"
compile "com.datastax.cassandra:cassandra-driver-core:3.0.1"
The error trace is too long.just take a summary.
throw exceptions
` ``
org.springframework.beans.factory.BeanCreationException
com.datastax.driver.core.exceptions.NoHostAvailableException
not caused by one class.when i run the test with gradle.all testcase throw this exception.and spring boot application cannot start.which worked well before.
and i have not added any java code to call datastax driver to connect cassandra.dont know why there is an exception about cassandra connection.
solve this issue by putting mongo/cassandr repository/model into
different package

Resources