Apache Spark and gRPC

Apache Spark and gRPC - gradle

I'm writing an application that uses Apache Spark. For communicating with a client, I would like to use gRPC.
In my Gradle build file, I use
dependencies {
compile('org.apache.spark:spark-core_2.11:1.5.2')
compile 'org.apache.spark:spark-sql_2.11:1.5.2'
compile 'io.grpc:grpc-all:0.13.1'
...
}
When leaving out gRPC, everything works fine. However, when gRPC is used, I can create the build, but not execute it, as various versions of netty are used by the packages. Spark seems to use netty-all, which contains the same methods (but with potentially different signatures) than what gRPC uses.
I tried shadowing (using com.github.johnrengelman.shadow) , but somehow it still does not work. How can I approach this problem?

The general solution to this sort of thing is shading with relocation. See the answer to a similar problem with protobuf dependencies: https://groups.google.com/forum/#!topic/grpc-io/ABwMhW9bU34

I think the problem is that spark uses netty 4.0.x and gRPC 4.1.0 .

Related

How do I programmatically install Maven libraries to a cluster using init scripts?

Have been trying for a while now and Im sure the solution is simple enough, just struggling to find it. Im pretty new so be easy on me..!
Its a requirement to do this using a premade init-script, which is then selected in the UI when configuring the cluster.
I am trying to install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18 to a cluster on Azure Databricks. Following the documentations example (it is installing a postgresql driver) they produce an init script using the following command:
dbutils.fs.put("/databricks/scripts/postgresql-install.sh","""
#!/bin/bash
wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.2/postgresql-42.2.2.jar""", True)```
My question is, what is the /mnt/driver-daemon/jars/postgresql-42.2.2.jar section of this code? And what would I have to do to make this work for my situation?
Many thanks in advance.

/mnt/driver-daemon/jars/postgresql-42.2.2.jar here is the output path where the jar file will be put. But it makes no sense as this jar won't be put into CLASSPATH and won't be found by Spark. Jars need to be put into /databricks/jars/ directory, where they will be picked up by Spark automatically.
But this method with downloading of jars works only for jars without dependencies, and for libraries like EventHubs connector this is not a case - they won't work if dependencies aren't downloaded as well. Instead it's better to use Cluster UI or Libraries API (or Jobs API for jobs) - with these methods, all dependencies will be fetched as well.
P.S. But really, instead of using EventHubs connector, it's better to use Kafka protocol that is supported by EventHubs as well. There are several reasons for that:
It's better from performance standpoint
It's better from stability standpoint
Kafka connector is included into DBR, so you don't need to install anything extra
You can read how to use Spark + EventHubs + Kafka connector in the EventHubs documentation.

Is it possible to use all camel components using HotSpot

I noticed that there are only a few camel extensions available to use in native mode. I am wondering if it's still possible to use the other camel components if you don't compile to native? And if, is it usefull to go that way, or should we for example stick to spring boot?

Note that all Camel extensions might not need a Quarkus one. Basically, a Quarkus extension is needed if we need to tune the Camel extension for GraalVM (add reflection declarations for instance). The interesting thing is that you can even do the work manually to make your Camel extension work in GraalVM mode and then report back so that we create a proper extension for all future use.
In JVM mode, all Camel extensions should work flawlessly. If you encounter an issue, please open a GitHub issue and we will take a look at it.
About if using Quarkus in JVM mode is worth it, I'm obviously partial but I think the Quarkus approach is beneficial even in JVM mode. You still have some of the benefits of better boot time and reduced memory usage. Obviously, depending on your application, they might not be important to you.

Cannot kotlinc-native compile restful service with Spring

Quick Question: Is it possible to convert a Kotlin + Spring restful web service to a linux native application?
It works properly when run on the JVM, but I get compilation errors when I try to build it using kotlinc-native for linux.
I cannot find any definitive statement about whether I am trying to do something that is unsupported. Am I trying to swim upstream, here?
Thank you for your comments and help!
Mike
Here are some details...
Ubuntu 18.04
Latest versions of stable dependencies

You can not link a jvm library with kotlin-native. You must have a C library to be able to link it or convert the java code to kotlin.
If what you need is a embedded server executable you could check the ktor project.
The project have some client native examples and a embedded native http2-push.

Flink for embedded stream processing in OSGi

I would like to use Apache Flink to process event inside an application.
My tests on a standalone JVM worked reasonably well though flink is a really big dependency.
I also tried to get it running in OSGi but gave up for now because of the many dependencies.
So my question is:
How small can I make Flink. I currently tried with the maven dependency on flink-streaming-java.
Unfortunately this depends on or embeds (only listing the questionable ones):
flink-shaded-hadoop2
kryo
zookeeper
netty
jetty
apache http client
apache http core
scala
akka
jackson
It also looks like several jars embed the same libs again and again. Like some google libs and asm.
So is there some way to get a slimmmer version of flink for local usage that does not depend on so many libs?

Many of the dependencies are required for Apache Flink's primary use-cases namely, distributed stream and batch processing.
Zookeeper for high-availability in case of (process) failures
Netty for data network transfer
Jetty for monitoring via REST API and web dashboard
Akka (and transitively Scala) for coordination of distributed processes
Most of these libraries are tightly coupled with the system and cannot be easily switched off or excluded.
I am sorry, there is no stripped down version for local stream processing.

Dart with Maven (in Spring Boot App)

I like Dart, I have been playing with it for a while. I'd like to integrate with my Maven web app project based on Spring Boot.
I suppose the correct way is to use dart-maven-plugin. But I'm not sure how to properly glue it in place. Spring Boot has its own structure, Maven as well and Dart makes that none the better.
I will need probably the entry point for Dart part, means Spring Boot templates folder needs to include the html resources from Dart.
I would appreciate any idea, best practices.
PS: the aforementioned dart-maven-plugin is not really vivid, should I be afraid using it at all, as I don't see any progress there, compared to Dart itself.
UPDATE
So this can be solution(note I have only one so called "entry point"- .dart file so far)
normal Dart structure in src/main/dart
user dart-maven-plugin's pub build command into ${project.build.directory}/dart
maven-resources-plugin:copy-resources from ${project.build.directory}/dart/web to ${project.build.directory}/classes/public/
make war
I'm still able to use Intellij's Dart integration from src/main/dart.
The Spring Boot maps classes/public/ folder to / so the dart file and html files are loaded properly.
It's not ideal, but it works so far. Please fell free to write down any comments.

I have tried a few times to use dart in a maven project myself and always ran into some problems. Right now I'm developing my dart apps in a separate module that I build with pub which connects to the maven based java backends with rest.
This has several advantages for me, for example:
I can use pub and avoid problems with outdated maven plugins
I use the serving mechanism that fits best for the static dart code and assets (in my case a docker image with nginx)
I have a clean separation of backend and frontend code with a tailored REST API
As I like the microservice approach I also use spring session together with zuul (via spring-cloud).
If you want to combine dart with generated html from for example JSPs or another templating engine, then this isn't a good approach for you. But IMHO dart is not yet very well suited for these kind of architectures.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Apache Spark and gRPC - gradle

The general solution to this sort of thing is shading with relocation. See the answer to a similar problem with protobuf dependencies: https://groups.google.com/forum/#!topic/grpc-io/ABwMhW9bU34

I think the problem is that spark uses netty 4.0.x and gRPC 4.1.0 .

Related

How do I programmatically install Maven libraries to a cluster using init scripts?

Is it possible to use all camel components using HotSpot

Cannot kotlinc-native compile restful service with Spring

Flink for embedded stream processing in OSGi

Dart with Maven (in Spring Boot App)

Categories

Resources