How can I use an Elascicsearch plugin in a JVM local node? - elasticsearch

I'm in the process of adding support for unicode normalization in ES with the help of the ICU analysis plugin. Installing this in a dedicated cluster is relatively easy, but I also need this plugin to be available during testing, where we use a JVM local node. Since it's a JVM local node I can't simply call the commands as explained in the plugin documentation. How can I get my plugin to work for this local node?

After digging through the source code of Elasticsearch I figured out the answer, and it is stupidly simple: Just make sure the plugins are in your classpath and ES will pick them up automatically. In my case adding the plugin to my pom.xml was enough.

Related

Is Apache apisix java plugin runner production ready

I am trying to implement an API gateway which has java plugin support. Have analyzed Kong, APIMan, APIsix, of which APIsix seems to be the best fit. But when i am trying to see the java plugin support, the github for java plugin runner displays as "This project is currently considered experimental."
https://github.com/apache/apisix-java-plugin-runner
So wanted to check with community, if that plugin is experimental and is there any other way to use ApiSix for production with java plugins enabled.
Anymore options for java enabled plugin API gateways are also welcomed.
So I'll quote out the reply that I had received for the same question on the slack channel from one of the maintainers of the project, for reference.
In fact, from some information I’ve gathered, there are already some users using it in production environments.
I can’t give an answer about whether it should be marked as production-ready or not. Here are some facts.
the design pattern, API interface and custom development approach of this project has not changed significantly since its inception, and should not be a major upheaval in the future, as it follows some common gateway design approaches in the Java world.
it is currently used in a rather primitive way, requiring clone project source code, but in the Java world, mature projects should import dependencies and use them by defining GAV in the dependency file.
Based on this, I think it is now production-ready in terms of stability, but not enough on other levels.
Apache APISIX Slack channel has the same question, link: https://the-asf.slack.com/archives/CUC5MN17A/p1653908139962639
Back to this question: Java Plugin Runner is used in production by some community users.
Here's the thing, from what I know, It has been used in a production environment in China(You can refer to this from the community bio-weekly talk).
So I would suggest you can try it.
Also, the plugin is still relatively easy to suit and I think could be better customized to suit your needs.

How do I programmatically install Maven libraries to a cluster using init scripts?

Have been trying for a while now and Im sure the solution is simple enough, just struggling to find it. Im pretty new so be easy on me..!
Its a requirement to do this using a premade init-script, which is then selected in the UI when configuring the cluster.
I am trying to install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18 to a cluster on Azure Databricks. Following the documentations example (it is installing a postgresql driver) they produce an init script using the following command:
dbutils.fs.put("/databricks/scripts/postgresql-install.sh","""
#!/bin/bash
wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.2/postgresql-42.2.2.jar""", True)```
My question is, what is the /mnt/driver-daemon/jars/postgresql-42.2.2.jar section of this code? And what would I have to do to make this work for my situation?
Many thanks in advance.
/mnt/driver-daemon/jars/postgresql-42.2.2.jar here is the output path where the jar file will be put. But it makes no sense as this jar won't be put into CLASSPATH and won't be found by Spark. Jars need to be put into /databricks/jars/ directory, where they will be picked up by Spark automatically.
But this method with downloading of jars works only for jars without dependencies, and for libraries like EventHubs connector this is not a case - they won't work if dependencies aren't downloaded as well. Instead it's better to use Cluster UI or Libraries API (or Jobs API for jobs) - with these methods, all dependencies will be fetched as well.
P.S. But really, instead of using EventHubs connector, it's better to use Kafka protocol that is supported by EventHubs as well. There are several reasons for that:
It's better from performance standpoint
It's better from stability standpoint
Kafka connector is included into DBR, so you don't need to install anything extra
You can read how to use Spark + EventHubs + Kafka connector in the EventHubs documentation.

Send arbitrary local jars to YARN container classpath

I'm using Apache Twill (v 0.10) to build YARN application. I've observed that the jars which are not referenced by my application code are not picked up and sent to containers' classpath. I checked YarnTwillPreparer class to see how the dependencies are decided. However, I'm still not very clear what I need to do to force some additional jars to be sent to each of the YARN containers.
I think there must be a simple and elegant way to achieve that. A precise code snippet is more welcome. Any pointer would also be good.

How to write from Apache Flink to Elasticsearch

I am trying to connect Flink to Elasticsearch and when I run the Maven project I have this error :
or another way to do it, I am using this example : https://github.com/keiraqz/KafkaFlinkElastic
The example you linked depends on various Flink modules with different version which is highly discouraged. Try setting them all to one version and see if this fixes the issue.

Use NiFi 1.5.0 processor in NiFi 1.2.0

There is one particular processor as mentioned below I am interested in which has one extra feature in 1.5.0 as compared to 1.2.0 and so I want to use that.
Processor Name: QueryDatabaseTable
Is there any way I can just upgrade the processor or add this processor without upgrading whole NiFi?
I see there are two approaches.
Above processor is stored as the nifi-standard-nar-x.x.x.nar-unpacked file. So just copy the Nar from 1.5.0 and put it on 1.2.0. I am not sure after this if nifi will recognize this new processor version or not?
above processor is part of the following file, and so create a new processor out of it and deploye it on 1.2.0, not sure how complicated it will be though.
https://github.com/apache/nifi/blob/dd58a376c9050bdb280e29125cce4c55701b29df/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/QueryDatabaseTableTest.java
would someone let me know which would be a better version and also, where I can find the nar file or source code of above processor, I don't separate nar file for this processor?
It might be worth a try to simply copy in the NAR, but in this case I'm pretty sure it won't work. There were lots of core framework changes between 1.2.0 and 1.5.0, and also the standard NAR has the standard-services-api NAR as a dependency, so you'd likely need to copy that one as well, etc.
A general approach for backporting is to find the Jira case that has the feature/fix you want, use the link in the Jira to get to the Github Pull Request that added/fixed it, then create a branch from your baseline (nifi-1.2.0, e.g.) and cherry-pick the commits. If the changes are to a single bundle, you can simply build that NAR from the POM in its bundle directory (nifi-standard-bundle, e.g.). Then you can replace your existing NAR with the one you built, creating a kind of "hotfix NAR".
I would think this is mainly an addition to the existing answer, but in general it is possible to create new processors. As such it may be wise to create a 'QueryDatabaseTable2' processor which is the same as the new one (or similar to it).

Resources