Use NiFi 1.5.0 processor in NiFi 1.2.0 - apache-nifi

There is one particular processor as mentioned below I am interested in which has one extra feature in 1.5.0 as compared to 1.2.0 and so I want to use that.
Processor Name: QueryDatabaseTable
Is there any way I can just upgrade the processor or add this processor without upgrading whole NiFi?
I see there are two approaches.
Above processor is stored as the nifi-standard-nar-x.x.x.nar-unpacked file. So just copy the Nar from 1.5.0 and put it on 1.2.0. I am not sure after this if nifi will recognize this new processor version or not?
above processor is part of the following file, and so create a new processor out of it and deploye it on 1.2.0, not sure how complicated it will be though.
https://github.com/apache/nifi/blob/dd58a376c9050bdb280e29125cce4c55701b29df/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/QueryDatabaseTableTest.java
would someone let me know which would be a better version and also, where I can find the nar file or source code of above processor, I don't separate nar file for this processor?

It might be worth a try to simply copy in the NAR, but in this case I'm pretty sure it won't work. There were lots of core framework changes between 1.2.0 and 1.5.0, and also the standard NAR has the standard-services-api NAR as a dependency, so you'd likely need to copy that one as well, etc.
A general approach for backporting is to find the Jira case that has the feature/fix you want, use the link in the Jira to get to the Github Pull Request that added/fixed it, then create a branch from your baseline (nifi-1.2.0, e.g.) and cherry-pick the commits. If the changes are to a single bundle, you can simply build that NAR from the POM in its bundle directory (nifi-standard-bundle, e.g.). Then you can replace your existing NAR with the one you built, creating a kind of "hotfix NAR".

I would think this is mainly an addition to the existing answer, but in general it is possible to create new processors. As such it may be wise to create a 'QueryDatabaseTable2' processor which is the same as the new one (or similar to it).

Related

How do I programmatically install Maven libraries to a cluster using init scripts?

Have been trying for a while now and Im sure the solution is simple enough, just struggling to find it. Im pretty new so be easy on me..!
Its a requirement to do this using a premade init-script, which is then selected in the UI when configuring the cluster.
I am trying to install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18 to a cluster on Azure Databricks. Following the documentations example (it is installing a postgresql driver) they produce an init script using the following command:
dbutils.fs.put("/databricks/scripts/postgresql-install.sh","""
#!/bin/bash
wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.2/postgresql-42.2.2.jar""", True)```
My question is, what is the /mnt/driver-daemon/jars/postgresql-42.2.2.jar section of this code? And what would I have to do to make this work for my situation?
Many thanks in advance.
/mnt/driver-daemon/jars/postgresql-42.2.2.jar here is the output path where the jar file will be put. But it makes no sense as this jar won't be put into CLASSPATH and won't be found by Spark. Jars need to be put into /databricks/jars/ directory, where they will be picked up by Spark automatically.
But this method with downloading of jars works only for jars without dependencies, and for libraries like EventHubs connector this is not a case - they won't work if dependencies aren't downloaded as well. Instead it's better to use Cluster UI or Libraries API (or Jobs API for jobs) - with these methods, all dependencies will be fetched as well.
P.S. But really, instead of using EventHubs connector, it's better to use Kafka protocol that is supported by EventHubs as well. There are several reasons for that:
It's better from performance standpoint
It's better from stability standpoint
Kafka connector is included into DBR, so you don't need to install anything extra
You can read how to use Spark + EventHubs + Kafka connector in the EventHubs documentation.

java.util.ServiceConfigurationError Provider not a subtype while using OSGi bundle

I'm creating a Liferay 7.1 OSGi bundle, which has some external dependencies in it. In consideration of time, we opted to embed the external JAR in our OSGi Bundle. I've managed to create a bnd file, which includes all of the ElasticSearch dependencies, and put them on the bundle classpath. I've used the source-code from github (https://github.com/liferay/liferay-portal/blob/master/modules/apps/portal-search-elasticsearch6/portal-search-elasticsearch6-impl/build.gradle) and the bnd.bnd file, to check what's imported.
When activating the bundle, an exception is thrown:
The activate method has thrown an exception
java.util.ServiceConfigurationError: org.elasticsearch.common.xcontent.XContentBuilderExtension: Provider org.elasticsearch.common.xcontent.XContentElasticsearchExtension not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.elasticsearch.common.xcontent.XContentBuilder.<clinit>(XContentBuilder.java:118)
at org.elasticsearch.common.settings.Setting.arrayToParsableString(Setting.java:1257)
The XContentBuilderExtension is from the elasticsearch-x-content-6.5.0.jar,
the XContentElasticsearchExtension class, is included in the elasticsearch-6.5.0.jar. Both are Included Resources, and have been put on the classpath.
The Activate-method initializes a TransportClient in my other jar, hence it happens on activation ;).
Edit:
I've noticed that this error does NOT occur when installing this the first time, or when the portal restarts. So it only occurs when I uninstall and reinstall the bundle. (This is functionality I really prefer to have!). Maybe a stupid thought.. But could it be that there is some 'hanging thread'? That the bundle is not correctly installed, or that the TransportClient still is alive? I'm checking this out. Any hints are welcome!
Edit 2:
I'm fearing this is an incompatibility between SPI and OSGi? I've checked: The High Level Rest Client has the same issue. (But then with another Extension). I'm going to try the Low-Level Rest Client. This should work, as there are minimal dependencies, I'm guessing. I'm still very curious on why the incompatibility is there. I'm certainly no expert on OSGi, neither on SPI. (Time to learn new stuff!)
Seems like a case where OSGi uses your bundle to solve a dependency from another bundle, probably one that used your bundle to solve a package when the system started.
Looking at the symptoms: it does not occur when booting or restarts. Also it is not a subtype.
When OSGi uses that bundle to solve a dependency, it will keep a copy around, even when you remove it. When the bundle comes back a package that was previously used by another bundle may still be around and you can have the situation where a class used has two version of itself, from different classloaders, meaning they are not the same class and therefore, not a subtype.
Expose only the necessary to minimize the effects of this. Import only if needs importing. If you are using Liferay Gradle configuration to include the bundle inside, stop - it's a terrible way to include as it exposes a lot. If using the bnd file to include a resource and create an entry for the adicional classpath location, do not expose if not necessary. If you have several bundles using one as dependency, make sure about the version they use and if the exchange objects from the problematic class, if they do, than extra care is required.
PS: you can include attributes when exporting and/or importing in order to be more specific and avoid using packages from the wrong origin.
You can have 2 elastic search connections inside one Java app and Liferay is by default not exposing the connection that it holds.
A way around it is to rebuild the Liferay ES connector. It's not a big deal because you don't need to change the code only the OSGi descriptor to expose more services.
I did it in one POC project and worked fine. The tricky thing is to rebuild the Liferay jar but that was explained by Pettry by his google like search blog posts. https://community.liferay.com/blogs/-/blogs/creating-a-google-like-search (it is a series but it's kind of hard to navigate in the new Liferay blogs but Google will probably help) Either way it is all nicely documented here https://github.com/peerkar/liferay-gsearch
the only thing then what needs to be done is to add org.elasticsearch.* in the bnd.bnd file in the export section. You will then be able to work with the native elastic API.

Spring boot and javascript node_modules

I´m currently building a spring-boot application, which also uses some javascript-stuff. I use yarn as a package-manager to manage the different js-libraries.
Now I wonder, how I would include these resources into my spring-boot-project? Simply including the whole node_module-folder as a resource seems to be overhead for me, as this doesn´t neccessarily contain only the required sources (for me it is more like my local maven-repo-path). How do I identify, which java-script-resources should be included in my jar in the end, so that I can also reference them in my Thymeleaf-HTML-templates.
I already found the 'frontend-maven-plugin' (https://github.com/eirslett/frontend-maven-plugin) which helps me to install all my yarn-dependencies during build, but it doesn´t care about the build-process, as far as I can see.
Thanks for your help!
Perhaps you should consider using webpack or some other javascript bundler/task runner to bundle your javascript and required dependencies into a single file. Then you can simply include that bundled file in your jar. For example: http://justincalleja.com/2016/04/17/serving-a-webpack-bundle-in-spring-boot/

How can I use an Elascicsearch plugin in a JVM local node?

I'm in the process of adding support for unicode normalization in ES with the help of the ICU analysis plugin. Installing this in a dedicated cluster is relatively easy, but I also need this plugin to be available during testing, where we use a JVM local node. Since it's a JVM local node I can't simply call the commands as explained in the plugin documentation. How can I get my plugin to work for this local node?
After digging through the source code of Elasticsearch I figured out the answer, and it is stupidly simple: Just make sure the plugins are in your classpath and ES will pick them up automatically. In my case adding the plugin to my pom.xml was enough.

hbase and osgi - can't find hbase-default.xml

as hbase is not available as osgi-ified bundle yet I managed to create the bundle with the maven felix plugin (hbase 0.92 and the corresponding hadoop-core 1.0.0), and both bundles are starting up in OSGi :)
also the hbase-default.xml is added to the resulting bundle. in the resulting osgi-jar, when I open it, the structure looks like this:
org/
META-INF
hbase-default.xml
This was achieved with <Include-Resource>#${pkgArtifactId}-${pkgVersion}.jar!/hbase-default.xml</Include-Resource>
The problem comes up when I actually want to connect to hbase. hbase-default.xml can not be found and thus I can not create any configuration file.
The hbase osgi bundle is used from within another osgi-bundle that should be used to get an hbase connection and query the database. This osgi-bundle is used by an RCP application.
My question is, where do I have to put my hbase-default.xml so that it will be found when the bundle is started? or why does it not realize that the file is existing?
Thank you for any hints.
-- edit
I found a decompiler so I could view the source where the loading of the configuration is executed (hadoop-core which does not provide any sources via maven) and I now see that the Threads contextClassLoader is used (and if not available the classLoader of the Configuration class itself), so it seems to me that it can't find the resource, but, it should, according to the description, also check the parents (but who is the parent in an OSGi environment?)?
I tested to get the resource from the OSGi-bundle that should use hbase, where I added hbase-default.xml to the created jar file (see above), and there I get a resource when I get the contextClassLoader of the thread. When I explored the code a bit more I realized that there is no way to set the classloader for the HBaseConfiguration (although it would be possible to set the classloader for a "simple" hadoop-Configuration, HBaseConfiguration inherits from, but the creation procedure of HBaseConfiguration does not allow it, as it simply creates a new object within the create() method.
I really hope you have some idea how to get this up and running :)
Thread.currentThread().setContextClassLoader(HBaseConfiguration.class.getClassLoader());
Make sure the HBaseConfiguration class loaded in your OSGI bundle.hbase will make use of the thread context classloader, in order to load resources (hbase-default.xml and hbase-site.xml). Setting the TCCL will allow you to load the defaults and override them later.
If hbase-default.xml is in the .jar file which is in the CLASSPATH, that file normally can be find by java program.
I have read the hbase mailing list.
check your pom.xml:
in 'process-resource' phase, hbase-default.xml's '###VERSION###' would be replaced with the actual version string. however, if this phase configuration is set to be 'target', not 'tasks', the replacement would not occur.
You could have a look at your pom.xml, ant correct the label to if so.
faced this issue, actually fixed it by putting hbase-site.xml in the bundle which I was calling hbase from, found advise here:
Using this component in OSGi: This component is fully functional in an OSGi environment however, it requires some actions from the user. Hadoop uses the thread context class loader in order to load resources. Usually, the thread context classloader will be the bundle class loader of the bundle that contains the routes. So, the default configuration files need to be visible from the bundle class loader. A typical way to deal with it is to keep a copy of core-default.xml in your bundle root. That file can be found in the hadoop-common.jar.

Resources