Apache MiNifi- Putelasticsearch - apache-nifi

i make flow, which process real time data from local server and send relevant data to Elasticsearch. I use Minifi, but when I run MiNifi it returned the following error.
Does anyone know, where is the issue?
Thanks
ERROR [Timer-Driven Process Thread-10] o.a.n.p.elasticsearch.PutElasticsearch5 PutElasticsearch5[id=4ed70cbe-9838-35cd-0000-000000000000] PutElasticsearch5[id=4ed70cbe-9838-35cd-0000-000000000000] failed to process due to java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.Version; rolling back session: {}
java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.Version
at org.elasticsearch.common.io.stream.StreamOutput.(StreamOutput.java:73)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:60)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:57)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:47)
at org.elasticsearch.common.xcontent.XContentBuilder.builder(XContentBuilder.java:67)
at org.elasticsearch.common.settings.Setting.arrayToParsableString(Setting.java:698)
at org.elasticsearch.common.settings.Setting.lambda$listSetting$26(Setting.java:656)
at org.elasticsearch.common.settings.Setting$2.getRaw(Setting.java:660)
at org.elasticsearch.common.settings.Setting.get(Setting.java:300)
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:164)
at org.elasticsearch.client.transport.TransportClient.newPluginService(TransportClient.java:81)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:106)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:228)
at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:69)
at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:65)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5TransportClientProcessor.getTransportClient(AbstractElasticsearch5TransportClientProcessor.java:230)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5TransportClientProcessor.createElasticsearchClient(AbstractElasticsearch5TransportClientProcessor.java:170)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5Processor.setup(AbstractElasticsearch5Processor.java:94)
at org.apache.nifi.processors.elasticsearch.PutElasticsearch5.onTrigger(PutElasticsearch5.java:177)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1122)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

In order to reduce its footprint, MiNiFi java only ships with the standard bundle of processors. In order to use the other processors that are present within a standard NiFi deployment in MiNiFi, you need to put the appropriate "nar" file into the "lib" of the MiNiFi deployment.
For "PutElasticSearch" you need "nifi-elasticsearch-nar-.nar" where "" is the version of NiFi that your version of MiNiFi is built off of. Versions 0.4.0 of MiNiFi java uses NiFi 1.5.0.
For more information and a list of the processors that do come bundled with MiNiFi out of the box see the "MiNiFi Java Agent Quick Start" documentation, section "Using Processors Not Packaged with MiNiFi"[1]. For more information on the different versions of MiNiFi correspond to the version of NiFi frameworks, see here[2].
[1] https://nifi.apache.org/minifi/minifi-java-agent-quick-start.html
[2] https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Versioning+and+Toolkit+Compatibility

Related

NiFi doesn't start on MacOS Big Sur

I have installed NiFi using Homebrew following the instructions on this page.
Once I go start NiFi using
nifi start
I get the following:
Java home: /usr/local/opt/openjdk#11/libexec/openjdk.jdk/Contents/Home
NiFi home: /usr/local/Cellar/nifi/1.15.0/libexec
Bootstrap Config File: /usr/local/Cellar/nifi/1.15.0/libexec/conf/bootstrap.conf
Error: Could not find or load main class org.apache.nifi.bootstrap.RunNiFi
Caused by: java.lang.ClassNotFoundException: org.apache.nifi.bootstrap.RunNiFi
I also see this error in the nifi-app.log
2021-12-08 13:06:37,463 ERROR [Write-Ahead Local State Provider Maintenance] o.a.n.c.s.p.l.WriteAheadLocalStateProvider Failed to checkpoint Write-Ahead Log used to stor$
java.io.FileNotFoundException: ./state/local/partition-0/1.journal (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
at org.wali.MinimalLockingWriteAheadLog$Partition.rollover(MinimalLockingWriteAheadLog.java:788)
at org.wali.MinimalLockingWriteAheadLog.checkpoint(MinimalLockingWriteAheadLog.java:534)
at org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider$CheckpointTask.run(WriteAheadLocalStateProvider.java:286)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Any ideas?
I got the same error. Mine was due to the 8443 being occupied. I think, you can either find what's using the 8443 or you can try to change the port nifi.web.https.port=8443 in ./bin/conf/nifi.properties to something else. I hope it helps if you are still facing the issues.

Kafka Streams - Volumes to state stores causes "Failed to delete the state directory" errors

Short overview:
I have a service written in scala using kafka-streams running inside a dedicated docker.
To allow state-stores dirs to be stored where i want, i created a volume from the state stores dir inside the container to some dir outside. Once i did that, started seeing such exceptions in the container logs:
Failed to delete the state directory.
java.nio.file.DirectoryNotEmptyException: /usr/src/app/stateStores/MyServiceStreams___20/0_0
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:763)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:746)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:746)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:290)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:253)
at org.apache.kafka.streams.KafkaStreams$2.run(KafkaStreams.java:795)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems it doesn't effect any functionality, just spams the logs.
Analysing the logs it show the following line before the error message:
Deleting obsolete state directory 0_0 for task 0_0 as 79998936ms has elapsed (cleanup delay is 6000000ms
Increasing the state.cleanup.delay.ms param didn't help.
[Edit]
Some technicals:
The /usr/src/app/stateStores/ are with root permissions
from inside the container, can remove the dirs (running as root). From outside, no (not root..)
kafka-streams version: 2.1.0
Please assist

Debugging Spark standalone cluster with idea

I am trying to debug a Spark Application on a local cluster using a master and a worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager with start-master.sh and it works.But I want to how Spark Application works in the spark cluster, so I want to start the cluster in debug mode. I read the start-master.sh codeļ¼Œ mock the args and start org.apache.spark.deploy.master.Master main method.Unfortunately it gets a NoClassDefFoundError,I can't open the webui. I want to know where the problem is.
The Error is :
Exception in thread "dispatcher-event-loop-1" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:81)
at org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:48)
at org.apache.spark.deploy.master.ui.MasterWebUI.<init>(MasterWebUI.scala:43)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:131)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:122)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:216)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.util.thread.ThreadPool
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
my debug configurations is:
enter image description here
Thanks!
I would suggest not to even use a spark standalone cluster for debugging.
You can run spark locally in the your IDE with breakpoints.
Spark provides you option to run locally pointing to local filesystem as HDFS.
Please follow the following link to know more about how to write test cases for local mode in spark
http://bytepadding.com/big-data/spark/word-count-in-spark/

Accessing hdfs from docker-hadoop-spark--workbench via zeppelin

I have installed https://github.com/big-data-europe/docker-hadoop-spark-workbench
Then started it up with docker-compose up . I navigated to the various urls mentioned in the git readme and all appears to be up.
I then started a local apache zeppelin with:
./bin/zeppelin.sh start
In zeppelin interpreter settings i have navigated then to spark interpreter and updated the master to point to the local cluster installed with docker
master: updated from from local[*] to spark://localhost:8080
I then run in a notebook the following code:
import org.apache.hadoop.fs.{FileSystem,Path}
FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///")).foreach( x => println(x.getPath ))
I get this exception in zeppelin logs:
INFO [2017-12-15 18:06:35,704] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20171212-200101_1553252595 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter#32d09a20
WARN [2017-12-15 18:07:37,717] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20171212-200101_1553252595 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
How can I access the hdfs from zeppelin and java/spark code?
Reason for the exception is that the sparkSession object is null for some reason in Zeppelin.
Reference:
https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
private SparkContext createSparkContext_2() {
return (SparkContext) Utils.invokeMethod(sparkSession, "sparkContext");
}
Might be a configuration related issue. Please cross-verify the settings/configuration and spark cluster settings. Make sure that spark is working fine.
Reference: https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Hope this helps.

Getting ClusterBlockException while running queries using node client

My elasticsearch cluster(version 2.0) is started and the node client is built successfully, but for some reason I'm getting the following error while running queries using node client.
20:15:15.479 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors updated due to agent reconnected:{}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:151)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:199)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] WARN c.b.o.m.d.DataCollectorPollStatusDAOESImpl - blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors for which polls updated after epoc time:1453128243336 - dcids: []
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsNotUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:182)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:204)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
I've even disabled the "multicast" as per this post - still no luck. Surprisingly, I could access the elasticsearch from sense. Any clues on what is going wrong ?
I faced the same error message and was not able to understand the problem first. I was developing a node client Java application on my laptop, using an Elasticsearch data node on a remote server. For production use, I needed to deploy the Java application on this remote server.
I configured the Java application to talk to the local host only (being on the same host now):
elasticsearch.discovery.zen.ping.unicast.hosts=127.0.0.1
And got the same exception
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
Looking at the logs I also found this entry:
[WARN] [TP-Processor2] DiscoveryService.waitForInitialState -> [cerbera] waited for 30s and no initial state was set by the discovery
So basically, the question was: Why doesn't it find the Elasticsearch data node? I changed port ranges and also played with the multicast setting - without success.
Finally, I checked elasticsearch.yml and found the data node not listening to localhost (127.0.0.1), but instead on the ethernet interface 192.168.1.2.
network.host: 192.168.1.2
http.port: 9200
The final change was simple, I just needed to reconfigure the node client configuration to talk to the correct interface
elasticsearch.discovery.zen.ping.unicast.hosts=192.168.1.2
Now my node client is talking to elasticsearch via the correct interface. Job done.
I had the same problem (using k8s ) I finally replaced my elastic image and the issue was solved...
moved from 6.5.4-debian-9-r41 to 6.8.16-debian-10-r5 (using bitnami images)
I know it is not the best answer - but I really tried suggested answers and nothing worked for me. so my recommendation is to update to a newer better version. (docker makes that easy:) )

Resources