Faster way of developing and Testing new Nifi Processor

Faster way of developing and Testing new Nifi Processor - apache-nifi

I am developing new Nifi processor for my data flow. I make code changes in eclipse , creates new .nar file and copy it to Nifi lib for testing it.
On ever nar update Nifi needs a restart which takes a significant amount of time.
Is there any better way of testing your new .nar in Nifi ? Because restarting Nifi for every small change reduces your development speed.

There are a few options for rapid prototyping and testing that make developing Apache NiFi processors easier.
Model your code in ExecuteScript -- using the ExecuteScript processor means you can make code changes to the domain-related code (whatever you type into the processor Script Body property or a file referenced by Script File) without having to build anything or restart the application. You can replay the same flowfiles through the updated code using the provenance replay feature. You can also test your scripts directly with Matt Burgess' NiFi Script Tester tool. Once you have acceptable behavior, take the script body and migrate it to a custom processor that can be deployed.
Use the unit testing and integration testing features of NiFi -- the test harnesses and "runners" provided by the core framework will allow you to simulate flow scenarios in automated tests before deploying the entire application. It takes a little time to build out the first flow, but once you do, it's a repeatable and understandable process which you can use to cover edge cases and ensure desired behavior.

Just check how testing done for standard nifi processors. And do the same. For example look at dbcp https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-dbcp-service/src
For those tests you don't need to start nifi.

Related

Understand JMeter technicalities for comparing distributed vs independent JMeter engines

I'd few ques on technical details of JMeter mostly pertaining to distributed setup vs independent JMeter engines (since JMeter controller can become a bottleneck in case of several JMeter load generators). Would be great if anybody can help with the understanding here -
How is JMeter distributed setup orchestrated by JMeter controller (i.e. called master or client)? Can we use the same logic to synchronize test among independent JMeter engines (independent mode)?
Is there a way to pool connections across vUsers?
Function of ASYNC_QUEUE in backend listener and it's expected side-effects in independent mode (mentioned above), what happens when queue is full?
Does/Is there a way for JMeter to execute javascript/act as headless browser?
How does DNS resolution happen for JMeter? Does it resolve for each vuser?

Your "question" looks like a compilation of interview questions rather than something connected with your single current concern and I don't think it's a proper place/way to ask it, I believe it should be: one post - one question.
Whatever
How is JMeter distributed setup orchestrated by JMeter controller - JMeter master sends .jmx script to slaves and collects results from them. Theoretically you can implement your own mechanism for delivering the test plan and eventual dependencies to the individual JMeter engines and running the test at the same time. Then you will need to collect the .jtl results files from the engines and combine it into a single one.
Is there a way to pool connections across vUsers? - JMeter does it internally
When the queue is full no more new sample results will be taken for processing by the backend listener so the results won't be "realtime" anymore, you will see the new results as free slots will be appearing in the queue
For JMeter per-se - no, AJAX calls can be simulated using Parallel Controller, for client-side performance testing, JavaScript execution profiling and rendering speed measurement you will need to use a read browser, no matter normal or headless, there is WebDriver Sampler plugin providing JMeter integration with Selenium
DNS resolution is dependent on underlying OS and/or JVM DNS resolution implementation, there is DNS Cache Manager which enables overriding hosts entries and using custom DNS resolver so each thread looks up the IP address on its own

Why it is recommended to run load test in non gui mode in jmeter

I'm monitoring the connect time and latency to connect from jmeter machine while running in GUI mode and that is in within acceptable limit.
Should we strictly follow non GUI mode even though I can able to perform load test with GUI mode?
I'm targeting 250 TPS and able to achieve that ..I have increased my memory and monitoring CPU and memory of load generator is below 60%.
Should I go for non GUI mode ?

The main limitation is that each event in the queue is being handled by a single event dispatch thread which will act as the bottleneck on your JMeter side.
My expectation is that your "250 TPS" look like:
while it should look like:
So check how does your load pattern look like using i.e. Transactions per Second listener (installable via JMeter Plugins Manager)
Also check how does your JVM look like especially when it comes to garbage collection, it can be done via i.e. JVisualVM, most probably you will see the same "chainsaw" pattern

You don't need to follow JMeter best practices, but
you may encounter issues to achieve specifc goals (as TPS)
your machine can't execute GUI or have low resources
you execute JMeter using a script or build tool as Jenkins
Also it's better to be familiar with JMeter CLI (non GUI) and its report capabilities
JMeter supports dashboard report generation to get graphs and statistics from a test plan.
Also it will be needed for using distributed testing
consider running multiple CLI JMeter instances on multiple machines using distributed mode (or not)
CLI also useful for Parameterising tests
The "loops" property can then be defined on the JMeter command-line:
jmeter … -Jloops=12

Forking a JVM process per feature file?

I have a number of feature files in my cucumber scenario test suite.
I run the tests by launching Cucumber using the CLI.
These are the steps which occur when the test process is running:
We create a static instance of a class which manages the lifecycle of testcontainers for my cucumber tests.
This currently involves three containers: (i) Postgres DB (with our schema applied), (ii) Axon Server (event store), (iii) a separate application container.
We use spring's new #DynamicPropertySource to set the values of our data source, event store, etc. so that the cucumber process can connect to the containers.
#Before each scenario we perform some clean up on the testcontainers.
This is so that each scenario has a clean slate.
It involves truncating data in tables (postgres container), resetting all events in our event store (Axon Server container), and some other work for our application (resetting relevant tracking event processors), etc.
Although the tests pass fine, the problem is by default it takes far too long for the test suite to run. So I am looking for a way to increase parallelism to speed it up.
Adding the arguments --threads <n> will not work because the static containers will be in contention (and I have tried this and as expected it fails).
The way I see it there is are different options for parallelism which would work:
Each scenario launches its own spring application context (essentially forking a JVM), gets its own containers deployed and runs tests that way.
Each feature file launches its own spring application context (essetially forking a JVM), gets its own containers deployed and runs each scenario serially (as it would normally).
I think in an ideal world we would go for 1 (see *). But this would require a machine with a lot of memory and CPUs (which I do not have access to). And so option 2 would probably make the most sense for me.
My questions are:
is it possible to configure cucumber to fork JVMs which run assigned feature files (which matches option 2 above?)
what is the best way to parallelise this situation (with testcontainers)?
* Having each scenario deployed and tested independently agrees with the cucumber docs which state: "Each scenario should be independent; you should be able to run them in any order or in parallel without one scenario interfering with another. Each scenario should test exactly one thing so that when it fails, it fails for a clear reason. This means you wouldn’t reuse one scenario inside another scenario."

This isn't really a question for stack overflow. There isn't a single correct answer - mostly it depends. You may want to try https://softwareengineering.stackexchange.com/ in the future.
No. This is not possible. Cucumber does not support forking the JVM. Surefire however does support forking and you may be able to utilize this by creating a runner for each feature file.
However I would reconsider the testing strategy and possibly the application design too.
To execute tests in parallel your system has to support parallel invocations. So I would not consider resetting your database and event store for each test a good practice.
Instead consider writing your tests in such a way that each test uses its own isolated set of resources. So for example if you are testing users, you create randomized users for each test. If these users are part of an organization, you create a random organization, ect.
This isn't always possible. Some applications are designed with implicit singleton resources in the code. In this case you'll have to refactor the application to make these resources explicit.
Alternatively consider pushing your Cucumber tests down the stack. You can test business logic at any abstraction level. It doesn't have to be an integration test. Then you can use JUnit with Surefire instead and use Surefire to create multiple forks.

start dataflow using .bat file?

how to start the dataflow I created without accessing the apache nifi interface. Is it possible to trigger run by running a .bat file? I am new in apache nifi and somewhat clueless on the limitation of apache nifi
I saved the dataflow as a template and want to start it without accessing apache nifi interface

There are several ways to start a processor.
Timer driven
This is the default mode. The Processor will be scheduled
to run on a regular interval. The interval at which the Processor is
run is defined by the 'Run Schedule' option (see below).
CRON driven
When using the CRON driven scheduling mode, the Processor is scheduled
to run periodically, similar to the Timer driven scheduling mode.
However, the CRON driven mode provides significantly more flexibility
at the expense of increasing the complexity of the configuration. The
CRON driven scheduling value is a string of six required fields and
one optional field, each separated by a space.
Event driven
When this mode is selected, the Processor will be triggered to run by
an event, and that event occurs when FlowFiles enter Connections
feeding this Processor. This mode is currently considered experimental
and is not supported by all Processors. When this mode is selected,
the 'Run Schedule' option is not configurable, as the Processor is not
triggered to run periodically but as the result of an event.
Additionally, this is the only mode for which the 'Concurrent Tasks'
option can be set to 0. In this case, the number of threads is limited
only by the size of the Event-Driven Thread Pool that the
administrator has configured.
You can read more about it in the Scheduling part of the NiFi User Guide.
If you specifically want to start a processor from a bat file, you can use cURL. For that your flow must start with either ListenHTTP or HandleHttpRequest. E.g. if ListenHTTP listens on port 8089 and your NiFi instance is accessible via my-nifi-intance.com, then you will have a webhook like my-nifi-intance.com:8089/webhook that will initiate the flow.
Since you are asking a very basic question, I encourage you to start with reading the Apache NiFi User Guide.

can we have any jmeter control over the threads?

can we have any control over the threads?
Consider i have 10 threads and i have provided my test data in .csv file. so can I control on threads like which thread should pick which data and may be some delay for few of the threads?
Also, can someone suggest me some book or online content wherein i can have information on internals of JMeter. Like when we run test plan, what all things are happening on memory side, reading of different properties files, receiving response, how threads internally works, etc.
Thanks,
Abhishek

JMeter is a very flexible and powerful tool. In theory, anything is possible it all depends on what your testing goals are. Even things not supported by JMeter can be coded in Java and easily integrated with a Java Sampler. Your question indicates you have not spent a lot of time experimenting with the tool, but hopefully my answer jump-starts that process for you.
JMeter has a lot of control features that can be used in conjunction with CSV data to control the flow of a thread. For example, use the CSV data to correctly enter the right block of a Switch Controller, validate an If Controller, or control the number of loops in a Loop Controller. Be sure you read the entire Getting Started Guide and familiarize yourself with the Component Reference Guide
In terms of how things work internally, your best bet is to build the JMeter project from source in an IDE like Eclipse. You can then step through the entire program in as much detail as you want.
Tutorial: Build JMeter from Source
Also, the /bin/jmeter file has a decent number of comments about how to properly configure JVM memory for a JMeter process.
You probably want to install at least the most basic JMeter Plugin Package.
Lastly, if you need one thread to control the behavior of another thread you can use FIFO Queues or set JVM properties via Beanshell which are global and not unique to a thread like runtime variables.
props.put("key","value");

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio