Develop Apache Kafka producer and load testing using JMeter

Develop Apache Kafka producer and load testing using JMeter - jmeter

Is it possible to use JMeter to push messages to Apache Kafka.
How to implement producer (in JAVA) to push messages to Kafka.
Regards,
Anand

I thought there was an answer earlier, maybe not. Have you taken a look at these? I'm using the original kafkameter myself.
https://github.com/BrightTag/kafkameter
https://github.com/EugeneYushin/new-api-kafkameter
and tutorials on kafkameter:
http://www.technix.in/load-testing-apache-kafka-using-kafkameter
http://codyaray.com/2014/07/custom-jmeter-samplers-and-config-elements
For use outside of JMeter, I've found it easier to write a producer load tool in say ruby, python, or node.js than in Java. But it's personal preference. Load scalability is another matter but other languages are easier to prototype out a producer tool.
Update:
Since the original post, there's now another solution/option for JMeter:
https://github.com/GSLabDev/pepper-box
and rather than post specific tutorials about it, you're better off googling for some mix of terms like "Pepper-Box kafka jmeter" and go over the tutorial results for those as there are quite a few. The ones from BlazeMeter should be good.

Yes you can go with the Jmeter by using the above external libraries given by #David, above. Just to add, I'd recommend you to have two different programs for both consumer and producer so that you'll be in more control on what's going on. Such as optimizing and changing the property files within the config according to your requirements. Even though JMeter sounds easy on loadtesting, I'm not sure whether you'll be able to identify the efficiency of message consumption or production, such as identifying the number of messages published or consumed within a certain amount of period (ie: if you're dealing with large number of messages).
Kafka Produce Sample given in the doc and this

Related

Karate WebSocket how to listen to multiple messages within one session?

For our integration tests we have a scenario where we want to listen to a set amount of messages predefined by the environment we use. I've seen that it's possible to listen to multiple messages by opening up a new connection but that doesn't allow for much flexibility.

Have you read the docs, because as far as I know, if you define a "handler" function, you can use the same connection for multiple messages and choose when you want to stop: https://github.com/intuit/karate#websocket
Also see: https://stackoverflow.com/a/67870765/143475
But if you have a very specific need or custom logic, maybe the best thing is is to write a small piece of Java "glue" code and you will get all the flexibility that you want. You may be able to re-use Karate's Java API such as com.intuit.karate.http.WebSocketClient - but this is not documented, and potentially an area where you can research / contribute code.
Here is a good example: https://twitter.com/KarateDSL/status/1417023536082812935 of the flexibility that the Java-interop approach provides.

Difference between Apache NiFi and StreamSets

I am planning to do a class project and was going through few technologies where I can automate or set the flow of data between systems and found that there are couple of them i.e. Apache NiFi and StreamSets ( to my knowledge ). What I couldn't understand is the difference between them and use-cases where they can be used? I am new to this and if anyone can explain me a bit would be highly appreciated. Thanks

Suraj,
Great question.
My response is as a member of the open source Apache NiFi project management committee and as someone who is passionate about the dataflow management domain.
I've been involved in the NiFi project since it was started in 2006. My knowledge of Streamsets is relatively limited so I'll let them speak for it as they have.
The key thing to understand is that NiFi was built to do one really important thing really well and that is 'Dataflow Management'. It's design is based on a concept called Flow Based Programming which you may want to read about and reference for your project 'https://en.wikipedia.org/wiki/Flow-based_programming'
There are already many systems which produce data such as sensors and others. There are many systems which focus on data processing like Apache Storm, Spark, Flink, and others. And finally there are many systems which store data like HDFS, relational databases, and so on. NiFi purely focuses on the task of connecting those systems and providing the user experience and core functions necessary to do that well.
What are some of those key functions and design choices made to make that effective:
1) Interactive command and control
The job of someone trying to connect systems is to be able to rapidly and efficiently interact with the constant streams of data they see. NiFi's UI allows you do just that as the data is flowing you can add features to operate on it, fork off copies of data to try new approaches, adjust current settings, see recent and historical stats, helpful in-line documentation and more. Almost all other systems by comparison have a model that is design and deploy oriented meaning you make a series of changes and then deploy them. That model is fine and can be intuitive but for the dataflow management job it means you don't get the interactive change by change feedback that is so vital to quickly build new flows or to safely and efficiently correct or improve handling of existing data streams.
2) Data Provenance
A very unique capability of NiFi is its ability to generate fine grained and powerful traceability details for where your data comes from, what is done to it, where its sent and when it is done in the flow. This is essential to effective dataflow management for a number of reasons but for someone in the early exploration phases and working a project the most important thing this gives you is awesome debugging flexibility. You can setup your flows and let things run and then use provenance to actually prove that it did exactly what you wanted. If something didn't happen as you expected you can fix the flow and replay the object then repeat. Really helpful.
3) Purpose built data repositories
NiFi's out of the box experience offers very powerful performance even on really modest hardware or virtual environments. This is because of the flowfile and content repository design which gives us the high performance but transactional semantics we want as data works its way through the flow. The flowfile repository is a simple write ahead log implementation and the content repository provides an immutable versioned content store. That in turn means we can 'copy' data by only ever adding a new pointer (not actually copying bytes) or we can transform data by simply reading from the original and writing out a new version. Again very efficient. Couple that with the provenance stuff I mentioned a moment ago and it just provides a really powerful platform. Another really key thing to understand here is that in the business of connecting systems you don't always get to dictate things like size of data involved. The NiFi API was built to honor that fact and so our API lets processors do things like receive, transform, and send data without ever having to load the full objects in memory. These repositories also mean that in most flows the majority of processors do not even touch the content at all. However, you can easily see from the NiFi UI precisely how many bytes are actually being read or written so again you get really helpful information in establishing and observing your flows. This design also means NiFi can support back-pressure and pressure-release naturally and these are really critical features for a dataflow management system.
It was mentioned previously by the folks from the Streamsets company that NiFi is file oriented. I'm not really sure what the difference is between a file or a record or a tuple or an object or a message in generic terms but the reality is when data is in the flow then it is 'a thing that needs to be managed and delivered'. That is what NiFi does. Whether you have lots of really high speed tiny things or you have large things and whether they came from a live audio stream off the Internet or they come from a file sitting on your harddrive it doesn't matter. Once it is in the flow it is time to manage and deliver it. That is what NiFi does.
It was also mentioned by the Streamsets company that NiFi is schemaless. It is accurate that NiFi does not force conversion of data from whatever it is originally to some special NiFi format nor do we have to reconvert it back to some format for follow-on delivery. It would be pretty unfortunate if we did that because what this means is that even the most trivial of cases would have problematic performance implications and luckily NiFi does not have that problem. Further had we gone that route then it would mean handling diverse datasets like media (images, video, audio, and more) would be difficult but we're on the right track and NiFi is used for things like that all the time.
Finally, as you continue with your project and if you find there are things you'd like to see improved or that you'd like to contribute code we'd love to have your help. From https://nifi.apache.org you can quickly find information on how to file tickets, submit patches, email the mailing list, and more.
Here are a couple of fun recent NiFi projects to checkout:
https://www.linkedin.com/pulse/nifi-ocr-using-apache-read-childrens-books-jeremy-dyer
https://twitter.com/KayLerch/status/721455415456882689
Good luck on the class project! If you have any questions the users#nifi.apache.org mailing list would love to help.
Thanks
Joe

Both Apache NiFi and StreamSets Data Collector are Apache-licensed open source tools.
Hortonworks does have a commercially supported variant called Hortonworks DataFlow (HDF).
While both have a lot of similarities such as a web-based ui, both are used for ingesting data there are a few key differences. They also both consist of a processors linked together to perform transformations, serialization, etc.
NiFi processors are file-oriented and schemaless. This means that a piece of data is represented by a FlowFile (this could be an actual file on disk, or some blob of data acquired elsewhere). Each processor is responsible for understanding the content of the data in order to operate on it. Thus if one processor understands format A and another only understands format B, you may need to perform a data format conversion in between those two processors.
NiFi can be run standalone, or as a cluster using its own built-in clustering system.
StreamSets Data Collector (SDC) however, takes a record based approach. What this means is that as data enters your pipeline it (whether its JSON, CSV, etc) it is parsed into a common format so that the responsibility of understanding the data format is no longer placed on each individual processor and any processor can be connected to any other processor.
SDC also runs standalone, and also a clustered mode, but it runs atop Spark on YARN/Mesos instead, leveraging existing cluster resources you may have.
NiFi has been around for about the last 10 years (but less than 2 years in the open source community).
StreamSets was released to the open source community a little bit later in 2015. It is vendor agnostic, and as far as Hadoop goes Hortonworks, Cloudera, and MapR are all supported.
Full Disclosure: I am an engineer who works on StreamSets.

They are very similar for data ingest scenarios.
Apache NIFI(HDP) is more mature and StreamSets is more lightweight.
Both are easy to use, both have strong capability. And StreamSets could easily
They have companies behind, Hortonworks and Cloudera.
Obviously there are more contributors working on NIFI than StreamSets, of course, NIFI have more enterprise deployments in production.

Two of the key differentiators between the two IMHO are.
Apache NiFi is a Top Level Apache project, meaning it has gone through the incubation process described here, http://incubator.apache.org/policy/process.html, and can accept contributions from developers around the world who follow the standard Apache process which ensures software quality. StreamSets, is Apache LICENSED, meaning anyone can reuse the code, etc. But the project is not managed as an Apache project. In fact, in order to even contribute to Streamsets, you are REQUIRED to sign a contract. https://streamsets.com/contributing/ . Contrast this with the Apache NiFi contributor guide, which wasn't written by a lawyer. https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-HowtocontributetoApacheNiFi
StreamSets "runs atop Spark on YARN/Mesos instead, leveraging existing cluster resources you may have." which imposes a bit of restriction if you want to deploy your dataflows further toward the Edge where the Devices that are generating the data live. Apache MiniFi, a sub-project of NiFi can run on a single Raspberry Pi, while I am fairly confident that StreamSets cannot, as YARN or Mesos require more resources than a Raspberry Pi provides.
Disclosure: I am a Hortonworks employee

Advantages of HornetQ vs ActiveMQ vs Qpid

I was browsing for an open source messaging software and after some good bit of research I came across these three products. I've taken these out for a preliminary test drive, having had them handle messages for queues and topics, and from what I've read all three of these products are good picks for an Open Source messaging solution for most companies. What I was wondering was what are the advantages that these products may have over one another? What I'm particularly interested in is messaging throughput, including persistent messaging throughput, security, scalability, reliability, support, routing capabilities, administrative options such as metrics and monitoring, and generally just how well each program runs in a large business environment.

Check out http://queues.io/
From their site:
The goal is to create a quality list of queues with a collection of articles, blog posts, slides, and videos about them. After reading the linked articles, you should have a good idea about: the pros and cons of each queue, a basic understanding of how the queue works, and what each queue is trying to achieve. Basically, you should have all the information you need to decide which queue will best fit your needs.

'messaging' covers a lot of options - and there must be at least a dozen different types of technologies that could be the right answer - having built many production messaging environments, using a variety of technologies/approaches, having a better understanding your requirements would help.
are you needing subject-based subscriptions? do you need multicast delivery? do you need dynamic subscribers/listeners? would your listeners be requerying for best sources even after finding an acceptable publisher/feed?
do you need guaranteed delivery? delivery confirmation? is you publisher storing any undelivered messages, or do you need the messaging system to do that for you automagically? how often does your feed data go stale - e.g. email-ish alerts can be store-and-forward but real-time pricing data is only valid for a short interval (and then probably needs to go away rather than cause confusion)
how volatile is your network topology? are your subscribers (or publishers) expecting to live at a fixed address? or are they mobile devices? could they appear to you over more complex internetwork topologies requiring registration and possibly imposing routing restrictions? if so any idea the frequency of these topology changes?
do you only need a java interface? are any of your subscribers to be integrated into windows components (like feeds into excel)?
if you're only interested in experience comparing the similar products you named then perhaps you have already thought through these topics.
as to products, in my experience Tibco is still the leader in throughput and scalability, especially in a real-time environment. ibm MQ would be next, especially in a store-and-forward architecture. with both of those products you get a level of support on which you can justify betting a fundamental part of your business systems. there's a reason both of those have been around for a couple of decades.
another often overlooked option is Tuxedo - it provides not only messaging but a proven transactional capability that remains unparalleled. Oracle continue to be committed to this product and, again, the level of support available is second to none.
i love open sourced solutions and am always glad to find production quality software for free - but if you are creating a fundamental part of your business infrastructure then an active community still might not indicate whether a particular voluntary project is the best bet.
my 2c worth. hope it helps.

First, I am no expert in this, but maybe I can give you some thought hints.
ActiveMQ and Qpid are both under the Apache umbrella and are message queues. But Qpid is an implementation of the AMQP specification.
AMQP is a protocol specification, on the wire level, so messages can be exchanged with other AMQP message queues (e.g RabbitMQ).
ActiveMQ and HornetQ are queues that you can use with a JMS API. The Java Message Service is a specification on an API level.
But you have the option to access Qpid via a JMS API, too.
I think performance is a secondary thought. To have an active community is more important.

http://x-aeon.com/wp/2013/04/10/a-quick-message-queue-benchmark-activemq-rabbitmq-hornetq-qpid-apollo/
Benchmark includes some performance numbers for you to decide, with both persistent and transient results.

What are zeromq use cases?

Could you give some examples of zeromq?

Let's say you want to have a bulletin board of some kind. You want to allow only some people to see it, by subscribing to the bulleting board.
This can be done using the publisher/subscriber model of ZeroMQ.
Now, let's say you need to send some asynchronous messages. That is, when a message is sent from system A and needs to get to system B, it is guaranteed to be delivered later, even if systems A and B cannot communicate at the moment when that message is sent. You can imagine a use case being SMS messages.
This can be done using asynchronous messaging model of ZeroMQ.
Basically, any JMS compliant solution like ZeroMQ will allow you to reliably broadcast or send a "message", whatever that message may be, to some other party with as little hassle as possible.

Please see the ZeroMQ blog -- they regularly post usage stories about different deployments, language bindings, etc.

IPython uses ZeroMQ for parallel computing features, the qt console and the notebook.

Last time Rick Olson created a "clone" of Dropbox: https://gist.github.com/122849a52c5b33c5d890

My personal use of this library is cross language communication:
I pass data between Python and Haskell

They also have an excellent guide that offers a complete peek into the possible use cases and their real time applications.
And if you have more time, you can go thru the whole website

Tool for posting test messages onto a JMS queue? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Can anyone recommend a tool for quickly posting test messages onto a JMS queue?
Description:
The tool should allow the user to enter some data, perhaps an XML
payload, and then submit it to a queue.
I should be able to test consumer without producer.

This answer doesn't apply to all JMS brokers, but if you happen to be using Apache ActiveMQ, the web-based admin console (by default at http://localhost:8161/admin) allows you to manually send text messages to topics or queues. It's handy for debugging.

HermesJMS seems to be a rather powerful client for interacting with JMS providers. In my opinion, it is pretty unintuitive and hard to set up, though. (At least I'm mostly failing at it...)
Other, more user-friendly clients are often vendor-specific. Sonic Message Manager is a very nice and simple-to-use open-source JMS client for SonicMQ. It would be great to have a client like that working with different providers.

The ActiveMQ's web-based admin console has a big deficiency - one cannot specify any headers / custom properties when posting a message.
I came across a neat FOSS tool that can post a message and also specify headers/properties:
http://sourceforge.net/projects/activemqbrowser/
HTH

Apache JMeter is a tool (written for the Java platform) which allows:
sending messages to a queue ( point to point)
publishing/subscribing to a topic
sending both persistent and non persistent messages
sending text , map and object messages
Apache ActiveMQ includes a ProducerTool and a ConsumerTool example sources (Java) with many command-line configuration options. As it is based on the JMS API, using it with other message brokers should be easy with minor modifications.

IBM provide a free, powerful command line tool called perfharness.
Although aimed at benchmarking JMS providers, it's really good at generating (and consuming) test messages. You can use data either generated randomly or taken from a file.
The power features include sending and consuming messages at a fixed rate, using a specific number of threads, using either JMS or native MQ, etc. It generates statistics telling you exactly how fast your queue is performing (hence the name).
The only down side is that it's not super intuitive, given the number of operations it supports.

I recommend the approach of #Will and using the Web Console of ActiveMQ which lets you post messages and browse queues or delete messages easily.
Another approach I often use is to use a directory of files as sample data and use a Camel route to move the messages from the directory to a JMS queue - or to take them from a queue and save them to disk etc
e.g.
from("file://someDirectory").
to("activemq:MyQueue");
This would move all the files from someDirectory and send them to an ActiveMQ queue called MyQueue. If you'd rather leave the files in place you can use the URI "file://someDirectory?noop=true".
For more details see
the file endpoint in Camel
a sample Camel example routing from files to JMS
the various enterprise integration patterns Camel supports

Also if the JMS broker supports JMX like ActiveMQ does you can use JConsole to post message and do a lot more.

ActiveMQ has a web console for sending test messages (like mentioned above), but if your provider doesn't have this, it might be easiest to just write a console app/web page to post test messages. Sending a message in JMS isn't too hard, you might get the most benefit just writing your own test client.
If you can use Spring in Java, it has some really powerful utilities, check out the JmsTemplate.

I'm not aware of a simple client. I remember looking for one a long time ago when I researched different queue systems and trying JMS I couldn't find one then, and I couldn't find one now. One thing though - there are a ton of tutorials that get you started and you could do a simple form to achieve that.
Sorry to be not more helpful.

I have built a GUI tool for administering Open Source JMS Servers (Currently Activemq and Hornetq). It can send and receive messages and most of the usual stuff, as well as aggregate queues and topics into logical "groups".
Its a commercial product but the BETA is free and is fully functional.
try it out at http://www.rockeyesoftware.com/

For ActiveMQ the examples directory holds scripts. For Rubyists, look at example/ruby/stompcat.rb and catstomp.rb for subscribing and publishing.

I'm a brazilian developer and I made a Java program for Post HTTP and JMS Messages his available for download at: https://sites.google.com/site/felipeglino/softwares/posttool
In thath page you can found english instructions.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio