Load testing with Kafka and Storm - performance

Our system takes post request and send json body to kafka topic and this topic is configured as spout for topology and topology generate output message to a kafka topic.
How i can load test this system? Number of message processed per second by system.
I am planning to use jmeter for load testing

JMeter is the one which comes in our mind whenever we think about performance or load testing of Web APIs, however you can also check for alternatives like LoadRunner etc.
However, from Kafka and Storm point of view, you need to write storm kafka spout to read and commit offsets as well as you might need some monitoring tool to check and validate how it behaves under load like
Throughput (messages/sec) on size of data.
Throughput (messages/sec) on number of messages.
Performance at Producer side
Performance at Consumer side
I've tried Yahoo Kafka manager which is opensource and serves the purpose, might want give it a try: https://github.com/yahoo/kafka-manager, however there other monitoring tools also Kafka Monitoring tool in Production
If you want focus only Kafka load performance benchmarking then below command under your kafka distribution would be very helpful:
kafka-producer-perf-test.sh
kafka-consumer-perf-test.sh
For more details: https://gist.github.com/ueokande/b96eadd798fff852551b80962862bfb3

If you plan to use JMeter it makes sense to consider Pepper-Box - Kafka Load Generator plugin, it comes with PepperBoxKafkaSampler which provides handy UI allowing to specify your Kafka endpoints, topics, etc.
If you need also to read messages from the Kafka topics you can use JSR223 Test Elements and Groovy language for this, check out Apache Kafka - How to Load Test with JMeter and Writing a Kafka Consumer in Java articles for more information and example code snippets

Related

kafka streams - can I use kafka streams processing in cases where the source is not a kafka topic?

I have an application (call it smscb-router) as shown in the diagram.
It reads data from a legacy system (sms).
Based on the content (callback type), I have to put into corresponding outgoing topic (such as billing-n-cdr, dr-cdr, ...)
I think streams API is better suited in this case, as it has the map functionality to do the content mapping check. What I am unsure is, can I read source data from a non-kafka-topic source.
All the examples that I see on the internet blogs, explain steaming apps with the context of reading from a source topic and put to other destination topics.
So, is this possible to read from a non-topic source, such as say a redis store, or a message queue such as RabbitMQ?
We had a recent implementation, where we had to poll an .xml file from a network attached drive and convert it into the KAFKA Events i.e. publishing each record into an output topic. In such, we wont even call it as something we have developed using a Streams API, but it is just a KAFKA Producer Component.
Java File Poller Module (Quartz time based) -> XML Schema Management -> KAFKA Producer Component -> Output Topic (KAFKA Broker).
And you will get all native features of KAKFA Producer API in terms of retries and you can use producer.send (Sync) or producer.send.get(Asyn) with call-back.
Hope this helps. Streams API is meant for big and something very complex that to be normalized through using Stateful operations.
Thanks,
Christopher
Kafka Streams is only about Topic to Topic Data Streaming
All external system should be integrated by another method :
Ideally Kafka Connect : for example with this one :
https://docs.confluent.io/kafka-connect-rabbitmq-source/current/overview.html
You may also use a manual consumer for the first step, but it always better to reuse all availability mecanism built in Kafka Connect. (No code, just some Json config).
In your schema i would recommend to add one topic and one producer or one connector in front of your Pink Component, then it can become a fully standard Kafka Streams microservice.

Get kafka broker, consumer, producer metrics using confluent-kafka-go

I cannot find any reference on implementation of getting metrics.
Can Someone help with an example and references?
As stats_example says here, You can get stats listed in STATISTICS.md. But clearly mentioned in the example comments, You need to implement metrics
Stats events are emitted as JSON (as string). Either directly forward
the JSON to your statistics collector, or convert it to a map to
extract fields of interest.
So in this case, In your application, you need to implement metrics collector something like prometheus
And if you want full broker side metrics, You can implement Kafka monitoring As Kafka Documentation explained here
Kafka uses Yammer Metrics for metrics reporting in the server. The
Java clients use Kafka Metrics, a built-in metrics registry that
minimizes transitive dependencies pulled into client applications.
Both expose metrics via JMX and can be configured to report stats
using pluggable stats reporters to hook up to your monitoring system.

Is it possible sending websocket messages to a kafka topic?

I am trying to find a way to consume messages that being sent by a websocket to a kafka topic (the messages are sent by the websocket to the address 'ws://address:port/topic_name' and I want to add all of those messages to a kafka topic).
I read about kafka connect and tried to find a way to do it with it but it doesnt seem to work...
thanks in advance :)
There is no Kafka Connector to a socket in Confluent Platform.
I work in a team that use Kafka in production and our source is a socket, so your options are to use platforms that support this socket->Kafka producing, or write one by yourself.
About possible platforms, I think most of them will be overkill though you can utilize them for this problem, some options are:
1. NiFi or MiniFi for smaller loads, use PublishKafka Processor
2. StreamSets with Kafka Producer Destination
3. Apache Flume- not very recommended, this project is stops to evolve.
If you wish to write your own producer, you basically have to create a listener on this port, and produce the incoming messages to Kafka; if this is a web socket, just get the payload of the requests and produce them to Kafka.
Example Kafka Producer Code can be copied from tutorialspoint simple producer example*
Here are some open-source projects examples:
1. https://github.com/DataReply/kafka-connect-socket-source
2. https://github.com/kafka-socket/miniature_engine
3. https://github.com/dhanuka84/kafka-connect-tcp
4. https://github.com/krux/tcp-stream-kafka-producer
The idea of Kafka connect is that you have some sort of external integration that serves as storage. This can be SAP, Salesforce, RDBMS, MQ or anything else that has state. You websocket endpoint does not have data, you can not poll it it is someone else that is invoking it and there fore the data is transfered. Now if you know who is actualy holding the data than you can potentialy build a conector using this guide. https://docs.confluent.io/current/connect/devguide.html
For your particular case, the best you can do is either to use Kafka Producer API https://docs.confluent.io/current/clients/producer.html
and from your websocket enpoint use this producer to post a message to the topic, or even better if you are using spring you can use a higher level abstraction, that will be KafkaTemplate https://docs.spring.io/spring-kafka/reference/html/#sending-messages.
Full disclosure: I work for MigratoryData.
You can check out MigratoryData's solution for Kafka. MigratoryData is a scalable WebSocket server. The MigratoryData Source/Sink Connector for Kafka makes use of Kafka Connect API and can be used to stream data in real-time from Kafka to WebSocket clients and vice versa. The main advantage of the solution is it extends Kafka messaging to WebSocket clients while preserving Kafka's key features like guaranteed delivery, message ordering, etc.

Scheduling jobs while consuming Kafka messages

I want build a single Spring Boot application which does multiple different tasks concurrently. I did research on the internet but I could not find any way out. Let me get into detail.
I would like to start jobs in certain intervals for example once a day. I can do it using Spring Quartz. I also would like to listen messages on a dedicated internet address. Messages will come from Apache Kafka platform. Thus, I would like to use Kafka integration for Spring framework.
Is it applicable practically (listening messages all the time and executing scheduled jobs on time)
Functionally speaking, this design is fine: a single Spring Boot app can consume Kafka messages while also executing quartz jobs.
But higher level, you should ask why these two functions belong in a single app. Is there some inherent relationship between the quartz jobs and Kafka messages being consumed? Are you just combining them solely to limit yourself to one app and save on compute/memory resources?
You should also consider the impacts to scalability. What if you need to increase the rate at which you consume Kafka messages? If you scale your app to get more Kafka consumers, you have to worry about multiple apps now firing your quartz jobs.
So yes, it can be done, but without any more detail it sounds like you should break this design into 2 separate applications: one for Quartz and one for Kafka consuming.

How to make Spring kafka client distributed

I have messages coming in from Kafka. So I am planning to write a listener and "onMessage". I want to process it and push it in to solr.
So my question is more architectural, like I have worked on web apps all my career, so in big data how to deploy the spring kafka listener, so I can process thousands of messages a second.
How do I make my spring code use multiple nodes to distribute the
load?
I am planning to write a SpringBoot application to run in
a tomcat container.
If you use the same group id for all instances, different partitions will be assigned to different consumers (instances of your application).
So, be sure that you specified enough partitions in the topic you are going to consume.

Resources