Increase Kafka Streams Consumer Throughput - spark-streaming

I have a Spark Streaming application and a Kafka Streams application running side by side, for benchmarking purposes. Both consume from the same input topic and write to different targets databases. Input topic has 15 partitions, both spark streaming and kafka streams have 15 consumers (1:1 ratio). In addition, event payloads are around 2kb. Not sure if it's relevant, but the 90% percentile Execution time for Spark Streaming is around 9ms. Kafka Streams, 12ms. commit() method is invoked in my Processor every time a message is processed.
The problem relies on high bursts. Spark Streaming can keep up with 700 per second, while Kafka Streams, around 60/70 per second only. I can't go beyond that. See graph below: (Green Line - Spark Streaming / Blue line - Kafka Streams)
As per config below, as long as it doesn't exceed 1000 events per consumer, considering the backpressure, spark streaming can keep up, regardless of the number of bytes per partition. As for Kafka Streams, if I understood its configs correctly (and please keep me honest), based on the same below, I am able to fetch a max of 1000 records (max.poll.records) every 100ms (poll.ms), as long as it doesn't exceed 1MB per partition (max.partition.fetch.bytes) and 50MB per fetch (fetch.max.bytes).
I see the same results (stuck on 70 events per second), regardless if I am using 5, 10 or 15 consumers, which drives me to think it is config related. I tried to tweak these by increasing the number of records per fetch and max bytes per partition, but i didn't get a significant result.
I am aware these are different tech and used for different purposes, but I am wondering what values I should use in Kafka Streams for better throughput.
Spark Streaming config:
spark.batch.duration=10
spark.streaming.backpressure.enabled=true
spark.streaming.backpressure.initialRate=1000
spark.streaming.kafka.maxRatePerPartition=100
Kafka Streams Config (All bytes and timing related)
# Consumer Config
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
heartbeat.interval.ms = 3000
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
request.timeout.ms = 30000
enable.auto.commit = false
# StreamsConfig
poll.ms=100
Processor Code
public class KStreamsMessageProcessor extends AbstractProcessor<String, String> {
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public void process(String key, String payload) {
ResponseEntity responseEntity = null;
try {
// Do Some processing
} catch (final MyException e) {
// Do Some Exception Handling
} finally {
context.forward(UUID.randomUUID().toString(), responseEntity);
context.commit();
}
}
Thanks in advance!

UPDATE
The database in which Kafka Streams was writing to was the big bottleneck here. After we switch it to a better cluster (better hardware, memory, cores, etc), I tuned with the config below and I was able to consume around 2k events per second. Commit interval config was also changed (as per Augusto suggestion) and also used G1GC Garbage collector.
fetch.max.bytes = 52428800
max.partition.fetch.bytes = 1048576
fetch.max.wait.ms = 1000
max.poll.records = 10000
fetch.min.bytes = 100000
enable.auto.commit = false

if I understood its configs correctly (and please keep me honest), based on the same below, I am able to fetch a max of 1000 records (max.poll.records) every 100ms (poll.ms), as long as it doesn't exceed 1MB per partition (max.partition.fetch.bytes) and 50MB per fetch (fetch.max.bytes).
That is not correct. :) max.poll.records specifies how many records may be returned by poll() -- if a single "fetch" to the broker returns more records, the next "poll()" call will be served from the consumer's internal buffer (ie, no network request). max.poll.records is basically a knob to tune you application code, ie, how many record do I want to process before poll() is called again. Calling poll() more frequently makes your application more reactive (for example, a rebalance only happens when poll() is called -- also you need to call poll often even to no violate max.poll.interval.ms).
poll.ms is the maximum blocking time within poll() in case no data is available. This avoids busy waiting. However, if there is data, poll() will return immediately.
Thus, the actual "network throughput" is based on the "fetch request" settings only.

Related

How to limit Message consumption rate of Kafka Consumer in SpringBoot? (Kafka Stream)

I want to limit my Kafka Consumer message consumption rate to 1 Message per 10 seconds .I'm using kafka streams in Spring boot .
Following is the property I tried to Make this work but it didn't worked out s expected(Consumed many messages at once).
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokersUrl);
config.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
//
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,1);
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 10000);
is there any way to Manually ACK(Manual offsetCommits) in KafkaStreams? which will be usefull to control the msg consumption rate .
Please note that i'm using Kstreams(KafkaStreams)
Any help is really appreciated . :)
I think you misunderstand what MAX_POLL_INTERVAL_MS_CONFIG actually does.
That is the max allowed time for the client to read an event.
From docs
controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). The value of the configuration request.timeout.ms (default to 30 seconds) must always be smaller than max.poll.interval.ms(default to 5 minutes), since that is the maximum time that a JoinGroup request can block on the server while the consumer is rebalance
"maximum time" not saying any "delay" between poll invocations.
Kafka Streams will constantly poll; you cannot easily pause/start it and delay record polling.
To read an event every 10 seconds without losing consumers in the group due to lost heartbeats, then you should use Consumer API, with pause() method, call Thread.sleep(Duration.ofSeconds(10)), then resume() + poll() while setting max.poll.records=1
Finally ,I achieved the desired message consuming limit using Thread.sleep().
Since , there is no way to control the message consumption rate using kafka config properties itself . I had to use my application code to control the rate of consumption .
Example: if I want control the record consumption rate say 4 msg per 10 seconds . Then i will just consumer 4 msg (will keep a count parallely) once 4 records are consumer then i will make the thread sleep for 10 seconds and will repeat the same process over again .
I know it's not a good solution but there was no other way.
thank you OneCricketeer

Nifi Group Content by Given Attributes

I am trying to run a script or a custom processor to group data by given attributes every hour. Queue size is up to 30-40k on a single run and it might go up to 200k depending on the case.
MergeContent does not fit since there is no limit on min-max counts.
RouteOnAttribute does not fit since there are too many combinations.
Solution 1: Consume all flow files and group by attributes and create the new flow file and push the new one. Not ideal but gave it a try.
While running this when I had 33k flow files on queue waiting.
session.getQueueSize().getObjectCount()
This number is returning 10k all the time even though I increased the queue threshold numbers on output flows.
Solution 2: Better approach is consume one flow file and and filter flow files matching the provided attributes
final List<FlowFile> flowFiles = session.get(file -> {
if (correlationId.equals(Arrays.stream(keys).map(file::getAttribute).collect(Collectors.joining(":"))))
return FlowFileFilter.FlowFileFilterResult.ACCEPT_AND_CONTINUE;
return FlowFileFilter.FlowFileFilterResult.REJECT_AND_CONTINUE;
});
Again with 33k waiting in the queue I was expecting around 200 new grouped flow files but 320 is created. It looks like a similar issue above and does not scan all waiting flow files on filter query.
Problems-Question:
Is there a parameter to change so this getObjectCount can take up to 300k?
Is there a way to filter all waiting flow files again by changing a parameter or by changing the processor?
I tried making default queue threshold 300k on nifi.properties but it didn't help
in nifi.properties there is a parameter that affects batching behavior
nifi.queue.swap.threshold=20000
here is my test flow:
1. GenerateFlowFile with "batch size = 50K"
2. ExecuteGroovyScript with script below
3. LogAttrribute (disabled) - just to have queue after groovy
groovy script:
def ffList = session.get(100000) // get batch with maximum 100K files from incoming queue
if(!ffList)return
def ff = session.create() // create new empty file
ff.batch_size = ffList.size() // set attribute to real batch size
session.remove(ffList) // drop all incoming batch files
REL_SUCCESS << ff // transfer new file to success
with parameters above there are 4 files generated in output:
1. batch_size = 20000
2. batch_size = 10000
3. batch_size = 10000
4. batch_size = 10000
according to documentation:
There is also the notion of "swapping" FlowFiles. This occurs when the number of FlowFiles in a connection queue exceeds the value set in the nifi.queue.swap.threshold property. The FlowFiles with the lowest priority in the connection queue are serialized and written to disk in a "swap file" in batches of 10,000.
This explains that from 50K incoming files - 20K it keeps inmemory and others in swap batched by 10K.
i don't know how increasing of nifi.queue.swap.threshold property will affect your system performance and memory consumption, but i set it to 100K on my local nifi 1.16.3 and it looks good with multiple small files, and first batch increased to 100K by this.

spring cloud stream kafka batch does not consume messages in every 15 minutes even after increasing this config, 'fetch.max.wait.ms'

I want to consume message in batch mode in every 15 minutes.
For that I have set these properties,
spring.cloud.stream.kafka.binder.consumer-properties.max.poll.records=5000000
spring.cloud.stream.kafka.binder.consumer-properties.fetch.max.wait.ms=900000
spring.cloud.stream.kafka.binder.consumer-properties.fetch.min.bytes=500000000
Consuming message works fine when I set this property spring.cloud.stream.kafka.binder.consumer-properties.fetch.max.wait.ms between 10000 to 30000, 10seconds or 30 seconds.
But If I increase the fetch.max.wait.ms to 1 minutes or more, It doesn't consumes messages even the waiting time is over.
I know the default value is 500ms, but will there be an issue if I increase that??
And How can I get the desired behaviour (consumer to wait for 10-15min before consuming the batch again)??
Can I use max.poll.interval.ms for that?
I was able to consume messages in every 15 minutes by setting these properties.
spring.cloud.stream.kafka.binder.consumer-properties.max.poll.interval.ms=1000000
And
setting idleTime between polls using container property.
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<?,
?>> customizer() {
return (container, dest, group) ->
container.getContainerProperties().setIdleBetweenPolls(idlePollTimeout);
}

How to fix the slow checkpoint with small state size issue?

I have a flink app (flink version is 1.9.2) which enabled checkpoint function. When I run it in the apache flink platform. I always get the checkpoint failed message: Checkpoint expired before completing.After check the threadDumps of the taskManager during a checkpoint, I found that a thread which contains two operators that request external service was always in runnable state. Below are my design of this operator and the checkpoint configuration. Please help advise how to resolve the issue ?
operator design:
public class OperatorA extends RichMapFunction<POJOA, POJOA> {
private Connection connection;
private String getCusipSourceIdPairsQuery;
private String getCusipListQuery;
private MapState<String, List<POJOX>> modifiedCusipState;
private MapState<String, List<POJOX>> bwicMatchedModifiedCusipState;
#Override
public POJOA map(POJOA value) throw Exception {
// create local variable PreparedStatement every time invoke this map method
// update/clear those two MapStates
}
#Override
public void open(Configuration parameters) {
// initialize jdbc connection and TTL MapStates using GlobalJobParameters
}
#Override
public void close() {
// close jdbc connection
}
}
public class OperatorB extends RichMapFunction<POJOA, POJOA> {
private MyServiceA serviceA;
private MyServiceB serviceB;
#Override
public POJOA map(POJOA value) throw Exception {
// call a restful GET API of ServiceB, get a XML response, about 500 fields in the response.
// use serviceA's function to extract the XML document and then populate the value fields.
}
#Override
public void open(Configuration parameters) {
// initialize local jdbc connection and PreparedStatement using globalJobParameters. then use the executed results to initialize serviceA.
// initialize serviceB.
}
}
checkpoint configuration:
Checkpointing Mode Exactly Once
Interval 15m 0s
Timeout 10m 0s
Minimum Pause Between Checkpoints 5m 0s
Maximum Concurrent Checkpoints 1
Persist Checkpoints Externally Disabled
Sample checkpoint history:
ID Status Acknowledged Trigger Time Latest Acknowledgement End to End Duration State Size Buffered During Alignment
20 In Progress 3/12 (25%) 15:03:13 15:04:14 1m 1s 5.65 KB 0 B
19 Failed 3/12 14:48:13 14:50:12 10m 0s 5.65 KB 0 B
18 Failed 3/12 14:33:13 14:34:50 10m 0s 5.65 KB 0 B
17 Failed 4/12 14:18:13 14:27:04 9m 59s 2.91 MB 64.0 KB
16 Failed 3/12 14:03:13 14:05:18 10m 0s 5.65 KB 0 B
Doing any sort of blocking i/o in a Flink user function (e.g., a RichMap or ProcessFunction) is asking for trouble with checkpointing. The reason is that it is very easy to end up with significant backpressure, which then prevents the checkpoint barriers from making sufficiently rapid progress through the execution graph, leading to checkpoint timeouts.
The preferred way to improve on this would be to use async i/o rather than a RichMap. This will allow for there to be more outstanding requests at any given moment (assuming the external service is capable of handling the higher load), and won't leave the operator blocked in user code waiting for responses to synchronous requests -- thereby allowing checkpoints to progress unimpeded.
An alternative would be to increase the parallelism of your cluster, which should reduce the backpressure, but at the expense of tying up more computing resources that won't really doing much other than waiting.
In the worst case, where the external service simply isn't capable of keeping up with your throughput requirements, then backpressure is unavoidable. This then is going to be more difficult to manage, but unaligned checkpoints, coming in Flink 1.11, should help.
Here are some tips that I usually use during locating expired checkpoint problem:
Check Checkpoint UI to know the distribution of subtasks which cause the expiration.
If most subtasks already finish the checkpoint, skip to Tip 3, otherwise skip to Tip 4.
The most possible reason is data skew and the problem subtask receives much more records than other subtasks. If it's not a data skew problem, take a look at the host that the subtask is running on, and check whether there're issue about CPU/MEM/DISK which may slow down the consuming of the subtask.
This situation is relatively rare, and it's usually caused by used codes. For example, user tries to access database in operators but the connection is not stable which slows down the processing.
I recently encountered a similar problem. Suggestions provided by #David Anderson are really good! Nevertheless, I have a few things to add.
You can try to tune your checkpoints according to Apache Flink documentation.
In my case, checkpoint interval was lower than min pause between checkpoints, so I increased it to make it bigger. In my case, I multiplied checkpoint interval by 2 and set this value as min pause between checkpoints.
You can also try to increase checkpoint timeout.
Another issue may be ValueState. My pipleline was keeping state for a long period of time and it wasn't evicted what was causing thoroughput problems. I set TTL for the ValueState (in my case for 30 minutes) and it started to work better. TTL is well described in the Apache Flink documentation. It's really simple and looks like that:
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();
ValueStateDescriptor<String> stateDescriptor = new ValueStateDescriptor<>("text state", String.class);
stateDescriptor.enableTimeToLive(ttlConfig);
It's also worth noticing that this SO thread is related to the similar topic: Flink Checkpoint Failure - Checkpoints time out after 10 mins and tips provided there may be useful.
Regards,
Piotr

Amazon MQ (ActiveMQ) bad performance on large messages

We are migrating from IBM MQ to Amazon MQ, at least we would like to do so. The problem is Amazon MQ is having bad performance when using JMS producer to put a large message on a queue compared to IBM MQ.
All messages are persistent and the system is High Available regarding IBM MQ, and Amazon MQ is multi AZ.
If we put this size of XML files to IBM MQ (2 cpu and 8GB RAM HA instance) we have this performance:
256 KB = 15ms
4,6 MB = 125ms
9,3 MB = 141ms
18,7 MB = 218ms
37,4 MB = 628ms
74,8 MB = 1463ms
If we put the same files on Amazon MQ (mq.m5.2xlarge = 8 CPU and 32 GB RAM) or ActiveMQ we have this performance:
256 KB = 967ms
4,6 MB = 1024ms
9,3 MB = 1828ms
18,7 MB = 3550ms
37,4 MB = 8900ms
74,8 MB = 14405ms
What we also see is that IBM MQ has equal response times for sending a message to a queue and getting a message from a queue, while Amazon MQ is real fast in getting a message (e.g. just takes 1 ms), but very slow on sending.
On Amazon MQ we use the OpenWire protocol. We use this config in Terraform style:
resource "aws_mq_broker" "default" {
broker_name = "bernardamazonmqtest"
deployment_mode = "ACTIVE_STANDBY_MULTI_AZ"
engine_type = "ActiveMQ
engine_version = "5.15.10"
host_instance_type = "mq.m5.2xlarge"
auto_minor_version_upgrade = "false"
apply_immediately = "false"
publicly_accessible = "false"
security_groups = [aws_security_group.pittensbSG-allow-mq-external.id]
subnet_ids = [aws_subnet.pittensbSN-public-1.id, aws_subnet.pittensbSN-public-3.id]
logs {
general = "true"
audit = "true"
}
We use Java 8 with JMS ActiveMQ library via POM (Maven):
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-client</artifactId>
<version>5.15.8</version>
</dependency>
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-pool</artifactId>
<version>5.15.8</version>
</dependency>
In JMS we have this Java code:
private ActiveMQConnectionFactory mqConnectionFactory;
private PooledConnectionFactory mqPooledConnectionFactory;
private Connection connection;
private Session session;
private MessageProducer producer;
private TextMessage textMessage;
private Queue queue;
this.mqConnectionFactory = new ActiveMQConnectionFactory();
this.mqPooledConnectionFactory = new PooledConnectionFactory();
this.mqPooledConnectionFactory.setConnectionFactory(this.mqConnectionFactory);
this.mqConnectionFactory.setBrokerURL("ssl://tag-1.mq.eu-west-1.amazonaws.com:61617");
this.mqPooledConnectionFactory.setMaxConnections(10);
this.connection = mqPooledConnectionFactory.createConnection());
this.connection.start();
this.session = this.connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
this.session.createQueue("ExampleQueue");
this.producer = this.session.createProducer(this.queue);
long startTimeSchrijf = 0;
startTimeWrite= System.currentTimeMillis();
producer.send("XMLFile.xml"); // here we send the files
logger.debug("EXPORTTIJD_PUT - Put to queue takes: " + (System.currentTimeMillis() - startTimeWrite));
// close session, producer and connection after 10 cycles
We also have run the performance test as a Single Instance AmazonMQ. But same results.
We have also run the performance test with a mq.m5.4xlarge (16 cpu, 96 GB RAM) engine but still no improvement of the bad performance.
Performance test configuration:
We first push the messages(XML files) according above one by one to a queue. We do that 5 times. After 5 times we read those messages(XML files) from the queue. We call this 1 cycle.
We run 10 cycles one after another, so in total we have pushed 300 files to the queue and we have getted 300 files from the queue.
We run 3 tests in parallel: One from AWS Region Londen, one from AWS Region Frankfurt in a different VPC and 1 from Frankfurt in the same VPC as the Amazon MQ broker and in the same subnet. Alle clients run on an EC2 instance: m4.xlarge.
If we run a test with only one VPC for example only the local VPC which is in the same subnet as the AmazonMQ broker the performance improves and we have these results:
256 KB = 72ms
4,6 MB = 381ms
9,3 MB = 980ms
18,7 MB = 2117ms
37,4 MB = 3985ms
74,8 MB = 7781ms
The client and server are in the same subnet, so we have nothing to do with firewalling etc.
Maybe somebody can tell me what is wrong, and why we have such a terrible performance with Amazon MQ or ActiveMQ?
extra info:
Response times are measured in the JMS Java app with Java starttime just before the producer.send('XML') and just endtime just after the producer.send('XML'). Difference is the recorded time. Times are average times over 300 calls.
IBM MQ server is located in our datacenter, and client app is running at a server in the same datacenter.
extra info test:
The jms app starts create connectionFactory queues sessions. Then it uploads the files to MQ 1 by 1. This is a cycle, then it run this cycle 10 times in a for lus without opening or closing sessions queues or connectionfactorys. Then all 60 messages are read from queue and written to files on the local drive. Then it closes the connection factory and session and producer/consumer. This is one batch.
Then we run 5 batches. So between the batches connectionFactory, queue, session are recreated.
In response to Sam:
When I also execute the test with the same size of files like you did Sam I approach the same response times, I set the persistence mode also to false value between () :
500 KB = 30ms (6ms)
1 MB = 50ms (13ms)
2 MB = 100ms (24ms)
I removed the connection pooling and I set
concurrentStoreAndDispatchQueues="false"
The system I have used broker: mq.m5.2xlarge and client: m4.xlarge.
But if I test with bigger files, this are the response times:
256 KB = 72ms
4,6 MB = 381ms
9,3 MB = 980ms
18,7 MB = 2117ms
37,4 MB = 3985ms
74,8 MB = 7781ms
I am having a very simple requirement. I have a system what puts messages on a queue and the messages are get from the queue by another system, sometimes at the same time sometimes not, sometimes there are 20 or 30 messages on the system before they get unloaded. Thats why I need a queue and messages must be persistent and it must be a Java JMS implementation.
I think Amazon MQ might be a solution for small files but for big files it is not. I think we have to use IBM MQ for this case which has better performance. But one important thing: I tested IBM MQ only on premis in our LAN. We tried to test IBM MQ on Amazon but we didn't succeed yet.
I tried to reproduce the scenario you were testing. When I ran a JMS client in the same VPC as the AmazonMQ broker for mq.m5.4xlarge broker with an Active and Standby instance, I see the following roundtrip latencies - measuring the moment from which a producer sends a message to the moment when consumer receives the message.
2MB - 50ms
1MB - 31ms
500KB - 15ms
My code just created a connection and a session. I did not use a PooledConnectionFactory (stating this as a matter of fact, not saying/suspecting that's the cause). Also it is better to strip down the code to bare minimum in order to establish a baseline and remove noise when doing performance testing. That way, when you introduce additional code, you can easily see if the new code introduced a performance issue. I used the default broker configuration.
In ActiveMQ, there is a concept of Fast Producer and Fast Consumer, this means, if consumer can process the messages at the same rate as the Producer, the broker transfers the message from producer to consumer via memory and then it writes the message to disk. This is the default behavior and is controlled by a broker configuration setting named concurrentStoreAndDispatch which is true (default)
If consumer is unable to keep up with producer, and thus becomes a "slow" consumer and with the concurrentStoreAndDispatch flag set to true, you take a performance hit.
ActiveMQ provides advisory topics which you can subscribe to detect slow consumers. If in fact, you detected that the consumer is slower than the producer, it is better to set concurrentStoreAndDispatch flag to false to get better performance.
I don't get any response.
I think its because there is no solution for this performance problem. Amazon MQ is a cloud service and mabye thats the reason why performance is this bad.
IBM MQ is a different architecture, and it is on premise.
I have to investigate the performance of ActiveMQ some more before I can tell what exactly the reason is for this problem.

Resources