Hazelcast avg get latency is > 2 ms for 18k/s throughput - caching

Background: We are working on evaluating hazelcast which can act as an alternative of Redis.
Setup :
3 members in a cluster under a single subnet (production boxes). Each member has ~1.4GB of data.
Near caching is off.
Each member has 1 backup.
Code deployed by preparing a spring boot jar and cache is implemented as embedded one.
VM config : 8C, 31GB RAM
code uses IMAP to retrieve and put the keys in the cache.
LoadTest : Attempted 18K/s rest API calls to read the data.
But hazelcast is showing avg get latency of around 3-4ms which I feel should be in microsecond as we have been already seeing that much of get command latency with redis setup.
CPU Load was ~95% during this test.
A member which gave this latency has heap usage of ~60% (committed: 7.85GB, used: 4.68GB). It is though with all the members in the cluster.
Need help to understand that is my configuration somewhere wrong, because of which I am NOT able to achieve get latency in microseconds?
Config for starting embedded cache:
config.addMapConfig(mapConfig());
NetworkConfig networkConfig = config.getNetworkConfig();
JoinConfig join = networkConfig.getJoin();
join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().setEnabled(true).setMembers(
Arrays.asList(
"ip1:5701",
"ip2:5701",
"ip3:5701"
)
);
return config;```

Related

Infinispan clustered REPL_ASYNC cache: command indefinitely bounced between two nodes

Im running a spring boot application using infinispan 10.1.8 in a 2 node cluster. The 2 nodes are communicating via jgroups TCP. I configured several REPL_ASYNC.
The problem:
One of these caches, at some point is causing the two nodes to exchange the same message over and over, causing high CPU and memory usage. The only way to stop this is to stop one of the two nodes.
More details, here is the configuration.
org.infinispan.configuration.cache.Configuration replAsyncNoExpirationConfiguration = new ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.REPL_ASYNC)
.transaction()
.lockingMode(LockingMode.OPTIMISTIC)
.transactionMode(TransactionMode.NON_TRANSACTIONAL)
.statistics().enabled(cacheInfo.isStatsEnabled())
.locking()
.concurrencyLevel(32)
.lockAcquisitionTimeout(15, TimeUnit.SECONDS)
.isolationLevel(IsolationLevel.READ_COMMITTED)
.expiration()
.lifespan(-1) //entries do not expire
.maxIdle(-1) // even when they are idle for some time
.wakeUpInterval(-1) // disable the periodic eviction process
.build();
One of these caches (named formConfig) is causing me abnormal communication between the two nodes, this is what happens:
with jmeter I generate traffic load targeting only node 1
for some time node 2 receives cache entries from node 1 via SingleRpcCommand, no anomalies, even formConfig cache behaves properly
after some time a new cache entry is sent to the formConfig cache
At this point the same message seems to keep bouncing between the two nodes:
node 1 sends entry mn-node1.company.acme-develop sending command to all: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 2 receives the entry mn-node2.company.acme-develop received command from mn-node1.company.acme-develop: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 2 sends the entry back to node 1 mn-node2.company.acme-develop sending command to all: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 1 receives the entry mn-node1.company.acme-develop received command from mn-node2.company.acme-develop: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850],
node 1 sends the entry to node 2 and so on and on...
Some other things:
the system is not under load, jmeter is running only few users in parallel
Even stopping jmeter this loop doesn't stop
formConfig is the only cache that behaves this way. All the other REPL_ASYNC caches work properly. I deactivated only formConfig cache and the system is working correctly.
I cannot reproduce the problem with two nodes running on my machine
Here's a more complete log file including logs from both nodes.
Other infos:
opendjdk 11 hot spot
spring boot 2.2.7
infinispan spring boot starter 2.2.4
using JbossUserMarshaller
I'm suspecting
something related to transactional configuration
or something related to serialization/deserialization of the cached object
The only scenario where this can happen is when the SimpleKey has different hashCode().
Are there any exceptions in the log? Are you able to check if the hashCode() is the same after serialization & deserialization of the key?

Amazon MQ (ActiveMQ) bad performance on large messages

We are migrating from IBM MQ to Amazon MQ, at least we would like to do so. The problem is Amazon MQ is having bad performance when using JMS producer to put a large message on a queue compared to IBM MQ.
All messages are persistent and the system is High Available regarding IBM MQ, and Amazon MQ is multi AZ.
If we put this size of XML files to IBM MQ (2 cpu and 8GB RAM HA instance) we have this performance:
256 KB = 15ms
4,6 MB = 125ms
9,3 MB = 141ms
18,7 MB = 218ms
37,4 MB = 628ms
74,8 MB = 1463ms
If we put the same files on Amazon MQ (mq.m5.2xlarge = 8 CPU and 32 GB RAM) or ActiveMQ we have this performance:
256 KB = 967ms
4,6 MB = 1024ms
9,3 MB = 1828ms
18,7 MB = 3550ms
37,4 MB = 8900ms
74,8 MB = 14405ms
What we also see is that IBM MQ has equal response times for sending a message to a queue and getting a message from a queue, while Amazon MQ is real fast in getting a message (e.g. just takes 1 ms), but very slow on sending.
On Amazon MQ we use the OpenWire protocol. We use this config in Terraform style:
resource "aws_mq_broker" "default" {
broker_name = "bernardamazonmqtest"
deployment_mode = "ACTIVE_STANDBY_MULTI_AZ"
engine_type = "ActiveMQ
engine_version = "5.15.10"
host_instance_type = "mq.m5.2xlarge"
auto_minor_version_upgrade = "false"
apply_immediately = "false"
publicly_accessible = "false"
security_groups = [aws_security_group.pittensbSG-allow-mq-external.id]
subnet_ids = [aws_subnet.pittensbSN-public-1.id, aws_subnet.pittensbSN-public-3.id]
logs {
general = "true"
audit = "true"
}
We use Java 8 with JMS ActiveMQ library via POM (Maven):
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-client</artifactId>
<version>5.15.8</version>
</dependency>
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-pool</artifactId>
<version>5.15.8</version>
</dependency>
In JMS we have this Java code:
private ActiveMQConnectionFactory mqConnectionFactory;
private PooledConnectionFactory mqPooledConnectionFactory;
private Connection connection;
private Session session;
private MessageProducer producer;
private TextMessage textMessage;
private Queue queue;
this.mqConnectionFactory = new ActiveMQConnectionFactory();
this.mqPooledConnectionFactory = new PooledConnectionFactory();
this.mqPooledConnectionFactory.setConnectionFactory(this.mqConnectionFactory);
this.mqConnectionFactory.setBrokerURL("ssl://tag-1.mq.eu-west-1.amazonaws.com:61617");
this.mqPooledConnectionFactory.setMaxConnections(10);
this.connection = mqPooledConnectionFactory.createConnection());
this.connection.start();
this.session = this.connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
this.session.createQueue("ExampleQueue");
this.producer = this.session.createProducer(this.queue);
long startTimeSchrijf = 0;
startTimeWrite= System.currentTimeMillis();
producer.send("XMLFile.xml"); // here we send the files
logger.debug("EXPORTTIJD_PUT - Put to queue takes: " + (System.currentTimeMillis() - startTimeWrite));
// close session, producer and connection after 10 cycles
We also have run the performance test as a Single Instance AmazonMQ. But same results.
We have also run the performance test with a mq.m5.4xlarge (16 cpu, 96 GB RAM) engine but still no improvement of the bad performance.
Performance test configuration:
We first push the messages(XML files) according above one by one to a queue. We do that 5 times. After 5 times we read those messages(XML files) from the queue. We call this 1 cycle.
We run 10 cycles one after another, so in total we have pushed 300 files to the queue and we have getted 300 files from the queue.
We run 3 tests in parallel: One from AWS Region Londen, one from AWS Region Frankfurt in a different VPC and 1 from Frankfurt in the same VPC as the Amazon MQ broker and in the same subnet. Alle clients run on an EC2 instance: m4.xlarge.
If we run a test with only one VPC for example only the local VPC which is in the same subnet as the AmazonMQ broker the performance improves and we have these results:
256 KB = 72ms
4,6 MB = 381ms
9,3 MB = 980ms
18,7 MB = 2117ms
37,4 MB = 3985ms
74,8 MB = 7781ms
The client and server are in the same subnet, so we have nothing to do with firewalling etc.
Maybe somebody can tell me what is wrong, and why we have such a terrible performance with Amazon MQ or ActiveMQ?
extra info:
Response times are measured in the JMS Java app with Java starttime just before the producer.send('XML') and just endtime just after the producer.send('XML'). Difference is the recorded time. Times are average times over 300 calls.
IBM MQ server is located in our datacenter, and client app is running at a server in the same datacenter.
extra info test:
The jms app starts create connectionFactory queues sessions. Then it uploads the files to MQ 1 by 1. This is a cycle, then it run this cycle 10 times in a for lus without opening or closing sessions queues or connectionfactorys. Then all 60 messages are read from queue and written to files on the local drive. Then it closes the connection factory and session and producer/consumer. This is one batch.
Then we run 5 batches. So between the batches connectionFactory, queue, session are recreated.
In response to Sam:
When I also execute the test with the same size of files like you did Sam I approach the same response times, I set the persistence mode also to false value between () :
500 KB = 30ms (6ms)
1 MB = 50ms (13ms)
2 MB = 100ms (24ms)
I removed the connection pooling and I set
concurrentStoreAndDispatchQueues="false"
The system I have used broker: mq.m5.2xlarge and client: m4.xlarge.
But if I test with bigger files, this are the response times:
256 KB = 72ms
4,6 MB = 381ms
9,3 MB = 980ms
18,7 MB = 2117ms
37,4 MB = 3985ms
74,8 MB = 7781ms
I am having a very simple requirement. I have a system what puts messages on a queue and the messages are get from the queue by another system, sometimes at the same time sometimes not, sometimes there are 20 or 30 messages on the system before they get unloaded. Thats why I need a queue and messages must be persistent and it must be a Java JMS implementation.
I think Amazon MQ might be a solution for small files but for big files it is not. I think we have to use IBM MQ for this case which has better performance. But one important thing: I tested IBM MQ only on premis in our LAN. We tried to test IBM MQ on Amazon but we didn't succeed yet.
I tried to reproduce the scenario you were testing. When I ran a JMS client in the same VPC as the AmazonMQ broker for mq.m5.4xlarge broker with an Active and Standby instance, I see the following roundtrip latencies - measuring the moment from which a producer sends a message to the moment when consumer receives the message.
2MB - 50ms
1MB - 31ms
500KB - 15ms
My code just created a connection and a session. I did not use a PooledConnectionFactory (stating this as a matter of fact, not saying/suspecting that's the cause). Also it is better to strip down the code to bare minimum in order to establish a baseline and remove noise when doing performance testing. That way, when you introduce additional code, you can easily see if the new code introduced a performance issue. I used the default broker configuration.
In ActiveMQ, there is a concept of Fast Producer and Fast Consumer, this means, if consumer can process the messages at the same rate as the Producer, the broker transfers the message from producer to consumer via memory and then it writes the message to disk. This is the default behavior and is controlled by a broker configuration setting named concurrentStoreAndDispatch which is true (default)
If consumer is unable to keep up with producer, and thus becomes a "slow" consumer and with the concurrentStoreAndDispatch flag set to true, you take a performance hit.
ActiveMQ provides advisory topics which you can subscribe to detect slow consumers. If in fact, you detected that the consumer is slower than the producer, it is better to set concurrentStoreAndDispatch flag to false to get better performance.
I don't get any response.
I think its because there is no solution for this performance problem. Amazon MQ is a cloud service and mabye thats the reason why performance is this bad.
IBM MQ is a different architecture, and it is on premise.
I have to investigate the performance of ActiveMQ some more before I can tell what exactly the reason is for this problem.

AutoML: out of memory on small training file

I am attempting to run H2OAutoML on a 2.7MB training CSV on a system with 4GB RAM using the python API and it is running out of memory.
The error messages I am encountering are either:
h2o_ubuntu_started_from_python.out:
02-17 17:57:25.063 127.0.0.1:54321 27097 FJ-3-15 INFO: Stopping XGBoost training because of timeout
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 247463936 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ubuntu/h20.ai/h2o-3.28.0.2/hs_err_pid27097.log
or
03:37:07.509: XRT_1_AutoML_20200217_030816 [DRF XRT (Extremely Randomized Trees)] failed: java.lang.OutOfMemoryError: Java heap space
in the output of the python depending on the exact crash instance I look at.
My init is:
h2o.init(max_mem_size='3G',min_mem_size='2G',jvm_custom_args=["-Xmx3g"])
Though I have tried with:
h2o.init()
My H2OAutoML call is:
H2OAutoML(nfolds=5,max_models=20, max_runtime_secs_per_model=600, seed=1,project_name =project_name)
aml.train(x=x, y=y, training_frame=train,validation_frame=test)
These are the server stats:
H2O cluster uptime: 02 secs
H2O cluster timezone: Etc/UTC
H2O data parsing timezone: UTC
H2O cluster version: 3.28.0.2
H2O cluster version age: 27 days
H2O cluster name: H2O_from_python_ubuntu_htq5aj
H2O cluster total nodes: 1
H2O cluster free memory: 3 Gb
H2O cluster total cores: 2
H2O cluster allowed cores: 2
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy: {'http': None, 'https': None}
H2O internal security: False
H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python version: 3.6.9 final
Does this sound right? Am I not able to run 20 models?
I can run this just fine setting the max_models=10. This takes about 60 min.
Are there guidelines for the amount of RAM needed for a given max_models and filesize?
Connect to the Flow interface, running at 127.0.0.1:54321.
There is a section there where you can view the remaining memory. You can also see what models and data frames are being created. You have max_runtime_secs_per_model set to 600, and say 10 models takes an hour, so if you check in every 5-10 minutes, you can get an idea of how much memory each model is taking up.
Your h2o.init() response looks fine. The guideline was to have 3-4 times the dataset size free. If your data is only 2.7MB, then this should not be a concern. Though if you have a lot of categorical columns, especially with a lot of choices, then they can take up more memory than you expect.
The memory used by a model can vary quite a lot, depending on the parameters chosen. Again, it is best to look on Flow, to see what parameters AutoML is choosing for you.
If it is simply the case that 10 models will fit in memory, and 20 models won't, and you don't want to take manual control of the parameters, then you could do batches of 10 models, and save after each hour. (Choose a different seed for each run.)

Elasticsearch 7.x circuit breaker - data too large - troubleshoot

The problem:
Since the upgrading from ES-5.4 to ES-7.2 I started getting "data too large" errors, when trying to write concurrent bulk request (or/and search requests) from my multi-threaded Java application (using elasticsearch-rest-high-level-client-7.2.0.jar java client) to an ES cluster of 2-4 nodes.
My ES configuration:
Elasticsearch version: 7.2
custom configuration in elasticsearch.yml:
thread_pool.search.queue_size = 20000
thread_pool.write.queue_size = 500
I use only the default 7.x circuit-breaker values, such as:
indices.breaker.total.limit = 95%
indices.breaker.total.use_real_memory = true
network.breaker.inflight_requests.limit = 100%
network.breaker.inflight_requests.overhead = 2
The error from elasticsearch.log:
{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [3144831050/2.9gb], which is larger than the limit of [3060164198/2.8gb], real usage: [3144829848/2.9gb], new bytes reserved: [1202/1.1kb]",
"bytes_wanted": 3144831050,
"bytes_limit": 3060164198,
"durability": "PERMANENT"
}
],
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [3144831050/2.9gb], which is larger than the limit of [3060164198/2.8gb], real usage: [3144829848/2.9gb], new bytes reserved: [1202/1.1kb]",
"bytes_wanted": 3144831050,
"bytes_limit": 3060164198,
"durability": "PERMANENT"
},
"status": 429
}
Thoughts:
I'm having hard time to pin point the source of the issue.
When using ES cluster nodes with <=8gb heap size (on a <=16gb vm), the problem become very visible, so, one obvious solution is to increase the memory of the nodes.
But I feel that increasing the memory only hides the issue.
Questions:
I would like to understand what scenarios could have led to this error?
and what action can I take in order to handle it properly?
(change circuit-breaker values, change es.yml configuration, change/limit my ES requests)
The reason is that the heap of the node is pretty full and being caught by the circuit breaker is nice because it prevents the nodes from running into OOMs, going stale and crash...
Elasticsearch 6.2.0 introduced the circuit breaker and improved it in 7.0.0. With the version upgrade from ES-5.4 to ES-7.2, you are running straight into this improvement.
I see 3 solutions so far:
Increase heap size if possible
Reduce the size of your bulk requests if feasible
Scale-out your cluster as the shards are consuming a lot of heap, leaving nothing to process the large request. More nodes will help the cluster to distribute the shards and requests among more nodes, what leads to a lower AVG heap usage on all nodes.
As an UGLY workaround (not solving the issue) one could increase the limit after reading and understanding the implications:
So I've spent some time researching how exactly ES implemented the new circuit breaker mechanism, and tried to understand why we are suddenly getting those errors?
the circuit breaker mechanism exists since the very first versions.
we started experience issues around it when moving from version 5.4 to 7.2
in version 7.2 ES introduced a new way for calculating circuit-break: Circuit-break based on real memory usage (why and how: https://www.elastic.co/blog/improving-node-resiliency-with-the-real-memory-circuit-breaker, code: https://github.com/elastic/elasticsearch/pull/31767)
In our internal upgrade of ES to version 7.2, we changed the jdk from 8 to 11.
also as part of our internal upgrade we changed the jvm.options default configuration, switching the official recommended CMS GC with the G1GC GC which have a fairly new support by elasticsearch.
considering all the above, I found this bug that was fixed in version 7.4 regarding the use of circuit-breaker together with the G1GC GC: https://github.com/elastic/elasticsearch/pull/46169
How to fix:
change the configuration back to CMS GC.
or, take the fix. the fix for the bug is just a configuration change that can be easily changed and tested in your deployment.

Spark job just hangs with large data

I am trying to query from s3 (15 days of data). I tried querying them separately (each day) it works fine. It works fine for 14 days as well. But when I query 15 days the job keeps running forever (hangs) and the task # is not updating.
My settings :
I am using 51 node cluster r3.4x large with dynamic allocation and maximum resource turned on.
All I am doing is =
val startTime="2017-11-21T08:00:00Z"
val endTime="2017-12-05T08:00:00Z"
val start = DateUtils.getLocalTimeStamp( startTime )
val end = DateUtils.getLocalTimeStamp( endTime )
val days: Int = Days.daysBetween( start, end ).getDays
val files: Seq[String] = (0 to days)
.map( start.plusDays )
.map( d => s"$input_path${DateTimeFormat.forPattern( "yyyy/MM/dd" ).print( d )}/*/*" )
sqlSession.sparkContext.textFile( files.mkString( "," ) ).count
When I run the same with 14 days, I got 197337380 (count) and I ran the 15th day separately and got 27676788. But when I query 15 days total the job hangs
Update :
The job works fine with :
var df = sqlSession.createDataFrame(sc.emptyRDD[Row], schema)
for(n <- files ){
val tempDF = sqlSession.read.schema( schema ).json(n)
df = df(tempDF)
}
df.count
But can some one explain why it works now but not before ?
UPDATE : After setting mapreduce.input.fileinputformat.split.minsize to 256 GB it works fine now.
Dynamic allocation and maximize resource allocation are both different settings, one would be disabled when other is active. With Maximize resource allocation in EMR, 1 executor per node is launched, and it allocates all the cores and memory to that executor.
I would recommend taking a different route. You seem to have a pretty big cluster with 51 nodes, not sure if it is even required. However, follow this rule of thumb to begin with, and you will get a hang of how to tune these configurations.
Cluster memory - minimum of 2X the data you are dealing with.
Now assuming 51 nodes is what you require, try below:
r3.4x has 16 CPUs - so you can put all of them to use by leaving one for the OS and other processes.
Set your number of executors to 150 - this will allocate 3 executors per node.
Set number of cores per executor to 5 (3 executors per node)
Set your executor memory to roughly total host memory/3 = 35G
You got to control the parallelism (default partitions), set this to number of total cores you have ~ 800
Adjust shuffle partitions - make this twice of number of cores - 1600
Above configurations have been working like a charm for me. You can monitor the resource utilization on Spark UI.
Also, in your yarn config /etc/hadoop/conf/capacity-scheduler.xml file, set yarn.scheduler.capacity.resource-calculator to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator - which will allow Spark to really go full throttle with those CPUs. Restart yarn service after change.
You should be increasing the executor memory and # executors, If the data is huge try increasing the Driver memory.
My suggestion is to not use the dynamic resource allocation and let it run and see if it still hangs or not (Please note that spark job can consume entire cluster resources and make other applications starve for resources try this approach when no jobs are running). if it doesn't hang that means you should play with the resource allocation, then start hardcoding the resources and keep increasing resources so that you can find the best resource allocation you can possibly use.
Below links can help you understand the resource allocation and optimization of resources.
http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz.html

Resources