IBM MQ tuning for tranfer a large number of file - websphere

I have a project to transfer file using IBM MQ. There are 10000 clients and one data center. The largest file size is almost 8MB. The MQ cluster contains three MQ managers which are at different Windows server. Each MQ manager have 5 channels for client and 5 channel for data center. There are two cases for testing. Clients are evenly distributed to MQ manager in each case. Do not lose any file is the most important thing in these cases.
Case 1:
Every client send 50 files to data center at the same time. The files size are between 150KB to 5MB.
In this case, the sum of file size one client send is almost 80MB.
Case 2 :
Data center send the 10 identical files to every client at the same time. In this case, I create a topic named `myTopic` and 10000 clients subscribe this topic. Data center send 10 identical files to the topic.
MQ Mangers have a heavy load. I already set some attribute in IBM MQ:
Queue Manager:
Max handles: 100000
Maximum message length: 100MB
Max channels: 10000
Max channels: 10000
Is there any attribute that could increase the performance?
5/11 update:
First, I have modified the situation of case 2 above. I have a data center server that has a 4 core CPU and 32G RAM. I use 4 clients server to simulate 10000 clients, and each client server has 4 core CPU and 16G RAM.
In case 1, it take about 37 minutes when 1000 clients send files to the data center. There are not enough memory on data center server when data center receive files from 2000 clients. I find there are 20G memory used for buffer/cache. Here is my java code used to receive files:
try {
String filePath = ConfigReader.getInstance().getConfig("filePath");
MQMessage mqMsg = new MQMessage();
mqMsg.messageId = CMQC.MQMI_NONE;
mqMsg.correlationId = CMQC.MQCI_NONE;
mqMsg.groupId = CMQC.MQGI_NONE;
int flag = 1;
while (true) {
try {
MQQueueManager queueManager = new MQQueueManager("QMGR1");
int option = CMQC.MQTOPIC_OPEN_AS_SUBSCRIPTION | CMQC.MQSO_DURABLE;
MQTopic subscriber = queueManager.accessTopic("", "myTopic", option, null, "datacenter");
subscriber.get(mqMsg);
if (mqMsg.getDataLength() != 0) {
String fileName = filePath + "_file" + flag + ".txt";
byte[] b = new byte[mqMsg.getDataLength()];
mqMsg.readFully(b);
System.out.println("Receive " + fileName + ", complete time: " + System.currentTimeMillis());
Path path = Paths.get(fileName);
System.out.println("Write " + fileName + ", start time: " + System.currentTimeMillis());
Files.write(path, b);
System.out.println("Write " + fileName + ", complete time: " + System.currentTimeMillis());
flag++;
}
} catch (MQException e) {
// e.printStackTrace();
if (e.reasonCode != 2033) {
e.printStackTrace();
}
} finally {
mqMsg.clearMessage();
mqMsg.messageId = CMQC.MQMI_NONE;
mqMsg.correlationId = CMQC.MQCI_NONE;
mqMsg.groupId = CMQC.MQGI_NONE;
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I use byte array to read message and write it to disk. Is it possible that the byte array does not release memory and takes 20G memory?
In case 2, I find if I send a 5MB file to myTopic that has 1000 subscribers on MQ manager01, MQ manager01 take a lot of time to sync with cluster member. The disks on the MQ servers are very busy. There are another problem: Sometimes I get only 7 seconds to send a 5MB file, sometimes it takes 90 seconds. Here is my java code to send files:
try {
MQQueueManager queueManager = new MQQueueManager("QMGR1");
MQTopic publisher = queueManager.accessTopic("myTopic", "", CMQC.MQTOPIC_OPEN_AS_PUBLICATION,
CMQC.MQOO_OUTPUT);
System.out.println("---- start publish , time: " + System.currentTimeMillis() + " ----");
publisher.put(InMemoryDataProvider.getInstance().getMessage("my5MBFile"));
System.out.println("---- end publish , time: " + System.currentTimeMillis() + " ----");
publish.getPublisher().close();
} catch (MQException e) {
System.out.println("threadNum: " + publish.getThreadNo() + " publish error");
if (e.reasonCode != 2033) {
e.printStackTrace();
}
}

A couple of things.
MQ has FTE which transfers files for you. I think it does it using non persistent messages, so you avoid the disk overhead.
You might try checking your .ini files for parameters like ClntRcvBuffSize=0
see here.
0 says use the operating system values.
TCP used to send some data in short packets (64KB chunk), then wait till the packets have been acknowledged, and send more. If the connection is reliable, then you get higher throughput by sending bigger logical packets, a technique known as Dynamic Right Sizing. See here
it works best when the connection is long lived and sending a lot if data. For example the first few chunks may be 64KB, then increase it a bit to 128KB chunks, eventually up to 100MB ( or more) if needed.
You need to set both ends.
Depending on platform, you can use Netstat replacement ss command to display the various window sizes.
For your QM to QM channels specify a large batchsz and batchlim - though this may make your disk IO worse as the data gets to the remote end faster.

Related

how to read messages by packet with JmsTemplate of Sring

i have an outOfMemoryException while reading messages from a queue with 2 M of messages.
and i am trying to find a way to read messgages by 1000 for example .
here is my code
List<TextMessage> messages = jmsTemplate.browse(JndiQueues.BACKOUT, (session,browser) -> {
Enumeration<?> browserEnumeration = browser.getEnumeration().;
List<TextMessage> messageList = new ArrayList<TextMessage>();
while (browserEnumeration.hasMoreElements()) {
messageList.add((TextMessage) browserEnumeration.nextElement());
}
return messageList;
});
thanks
Perform the browse on a different thread and pass the results subset to the main thread via a BlockingQueue<List<TextMessage>>. e.g. a LinkedBlockingQueue with a small capacity.
The browsing thread will block when the queue is full. When the main thread removes an entry from the queue, the browser can add a new one.
Probably makes sense to have a capacity at least 2 so that the next list is available immediately.

Spark Streaming: Micro batches Parallel Execution

We are receiving data in spark streaming from Kafka. Once execution has been started in Spark Streaming, it executes only one batch and the remaining batches starting queuing up in Kafka.
Our data is independent and can be processes in Parallel.
We tried multiple configurations with multiple executor, cores, back pressure and other configurations but nothing worked so far. There are a lot messages queued and only one micro batch has been processed at a time and rest are remained in queue.
We want to achieve parallelism at maximum, so that not any micro batch is queued, as we have enough resources available. So how we can reduce time by maximum utilization of resources.
// Start reading messages from Kafka and get DStream
final JavaInputDStream<ConsumerRecord<String, byte[]>> consumerStream = KafkaUtils.createDirectStream(
getJavaStreamingContext(), LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, byte[]>Subscribe("TOPIC_NAME",
sparkServiceConf.getKafkaConsumeParams()));
ThreadContext.put(Constants.CommonLiterals.LOGGER_UID_VAR, CommonUtils.loggerUniqueId());
JavaDStream<byte[]> messagesStream = consumerStream.map(new Function<ConsumerRecord<String, byte[]>, byte[]>() {
private static final long serialVersionUID = 1L;
#Override
public byte[] call(ConsumerRecord<String, byte[]> kafkaRecord) throws Exception {
return kafkaRecord.value();
}
});
// Decode each binary message and generate JSON array
JavaDStream<String> decodedStream = messagesStream.map(new Function<byte[], String>() {
private static final long serialVersionUID = 1L;
#Override
public String call(byte[] asn1Data) throws Exception {
if(asn1Data.length > 0) {
try (InputStream inputStream = new ByteArrayInputStream(asn1Data);
Writer writer = new StringWriter(); ) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(asn1Data);
GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream);
byte[] buffer = new byte[1024];
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int len;
while((len = gzipInputStream.read(buffer)) != -1) {
byteArrayOutputStream.write(buffer, 0, len);
}
return new String(byteArrayOutputStream.toByteArray());
} catch (Exception e) {
//
producer.flush();
throw e;
}
}
return null;
}
});
// publish generated json gzip to kafka
cache.foreachRDD(new VoidFunction<JavaRDD<String>>() {
private static final long serialVersionUID = 1L;
#Override
public void call(JavaRDD<String> jsonRdd4DF) throws Exception {
//Dataset<Row> json = sparkSession.read().json(jsonRdd4DF);
if(!jsonRdd4DF.isEmpty()) {
//JavaRDD<String> jsonRddDF = getJavaSparkContext().parallelize(jsonRdd4DF.collect());
Dataset<Row> json = sparkSession.read().json(jsonRdd4DF);
SparkAIRMainJsonProcessor airMainJsonProcessor = new SparkAIRMainJsonProcessor();
airMainJsonProcessor.processAIRData(json, sparkSession);
}
}
});
getJavaStreamingContext().start();
getJavaStreamingContext().awaitTermination();
getJavaStreamingContext().stop();
Technology that we are using:
HDFS 2.7.1.2.5
YARN + MapReduce2 2.7.1.2.5
ZooKeeper 3.4.6.2.5
Ambari Infra 0.1.0
Ambari Metrics 0.1.0
Kafka 0.10.0.2.5
Knox 0.9.0.2.5
Ranger 0.6.0.2.5
Ranger KMS 0.6.0.2.5
SmartSense 1.3.0.0-1
Spark2 2.0.x.2.5
Statistics that we got from difference experimentations:
Experiment 1
num_executors=6
executor_memory=8g
executor_cores=12
100 Files processing time 48 Minutes
Experiment 2
spark.default.parallelism=12
num_executors=6
executor_memory=8g
executor_cores=12
100 Files processing time 8 Minutes
Experiment 3
spark.default.parallelism=12
num_executors=6
executor_memory=8g
executor_cores=12
100 Files processing time 7 Minutes
Experiment 4
spark.default.parallelism=16
num_executors=6
executor_memory=8g
executor_cores=12
100 Files processing time 10 Minutes
Please advise, how we can process maximum so no queued.
I was facing same issue and I tried a few things in trying to resolve the issue and came to following findings:
First of all. Intuition says that one batch must be processed per executor but on the contrary, only one batch is processed at a time but jobs and tasks are processed in parallel.
Multiple batch processing can be achieved by using spark.streaming.concurrentjobs, but it's not documented and still needs a few fixes. One of problems is with saving Kafka offsets. Suppose we set this parameter to 4 and 4 batches are processed in parallel, what if 3rd batch finishes before 4th one, which Kafka offsets would be committed. This parameter is quite useful if batches are independent.
spark.default.parallelism because of its name is sometimes considered to make things parallel. But its true benefit is in distributed shuffle operations. Try different numbers and find an optimum number for this. You will get a considerable difference in processing time. It depends upon shuffle operations in your jobs. Setting it too high would decrease the performance. It's apparent from you experiments results too.
Another option is to use foreachPartitionAsync in place of foreach on RDD. But I think foreachPartition is better as foreachPartitionAsync would queue up the jobs whereas batches would appear to be processed but their jobs would still be in the queue or in processing. May be I didn't get its usage right. But it behaved same in my 3 services.
FAIR spark.scheduler.mode must be used for jobs with lots of tasks as round-robin assignment of tasks to jobs, gives opportunity to smaller tasks to start receiving resources while bigger tasks are processing.
Try to tune your batch duration+input size and always keep it below processing duration otherwise you're gonna see a long backlog of batches.
These are my findings and suggestions, however, there are so many configurations and methods to do streaming and often one set of operation doesn't work for others. Spark Streaming is all about learning, putting your experience and anticipation together to get to a set of optimum configuration.
Hope it helps. It would be a great relief if someone could tell specifically how we can legitimately process batches in parallel.
We want to achieve parallelism at maximum, so that not any micro batch is queued
That's the thing about stream processing: you process the data in the order it was received. If you process your data at the rate slower than it arrives it will be queued. Also, don't expect that processing of one record will suddenly be parallelized across multiple nodes.
From your screenshot, it seems your batch time is 10 seconds and your producer published 100 records over 90 seconds.
It took 36s to process 2 records and 70s to process 17 records. Clearly, there is some per-batch overhead. If this dependency is linear, it would take only 4:18 to process all 100 records in a single mini-batch thus beating your record holder.
Since your code is not complete, it's hard to tell what exactly takes so much time. Transformations in the code look fine but probably the action (or subsequent transformations) are the real bottlenecks. Also, what's with producer.flush() which wasn't mentioned anywhere in your code?
I was facing the same issue and I solved it using Scala Futures.
Here are some link that show how to use it:
https://alvinalexander.com/scala/how-use-multiple-scala-futures-in-for-comprehension-loop
https://www.beyondthelines.net/computing/scala-future-and-execution-context/
Also, this is piece of my code when I used Scala Futures:
messages.foreachRDD{ rdd =>
val f = Future {
// sleep(100)
val newRDD = rdd.map{message =>
val req_message = message.value()
(message.value())
}
println("Request messages: " + newRDD.count())
var resultrows = newRDD.collect()//.collectAsList()
processMessage(resultrows, mlFeatures: MLFeatures, conf)
println("Inside scala future")
1
}
f.onComplete {
case Success(messages) => println("yay!")
case Failure(exception) => println("On no!")
}
}
It's hard to tell without having all the details, but general advice to tackle issues like that -- start with very simple application, "Hello world" kind. Just read from input stream and print data into log file. Once this works you prove that problem was in application and you gradually add your functionality back until you find what was culprit. If even simplest app doesn't work - you know that problem in configuration or Spark cluster itself. Hope this helps.

NetMQ PUSH socket blocks indefinitely when it reaches HWM

I'm using NetMQ (Nuget 3.3.2.2) on .NET 4.5 and I have a single fast generator process with a PUSH socket, and a single slow consumer process using a PULL socket. If I send enough messages to hit the sending HWM, the sending process blocks the thread indefinitely.
Some contrived (generator) code which illustrates the problem:
using (var ctx = NetMQContext.Create())
using (var pushSocket = ctx.CreatePushSocket())
{
pushSocket.Connect("tcp://127.0.0.1:42404");
var template = GenerateMessageBody(i);
for (int i = 1; i <= 100000; i++)
{
pushSocket.SendMoreFrame("SampleMessage").SendFrame(Messages.SerializeToByteArray(template));
if (i % 1000 == 0)
Console.WriteLine("Sent " + i + " messages");
}
Console.WriteLine("All finished");
Console.ReadKey();
}
On my configuration, this will usually report it has sent about 5000 or 6000 messages, and will then simply block. If I set the send HWM set to a large value (or 0), then it sends all of the messages as expected.
It looks like it's waiting to receive another command before it tries again, here: (SocketBase.TrySend)
// Oops, we couldn't send the message. Wait for the next
// command, process it and try to send the message again.
// If timeout is reached in the meantime, return EAGAIN.
while (true)
{
ProcessCommands(timeoutMillis, false);
From what I've read in the 0MQ guide, blocking on a full PUSH sockeet is the correct behaviour (and is what I want it to do), however I would expect it to recover once the consumer has cleared its queue.
Short of using some sort of TrySend pattern and dealing with the block myself, is there some option I can set or some other facility I can use to have the PUSH socket attempt to resend blocked messages periodically?

Browse All Messages from WebSphere MQ at a single time using Java

How can we browse all the messages in a WebSphere MQ queue in one API call using java?
Here is the code which I'm using. Here I'm using this code a for loop until q depth is reached.
MQGetMessageOptions gmo=new MQGetMessageOptions();
gmo.options = MQC.MQGMO_WAIT | MQC.MQGMO_BROWSE_NEXT ;
//System.out.println("Status: "+i);
MQMessage out=new MQMessage();
out.format =MQC.MQFMT_XMIT_Q_HEADER;//MQC.MQFMT_REF_MSG_HEADER;
mqCon.getQue().get(out,gmo);
System.out.print(i);
How can I get all messages without using for loop? It's taking a long time to browse 10,000 messages.
How can I get all messages without using for loop?
Use a while loop. Sorry, could not resist a slightly snarky answer on that one. WMQ does not have an API call analogous to the SQL select statement. Messaging and databases share some traits but address fundamentally different requirements.
It's taking a long time to browse 10,000 messages.
Take a look at the Performance SupportPacs. These are published on the SupportPacs main page and have names beginning with MP. Find the one for your platform and MQ version and it will list different scenarios for putting and getting messages as well as performance tuning recommendations.
I would also ask why a normal app needs to browse 10,000 messages. The QMgr will select messages for you based on MsgID, Correlation ID or property and this is much faster than browsing all the messages in order for the application to find the ones of interest. Occasionally people need to browse all messages on a queue to archive the queue or to debug a problem, but this is the exception rather than the rule. If a Production app regularly browses all messages on a queue, then the queues may have been inappropriately used as a database.
How can I get all messages without using for loop?
MQGetMessageOptions getOptions = new MQGetMessageOptions();
getOptions.options = MQC.MQGMO_BROWSE_NEXT + MQC.MQGMO_NO_WAIT + MQC.MQGMO_FAIL_IF_QUIESCING;
MQMessage message = new MQMessage();
byte[] b = null;
while(true)
{
try
{
queue.get(message, getOptions);
b = new byte[message.getMessageLength()];
message.readFully(b);
System.out.println(new String(b));
message.clearMessage();
}
catch (IOException e)
{
System.out.println("IOException: " + e.getMessage());
break;
}
catch (MQException e)
{
if (e.completionCode == 2 && e.reasonCode == MQException.MQRC_NO_MSG_AVAILABLE)
System.out.println("All messages read.");
else
System.out.println("MQException: Completion Code = " + e.completionCode + " : Reason Code = " + e.reasonCode);
break;
}
}
I have posted many Java/MQ samples here:
http://www.capitalware.biz/mq_code_java.html
If you want just browse messeges (not retrive them), you can use javax.jms.QueueBrowser.
Is pretty fast...
import javax.jms.*
...
public ArrayList<Message> browse() {
...
QueueBrowser queueBrowser = queueSession.createBrowser((javax.jms.Queue) lookupQueue());
Enumeration enums = queueBrowser.getEnumeration();
while (enums.hasMoreElements()) {
Object objMsg = enums.nextElement();
if (objMsg instanceof TextMessage) {
TextMessage message = (TextMessage) objMsg;
Log4j.trace("Text message: " + i + ". MSG:" + message.getText() + " MSG id:"
+ message.getJMSMessageID() + " MSG dest:" + message.getJMSDestination());
} else if (objMsg instanceof ObjectMessage) {
ObjectMessage message = (ObjectMessage) objMsg;
Log4j.trace("Object Message: " + i + ". MSG" + " MSG id:" + message.getJMSMessageID()
+ " MSG dest:" + message.getJMSDestination());
}
}
}
...
Just noticed the message format (out.Format) you are using. MQFMT_XMIT_Q_HEADER is used for messages that are sent to a transmit queue. Messages in transmit queue are not generally read by applications. MQ uses a transmit queue to send messages from one queue manager to another queue manager in a MQ network. I hope you are not browsing messages in a transmission queue.
For applications, message format would typically depend on the receiving application. For example if the receiving application is CICS based, then format would be MQFMT_CICS, for IMS it would be MQFMT_IMS. If text/string type of data is expected then you could use MQFMT_STRING. For administering MQ using PCF messages, then format can be MQFMT_PCF.

Azure Storage Queue very slow from a worker role in the cloud, but not from my machine

I'm doing a very simple test with queues pointing to the real Azure Storage and, I don't know why, executing the test from my computer is quite faster than deploy the worker role into azure and execute it there. I'm not using Dev Storage when I test locally, my .cscfg is has the connection string to the real storage.
The storage account and the roles are in the same affinity group.
The test is a web role and a worker role. The page tells to the worker what test to do, the the worker do it and returns the time consumed. This specific test meassures how long takes get 1000 messages from an Azure Queue using batches of 32 messages. First, I test running debug with VS, after I deploy the app to Azure and run it from there.
The results are:
From my computer: 34805.6495 ms.
From Azure role: 7956828.2851 ms.
That could mean that is faster to access queues from outside Azure than inside, and that doesn't make sense.
I'm testing like this:
private TestResult InQueueScopeDo(String test, Guid id, Int64 itemCount)
{
CloudStorageAccount account = CloudStorageAccount.Parse(_connectionString);
CloudQueueClient client = account.CreateCloudQueueClient();
CloudQueue queue = client.GetQueueReference(Guid.NewGuid().ToString());
try
{
queue.Create();
PreTestExecute(itemCount, queue);
List<Int64> times = new List<Int64>();
Stopwatch sw = new Stopwatch();
for (Int64 i = 0; i < itemCount; i++)
{
sw.Start();
Boolean valid = ItemTest(i, itemCount, queue);
sw.Stop();
if (valid)
times.Add(sw.ElapsedTicks);
sw.Reset();
}
return new TestResult(id, test + " with " + itemCount.ToString() + " elements", TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
}
finally
{
queue.Delete();
}
return null;
}
The PreTestExecute puts the 1000 items on the queue with 2048 bytes each.
And this is what happens in the ItemTest method for this test:
Boolean done = false;
public override bool ItemTest(long itemCurrent, long itemCount, CloudQueue queue)
{
if (done)
return false;
CloudQueueMessage[] messages = null;
while ((messages = queue.GetMessages((Int32)itemCount).ToArray()).Any())
{
foreach (var m in messages)
queue.DeleteMessage(m);
}
done = true;
return true;
}
I don't what I'm doing wrong, same code, same connection string and I got these resuts.
Any idea?
UPDATE:
The problem seems to be in the way I calculate it.
I have replaced the times.Add(sw.ElapsedTicks); for times.Add(sw.ElapsedMilliseconds); and this block:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
for this one:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
times.Min(),times.Max(),times.Average());
And now the results are similar, so apparently there is a difference in how the precision is handled or something. I will research this later on.
The problem apparently was a issue with different nature of the StopWatch and TimeSpan ticks, as discussed here.
Stopwatch.ElapsedTicks Property
Stopwatch ticks are different from DateTime.Ticks. Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.
How is your CPU utilization? Is this possible that your code is spiking the CPU and your workstation is much faster than your Azure node?

Resources