Route lines from file to persistent JMS queue: How to improve performance? - elasticsearch

I need some help with performance tuning of a use case. In this use case the Camel route is tailing status lines in a log file and sends each line as a message to a JMS queue. I have implemented the use case like this:
package tests;
import java.io.File;
import java.net.URI;
import org.apache.activemq.ActiveMQConnectionFactory;
import org.apache.activemq.broker.BrokerFactory;
import org.apache.activemq.broker.BrokerService;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.component.sjms.SjmsComponent;
import org.apache.camel.main.Main;
public class LinesToQueue {
public static void main() throws Exception {
final File file = new File("data/log.txt");
final String uri = "tcp://127.0.0.1:61616";
final BrokerService jmsService = BrokerFactory.createBroker(new URI("broker:" + uri));
jmsService.start();
final SjmsComponent jmsComponent = new SjmsComponent();
jmsComponent.setConnectionFactory(new ActiveMQConnectionFactory(uri));
final Main main = new Main();
main.bind("jms", jmsComponent);
main.addRouteBuilder(new RouteBuilder() {
#Override
public void configure() throws Exception {
fromF("stream:file?fileName=%s&scanStream=true&scanStreamDelay=0", file.getAbsolutePath())
.routeId("LinesToQueue")
.to("jms:LogLines?synchronous=false");
}
});
main.enableHangupSupport();
main.run();
}
}
When I run this use case with a file already filled with 1.000.000 lines the overall performance I get in the route is about 313 lines/second. This means that it takes about 55 minutes to process the file.
As some sort of reference I also have created another use case. In this use case the Camel route is tailing status lines in a log file and sends each line as a document to an Elasticsearch index. I have implemented the use case like this:
package tests;
import java.io.File;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.main.Main;
public class LinesToIndex {
public static void main() throws Exception {
final File file = new File("data/log.txt");
final String uri = "local";
final Main main = new Main();
main.addRouteBuilder(new RouteBuilder() {
#Override
public void configure() throws Exception {
fromF("stream:file?fileName=%s&scanStream=true&scanStreamDelay=0", file.getAbsolutePath())
.routeId("LinesToIndex")
.bean(new LineConverter())
.toF("elasticsearch://%s?operation=INDEX&indexName=log&indexType=line", uri);
}
});
main.enableHangupSupport();
main.run();
}
}
When I run this use case with a file already filled with 1.000.000 lines the overall performance I get in the route is about 8333 lines/second. This means that it takes about 2 minutes to process the file.
I understand that there is a huge difference between a JMS queue and an Elasticsearch index but how can have the JMS use case above to perform better?
Update #1:
It seems to be the persistence in the JMS service that is the bottleneck in my first use case above. If I disable the persistence in the JMS service then the performance in the route is about 11111 lines/second. Which persistence storage for the JMS service will give me a better performance?

a couple of things to consider...
ActiveMQ producer connections are expensive, make sure you use a pooled connection factory...
consider using the VM transport for an in process ActiveMQ instance
consider using an external ActiveMQ broker over TCP (so it doesn't compete for resources with your test)
setup/tune KahaDB or LevelDB to optimize persistent storage for your use case

Related

Apache ActiveMQ Artemis transform TextMessage to ObjectMessage

I have a use case where I need to convert a message from one type to another (i.e. TextMessage -> ObjectMessage).
I found that when diverting between queues there is an option to transform the message. I have implemented the Transformer interface as instructed in the documentation.
import org.apache.activemq.artemis.api.core.Message;
import org.apache.activemq.artemis.core.server.transformer.Transformer;
import javax.jms.ObjectMessage;
public class TypeTransformer implements Transformer {
#Override
public Message transform(Message message) {
return message;
}
}
But I am now beginning to realize that it might be impossible to convert from a org.apache.activemq.artemis.api.core.Message to an javax.jms.ObjectMessage?
Is this right? That it cannot be done or is there some other way?
It should technically be possible to convert a javax.jms.TextMessage to a javax.jms.ObjectMessage, but it may be cumbersome. Here are some important things to note:
javax.jms.TextMessage, javax.jms.ObjectMessage, and org.apache.activemq.artemis.api.core.Message are all just interfaces. The javax version is what you use on the client and Message is what is used on the broker. The data for each type of message is stored differently in the underlying message implementation.
The class for the Java object that you wish to put into the ObjectMessage will need to be on the broker's classpath. This isn't required under normal circumstances as the broker itself will never serialize or deserialize the object.
You should really try to avoid ObjectMessage whenever possible. ObjectMessage objects depend on Java serialization to marshal and unmarshal their object payload. This process is generally considered unsafe (and slow!), because a malicious payload can exploit the host system. Lots of CVEs have been created for this. For this reason, most JMS providers force users to explicitly whitelist packages that can be exchanged using ObjectMessage messages. For example, here's the related documentation for ActiveMQ Artemis. There are a number of other issues with using JMS ObjectMessage not related to security that you should read about.
Granted you understand all that you should be able to convert the message using code something like this:
import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import org.apache.activemq.artemis.api.core.ICoreMessage;
import org.apache.activemq.artemis.api.core.Message;
import org.apache.activemq.artemis.api.core.SimpleString;
import org.apache.activemq.artemis.core.server.transformer.Transformer;
public class TypeTransformer implements Transformer {
#Override
public Message transform(Message message) {
ICoreMessage coreMessage = message.toCore();
try {
// get the data from the TextMessage
SimpleString mySimpleString = coreMessage.getBodyBuffer().readNullableSimpleString();
if (mySimpleString == null) {
// no text in the message so no transformation can be done
return message;
}
String myString = mySimpleString.toString();
// parse the data from the TextMessage and set it on the serializable object
Serializable object = new MySerializable();
// turn serializable object into byte array and write it to the message
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(object);
oos.flush();
byte[] data = baos.toByteArray();
coreMessage.getBodyBuffer().clear();
coreMessage.getBodyBuffer().writeInt(data.length);
coreMessage.getBodyBuffer().writeBytes(data);
coreMessage.setType(Message.OBJECT_TYPE);
return coreMessage;
} catch (Exception e) {
e.printStackTrace();
return message;
}
}
}

How to distribute workload to many compute and do scatter-gather scenarios with Kafka Steam?

I am new to Kafka Stream and Alpakka Kafka.
Problem: I have been using Java Executor Service to run parallel jobs and when ALL of them are done, marking the entire process done. The issue is fault tolerance, High Availability and Not utilizing all computes to do the work. It is using just ONE HOST JVM to do work.
We have Apache Kafka as infrastructure, so I was wondering how I can use Kafka Stream to do scatter-gather or just execute child task use case implemented to distribute workload and then gather results or get an indication that all tasks are done.
Any pointer to sample work or scatter-gather or Fork join would be great with Kafka Steam or Alpakka Kafka.
Here is a Sample:
import org.springframework.http.MediaType;
import org.springframework.web.reactive.function.client.WebClient;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
private static final ExecutorService executorService = Executors.newFixedThreadPool(15);
public static void main(String[] args) throws Exception {
final WebClient webClient = WebClient.builder().build();
List<CompletableFuture<String>> allTasks = new LinkedList<>();
String urls[] = {"http://test1", "http://test2", "http://test3"};
// Distribute the work ( webcient can do async but I wanted to just give example).
for (final String url : urls) {
CompletableFuture<String> task = CompletableFuture.supplyAsync(() -> {
// SOME Task JUST FOR Example I have put GET call it could be any thing
String response =
webClient.get().uri(url).accept(MediaType.APPLICATION_JSON).retrieve().bodyToMono(String.class).block();
return response;
}, executorService);
allTasks.add(task);
}
// wait for all do be done (Join)
CompletableFuture.allOf(allTasks.toArray(new CompletableFuture[]{})).join();
for(CompletableFuture<String> task: allTasks){
processResponse(task.get());
}
}
public static void processResponse(String response){
System.out.println(response);
}
}

Spring Boot IBM Queue - Discover all Destinations

I am writing a small spring boot application that is supposed to monitor queues on an external IBM Queue installation.
I am able to connect via MQXAQueueConnectionFactory, but I have not found a way to discover all remote queues/destinations on that Host programmatically. I don't want to add them fix in my code.
How can I get a list of all existing queues in order to add listeners? I have to mention that an access via REST-API is not possible because this feature has been disabled by the administration.
You can use the IBM MQ Programmable Command Formats. If you installed the IBM MQ samples, the tools/pcf/samples/PCF_DisplayActiveLocalQueues.java gives you an idea for your use case.
Here is how I use it in my unit tests to find all the queues with messages:
import java.io.IOException;
import com.ibm.mq.MQException;
import com.ibm.mq.MQGetMessageOptions;
import com.ibm.mq.MQMessage;
import com.ibm.mq.MQQueue;
import com.ibm.mq.MQQueueManager;
import com.ibm.mq.constants.CMQC;
import com.ibm.mq.constants.CMQCFC;
import com.ibm.mq.constants.MQConstants;
import com.ibm.mq.headers.MQDataException;
import com.ibm.mq.headers.pcf.PCFMessage;
import com.ibm.mq.headers.pcf.PCFMessageAgent;
public class MqUtils {
public static void queuesWithMessages(MQQueueManager qmgr) {
try {
PCFMessageAgent agent = new PCFMessageAgent(qmgr);
try {
PCFMessage request = new PCFMessage(CMQCFC.MQCMD_INQUIRE_Q);
// NOTE: You can not use a queue name pattern like "FOO.*" together with
// the "addFilterParameter" method. This is a limitation of PCF messages.
// If you want to filter on queue names, you would have to do it in the
// for loop after sending the PCF message.
request.addParameter(CMQC.MQCA_Q_NAME, "*");
request.addParameter(CMQC.MQIA_Q_TYPE, MQConstants.MQQT_LOCAL);
request.addFilterParameter(CMQC.MQIA_CURRENT_Q_DEPTH, CMQCFC.MQCFOP_GREATER, 0);
for (PCFMessage response : agent.send(request)) {
String queueName = (String) response.getParameterValue(CMQC.MQCA_Q_NAME);
if (queueName == null
|| queueName.startsWith("SYSTEM")
|| queueName.startsWith("AMQ")) {
continue;
}
Integer queueDepth = (Integer) response.getParameterValue(CMQC.MQIA_CURRENT_Q_DEPTH);
// Do something with this queue that has messages
}
} catch (MQException | IOException e) {
throw new RuntimeException(e);
} finally {
agent.disconnect();
}
} catch (MQDataException e) {
throw new RuntimeException(e);
}
}
}
And this should give you ideas how to configure the MQQueueManager (see also IBM docs):
import com.ibm.mq.MQEnvironment;
import com.ibm.mq.MQException;
import com.ibm.mq.MQQueueManager;
#Configuration
static class MQConfig {
#Bean(destroyMethod = "disconnect")
public MQQueueManager mqQueueManager() throws MQException {
MQEnvironment.hostname = "the.host.com";
MQEnvironment.port = 1415;
MQEnvironment.channel = "xxx.CL.FIX";
return new MQQueueManager("xxx");
}
}
The chapter Using with IBM MQ classes for JMS explains how you can use PCF messages in pure JMS.

Aws lambda java - Implement simple cache to read a file

I have a lambda process in java and it reads a json file with a table everytime is triggered. I'd like to implement a kind of cache to have that file in memory and I wonder how to do something simple. I don't want to use elasticchache or redis.
I read something similar to my approach in javascript declaring a global variable with let but not sure how to do it in java, where it should be declared and how to test it. Any idea or example you can provide me? Thanks
There are global variables in lambda which can be of help but they have to be used wisely.
They are usually the variables declared out side of lambda_handler.
There are pros and cons of using it.
You can't rely on this behavior but you must be aware it exists. When you call your Lambda function several times, you MIGHT get the same container to optimise run duration and setup delay Use of Global Variables
At the same time you should be aware of the issues or avoid wrong use of it caching issues
If you don't want to use ElastiCache/redis then i guess you have very less options left.......may be dynamoDB or S3 that's all i can think of
again connection to dynamoDB or S3 can be cached here. It won't be as fast as ElastiCache though.
In Java it's not too hard to do. Just create your cache outside of the handler:
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.LambdaLogger;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
public class SampleHandler implements RequestStreamHandler {
private static final Logger logger = LogManager.getLogger(SampleHandler.class);
private static Map<String, String> theCache = null;
public SampleHandler() {
logger.info( "filling cache...");
theCache = new HashMap<>();
theCache.put("key1", "value1");
theCache.put("key2", "value2");
theCache.put("key3", "value3");
theCache.put("key4", "value4");
theCache.put("key5", "value5");
}
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
logger.info("handlingRequest");
LambdaLogger lambdaLogger = context.getLogger();
ObjectMapper objectMapper = new ObjectMapper();
JsonNode jsonNode = objectMapper.readTree(inputStream);
String requestedKey = jsonNode.get("requestedKey").asText();
if( theCache.containsKey( requestedKey )) {
// read from the cache
String result = "{\"requestedValue\": \"" + theCache.get(requestedKey) + "\"}";
outputStream.write(result.getBytes());
}
logger.info("done with run, remaining time in ms is " + context.getRemainingTimeInMillis() );
}
}
(run with the AWS cli with aws lambda invoke --function-name lambda-cache-test --payload '{"requestedKey":"key4"}' out with the output going the the file out)
When this runs with a "cold start" you'll see the "filling cache..." message and then the "handlingRequest" in the CloudWatch log. As long as the Lambda is kept "warm" you will not see the cache message again.
Note that if you had hundreds of the same Lamda's running they would all have their own independent cache. Ultimately this does what you want though - it's a lazy load of the cache during a cold start and the cache is reused for warm calls.

How to refresh the key and value in cache after they are expired in Guava (Spring)

So, I was looking at caching methods in Java (Spring). And Guava looked like it would solve the purpose.
This is the usecase -
I query for some data from a remote service. Kind of configuration field for my application. This field will be used by every inbound request to my application. And it would be expensive to call the remote service everytime as it's kind of constant which changes periodically.
So, on the first request inbound to my application, when I call remote service, I would cache the value. I set an expiry time of this cache as 30 mins. After 30 mins when the cache is expired and there is a request to retrieve the key, I would like a callback or something to do the operation of calling the remote service and setting the cache and return the value for that key.
How can I do it in Guava cache?
Here i give a example how to use guava cache. If you want to handle removal listener then need to call cleanUp. Here i run a thread which one call clean up every 30 minutes.
import com.google.common.cache.*;
import org.springframework.stereotype.Component;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
#Component
public class Cache {
public static LoadingCache<String, String> REQUIRED_CACHE;
public Cache(){
RemovalListener<String,String> REMOVAL_LISTENER = new RemovalListener<String, String>() {
#Override
public void onRemoval(RemovalNotification<String, String> notification) {
if(notification.getCause() == RemovalCause.EXPIRED){
//do as per your requirement
}
}
};
CacheLoader<String,String> LOADER = new CacheLoader<String, String>() {
#Override
public String load(String key) throws Exception {
return null; // return as per your requirement. if key value is not found
}
};
REQUIRED_CACHE = CacheBuilder.newBuilder().maximumSize(100000000)
.expireAfterWrite(30, TimeUnit.MINUTES)
.removalListener(REMOVAL_LISTENER)
.build(LOADER);
Executors.newSingleThreadExecutor().submit(()->{
while (true) {
REQUIRED_CACHE.cleanUp(); // need to call clean up for removal listener
TimeUnit.MINUTES.sleep(30L);
}
});
}
}
put & get data:
Cache.REQUIRED_CACHE.get("key");
Cache.REQUIRED_CACHE.put("key","value");

Resources