Spring Integration timeout with Aggregator in memory - spring

I have noticed (code decompile) that when a timeout is set on a aggreator the whole message group is being stored in a future (in memory) and not in the storage. This causes at times "Out Of Memory" exceptions when a high throughput happens.
Is there a better way of handling this?
<aggregator input-channel="orderNotificationLoadBalancedExecutorChannelLATAM" output-channel="orderNotificationConverterChannelLATAM"
message-store="orderNotificationGroupStoreLATAM"
send-partial-result-on-expiry="true"
ref="firstOnlyPrimaryKeyMessageAggregator"
method="aggregate"
correlation-strategy-expression="headers['erpKeyMap']['erpKey']"
release-strategy-expression="#this[0].headers['tableName'].topLevel and #this[0].headers['operationType'].operationTypeDelete"
expire-groups-upon-completion="true"
expire-groups-upon-timeout="true"
group-timeout="5000">
</aggregator>

Oops!
Looks like a bug. I've just raised a JIRA on the matter.
The guilty code looks like:
private void scheduleGroupToForceComplete(final MessageGroup messageGroup) {
...
ScheduledFuture<?> scheduledFuture = this.getTaskScheduler()
.schedule(new Runnable() {
#Override
public void run() {
try {
forceReleaseProcessor.processMessageGroup(messageGroup);
}
catch (MessageDeliveryException e) {
if (logger.isDebugEnabled()) {
logger.debug("The MessageGroup [ " + messageGroup +
"] is rescheduled by the reason: " + e.getMessage());
}
scheduleGroupToForceComplete(messageGroup);
}
}
}, new Date(System.currentTimeMillis() + groupTimeout));
So, the ScheduledFuture holds the reference to the final MessageGroup via that inline Runnable callback.
I think we will fix it using only the groupId.
Sorry, there is no any workarounds...

You can set a message store. See the here.
A reference to a MessageGroupStore used to store groups of messages
under their correlation key until they are complete. Optional, by
default a volatile in-memory store.

Related

MassTransit Mediator MessageNotConsumedException

I Noticed a weird issue in one of our applications, from time to time, we get MessageNotConsumedException errors on API requests which we route via MT's Mediator.
As you will notice below, we have configured a customer LogFilter<T> which implements IFilter<ConsumeContext<T>> which ensure that we log each mediator message before and after consuming, or a 'ConsumeFailed' log in case an exception is thrown in any consumer.
When the error manifests itself, in the logs we see the following sequence of events:
T 0 : PreConsume logged
T +5ms: PostConsume logged
T +6ms: R-FAULT logged (I believe this logging is made by MT's internals?)
T +9ms: API Request 500 response logged, with `MessageNotConsumedException` as internal error
In the production environment, we see these errors with various timings, it happens in requests taking as 'little' as 9ms, over several seconds up to 30+ seconds.
I've trying to reproduce this problem in my local development environment, and did manage to produce the same sequence of events, but only by adding a delay of 35 seconds inside the consumer (see GetSomethingById class below for consumer body)
If I reduce the delay to 30s or less, the reponse will be fine.
Since the production errors are happening with very low handling times in the consumer, I suspect what I'm able to reproduce is not exactly the same.
However I'd still like to understand why I'm getting the MessageNotConsumedException, since while debugging I can easily step through my entire consumer (after the delay has elapsed) and happily reach the context.RespondAsync() call without any problems. Also while stepping through the consumer, the context.CancellationToken has not been cancelled.
I also came across this question, which sounds exactly like what I'm having, however I did add the HttpContext scope as documented. To be fair, I didn't try this change in production yet, but my local issue with the 35s delay remains unchanged.
I have MassTransit medatior configured as follows:
services.AddHttpContextAccessor();
services.AddMediator(x =>
{
x.AddConsumer<GetSomethingByIdHandler>();
x.ConfigureMediator((context, cfg) =>
{
//The order of using the middleware matters, so don't change this
cfg.UseHttpContextScopeFilter(context); // Extension method & friends copy/pasted from https://masstransit-project.com/usage/mediator.html#http-context-scope
cfg.UseConsumeFilter(typeof(LogFilter<>), context);
});
});
The LogFilter which is configured is the following class:
public class LogFilter<T> : IFilter<ConsumeContext<T>> where T : class
{
private readonly ILogger<LogFilter<T>> _logger;
public LogFilter(ILogger<LogFilter<T>> logger)
{
_logger = logger;
}
public void Probe(ProbeContext context) => context.CreateScope("log-filter");
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
LogPreConsume(context);
try
{
await next.Send(context);
}
catch (Exception exception)
{
LogConsumeException(context, exception);
throw;
}
LogPostConsume(context);
}
private void LogPreConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}",
typeof(T).Name,
"PreConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime());
private void LogPostConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}",
typeof(T).Name,
"PostConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime);
private void LogConsumeException(ConsumeContext<T> context, Exception exception) => _logger.LogError(exception,
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with sent time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}"
+ " and message {#message}",
typeof(T).Name,
"ConsumeFailure",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime,
context.Message);
}
I then have a controller method which looks like this:
[Route("[controller]")]
[ApiController]
public class SomethingController : ControllerBase
{
private readonly IMediator _mediator;
public SomethingController(IMediator mediator)
{
_mediator = mediator;
}
[HttpGet("{somethingId}")]
public async Task<IActionResult> GetSomething([FromRoute] int somethingId, CancellationToken ct)
{
var query = new GetSomethingByIdQuery(somethingId);
var response = await _mediator
.CreateRequestClient<GetSomethingByIdQuery>()
.GetResponse<Something>(query, ct);
return Ok(response.Message);
}
}
The consumer which handles this request is as follows:
public record GetSomethingByIdQuery(int SomethingId);
public class GetSomethingByIdHandler : IConsumer<GetSomethingByIdQuery>
{
public async Task Consume(ConsumeContext<GetSomethingByIdQuery> context)
{
await Task.Delay(35000, context.CancellationToken);
await context.RespondAsync(new Something{Name = "Something cool"});
}
}
MessageNotConsumedException is thrown when a message is sent using mediator and that message is not consumed by a consumer. That wouldn't typically be a transient error since one would expect that the consumer remains configured/connected to the mediator for the lifetime of the application.

Pub Sub Messages still in queue but not pulled

I have a simple shell script that connect to GCP and try to pull Pub/Sub messages from a topic.
When launched, it check if any message exist, does a simple action if so, then ack the message and loop .
It looks like that :
while [ 1 ]
do
gcloud pubsub subscriptions pull...
// Do something
gcloud pubsub subscriptions ack ...
done
Randomly it does not pull the messages : they stay in the queue and are not pulled.
So we tried to add a while loop when getting the message with something like 5 re-try in order to avoid those issues work better but not perfectly. I also think that is a bit shabby...
This issue happened on other project that where migrated from a script shell to Java (for some other reasons) where we used a pull subscription and it work perfectly on those projects now !
We must probably do something wrong but I don't know what...
I have read that sometimes gcloud pull less messages than what's really on the pubsub queue :
https://cloud.google.com/sdk/gcloud/reference/pubsub/subscriptions/pull
But it must at least pull one ... In our case no messages are pulled but randomly.
Is there something to improve here ?
In general, relying on a shell script that uses gcloud to retrieve messages and do something with them is not going to be an efficient way to use Cloud Pub/Sub. It is worth noting that the lack of messages being returned in pull is not indicative of a lack of messages; it just means that messages could not be returned before the pull request's deadline. The gcloud subscriptions pull command sets the returnImmediately property (see info in pull documentation) to true, which basically means that if there aren't messages already quickly accessible in memory, then no messages are going to be returned. This property is deprecated and should not be set to true, so that is probably something that we need to explore changing in gcloud.
You would be better off writing a subscriber using the client libraries that sets up a stream and continuously retrieves messages. If your intention is to run this only periodically, then you could write a job that reads messages and waits some time after messages have not been received and shuts down. Again, this would not guarantee that all messages would be consumed that are available, but it would be true in most cases.
A version of this in Java would look like this:
import com.google.cloud.pubsub.v1.AckReplyConsumer;
import com.google.cloud.pubsub.v1.MessageReceiver;
import com.google.pubsub.v1.ProjectSubscriptionName;
import com.google.pubsub.v1.PubsubMessage;
import java.util.concurrent.atomic.AtomicLong;
import org.joda.time.DateTime;
/** A basic Pub/Sub subscriber for purposes of demonstrating use of the API. */
public class Subscriber implements MessageReceiver {
private final String PROJECT_NAME = "my-project";
private final String SUBSCRIPTION_NAME = "my-subscription";
private com.google.cloud.pubsub.v1.Subscriber subscriber;
private AtomicLong lastReceivedTimestamp = new AtomicLong(0);
private Subscriber() {
ProjectSubscriptionName subscription =
ProjectSubscriptionName.of(PROJECT_NAME, SUBSCRIPTION_NAME);
com.google.cloud.pubsub.v1.Subscriber.Builder builder =
com.google.cloud.pubsub.v1.Subscriber.newBuilder(subscription, this);
try {
this.subscriber = builder.build();
} catch (Exception e) {
System.out.println("Could not create subscriber: " + e);
System.exit(1);
}
}
#Override
public void receiveMessage(PubsubMessage message, AckReplyConsumer consumer) {
// Process message
lastReceivedTimestamp.set(DateTime.now().getMillis());
consumer.ack();
}
private void run() {
subscriber.startAsync();
while (true) {
long now = DateTime.now().getMillis();
long currentReceived = lastReceivedTimestamp.get();
if (currentReceived > 0 && ((now - currentReceived) > 30000)) {
subscriber.stopAsync();
break;
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
System.out.println("Error while waiting for completion: " + e);
}
}
System.out.println("Subscriber has not received message in 30s. Stopping.");
subscriber.awaitTerminated();
}
public static void main(String[] args) {
Subscriber s = new Subscriber();
s.run();
System.exit(0);
}
}

How to call kafkaconsumer api from partition assignor' s implementation

I have implemented my own partition assignment strategy by implementing RangeAssignor in my spring boot application.
I have overridden its subscriptionUserData method and adding some user data. Whenever this data is getting changed I want to trigger partition rebalance by invoking below kafkaConsumer's api
kafkaconsumer apis enforce rebalance
I am not sure how can I get the object of kafka consumer and invoke this api.
Please suggest
You can call consumer.wakeup() function
consumer.wakeup() is the only consumer method that is safe to call from a different thread. Calling wakeup will cause poll() to exit with WakeupException, or if consumer.wakeup() was called while the thread was not waiting on poll, the exception will be thrown on the next iteration when poll() is called. The WakeupException doesn’t need to be handled, but before exiting the thread, you must call consumer.close(). Closing the consumer will commit off‐ sets if needed and will send the group coordinator a message that the consumer is leaving the group. The consumer coordinator will trigger rebalancing immediately
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
System.out.println("Starting exit...");
consumer.wakeup(); **//1**
try {
mainThread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
} });
...
Duration timeout = Duration.ofMillis(100);
try {
// looping until ctrl-c, the shutdown hook will cleanup on exit
while (true) {
ConsumerRecords<String, String> records =
movingAvg.consumer.poll(timeout);
System.out.println(System.currentTimeMillis() +
"-- waiting for data...");
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s\n",
record.offset(), record.key(), record.value());
}
for (TopicPartition tp: consumer.assignment())
System.out.println("Committing offset at position:" +
consumer.position(tp));
movingAvg.consumer.commitSync();
}
} catch (WakeupException e) {
// ignore for shutdown. **//2**
} finally {
consumer.close(); **//3**
System.out.println("Closed consumer and we are done");
}
ShutdownHook runs in a separate thread, so the only safe action we can take is to call wakeup to break out of the poll loop.
Another thread calling wakeup will cause poll to throw a WakeupException. You’ll want to catch the exception to make sure your application doesn’t exit unexpect‐ edly, but there is no need to do anything with it.
Before exiting the consumer, make sure you close it cleanly.
full example at:
https://github.com/gwenshap/kafka-examples/blob/master/SimpleMovingAvg/src/main/java/com/shapira/examples/newconsumer/simplemovingavg/SimpleMovingAvgNewConsumer.java

Elasticsearch : AssertionError while getting index name from alias

We have been using Elasticsearch Plugin in our project. While getting index name from alias getting below error
Error
{
"error": "AssertionError[Expected current thread[Thread[elasticsearch[Seth][http_server_worker][T#2]{New I/O worker #20},5,main]] to not be a transport thread. Reason: [Blocking operation]]", "status": 500
}
Code
String realIndex = client.admin().cluster().prepareState()
.execute().actionGet().getState().getMetaData()
.aliases().get(aliasName).iterator()
.next().key;
what causes this issue?? Googled it didn't get any help
From the look of the error, it seems like this operation is not allowed on the transport thread as it will block the thread until you get the result back. You need to execute this on a execute thread.
public String getIndexName() {
final IndexNameHolder result = new IndexNameHolder(); // holds the index Name. Needed a final instance here, hence created a holder.
getTransportClient().admin().cluster().prepareState().execute(new ActionListener<ClusterStateResponse>() {
#Override
public void onResponse(ClusterStateResponse response) {
result.indexName = response.getState().getMetaData().aliases().get("alias").iterator().next().key;
}
#Override
public void onFailure(Throwable e) {
//Handle failures
}
});
return result.value;
}
There is another method for execute(), one which takes a listener. You need to implement your own listener. In my answer, I have an anonymous implementation of Listener.
I hope it helps

Non-Blocking Endpoint: Returning an operation ID to the caller - Would like to get your opinion on my implementation?

Boot Pros,
I recently started to program in spring-boot and I stumbled upon a question where I would like to get your opinion on.
What I try to achieve:
I created a Controller that exposes a GET endpoint, named nonBlockingEndpoint. This nonBlockingEndpoint executes a pretty long operation that is resource heavy and can run between 20 and 40 seconds.(in the attached code, it is mocked by a Thread.sleep())
Whenever the nonBlockingEndpoint is called, the spring application should register that call and immediatelly return an Operation ID to the caller.
The caller can then use this ID to query on another endpoint queryOpStatus the status of this operation. At the beginning it will be started, and once the controller is done serving the reuqest it will be to a code such as SERVICE_OK. The caller then knows that his request was successfully completed on the server.
The solution that I found:
I have the following controller (note that it is explicitely not tagged with #Async)
It uses an APIOperationsManager to register that a new operation was started
I use the CompletableFuture java construct to supply the long running code as a new asynch process by using CompletableFuture.supplyAsync(() -> {}
I immdiatelly return a response to the caller, telling that the operation is in progress
Once the Async Task has finished, i use cf.thenRun() to update the Operation status via the API Operations Manager
Here is the code:
#GetMapping(path="/nonBlockingEndpoint")
public #ResponseBody ResponseOperation nonBlocking() {
// Register a new operation
APIOperationsManager apiOpsManager = APIOperationsManager.getInstance();
final int operationID = apiOpsManager.registerNewOperation(Constants.OpStatus.PROCESSING);
ResponseOperation response = new ResponseOperation();
response.setMessage("Triggered non-blocking call, use the operation id to check status");
response.setOperationID(operationID);
response.setOpRes(Constants.OpStatus.PROCESSING);
CompletableFuture<Boolean> cf = CompletableFuture.supplyAsync(() -> {
try {
// Here we will
Thread.sleep(10000L);
} catch (InterruptedException e) {}
// whatever the return value was
return true;
});
cf.thenRun(() ->{
// We are done with the super long process, so update our Operations Manager
APIOperationsManager a = APIOperationsManager.getInstance();
boolean asyncSuccess = false;
try {asyncSuccess = cf.get();}
catch (Exception e) {}
if(true == asyncSuccess) {
a.updateOperationStatus(operationID, Constants.OpStatus.OK);
a.updateOperationMessage(operationID, "success: The long running process has finished and this is your result: SOME RESULT" );
}
else {
a.updateOperationStatus(operationID, Constants.OpStatus.INTERNAL_ERROR);
a.updateOperationMessage(operationID, "error: The long running process has failed.");
}
});
return response;
}
Here is also the APIOperationsManager.java for completness:
public class APIOperationsManager {
private static APIOperationsManager instance = null;
private Vector<Operation> operations;
private int currentOperationId;
private static final Logger log = LoggerFactory.getLogger(Application.class);
protected APIOperationsManager() {}
public static APIOperationsManager getInstance() {
if(instance == null) {
synchronized(APIOperationsManager.class) {
if(instance == null) {
instance = new APIOperationsManager();
instance.operations = new Vector<Operation>();
instance.currentOperationId = 1;
}
}
}
return instance;
}
public synchronized int registerNewOperation(OpStatus status) {
cleanOperationsList();
currentOperationId = currentOperationId + 1;
Operation newOperation = new Operation(currentOperationId, status);
operations.add(newOperation);
log.info("Registered new Operation to watch: " + newOperation.toString());
return newOperation.getId();
}
public synchronized Operation getOperation(int id) {
for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
return op;
}
}
Operation notFound = new Operation(-1, OpStatus.INTERNAL_ERROR);
notFound.setCrated(null);
return notFound;
}
public synchronized void updateOperationStatus (int id, OpStatus newStatus) {
iteration : for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
op.setStatus(newStatus);
log.info("Updated Operation status: " + op.toString());
break iteration;
}
}
}
public synchronized void updateOperationMessage (int id, String message) {
iteration : for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
op.setMessage(message);
log.info("Updated Operation status: " + op.toString());
break iteration;
}
}
}
private synchronized void cleanOperationsList() {
Date now = new Date();
for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if((now.getTime() - op.getCrated().getTime()) >= Constants.MIN_HOLD_DURATION_OPERATIONS ) {
log.info("Removed operation from watchlist: " + op.toString());
iterator.remove();
}
}
}
}
The questions that I have
Is that concept a valid one that also scales? What could be improved?
Will i run into concurrency issues / race conditions?
Is there a better way to achieve the same in boot spring, but I just didn't find that yet? (maybe with the #Async directive?)
I would be very happy to get your feedback.
Thank you so much,
Peter P
It is a valid pattern to submit a long running task with one request, returning an id that allows the client to ask for the result later.
But there are some things I would suggest to reconsider :
do not use an Integer as id, as it allows an attacker to guess ids and to get the results for those ids. Instead use a random UUID.
if you need to restart your application, all ids and their results will be lost. You should persist them to a database.
Your solution will not work in a cluster with many instances of your application, as each instance would only know its 'own' ids and results. This could also be solved by persisting them to a database or Reddis store.
The way you are using CompletableFuture gives you no control over the number of threads used for the asynchronous operation. It is possible to do this with standard Java, but I would suggest to use Spring to configure the thread pool
Annotating the controller method with #Async is not an option, this does not work no way. Instead put all asynchronous operations into a simple service and annotate this with #Async. This has some advantages :
You can use this service also synchronously, which makes testing a lot easier
You can configure the thread pool with Spring
The /nonBlockingEndpoint should not return the id, but a complete link to the queryOpStatus, including id. The client than can directly use this link without any additional information.
Additionally there are some low level implementation issues which you may also want to change :
Do not use Vector, it synchronizes on every operation. Use a List instead. Iterating over a List is also much easier, you can use for-loops or streams.
If you need to lookup a value, do not iterate over a Vector or List, use a Map instead.
APIOperationsManager is a singleton. That makes no sense in a Spring application. Make it a normal PoJo and create a bean of it, get it autowired into the controller. Spring beans by default are singletons.
You should avoid to do complicated operations in a controller method. Instead move anything into a service (which may be annotated with #Async). This makes testing easier, as you can test this service without a web context
Hope this helps.
Do I need to make database access transactional ?
As long as you write/update only one row, there is no need to make this transactional as this is indeed 'atomic'.
If you write/update many rows at once you should make it transactional to guarantee, that either all rows are updated or none.
However, if two operations (may be from two clients) update the same row, always the last one will win.

Resources