Spring State Machine task execution not firing - spring

I am having issues with getting a runnable to run in the manner described in the following reference:
http://docs.spring.io/autorepo/docs/spring-statemachine/1.0.0.M3/reference/htmlsingle/#statemachine-examples-tasks
TasksHandler handler = TasksHandler.builder()
.task("1", sleepRunnable())
.task("2", sleepRunnable())
.task("3", sleepRunnable())
.build();
handler.runTasks();
My implementation looks like this:
private Action<States, Events> getUnlockedAction() {
return new Action() {
#Override
public void execute(StateContext sc) {
System.out.println("in action..");
handler = new TasksHandler.Builder().taskExecutor(taskExecutor()).task("1", dp.runProcess(1)).build();
handler.addTasksListener(new MyTasksListener());
handler.runTasks();
System.out.println("after action..");
}
};
}
The initialization for the TaskExecutor looks like this:
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor te = new ThreadPoolTaskExecutor();
te.setMaxPoolSize(50);
te.setThreadNamePrefix("LULExecutor-");
te.setCorePoolSize(25);
te.initialize();
return te;
}
My code for dp (DataProcessor) looks like this:
#Component
#Qualifier("dataProcessor")
public class ADataProcessor {
public Runnable runProcess(final int i) {
return new Runnable() {
#Async
#Override
public void run() {
long delay = (long) ((Math.random() * 10) + 1) * 1000;
System.out.println("In thread " + i + "... sleep for " + delay);
try {
Thread.sleep(delay);
} catch (InterruptedException ex) {
Logger.getLogger(FSMFactoryConfig.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println("After thread " + i + "...");
}
};
}
}
When i execute my code, I see the messages for 'in action..' and 'after action..' with no delay..
When I use the following:
taskExecutor().execute(dp.runProcess(1));
taskExecutor().execute(dp.runProcess(2));
taskExecutor().execute(dp.runProcess(3));
taskExecutor().execute(dp.runProcess(4));
taskExecutor().execute(dp.runProcess(5));
taskExecutor().execute(dp.runProcess(6));
I get what I would expect from using the TasksHandler..
state changed to UNLOCKED
In thread 2... sleep for 10000
In thread 3... sleep for 5000
In thread 4... sleep for 8000
In thread 5... sleep for 4000
In thread 6... sleep for 4000
In thread 1... sleep for 9000
Jan 13, 2016 12:32:13 PM - org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor initialize
INFO: Initializing ExecutorService
state changed to LOCKED
After thread 5...
After thread 6...
After thread 3...
After thread 4...
After thread 1...
After thread 2...
None of the messages before or after the delay in the sleep are displayed when using the TasksHandler. So my question, how do I actually execute my runnable?? If I'm doing it correctly, what should I check?

I think you've slightly misunderstood few things. First you're linking to tasks sample having the original idea which were turned into a tasks recipe. It's also worth to look unit tests for tasks.
You register runnables with taskhandler get a state machine from it to start it and then tell handler to run tasks.
I now realize that in docs I probably should be bit more clear of its usage.

After adding all tasks to the handler, I had to start the state machine before invoking runTasks().
handler.getStateMachine().startReactively().block();
handler.runTasks();

Related

MassTransit Mediator MessageNotConsumedException

I Noticed a weird issue in one of our applications, from time to time, we get MessageNotConsumedException errors on API requests which we route via MT's Mediator.
As you will notice below, we have configured a customer LogFilter<T> which implements IFilter<ConsumeContext<T>> which ensure that we log each mediator message before and after consuming, or a 'ConsumeFailed' log in case an exception is thrown in any consumer.
When the error manifests itself, in the logs we see the following sequence of events:
T 0 : PreConsume logged
T +5ms: PostConsume logged
T +6ms: R-FAULT logged (I believe this logging is made by MT's internals?)
T +9ms: API Request 500 response logged, with `MessageNotConsumedException` as internal error
In the production environment, we see these errors with various timings, it happens in requests taking as 'little' as 9ms, over several seconds up to 30+ seconds.
I've trying to reproduce this problem in my local development environment, and did manage to produce the same sequence of events, but only by adding a delay of 35 seconds inside the consumer (see GetSomethingById class below for consumer body)
If I reduce the delay to 30s or less, the reponse will be fine.
Since the production errors are happening with very low handling times in the consumer, I suspect what I'm able to reproduce is not exactly the same.
However I'd still like to understand why I'm getting the MessageNotConsumedException, since while debugging I can easily step through my entire consumer (after the delay has elapsed) and happily reach the context.RespondAsync() call without any problems. Also while stepping through the consumer, the context.CancellationToken has not been cancelled.
I also came across this question, which sounds exactly like what I'm having, however I did add the HttpContext scope as documented. To be fair, I didn't try this change in production yet, but my local issue with the 35s delay remains unchanged.
I have MassTransit medatior configured as follows:
services.AddHttpContextAccessor();
services.AddMediator(x =>
{
x.AddConsumer<GetSomethingByIdHandler>();
x.ConfigureMediator((context, cfg) =>
{
//The order of using the middleware matters, so don't change this
cfg.UseHttpContextScopeFilter(context); // Extension method & friends copy/pasted from https://masstransit-project.com/usage/mediator.html#http-context-scope
cfg.UseConsumeFilter(typeof(LogFilter<>), context);
});
});
The LogFilter which is configured is the following class:
public class LogFilter<T> : IFilter<ConsumeContext<T>> where T : class
{
private readonly ILogger<LogFilter<T>> _logger;
public LogFilter(ILogger<LogFilter<T>> logger)
{
_logger = logger;
}
public void Probe(ProbeContext context) => context.CreateScope("log-filter");
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
LogPreConsume(context);
try
{
await next.Send(context);
}
catch (Exception exception)
{
LogConsumeException(context, exception);
throw;
}
LogPostConsume(context);
}
private void LogPreConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}",
typeof(T).Name,
"PreConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime());
private void LogPostConsume(ConsumeContext context) => _logger.LogInformation(
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with send time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}",
typeof(T).Name,
"PostConsume",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime);
private void LogConsumeException(ConsumeContext<T> context, Exception exception) => _logger.LogError(exception,
"{MessageType}:{EventType} correlated by {CorrelationId} on {Address}"
+ " with sent time {SentTime:dd/MM/yyyy HH:mm:ss:ffff}"
+ " and elapsed time {ElapsedTime}"
+ " and message {#message}",
typeof(T).Name,
"ConsumeFailure",
context.CorrelationId,
context.ReceiveContext.InputAddress,
context.SentTime?.ToUniversalTime(),
context.ReceiveContext.ElapsedTime,
context.Message);
}
I then have a controller method which looks like this:
[Route("[controller]")]
[ApiController]
public class SomethingController : ControllerBase
{
private readonly IMediator _mediator;
public SomethingController(IMediator mediator)
{
_mediator = mediator;
}
[HttpGet("{somethingId}")]
public async Task<IActionResult> GetSomething([FromRoute] int somethingId, CancellationToken ct)
{
var query = new GetSomethingByIdQuery(somethingId);
var response = await _mediator
.CreateRequestClient<GetSomethingByIdQuery>()
.GetResponse<Something>(query, ct);
return Ok(response.Message);
}
}
The consumer which handles this request is as follows:
public record GetSomethingByIdQuery(int SomethingId);
public class GetSomethingByIdHandler : IConsumer<GetSomethingByIdQuery>
{
public async Task Consume(ConsumeContext<GetSomethingByIdQuery> context)
{
await Task.Delay(35000, context.CancellationToken);
await context.RespondAsync(new Something{Name = "Something cool"});
}
}
MessageNotConsumedException is thrown when a message is sent using mediator and that message is not consumed by a consumer. That wouldn't typically be a transient error since one would expect that the consumer remains configured/connected to the mediator for the lifetime of the application.

How to call kafkaconsumer api from partition assignor' s implementation

I have implemented my own partition assignment strategy by implementing RangeAssignor in my spring boot application.
I have overridden its subscriptionUserData method and adding some user data. Whenever this data is getting changed I want to trigger partition rebalance by invoking below kafkaConsumer's api
kafkaconsumer apis enforce rebalance
I am not sure how can I get the object of kafka consumer and invoke this api.
Please suggest
You can call consumer.wakeup() function
consumer.wakeup() is the only consumer method that is safe to call from a different thread. Calling wakeup will cause poll() to exit with WakeupException, or if consumer.wakeup() was called while the thread was not waiting on poll, the exception will be thrown on the next iteration when poll() is called. The WakeupException doesn’t need to be handled, but before exiting the thread, you must call consumer.close(). Closing the consumer will commit off‐ sets if needed and will send the group coordinator a message that the consumer is leaving the group. The consumer coordinator will trigger rebalancing immediately
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
System.out.println("Starting exit...");
consumer.wakeup(); **//1**
try {
mainThread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
} });
...
Duration timeout = Duration.ofMillis(100);
try {
// looping until ctrl-c, the shutdown hook will cleanup on exit
while (true) {
ConsumerRecords<String, String> records =
movingAvg.consumer.poll(timeout);
System.out.println(System.currentTimeMillis() +
"-- waiting for data...");
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s\n",
record.offset(), record.key(), record.value());
}
for (TopicPartition tp: consumer.assignment())
System.out.println("Committing offset at position:" +
consumer.position(tp));
movingAvg.consumer.commitSync();
}
} catch (WakeupException e) {
// ignore for shutdown. **//2**
} finally {
consumer.close(); **//3**
System.out.println("Closed consumer and we are done");
}
ShutdownHook runs in a separate thread, so the only safe action we can take is to call wakeup to break out of the poll loop.
Another thread calling wakeup will cause poll to throw a WakeupException. You’ll want to catch the exception to make sure your application doesn’t exit unexpect‐ edly, but there is no need to do anything with it.
Before exiting the consumer, make sure you close it cleanly.
full example at:
https://github.com/gwenshap/kafka-examples/blob/master/SimpleMovingAvg/src/main/java/com/shapira/examples/newconsumer/simplemovingavg/SimpleMovingAvgNewConsumer.java

Apache Storm spout stops emitting messages from spout

We have been struggling with this issue for a long time now. In short, our storm topology stops emitting messages from spout after some time in a random fashion. We have an automated script which re-deploys the topology at 06:00 UTC everyday after the master data refresh activity is complete.
In the last 2 weeks, our topology stopped emitting the messages for 3 times in late UTC hours (between 22:00 and 02:00). It only comes online when we restart it which is around 06:00 UTC.
I've searched for many answers & blogs but couldn't find out what's happening here. We have an un-anchored topology which is a choice we have made like 3-4 years ago. We started with 0.9.2 and now we are on 1.1.0.
I've checked all kind of logs and I'm 100% sure that the nextTuple() method for the controller is not getting called and there are no exceptions happening in the system which may cause this. I've also checked all kind of logs we accumulate and there is not even a single ERROR or WARN logs explaining the abrupt stoppage. The INFO logs are also not that helpful. There is nothing which can be connected to this issue in worker logs or supervisor logs or nimbus logs.
This is how our spout class looks:
Controller.java
public class Controller implements IRichSpout {
SpoutOutputCollector _collector;
Calendar LAST_RUN = null;
List<ControllerMessage> msgList;
/**
* It is to open the spout
*/
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
msgList= new ArrayList<ControllerMessage>();
MongoIndexingHandler mongoIndexingHandler = new MongoIndexingHandler();
mongoIndexingHandler.createMongoIndexes();
}
/**
* It executes the next tuple
*/
#Override
public void nextTuple() {
Map<String, Object> logMap = new HashMap<>();
logMap.put("BEGIN", new Date());
try {
TriggerHandler thandler = new TriggerHandler();
if (msgList.size() == 0) {
List<ControllerMessage> mList = thandler.getControllerMessage(new Date());
msgList = mList;
}
if (msgList.size() > 0) {
ControllerMessage message = msgList.get(0);
if(thandler.fire(message.getFireTime())) {
Util.log(message, "CONTROLLER_LOGS", message.getTime(), new Date());
msgList.remove(0);
_collector.emit(new Values(message));
}
}
else{
Utils.sleep(1000);
}
} catch (Exception e) {
_collector.reportError(e);
Util.exLog(e, "EXECUTOR_ERROR", new Date(), "nextTuple()",Controller.class);
}
}
/**
* It acknowledges the messages
*/
#Override
public void ack(Object id) {
}
/**
* It tells failed messages
*/
#Override
public void fail(Object id) {
}
/**
* It declares the message name
*/
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("SPOUT_MESSAGE"));
}
#Override
public void activate() {
}
#Override
public void close() {
}
#Override
public void deactivate() {
}
#Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
and this is the topology class: DiagnosticTopology.java
public class DiagnosticTopology {
public static void main(String[] args) throws Exception {
int gSize = (null != args && args.length > 0) ? Integer.parseInt(args[0]) : 2;
int sSize = (null != args && args.length > 1) ? Integer.parseInt(args[1]) : 128;
int sMSize = (null != args && args.length > 2) ? Integer.parseInt(args[2]) : 16;
int aGSize = (null != args && args.length > 3) ? Integer.parseInt(args[3]) : 16;
int rSize = (null != args && args.length > 4) ? Integer.parseInt(args[4]) : 64;
int rMSize = (null != args && args.length > 5) ? Integer.parseInt(args[5]) : 16;
int dMSize = (null != args && args.length > 6) ? Integer.parseInt(args[6]) : 8;
int wSize = (null != args && args.length > 7) ? Integer.parseInt(args[7]) : 16;
String topologyName = (null != args && args.length > 8) ? args[8] : "DIAGNOSTIC";
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("controller", new Controller(), 1);
builder.setBolt("generator", new GeneratorBolt(), gSize).shuffleGrouping("controller");
builder.setBolt("scraping", new ScrapingBolt(), sSize).shuffleGrouping("generator");
builder.setBolt("smongo", new MongoBolt(), sMSize).shuffleGrouping("scraping");
builder.setBolt("aggregation", new AggregationBolt(), aGSize).shuffleGrouping("scraping");
builder.setBolt("rule", new RuleBolt(), rSize).shuffleGrouping("smongo");
builder.setBolt("rmongo", new RMongoBolt(), rMSize).shuffleGrouping("rule");
builder.setBolt("dstatus", new DeviceStatusBolt(), dMSize).shuffleGrouping("rule");
builder.setSpout("trigger", new TriggerSpout(), 1);
builder.setBolt("job", new JobTriggerBolt(), 4).shuffleGrouping("trigger");
Config conf = new Config();
conf.setDebug(false);
conf.setNumWorkers(wSize);
StormSubmitter.submitTopologyWithProgressBar(topologyName, conf, builder.createTopology());
}
}
We have fairly good servers (Xeon, 8 core, 32 GB and flash drives) in place for the production as well as testing environment and there are not external factors which can cause this issue as exception handling is everywhere in the code.
When this thing happens, it seems like everything stopped all of a sudden and there are no traces of why it happened.
Any help is highly appreciated!
I don't know what is causing your issue, but I'd recommend that you start by checking if upgrading to the latest Storm version resolves the issue. I know of at least two issues related to worker threads dying and not coming back up https://issues.apache.org/jira/browse/STORM-1750 https://issues.apache.org/jira/browse/STORM-2194. 1750 is fixed in 1.1.0, but 2194 is not fixed until 1.1.1.
In case upgrading doesn't fix the issue for you, you might be able to debug it by doing the following.
Next time your topology is hanging, go open Storm UI and find your spout. It'll show the list of executors running that spout, along with which workers are responsible for running them. Pick one of the workers where the spout executor isn't emitting anything. Open a shell on the machine running that worker, and find the worker JVM's process id. You can do this easily with jps -m.
Example output showing the worker JVM with port 6701 on my local machine, which has pid 7592:
7592 Worker test-2-1520361882 d24dc55d-76c7-4cc6-93fa-2663fcdcb1ba-10.0.75.1 6701 f7b6f8e4-6c87-47ca-a7b7-655009b6c62a
Trigger a thread dump by doing kill -3 <pid>, or use jstack <pid> if you prefer.
In the thread dump, you should be able to find the executor thread that's hanging. For instance, when I do a thread dump for a topology with a spout called "word", where one of the spout executors has number 13, I see
edit: Stack overflow won't let me post the stack trace because the heuristic looking for unformatted code is bad. I've spent probably as long trying to post the stack trace as writing the original answer, so I can't be bothered to keep trying. Here's the trace that should have been here https://pastebin.com/2Sz5kkQ1
which shows me what executor 13 is currently doing. In this case it's sleeping during a call to nextTuple.
If you can find out what your hanging executor is doing, you should be much better equipped to solve the issue, or report a bug to Storm.
We have observed this with our application where we had very busy CPU and all other threads were waiting for their turn. When we tried to find root cause using JVisualVM to check resource usage, we found that some function in some bolts were causing lot of overhead and CPU time. Please check via. any profiling tool if there are blocked threads in CPU critical path of nextTuple() method or are you receiving any data for the same from upstream.

When is a windows service considered "started"

We have a process that is executed as a windows service,
This process serves as an interface server processing incoming messages, transforms them and sends them out to another interface.
it is a rather heavy process, it needs to load a lot of things into memory and that takes some time (few minutes).
due to its nature, when we start it using its windows service, it remains in "starting" status for a very long time (sometimes more than 20 minutes)
even when we can see the process already works and process messages just fine (going by its logs).
so the question is - when is a service considered "starting" and when is it considered "started"? based on what factors?
Starting status finish when onstart is completed.
You should write starting code after onstart event.
puclic class Service1
{
private Timer timer = new Timer();
protected override void OnStart(string[] args)
{
this.timer.Elapsed += new ElapsedEventHandler(OnElapsedTime);
this.timer.Interval = 1 * 1000; // 1 second
this.timer.Enabled = true;
}
private void OnElapsedTime(object source, ElapsedEventArgs e)
{
this.timer.Enabled = false; // OnElapsedTime run only one time
// Write your code
}
}

Spring Integration timeout with Aggregator in memory

I have noticed (code decompile) that when a timeout is set on a aggreator the whole message group is being stored in a future (in memory) and not in the storage. This causes at times "Out Of Memory" exceptions when a high throughput happens.
Is there a better way of handling this?
<aggregator input-channel="orderNotificationLoadBalancedExecutorChannelLATAM" output-channel="orderNotificationConverterChannelLATAM"
message-store="orderNotificationGroupStoreLATAM"
send-partial-result-on-expiry="true"
ref="firstOnlyPrimaryKeyMessageAggregator"
method="aggregate"
correlation-strategy-expression="headers['erpKeyMap']['erpKey']"
release-strategy-expression="#this[0].headers['tableName'].topLevel and #this[0].headers['operationType'].operationTypeDelete"
expire-groups-upon-completion="true"
expire-groups-upon-timeout="true"
group-timeout="5000">
</aggregator>
Oops!
Looks like a bug. I've just raised a JIRA on the matter.
The guilty code looks like:
private void scheduleGroupToForceComplete(final MessageGroup messageGroup) {
...
ScheduledFuture<?> scheduledFuture = this.getTaskScheduler()
.schedule(new Runnable() {
#Override
public void run() {
try {
forceReleaseProcessor.processMessageGroup(messageGroup);
}
catch (MessageDeliveryException e) {
if (logger.isDebugEnabled()) {
logger.debug("The MessageGroup [ " + messageGroup +
"] is rescheduled by the reason: " + e.getMessage());
}
scheduleGroupToForceComplete(messageGroup);
}
}
}, new Date(System.currentTimeMillis() + groupTimeout));
So, the ScheduledFuture holds the reference to the final MessageGroup via that inline Runnable callback.
I think we will fix it using only the groupId.
Sorry, there is no any workarounds...
You can set a message store. See the here.
A reference to a MessageGroupStore used to store groups of messages
under their correlation key until they are complete. Optional, by
default a volatile in-memory store.

Resources