Kafka Streams Processor API batching on size and time

Kafka Streams Processor API batching on size and time - apache-kafka-streams

Trying to batch the records using kafka streams processor API. Batching is based on size and time. Lets say if the batch size reaches 10 or the last batch is processed more than 10 secs ago (Size or last processed time what ever comes first) then call external API to send the batch and commit using ProcessingContext.
Using punctuate to periodically check if the batch can be cleared and send to the external system.
Question - Can the processor API process method be invoked by streams API when the punctuate thread is being executed? Since the code is calling commit in punctuate thread can the context.commit() commit records which are not yet processed by process method?
Is it possible that the punctuate thread and process method being executed at the same time in different threads? If so then the code I have commit records which are not processed yet
public class TestProcessor extends AbstractProcessor<String, String> {
private ProcessorContext context;
private List<String> batchList = new LinkedList<>();
private AtomicLong lastProcessedTime = new AtomicLong(System.currentTimeMillis());
private static final Logger LOG = LoggerFactory.getLogger(TestProcessor.class);
#Override
public void init(ProcessorContext context) {
LOG.info("Calling init method " + context.taskId());
this.context = context;
context.schedule(10000, PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
if(batchList.size() > 0 && System.currentTimeMillis() - lastProcessedTime.get() >
10000){
//call external API
batchList.clear();
lastProcessedTime.set(System.currentTimeMillis());
}
context.commit();
});
}
#Override
public void process(String key, String value) {
batchList.add(value);
LOG.info("Context details " + context.taskId() + " " + context.partition() + " " +
"storeSize " + batchList.size());
if(batchList.size() == 10){
//call external API to send the batch
batchList.clear();
lastProcessedTime.set(System.currentTimeMillis());
}
context.commit();
}
#Override
public void close() {
if(batchList.size() > 0){
//call external API to send the left over records
batchList.clear();
}
}
}

Can the processor API process method be invoked by streams API when
the punctuate thread is being executed?
nope, it's not possible, as Processor executes process and punctuate methods in a single thread (the same thread used for both methods).
Is it possible that the punctuate thread and process method being
executed at the same time in different threads?
response is 'it's not possible', description provided above.
take into consideration that each topic partition will have own instance of your class TestProcessor. instead of local variables batchList and lastProcessedTime I recommend to use Kafka state store like KeyValueStore, so your stream will be fault tolerant.

Related

Send data to Spring Batch Item Reader (or Tasklet)

I have the following requirement:
An endpoint http://localhost:8080/myapp/jobExecution/myJobName/execute which receives a CSV and use univocity to apply some validations and generate a List of some pojo.
Send that list to a Spring Batch Job for some processing.
Multiple users could do this.
I want to know if with Spring Batch I can achieve this?
I was thinking to use a queue, put the data and execute a Job that pull objects from that queue. But how can I be sure that if other person execute the endpoint and other Job is executing, Spring Batch Knows which Item belongs to a certain execution?

You can use a queue or go ahead to put the list of values that was generated after the step with validations and store it as part of job parameters in the job execution context.
Below is a snippet to store the list to a job context and read the list using an ItemReader.
Snippet implements StepExecutionListener in a Tasklet step to put List which was constructed,
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
//tenantNames is a List<String> which was constructed as an output of an evaluation logic
stepExecution.getJobExecution().getExecutionContext().put("listOfTenants", tenantNames);
return ExitStatus.COMPLETED;
}
Now "listOfTenants" are read as part of a Step which has Reader (To allow one thread read at a time), Processor and Writer. You can also store it as a part of Queue and fetch it in a Reader. Snippet for reference,
public class ReaderStep implements ItemReader<String>, StepExecutionListener {
private List<String> tenantNames;
#Override
public void beforeStep(StepExecution stepExecution) {
try {
tenantNames = (List<String>)stepExecution.getJobExecution().getExecutionContext()
.get("listOfTenants");
logger.debug("Sucessfully fetched the tenant list from the context");
} catch (Exception e) {
// Exception block
}
}
#Override
public synchronized String read() throws Exception {
String tenantName = null;
if(tenantNames.size() > 0) {
tenantName = tenantNames.get(0);
tenantNames.remove(0);
return tenantName;
}
logger.info("Completed reading all tenant names");
return null;
}
// Rest of the overridden methods of this class..
}

Yes. Spring boot would execute these jobs in different threads. So Spring knows which items belongs to which execution.
Note: You can use like logging correlation id. This will help you filter the logs for a particular request. https://dzone.com/articles/correlation-id-for-logging-in-microservices

Is there any way to free JVM memory in #AfterChunks in spring batch?

Is there any way to free JVM memory in #AfterChunks? Because we are getting outOfMemory error after processing couple of records.
Is there any way to free memory after spring batch job completion ?
Public class ABC implements ChunkListener{
private static final Logger log = oggerFactory.getLogger(ABC .class);
private MessageFormat fmt = new MessageFormat("{0} items processed");
private int loggingInterval = 100;
#Override
public void beforeChunk(ChunkContext context) {
// Nothing to do here
}
#Override
public void afterChunk(ChunkContext context) {
int count = context.getStepContext().getStepExecution().getReadCount();
// If the number of records processed so far is a multiple of the logging interval then output a log message.
if (count > 0 && count % loggingInterval == 0) {
log.info( fmt.format(new Object[] {new Integer(count) })) ;
}
//String name = context.getStepContext().getStepName();
//context.getStepContext().registerDestructionCallback(name, callback);
}
How to call registerDestructionCallback to clean up? What are name and callback? Any reference?

There is no way to force a GC in Java (System.gc() is just a hint to the JVM) and you should leave this to the GC.
Items in a chunk-oriented step should be garbage collected after each chunk is processed, see Does Spring Batch release the heap memory after processing each batch?. If you have an OOM, make sure:
your items are not held by a processor, mapper, etc
a chunk can fit in memory: sometimes, using the driving query pattern, the processor fetches more details about each item and you can get out of memory very quickly for the first few items

Run task in background using deferredResult in Spring without frozen browser as client

I have implemented a simple Rest service by which I'd like to test deferredResult from Spring. While am I getting texts in that order:
TEST
TEST 1
TEST AFTER DEFERRED RESULT
I am very interested why in a browser (client) I need to wait that 8 seconds. Isn't that deferedResult shouldn't be non-blocking and run a task in the background? If no, how to create a rest service which will be non-blocking and run tasks in the background without using Java 9 and reactive streams?
#RestController("/")
public class Controller {
#GetMapping
public DeferredResult<Person> test() {
System.out.println("TEST");
DeferredResult<Person> result = new DeferredResult<>();
CompletableFuture.supplyAsync(this::test1)
.whenCompleteAsync((res, throwable) -> {
System.out.println("TEST AFTER DEFERRED RESULT");
result.setResult(res);
});
System.out.println("TEST 1");
return result;
}
private Person test1() {
try {
Thread.sleep(8000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Person("michal", 20);
}
}
class Person implements Serializable {
private String name;
private int age;
}

DeferredResult is a holder for a WebRequest to allow the serving thread to release and serve another incoming HTTP request instead of waiting for the current one's result. After setResult or setError methods will be invoked - Spring will release that stored WebRequest and your client will receive the response.
DeferredResult holder is a Spring Framework abstraction for Non-blocking IO threading.
Deferred result abstraction has nothing with background tasks. Calling it without threading abstractions will cause the expected same thread execution. Your test1 method is running in the background because of CompletableFuture.supplyAsync method invocation that gives the execution to common pool.
The result is returned in 8 seconds because the whenCompleteAsync passed callback will be called only after test1 method will return.
You cannot receive the result immediately when your "service call logic" takes 8 seconds despite you are performing it in the background. If you want to release the HTTP request - just return an available proper object (it could contain a UUID, for example, to fetch the created person later) or nothing from the controller method. You can try to GET your created user after N seconds. There are specific HTTP response codes (202 ACCEPTED), that means the serverside is processing the request. Finally just GET your created object.
The second approach (if you should notify your clientside - but I will not recommend you to do it if this is the only reason) - you can use WebSockets to notify the clientside and message with it.

Long-running AEM EventListener working inconsistently - blacklisted?

As always, AEM has brought new challenges to my life. This time, I'm experiencing an issue where an EventListener that listens for ReplicationEvents is working sometimes, and normally just the first few times after the service is restarted. After that, it stops running entirely.
The first line of the listener is a log line. If it was running, it would be clear. Here's a simplified example of the listener:
#Component(immediate = true, metatype = false)
#Service(value = EventHandler.class)
#Property(
name="event.topics", value = ReplicationEvent.EVENT_TOPIC
)
public class MyActivityReplicationListener implements EventHandler {
#Reference
private SlingRepository repository;
#Reference
private OnboardingInterface onboardingService;
#Reference
private QueryInterface queryInterface;
private Logger log = LoggerFactory.getLogger(this.getClass());
private Session session;
#Override
public void handleEvent(Event ev) {
log.info(String.format("Starting %s", this.getClass()));
// Business logic
log.info(String.format("Finished %s", this.getClass()));
}
}
Now before you panic that I haven't included the business logic, see my answer below. The main point of interest is that the business logic could take a few seconds.

While crawling through the second page of Google search to find an answer, I came across this article. A German article explaining that EventListeners that take more than 5 seconds to finish are sort of silently quarantined by AEM with no output.
It just so happens that this task might take longer than 5 seconds, as it's working off data that was originally quite small, but has grown (and this is in line with other symptoms).
I put a change in that makes the listener much more like the one in that article - that is, it uses an EventConsumer to asynchronously process the ReplicationEvent using a pub/sub model. Here's a simplified version of the new model (for AEM 6.3):
#Component(immediate = true, property = {
EventConstants.EVENT_TOPIC + "=" + ReplicationEvent.EVENT_TOPIC,
JobConsumer.PROPERTY_TOPICS + "=" + AsyncReplicationListener.JOB_TOPIC
})
public class AsyncReplicationListener implements EventHandler, JobConsumer {
private static final String PROPERTY_EVENT = "event";
static final String JOB_TOPIC = ReplicationEvent.EVENT_TOPIC;
#Reference
private JobManager jobManager;
#Override
public JobConsumer.JobResult process (Job job) {
try {
ReplicationEvent event = (ReplicationEvent)job.getProperty(PROPERTY_EVENT);
// Slow business logic (>5 seconds)
} catch (Exception e) {
return JobResult.FAILED;
}
return JobResult.OK ;
}
#Override
public void handleEvent(Event event) {
final Map <String, Object> payload = new HashMap<>();
payload.put(PROPERTY_EVENT, ReplicationEvent.fromEvent(event));
final Job addJobResult = jobManager.addJob(JOB_TOPIC , payload);
}
}
You can see here that the EventListener passes off the ReplicationEvent wrapped up in a Job, which is then handled by the JobConsumer, which according to this magic article, is not subject to the 5 second rule.
Here is some official documentation on this time limit. Once I had the "5 seconds" key, I was able to a bit more information, here and here, that talk about the 5 second limit as well. The first article uses a similar method to the above, and the second article shows a way to turn off these time limits.
The time limits can be disabled entirely (or increased) in the configMgr by setting the Timeout property to zero in the Apache Felix Event Admin Implementation configuration.

JMeter WebSockets Publish/Subscribe - scripting aschronous responses

We have built a publish/subscribe model into our application via WebSockets so users can receive "dynamic updates" when data changes. I'm now looking to load test this using JMeter.
Is there a way to configure a JMeter test to react to receipt of a WebSocket "published" message and then run further samplers i.e. make further web requests?
I have looked at plugin samples, but they appear focused on request/reply model (e.g. https://bitbucket.org/pjtr/jmeter-websocket-samplers) rather than publish/subscribe.
Edit:
I have progressed a solution for this using the WebSocketSampler - an Example JMX file can be found on BitBucket which uses STOMP over WebSockets and includes Connect, Subscribe, Handle Publish Message and Initiate JMeter Samplers from that.

It is a misunderstanding that the https://bitbucket.org/pjtr/jmeter-websocket-samplers/overview plugin only supports request-response model conversations.
Since version 0.7, the plugin offers "single read" and "single write" samplers. Of course, it depends on your exact protocol, but the idea is that you could use a "single write" sampler to send a WebSocket message that simulates creating the subscription and then have a (standard JMeter) While loop in combination with the "single read" samplers, to read any number of messages that are being published.
If this does not satisfy your needs, let me know and i'll see what i can do for you (i'm the author of this plugin).

I had the system with STOMP. So the clients executed the HTTP messages and they got the actual state via asynchronous WebSockets with this subscribe model. To emulate this behaviour I wrote a class which via JMeterContext variable could exchange data with Jmeter threads (import part you can find by yourself import org.springframework.*):
public class StompWebSocketLoadTestClient {
public static JMeterContext ctx;
public static StompSession session;
public static void start(JMeterContext ctx, String wsURL, String SESSION) throws InterruptedException {
WebSocketClient transport = new StandardWebSocketClient();
WebSocketStompClient stompClient = new WebSocketStompClient(transport);
ThreadPoolTaskScheduler threadPoolTaskScheduler = new ThreadPoolTaskScheduler();
threadPoolTaskScheduler.initialize();
stompClient.setTaskScheduler(threadPoolTaskScheduler);
stompClient.setDefaultHeartbeat(new long[]{10000, 10000});
stompClient.setMessageConverter(new ByteArrayMessageConverter());
StompSessionHandler handler = new MySessionHandler(ctx);
WebSocketHttpHeaders handshakeHeaders = new WebSocketHttpHeaders();
handshakeHeaders.add("Cookie", "SESSION=" + SESSION);
stompClient.connect(wsURL, handshakeHeaders, handler);
sleep(1000);
}
The messages were handled in this class:
private static class MySessionHandler extends StompSessionHandlerAdapter implements TestStateListener {
private String Login = "";
private final JMeterContext ctx_;
private MySessionHandler(JMeterContext ctx) {
this.ctx_ = ctx;
}
#Override
public void afterConnected(StompSession session, StompHeaders connectedHeaders) {
session.setAutoReceipt(true);
this.Login = ctx_.getVariables().get("LOGIN");
//System.out.println("CONNECTED:" + connectedHeaders.getSession() + ":" + session.getSessionId() + ":" + Login);
//System.out.println(session.isConnected());
**//HERE SUBSCRIBTION:**
session.subscribe("/user/notification", new StompFrameHandler() {
#Override
public Type getPayloadType(StompHeaders headers) {
//System.out.println("getPayloadType:");
Iterator it = headers.keySet().iterator();
while (it.hasNext()) {
String header = it.next().toString();
//System.out.println(header + ":" + headers.get(header));
}
//System.out.println("=================");
return byte[].class;
}
#Override
public void handleFrame(StompHeaders headers, Object payload) {
//System.out.println("recievedMessage");
NotificationList nlist = null;
try {
nlist = NotificationList.parseFrom((byte[]) payload);
JMeterVariables vars = ctx_.getVariables();
Iterator it = nlist.getNotificationList().iterator();
while (it.hasNext()) {
Notification n = (Notification) it.next();
String className = n.getType();
//System.out.println("CLASS NAME:" + className);
if (className.contains("response.Resource")) {
///After getting some message you can work with jmeter variables:
vars.putObject("var1", var1);
vars.put("var2",String.valueOf(var2));
}
//Here is "sending" variables back to Jmeter thread context so you can use the data during the test
ctx_.setVariables(vars);
n = null;
}
} catch (InvalidProtocolBufferException ex) {
Logger.getLogger(StompWebSocketLoadTestClient.class.getName()).log(Level.SEVERE, null, ex);
}
}
});
}
In Jmeter testplan, after Login stage I just added a Beanshell sampler with login/password and session strings and Jmeter thread context:
import jmeterstopm.StompWebSocketLoadTestClient;
StompWebSocketLoadTestClient ssltc = new StompWebSocketLoadTestClient();
String SERVER_NAME = vars.get("SERVER_NAME");
String SESSION = vars.get("SESSION");
String ws_pref = vars.get("ws_pref");
ssltc.start(ctx,ws_pref+"://"+SERVER_NAME+"/endpoint/notification- ws/websocket",SESSION);
Further is possible to use all incoming via Websockets data with simple vars variable:
Object var1= (Object) vars.getObject("var1");

Basically, JMeter is not suited well for async type of interaction with system under test.
Though (virtually) everything is possible with Scripting components (post processors, timers, assertions, perhaps samplers, seems to look most useful in your case) and JMeter Logic Controllers.
Like, you may line up your "further samplers", covered in If blocks, analyze the "receipt of a WebSocket published message" and set the flag variables/other parameters for If blocks.
And you may even sync threads, if you need it, check this answer.
But tell you what - that pretty much looks like a lot of handwritten stuff to be done.
So it make sense to consider the whole custom handwritten test harness too.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio