Kafka Streams: batch keys for a time window and do some processing on the batch of keys together - apache-kafka-streams

I have a stream of incoming primary keys (PK) that I am reading in my Kafkastreams app, I would like to batch them over say last 1 minute and query my transactional DB to get more data for the batch of PKs (deduplicated) in the last minute. And for each PK I would like to post a message on output topic.
I was able to code this using Processor API like below:
Topology topology = new Topology();
topology.addSource("test-source", inputKeySerde.deserializer(), inputValueSerde.deserializer(), "input.kafka.topic")
.addProcessor("test-processor", processorSupplier, "test-source")
.addSink("test-sink", "output.kafka.topic", outputKeySerde.serializer(), outputValueSerde.serializer, "test-processor");
Here processor supplier has a process method that adds the PK to a queue and a punctuator that is scheduled to run every minute and drains the queue and queries transactional DB and forwards a message for every PK.
ProcessorSupplier<Integer, ValueType> processorSupplier = new ProcessorSupplier<Integer, ValueType>() {
public Processor<Integer, ValueType> get() {
return new Processor<Integer, ValueType>() {
private ProcessorContext context;
private BlockingQueue<Integer> ids;
public void init(ProcessorContext context) {
this.context = context;
this.context.schedule(Duration.ofMillis(1000), PunctuationType.WALL_CLOCK_TIME, this::punctuate);
ids = new LinkedBlockingQueue<>();
public void process(Integer key, ValueType value) {
public void punctuate(long timestamp) {
Set<Long> idSet = new HashSet<>();
ids.drainTo(idSet, 1000);
List<Document> documentList = createDocuments(ids);
documentList.stream().forEach(document -> context.forward(document.getId(), document));
public void close() {
Wondering if there is a simpler way to accomplish this using DSL windowedBy and reduce/aggregate route?
***** Updated code to use state store ******
ProcessorSupplier<Integer, ValueType> processorSupplier = new ProcessorSupplier<Integer, ValueType>() {
public Processor<Integer, ValueType> get() {
return new Processor<Integer, ValueType>() {
private ProcessorContext context;
private KeyValueStore<Integer, Integer> stateStore;
public void init(ProcessorContext context) {
this.context = context;
stateStore = (KeyValueStore) context.getStateStore("MyStore");
this.context.schedule(Duration.ofMillis(5000), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
Set<Integer> ids = new HashSet<>();
try (KeyValueIterator<Integer, Integer> iter = this.stateStore.all()) {
while (iter.hasNext()) {
KeyValue<Integer, Integer> entry = iter.next();
List<Document> documentList = createDocuments(dataRetriever, ids);
documentList.stream().forEach(document -> context.forward(document.getId(), document));
ids.stream().forEach(id -> stateStore.delete(id));
public void process(Integer key, ValueType value) {
Long id = key.getId();
stateStore.put(id, id);
public void close() {


Kafka Stream: can't get data from Kafka persistent keyValue state store

I am using Kafka streams and persistent KeyValue store in my application. There are two KeyValue stores I am using and two processors. I am facing issue with the stateStore which is shared between two processors. NameProcessor put data into nameStore and EventProcessor extracts data from nameStore. From Debugging it looks like, NameProcessor is able to put data successfully but when EventProcessor trying to get data from nameStore, it doesn't get any data. Below is the code snippet for Application class, Topology, NameProcessor and EventProcessor. Also, I am using Spring boot parent version 2.4.3, kafka-streams version 2.2.0 and kafka-clients version 2.2.0
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
Properties configs = getKafkaStreamProperties();
Topology builder = new Topology();
new ApplicationTopology(builder);
KafkaStreams stream = new KafkaStreams(builder, configs);
stream.setUncaughtExceptionHandler((Thread thread, Throwable throwable) -> {
// here you should examine the throwable/exception and perform an appropriate action!
logger.error("Uncaught exception in stream, MessageDetail: "+ ExceptionUtils.getRootCauseMessage(throwable) + ", Stack Trace: " + throwable.fillInStackTrace());
Runtime.getRuntime().addShutdownHook(new Thread(stream::close));
private static Properties getKafkaStreamProperties() {
Properties configs = new Properties();
configs.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, getApplicationId());
configs.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, getBootstrapServers());
configs.setProperty(StreamsConfig.RETRIES_CONFIG, getRetries());
configs.setProperty(StreamsConfig.RETRY_BACKOFF_MS_CONFIG, getRetryBackOffMs());
configs.setProperty(StreamsConfig.REPLICATION_FACTOR_CONFIG, getReplicationFactor());
configs.setProperty(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, getMaxPollIntervalMs());
return configs;
public class ApplicationTopology {
public ApplicationTopology (Topology builder) {
StoreBuilder<KeyValueStore<String, Sensor>> nameStoreBuilder = Stores.
keyValueStoreBuilder(Stores.persistentKeyValueStore("nameStore"), Serdes.String(), CustomSerdes.getNameSerde()).withCachingEnabled().withLoggingEnabled(new HashMap<>());
StoreBuilder<KeyValueStore<String, String>> stateStoreBuilder = Stores.
keyValueStoreBuilder(Stores.persistentKeyValueStore("stateStore"), Serdes.String(), Serdes.String()).withCachingEnabled().withLoggingEnabled(new HashMap<>());
builder.addSource(AutoOffsetReset.LATEST, "source", Serdes.String().deserializer(), CustomSerdes.getIncomingEventSerde().deserializer(), getInboundTopic())
.addProcessor(TRANSFORMER, () -> new EventProcessor(), "source")
.addStateStore(nameStoreBuilder, TRANSFORMER)
.addSink("sink", getOutboundTopic(), Serdes.String().serializer(), CustomSerdes.getIncomingEventSerde().serializer(), TRANSFORMER);
//reset to earliest for model config topic as some models could be already on the topic
builder.addSource(AutoOffsetReset.EARLIEST, "nameStoreSource", Serdes.String().deserializer(), CustomSerdes.getSensorSerde().deserializer(), getInboundSensorUpdateTopic())
.addProcessor("process", () -> new NameProcessor(), "nameStoreSource")
.addStateStore(nameStoreBuilder, TRANSFORMER, "process");
public ApplicationTopology() {}
} }
public class NameProcessor extends AbstractProcessor<String, Sensor> {
private static final Logger LOGGER = LoggerFactory.getLogger(NameProcessor.class);
ProcessorContext context;
private KeyValueStore<String, Name> nameStore;
private static List<String> externalDeviceIdList = new ArrayList<>();
public void init(ProcessorContext processorContext) {
this.context = processorContext;
this.nameStore = (KeyValueStore<String, Name>) context.getStateStore("nameStore");
public void process(String externalDeviceId, Name name) {
if (StringUtils.isNotBlank(externalDeviceId)) {
String[] externalDeviceIds = SensorUtils.getExternalDeviceIdsWithoutSuffix(externalDeviceId);
if (Objects.isNull(name)) {
Arrays.stream(externalDeviceIds).forEach(id -> {
} else {
addOrUpdateNameInStore(sensor, externalDeviceIds);
private void addOrUpdateNameInStore(Sensor sensor, String[] externalDeviceIds) {
Arrays.stream(externalDeviceIds).forEach(id -> {
sensorStore.put(id, sensor);
// context.commit();
public class EventProcessor extends AbstractProcessor<String, IncomingEvent> {
private static final Logger LOGGER = LoggerFactory.getLogger(EventProcessor.class);
ProcessorContext context;
private KeyValueStore<String, Name> nameStore;
private KeyValueStore<String, String> stateStore;
public void init(ProcessorContext processorContext) {
this.context = processorContext;
this.nameStore = (KeyValueStore<String, Name>) context.getStateStore("nameStore");
this.stateStore = (KeyValueStore<String, String>) context.getStateStore("stateStore");
public void process(String key, IncomingEvent value) {
String correlationId = UUID.randomUUID().toString();
try {
String externalDeviceId = value.getExternalDeviceId();
Name nameFromStore = nameStore.get(externalDeviceId);
In nameFromStore variable, I don't get even value even after storing it in NameProcessor.

Best approach to use DiffUtil with LIveData + Room Database?

I am using Room Database with LiveData , but my Local Database is updating too fast as per our requirement and at the same time i have to reload my recycler view .instead of calling notifyDataSetChanged() to adapter , i am trying to use DiffUtil , but is crashing or not reloading properly , this is uncertain .
i am following this tutorial :
Tutorials Link here
MyAdapter :
public class SwitchGridAdapter extends RecyclerView.Adapter<SwitchGridAdapter.ViewHolder> {
private List<Object> allItemsList;
private LayoutInflater mInflater;
private OnItemClickListener mClickListener;
private Context context;
private Queue<List<Object>> pendingUpdates =
new ArrayDeque<>();
// data is passed into the constructor
public SwitchGridAdapter(Context context,List<Appliance> applianceList,List<ZmoteRemote> zmoteRemoteList) {
this.mInflater = LayoutInflater.from(context);
this.context = context;
allItemsList = new ArrayList<>();
if (applianceList!=null) allItemsList.addAll(applianceList);
if (zmoteRemoteList!=null)allItemsList.addAll(zmoteRemoteList);
// inflates the cell layout from xml when needed
public ViewHolder onCreateViewHolder(ViewGroup parent, int viewType) {
View view = mInflater.inflate(R .layout.switch_grid_item, parent, false);
return new ViewHolder(view);
// binds the data to the textview in each cell
public void onBindViewHolder(ViewHolder holder, int position) {
// Doing some update with UI Elements
// total number of cells
public int getItemCount() {
return allItemsList.size();
// stores and recycles views as they are scrolled off screen
public class ViewHolder extends RecyclerView.ViewHolder implements View.OnClickListener,View.OnLongClickListener {
TextView myTextView;
ImageView imgSwitch;
ViewHolder(View itemView) {
myTextView = (TextView) itemView.findViewById(R.id.txtSwitchName);
imgSwitch = (ImageView) itemView.findViewById(R.id.imgSwitchStatus);
public void onClick(View view) {
// handling click
public boolean onLongClick(View view) {
return true;
// convenience method for getting data at click position
Object getItem(int id) {
return allItemsList.get(id);
// allows clicks events to be caught
public void setClickListener(OnItemClickListener itemClickListener) {
this.mClickListener = itemClickListener;
// parent activity will implement this method to respond to click events
public interface OnItemClickListener {
void onItemClick(View view, int position);
void onItemLongPressListner(View view, int position);
// ✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅✅
// From This Line Reloading with Diff Util is Done .
public void setApplianceList( List<Appliance> applianceList,List<ZmoteRemote> zmoteRemoteList)
if (allItemsList == null)
allItemsList = new ArrayList<>();
List<Object> newAppliances = new ArrayList<>();
if (applianceList!=null) newAppliances.addAll(applianceList);
// when new data becomes available
public void updateItems(final List<Object> newItems) {
if (pendingUpdates.size() > 1) {
// This method does the heavy lifting of
// pushing the work to the background thread
void updateItemsInternal(final List<Object> newItems) {
final List<Object> oldItems = new ArrayList<>(this.allItemsList);
final Handler handler = new Handler();
new Thread(new Runnable() {
public void run() {
final DiffUtil.DiffResult diffResult =
DiffUtil.calculateDiff(new DiffUtilHelper(oldItems, newItems));
handler.post(new Runnable() {
public void run() {
applyDiffResult(newItems, diffResult);
// This method is called when the background work is done
protected void applyDiffResult(List<Object> newItems,
DiffUtil.DiffResult diffResult) {
dispatchUpdates(newItems, diffResult);
// This method does the work of actually updating
// the backing data and notifying the adapter
protected void dispatchUpdates(List<Object> newItems,
DiffUtil.DiffResult diffResult) {
// ❌❌❌❌❌❌ Next Line is Crashing the app ❌❌❌❌❌
dispatchUpdates(newItems, diffResult);
if (pendingUpdates.size() > 0) {
Observing LiveData
public void setUpAppliancesListLiveData()
if (applianceObserver!=null)
applianceObserver = null;
Log.e("Appliance Fetch","RoomName:"+this.roomName);
applianceObserver = new Observer<List<Appliance>>() {
public void onChanged(#Nullable List<Appliance> applianceEntities) {
// Log.e("Appliance Result","Appliance List \n\n:"+applianceEntities.toString());
new Thread(new Runnable() {
public void run() {
List<Appliance> applianceListTemp = applianceEntities;
zmoteRemoteList = new ArrayList<>(); //appDelegate.getDatabase().zmoteRemoteDao().getRemoteList(roomName);
// Sort according to name
Collections.sort(applianceListTemp, new Comparator<Appliance>() {
public int compare(Appliance item, Appliance t1) {
String s1 = item.getSwitchName();
String s2 = t1.getSwitchName();
return s1.compareToIgnoreCase(s2);
if(getActivity()!=null) {
getActivity().runOnUiThread(new Runnable() {
public void run() {
applianceList = applianceListTemp;
appDelegate.getDatabase().applianceDao().getApplinaceListByRoomName(this.roomName).observe(this, applianceObserver);

Sorting DataStream using Apache Flink

I am learning Flink and I started with a simple word count using DataStream. To enhance the processing I filtered the output to show only the results with 3 or more words found.
DataStream<Tuple2<String, Integer>> dataStream = env
.socketTextStream("localhost", 9000)
.flatMap(new Splitter())
.apply(new MyWindowFunction())
.filter(word -> word.f1 >= 3);
I would like to create a WindowFunction to sort the output by the value of words found. The WindowFunction that I am trying to implement does not compile at all. I am struggling to define the apply method and the parameters of the WindowFunction interface.
public static class MyWindowFunction implements WindowFunction<
Tuple2<String, Integer>, // input type
Tuple2<String, Integer>, // output type
Tuple2<String, Integer>, // key type
TimeWindow> {
void apply(Tuple2<String, Integer> key, TimeWindow window, Iterable<Tuple2<String, Integer>> input, Collector<Tuple2<String, Integer>> out) {
String word = ((Tuple2<String, Integer>)key).f0;
Integer count = ((Tuple2<String, Integer>)key).f1;
out.collect(new Tuple2<>(word, count));
I am updating this answer to use Flink 1.12.0. In order to sort the elements of a stream in I had to use a KeyedProcessFunction after counting the stream with a ReduceFunction. Then I had to set the parallelism of the very last transformation to 1 in order to not change the order of the elements that I sorted using KeyedProcessFunction. The sequence that I am using is socketTextStream -> flatMap -> keyBy -> reduce -> keyBy -> process -> print().setParallelism(1). Bellow it the example:
public class SocketWindowWordCountJava {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.socketTextStream("localhost", 9000)
.flatMap(new SplitterFlatMap())
.keyBy(new WordKeySelector())
.reduce(new SumReducer())
.keyBy(new WordKeySelector())
.process(new SortKeyedProcessFunction(3 * 1000))
String executionPlan = env.getExecutionPlan();
System.out.println("ExecutionPlan ........................ ");
System.out.println("........................ ");
env.execute("Window WordCount sorted");
The UDF that I used to sort the stream is the SortKeyedProcessFunction which extends KeyedProcessFunction. I use a ValueState<List<Event>> listState of Event implements Comparable<Event> to have a sorted list as state. On the processElement method I register the time stamp that I added the event to the state context.timerService().registerProcessingTimeTimer(timeoutTime); and I collect the event at the onTimer method. I am also using a time window of 3 seconds here.
public class SortKeyedProcessFunction extends KeyedProcessFunction<String, Tuple2<String, Integer>, Event> {
private static final long serialVersionUID = 7289761960983988878L;
// delay after which an alert flag is thrown
private final long timeOut;
// state to remember the last timer set
private ValueState<List<Event>> listState = null;
private ValueState<Long> lastTime = null;
public SortKeyedProcessFunction(long timeOut) {
this.timeOut = timeOut;
public void open(Configuration conf) {
// setup timer and HLL state
ValueStateDescriptor<List<Event>> descriptor = new ValueStateDescriptor<>(
// state name
// type information of state
TypeInformation.of(new TypeHint<List<Event>>() {
listState = getRuntimeContext().getState(descriptor);
ValueStateDescriptor<Long> descriptorLastTime = new ValueStateDescriptor<Long>(
TypeInformation.of(new TypeHint<Long>() {
lastTime = getRuntimeContext().getState(descriptorLastTime);
public void processElement(Tuple2<String, Integer> value, Context context, Collector<Event> collector) throws Exception {
// get current time and compute timeout time
long currentTime = context.timerService().currentProcessingTime();
long timeoutTime = currentTime + timeOut;
// register timer for timeout time
List<Event> queue = listState.value();
if (queue == null) {
queue = new ArrayList<Event>();
Long current = lastTime.value();
queue.add(new Event(value.f0, value.f1));
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Event> out) throws Exception {
// System.out.println("onTimer: " + timestamp);
// check if this was the last timer we registered
System.out.println("timestamp: " + timestamp);
List<Event> queue = listState.value();
Long current = lastTime.value();
if (timestamp == current.longValue()) {
queue.forEach( e -> {
class Event implements Comparable<Event> {
String value;
Integer qtd;
public Event(String value, Integer qtd) {
this.value = value;
this.qtd = qtd;
public String getValue() { return value; }
public Integer getQtd() { return qtd; }
public String toString() {
return "Event{" +"value='" + value + '\'' +", qtd=" + qtd +'}';
public int compareTo(#NotNull Event event) {
return this.getValue().compareTo(event.getValue());
So when I use $ nc -lk 9000 and type the words on the console I see them in order on the output
Event{value='soccer', qtd=7}
Event{value='swim', qtd=5}
Event{value='basketball', qtd=9}
Event{value='soccer', qtd=8}
Event{value='swim', qtd=6}
The other UDFs are for the other transformations of the stream program and they are here for completeness.
public class SplitterFlatMap implements FlatMapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 3121588720675797629L;
public void flatMap(String sentence, Collector<Tuple2<String, Integer>> out) throws Exception {
for (String word : sentence.split(" ")) {
out.collect(Tuple2.of(word, 1));
public class WordKeySelector implements KeySelector<Tuple2<String, Integer>, String> {
public String getKey(Tuple2<String, Integer> value) throws Exception {
return value.f0;
public class SumReducer implements ReduceFunction<Tuple2<String, Integer>> {
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> event1, Tuple2<String, Integer> event2) throws Exception {
return Tuple2.of(event1.f0, event1.f1 + event2.f1);
The .sum(1) method will do everything you need (no need for using apply()), as long as the Splitter class (which should be a FlatMapFunction) is emitting Tuple2<String, Integer> records, where String is the word, and Integer is always 1.
So then .sum(1) will do the aggregation for you. If you needed something different than what sum() does, you would typically use .reduce(new MyCustomReduceFunction()), as that's going to be the most efficient and scalable approach, in terms of not needing to buffer lots in memory.

Spring Mvc with Thread

Hi My thread class is showing null pointer exception please help me to resolve
public class AlertsToProfile extends Thread {
public final Map<Integer, List<String>> userMessages = Collections.synchronizedMap(new HashMap<Integer, List<String>>());
ProfileDAO profileDAO;
private String categoryType;
private String dataMessage;
public String getCategoryType() {
return categoryType;
public void setCategoryType(String categoryType) {
this.categoryType = categoryType;
public String getDataMessage() {
return dataMessage;
public void setDataMessage(String dataMessage) {
this.dataMessage = dataMessage;
public void run() {
String category=getCategoryType();
String data= getDataMessage();
List<Profile> all = profileDAO.findAll();
if (all != null) {
if (category == "All" || category.equalsIgnoreCase("All")) {
for (Profile profile : all) {
List<String> list = userMessages.get(profile.getId());
if (list == null ) {
ArrayList<String> strings = new ArrayList<String>();
userMessages.put(profile.getId(), strings);
} else {
and my service method is as follows
public class NoteManager
#Autowired AlertsToProfile alertsToProfile;
public void addNote(String type, String message, String category) {
String data = type + "," + message;
System.out.println("addNotes is done");
But when i call start() method am getting null pointer exception please help me. I am new to spring with thread concept
It pretty obvious: you instantiate your thread directly, as opposed to letting spring create AlertsToProfile and auto wire your instance.
To fix this, create a Runnable around your run() method and embed that into a method, something like this:
public void startThread() {
new Thread(new Runnable() {
public void run() {
// your code in here
you will want to bind the Thread instance to a field in AlertsToProfile in order to avoid leaks and stop the thread when you're done.

GWT retrieve list from datastore via serviceimpl

Hi I'm trying to retrieve a linkedhashset from the Google datastore but nothing seems to happen. I want to display the results in a Grid using GWT on a page. I have put system.out.println() in all the classes to see where I go wrong but it only shows one and I don't recieve any errors. I use 6 classes 2 in the server package(ContactDAOJdo/ContactServiceImpl) and 4 in the client package(ContactService/ContactServiceAsync/ContactListDelegate/ContactListGui). I hope someone can explain why this isn't worken and point me in the right direction.
public class ContactDAOJdo implements ContactDAO {
public LinkedHashSet<Contact> listContacts() {
PersistenceManager pm = PmfSingleton.get().getPersistenceManager();
String query = "select from " + Contact.class.getName();
System.out.print("ContactDAOJdo: ");
return (LinkedHashSet<Contact>) pm.newQuery(query).execute();
public class ContactServiceImpl extends RemoteServiceServlet implements ContactService{
private static final long serialVersionUID = 1L;
private ContactDAO contactDAO = new ContactDAOJdo() {
public LinkedHashSet<Contact> listContacts() {
LinkedHashSet<Contact> contacts = contactDAO.listContacts();
System.out.println("service imp "+contacts);
return contacts;
public interface ContactService extends RemoteService {
LinkedHashSet<Contact> listContacts();
public interface ContactServiceAsync {
void listContacts(AsyncCallback<LinkedHashSet <Contact>> callback);
public class ListContactDelegate {
private ContactServiceAsync contactService = GWT.create(ContactService.class);
ListContactGUI gui;
void listContacts(){
contactService.listContacts(new AsyncCallback<LinkedHashSet<Contact>> () {
public void onFailure(Throwable caught) {
System.out.println("delegate "+caught);
public void onSuccess(LinkedHashSet<Contact> result) {
System.out.println("delegate "+result);
public class ListContactGUI {
protected Grid contactlijst;
protected ListContactDelegate listContactService;
private Label status;
public void init() {
status = new Label();
contactlijst = new Grid();
status.setText("Contact list is being retrieved");
public void service_eventListRetrievedFromService(LinkedHashSet<Contact> result){
System.out.println("1 service eventListRetreivedFromService "+result);
status.setText("Retrieved contactlist list");
this.contactlijst.resizeRows(1 + result.size());
int row = 1;
this.contactlijst.setWidget(0, 0, new Label ("Voornaam"));
this.contactlijst.setWidget(0, 1, new Label ("Achternaam"));
for(Contact contact: result) {
this.contactlijst.setWidget(row, 0, new Label (contact.getVoornaam()));
this.contactlijst.setWidget(row, 1, new Label (contact.getVoornaam()));
System.out.println("voornaam: "+contact.getVoornaam());
System.out.println("2 service eventListRetreivedFromService "+result);
public void placeWidgets() {
System.out.println("placewidget inside listcontactgui" + contactlijst);
public void service_eventListContactenFailed(Throwable caught) {
status.setText("Unable to retrieve contact list from database.");
It could be the query returns a lazy list. Which means not all values are in the list at the moment the list is send to the client. I used a trick to just call size() on the list (not sure how I got to that solution, but seems to work):
public LinkedHashSet<Contact> listContacts() {
final PersistenceManager pm = PmfSingleton.get().getPersistenceManager();
try {
final LinkedHashSet<Contact> contacts =
(LinkedHashSet<Contact>) pm.newQuery(Contact.class).execute();
contacts.size(); // this triggers to get all values.
return contacts;
} finally {
But I'm not sure if this is the best practice...
