Where to add StateStore in SpringBoot Kafka Stream - spring-boot

I'm trying to use the Topology Test Driver which requires the topology in its constructor.
However, while the application itself works fine it fails in my KStream unit tests with the following error:
"StateStore ... is already added"
Here's my KStream that I'd like to test (shortened):
#Bean
public KStream<...,...> kstream(StreamsBuilder builder) {
builder.addStateStore(...);
KStream<...,...> stream = builder.stream().filter(...).etc()
stream.process(()->...);
return stream;
}
My test (shortened)
def "..."() {
given:
...
StreamsBuilder builder = new StreamsBuilder();
MyStreamService myStreamService = new MyStreamService(...stubbed);
KStream mykStream = myStreamService.kStream(builder);
TopologyTestDriver driver = new TopologyTestDriver(builder.build(), ...)
...
As soon as I run builder.build() to get the Topology it throws the above error - however I don't understand why as I'm only calling addStateStore once in that very place. I removed the entire stream logic except for the .addStoreStore() method to see if any of the other methods would initialize it (map, filter, process etc.) to no avail.
I understand there are other ways to test Kafka streams but I'm specifically trying to get it working the above explained way. If this is not possible that's okay.

In the application.yml file try to add the following line:
spring:kafka:streams:state-dir: "dir-path"
kstreams will use the "dir-path" directory for state-stores.

Related

How to test ApplicationStartedEvent in Spring Boot?

I need to write an application startup event listener, here I have an #EventListener:
#EventListener
public void onApplicationEvent(ApplicationStartedEvent startedEvent)
How do I go with this? What I have done until now is, wrote many other unit tests but don't understand if I should (somehow) create an object for the ApplicationStartedEvent which doesn't sound right.
Other questions on SO like this one provide information about creating tests for custom events but this isn't a custom event and I don't want to create this object manually.
I solved it differently as I wanted to unit-test this, but seems like we can only write an Integration Test for this.
Used the following to create sample data:
private static final EasyRandomParameters EASY_RANDOM_PARAMETERS = new EasyRandomParameters()
.seed(123L)
.objectPoolSize(100)
.randomizationDepth(3)
.charset(StandardCharsets.UTF_8)
.timeRange(LocalTime.of(9, 0), LocalTime.of(17, 0))
.stringLengthRange(5, 50)
.collectionSizeRange(1, 10)
.scanClasspathForConcreteTypes(true)
.overrideDefaultInitialization(false)
.ignoreRandomizationErrors(true);
public static <T> T create(Class<T> ofType) {
EasyRandom easyRandom = new EasyRandom(EASY_RANDOM_PARAMETERS);
return easyRandom.nextObject(ofType);
}
And used it like:
ApplicationStartedEvent applicationStartedEvent = create(ApplicationStartedEvent.class);
Then called the required method. Not exactly the event listener of course but did the job.

Kafka Streams - The state store may have migrated to another instance

I'm writing a basic application to test the Interactive Queries feature of Kafka Streams. Here is the code:
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
KeyValueBytesStoreSupplier waypointsStoreSupplier = Stores.persistentKeyValueStore("test-store");
StoreBuilder waypointsStoreBuilder = Stores.keyValueStoreBuilder(waypointsStoreSupplier, Serdes.ByteArray(), Serdes.Integer());
final KStream<byte[], byte[]> waypointsStream = builder.stream("sample1");
final KStream<byte[], TruckDriverWaypoint> waypointsDeserialized = waypointsStream
.mapValues(CustomSerdes::deserializeTruckDriverWaypoint)
.filter((k,v) -> v.isPresent())
.mapValues(Optional::get);
waypointsDeserialized.groupByKey().aggregate(
() -> 1,
(aggKey, newWaypoint, aggValue) -> {
aggValue = aggValue + 1;
return aggValue;
}, Materialized.<byte[], Integer, KeyValueStore<Bytes, byte[]>>as("test-store").withKeySerde(Serdes.ByteArray()).withValueSerde(Serdes.Integer())
);
final KafkaStreams streams = new KafkaStreams(builder.build(), new StreamsConfig(createStreamsProperties()));
streams.cleanUp();
streams.start();
ReadOnlyKeyValueStore<byte[], Integer> keyValueStore = streams.store("test-store", QueryableStoreTypes.keyValueStore());
KeyValueIterator<byte[], Integer> range = keyValueStore.all();
while (range.hasNext()) {
KeyValue<byte[], Integer> next = range.next();
System.out.println(next.value);
}
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
protected static Properties createStreamsProperties() {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "random167");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "client-id");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, Serdes.Integer().getClass().getName());
//streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10000);
return streamsConfiguration;
}
So my problem is, every time I run this I get this same error:
Exception in thread "main" org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, test-store, may have migrated to another instance.
I'm running only 1 instance of the application, and the topic I'm consuming from has only 1 partition.
Any idea what I'm doing wrong ?
Looks like you have a race condition. From the kafka streams javadoc for KafkaStreams::start() it says:
Start the KafkaStreams instance by starting all its threads. This function is expected to be called only once during the life cycle of the client.
Because threads are started in the background, this method does not block.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html
You're calling streams.store() immediately after streams.start(), but I'd wager that you're in a state where it hasn't initialized fully yet.
Since this is code appears to be just for testing, add a Thread.sleep(5000) or something in there and give it a go. (This is not a solution for production) Depending on your input rate into the topic, that'll probably give a bit of time for the store to start filling up with events so that your KeyValueIterator actually has something to process/print.
Probably not applicable to OP but might help others:
In trying to retrieve a KTable's store, make sure the the KTable's topic exists first or you'll get this exception.
I failed to call Storebuilder before consuming the store.
Typically this happens for two reasons:
The local KafkaStreams instance is not yet ready (i.e., not yet in
runtime state RUNNING, see Run-time Status Information) and thus its
local state stores cannot be queried yet. The local KafkaStreams
instance is ready (e.g. in runtime state RUNNING), but the particular
state store was just migrated to another instance behind the scenes.
This may notably happen during the startup phase of a distributed
application or when you are adding/removing application instances.
https://docs.confluent.io/platform/current/streams/faq.html#handling-invalidstatestoreexception-the-state-store-may-have-migrated-to-another-instance
The simplest approach is to guard against InvalidStateStoreException when calling KafkaStreams#store():
// Example: Wait until the store of type T is queryable. When it is, return a reference to the store.
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}

Spring Batch multiple readers for different DB's

I have an existing spring batch project which reads data from MySQL or ArangoDB(NoSql database) based on feature toggle decision during startup and does some process and again writes back to MySQL/ArangoDB.
Now the reader configuration for MySQL is something like below,
#Bean
#Primary
#StepScope
public HibernatePagingItemReader reader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
Map<String, Object> queryParameters = new HashMap<>();
queryParameters.put(Constants.OLD_METADATA_DEFINITION_ID, oldMetadataDefinitionId);
HibernatePagingItemReader<Long> reader = new HibernatePagingItemReader<>();
reader.setUseStatelessSession(false);
reader.setPageSize(250);
reader.setParameterValues(queryParameters);
reader.setSessionFactory(((HibernateEntityManagerFactory) entityManagerFactory.getObject()).getSessionFactory());
return reader;
}
and i have another arango reader like below,
#Bean
#StepScope
public ListItemReader arangoReader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
List<InstanceDTO> instanceList = new ArrayList<InstanceDTO>();
PersistenceService arangoPersistence = arangoConfiguration
.getPersistenceService());
List<Long> instanceIds = arangoPersistence.getDefinitionInstanceIds(oldMetadataDefinitionId);
instanceIds.forEach((instanceId) ->
{
InstanceDTO instanceDto = new InstanceDTO();
instanceDto.setDefinitionID(oldMetadataDefinitionId);
instanceDto.setInstanceID(instanceId);
instanceList.add(instanceDto);
});
return new ListItemReader(instanceList);
}
and my step configuration is below,
#Bean
#SuppressWarnings("unchecked")
public Step InstanceMergeStep(ListItemReader arangoReader, ItemWriter<MetadataInstanceDTO> arangoWriter,
ItemReader<Long> mysqlReader, ItemWriter<Long> mysqlWriter) {
Step step = null;
if (arangoUsage) {
step = steps.get("arangoInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(arangoReader)
.writer(arangoWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(10)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
else {
step = steps.get("mysqlInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(mysqlReader)
.writer(mysqlWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(failedSkipLimit)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
return step;
}
The MySQL reader has pagination support through HibernatePagingItemReader so that it will handle millions of items without any memory issue.
I want to implement the same pagination support for arango reader to fetch only 250 documents per iteration how can modify the arango reader code to acheive this?
First of all documentation of ListItemReader says that - Useful for testing so don't use it for production. Return an ItemReader instead from all your reader beans instead of actual concrete types.
Having said that, Spring Batch API or Spring Data doesn't seem to supporting Arango DB . Closest that I could find is this
( I have not worked with Arango DB before ) .
So in my opinion, you have to write your own custom arango reader that implements paging by possibly implementing abstract class - org.springframework.batch.item.database.AbstractPagingItemReader
If its not doable by extending above class, you might have to implement everything from scratch. All of pagination readers in Spring Batch API extend this abstract class including HibernatePagingItemReader.
Also, remember that arango record set should have some kind of ordering to implement pagination so we can distinguish between page - 0 & page -1 etc ( similar to ORDER BY clause , BETWEEN Operator & less than , greater than operators etc in SQL. Also FETCH FIRST XXX ROWS OR LIMIT clause kind of thing would be needed too ) .
Implementing by your own is not a very tough task as you have to calculate total possible items , order them and then divide into pages and fetch only one page at a time.
Look at API for implementations like - HibernatePagingItemReader etc to get ideas.
Hope it helps !!

PowerMockito testing for rabbitmq akka streams

I am writing some testcases using RabbitMq library provided by
io.scalac.amqp
The testcase is to test consume message scenario, the code for which is something like and runs fine.
Source<Object, NotUsed> src = Source.fromPublisher(conn.consume(props.getQueue(), 1, false)).map(msg -> {
//some transformation to `ByteString` from "`Delivery`" msg
}
conn above is io.scalac.amqp.impl.RabbitConnection passed as io.scalac.amqp.impl.Connection
Parent class io.scalac.amqp.impl.Connection returns Publisher<Delivery>
while io.scalac.amqp.impl.RabbitConnection returns QueuePublisher
I write the testcase using PowerMock and spy Source class and mock like this
Source<Delivery,NotUsed> del;//Creating this object before calling the test
PowerMockito.spy(Source.class);
PowerMockito.when(Source.fromPublisher(Mockito.isA(Publisher.class))).thenReturn(del);
//Also tried
//PowerMockito.when(Source.fromPublisher(Mockito.isA(QueuePublisher.class))).thenReturn(del);
Is there any way I can test the consuming code through mocks?

How to register a Renderer with CRaSH

After reading about the remote shell in the Spring Boot documentation I started playing around with it. I implemented a new Command that produces a Stream of one of my database entities called company.
This works fine. So I want to output my stream of companies in the console. This is done by calling toString() by default. While this seams reasonable there is also a way to get nicer results by using a Renderer.
Implementing one should be straight forward as I can delegate most of the work to one of the already existing ones. I use MapRenderer.
class CompanyRenderer extends Renderer<Company> {
private final mapRenderer = new MapRenderer()
#Override Class<Company> getType() { Company }
#Override LineRenderer renderer(Iterator<Company> stream) {
def list = []
stream.forEachRemaining({
list.add([id: it.id, name: it.name])
})
return mapRenderer.renderer(list.iterator())
}
}
As you can see I just take some fields from my entity put them into a Mapand then delegate to a instance of MapRenderer to do the real work.
TL;DR
Only problem is: How do I register my Renderer with CRaSH?
Links
Spring Boot documentation http://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-remote-shell.html
CRaSH documentation (not helping) http://www.crashub.org/1.3/reference.html#_renderers

Resources