Is there a way MySQLMaxValueIncrementer connects to multiple datasources in spring batch - spring

I have 2 datasources in my project datasource1 connects to datahost1 and datasource2 connects to datahost 2. I have 2 jobs firing for datasource1 and datasource 2 , now the MySQLMaxValueIncrementer should connect to datasource1 to get next incremental Id from datasource1 and for second job it should get from second datasource. Since there is only one MySQLMaxValueIncrementer it is connecting to single datasource. Is there any way we can solve this problem by dynamicalluy saying to use datasource based on condition at runtime

The JobRepository can be configured with a single DataFieldMaxValueIncrementer (through a DataFieldMaxValueIncrementerFactory).
If you want to use the same JobRepository for both jobs, then you need to provide a custom incrementer that is able to handle two datasources.
Otherwise, you would need to create a separate JobRepository for each job.

Related

Spring Boot Batch: Paging Doesn't Appear to be Working as Expected With DB2 Datasource

I've been working spring boot batch job to pull data from DB2 AS400 sys. The amount of records expected to be retrieved in the end game is 1 million+, but for now I'm working with a smaller subset of 100,000 records. I configured a JdbcPagingItemReader with a pageSize of 40,000 and a fetchSize of 100,000. The step for this is chunking the data into chunks of 20,000. When I ran with logging set to trace I notice that the query was modified with FETCH FIRST 40000 ROWS ONLY, which at first I didn't think was an issue until I noticed that the job was only retrieving 40,000 records and then ending. I could very well be missing understanding how paging is supposed to work with Spring Batch but my assumption was that Spring use the fetchSize as the total amount to retrieve per query, and then that would be split into pages of 40,000 each, and then that would be chunked into 20,000 records per chunk. My end goal is process all 100,000 records maintaining a high performance bar. Any help explaining how paging actually works would be most helpful. Code examples below.
Spring Boot v2.3.3
Spring Batch Core v4.2.4
Step Bean
#Bean(name = "step-one")
public Step stepOne(#Autowired JdbcPagingItemReader<MyPojo> pagingItemReader) {
return stepBuilderFactory.get("step-one")
.<Product.ProductBuilder, Product>chunk(chunkSize)
.reader(pagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.taskExecutor(coreTaskExecutor)
.listener(chunkListener)
.allowStartIfComplete(true)
.build();
}
JdbcPagingItemReader Bean
#StepScope
#Bean(name = "PagingItemReader")
public JdbcPagingItemReader<MyPojo> pagingItemReader(#Autowired PagingQueryProvider queryProvider) {
return new JdbcPagingItemReaderBuilder<MyPojo>().name("paging-reader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.rowMapper(mapper)
.pageSize(pageSize)
.fetchSize(fetchSize)
.saveState(false)
.build();
}
Properties
#APPLICATION BATCH CONFIGURATION
application.batch.page-size=40000
application.batch.chunk-size=10000
application.batch.fetch-size=80000

How to run more than 1 application instances of ktable-ktable joins kafka streams application on single partitioned kafka topics?

KTable<Key1, GenericRecord> primaryTable = createKTable(key1, kstream, statestore-name);
KTable<Key2, GenericRecord> childTable1 = createKTable(key1, kstream, statestore-name);
KTable<Key3, GenericRecord> childTable2 = createKTable(key1, kstream, statestore-name);
primaryTable.leftJoin(childTable1, (primary, choild1) -> compositeObject)
.leftJoin(childTable2,(compositeObject, child2) -> compositeObject, Materialized.as("compositeobject-statestore"))
.toStream().to(""composite-topics)
For my application, I am using KTable-Ktable joins, so that whenever data is received on primary or child stream, it can set it compositeObject with setters and getters for all three tables. These three incoming streams have different keys, but while creating KTable, I make the keys same for all three KTable.
I have all topics with single partition. When I run application on single instance, everything runs fine. I can see compositeObject populated with data from all three tables.
All interactive queries also runs fine passing the recordID and local statestore name.
But when I run two instances of same application, I see compositeObject with primary and child1 data but child2 remains empty. Even if i try to make call to statestore using interactive query, it doesn't return anything.
I am using spring-cloud-stream-kafka-streams libraries for writing code.
Please suggest what is the reason it is not setting and what should be a right solution to handle this.
Kafka Streams' scaling model is coupled to the number of input topic partitions. Thus, if your input topics are single partitioned you cannot scale-out. The number of input topic partitions determine your maximum parallelism.
Thus, you would need to create new topics with higher parallelism.

How to use Spring batch CompositeItemWriter with different data and having two JdbcBatchItemWritter

Need a solution to write a compositer writer with two JdbcBatchItemWriter and also differ data sets
You can find an example in spring-batch-samples repository. This sample shows how to use a composite item writer with two flat file item writers, but you can adapt it with two jdbc batch item writers.

read data through spring batch and return data outside the job

I read everywhere how to read data in spring batch itemReader and write in database using itemWriter, but I wanted to just read data using spring batch then somehow I wanted to access this list of items outside the job. I need to perform remaining processing after job finished.
The reason I wanted to do this is because I need to perform a lot of validations on every item. I have to validate each item's variable xyz if it exists in list(which is not available within job). After performing a lot of processing I have to insert information in different tables using JPA. Please help me out!

Designing model for Storm Topology

I am using Apache Kafka & Apache Storm integration.
I need to design a model.Here are the specification of my topology :
I have configured topic in Kafka. Let say customer1 . Now, the storm bolts will read the data from the customer1 kafka-spout. It processes the data and writes into mongo and cassandra db. Here the db names are also same as the kafka topics customer1. Table structure and rest of the things will be same.
Now, suppose I get a new customer let say customer2. I need to read data from customer2 kafka-spout and write it into mongo and cassandra db where the db names will be customer2.
I can think of two ways to do it .
I will write a bolt which gets trigged whenever a new customer name gets added into a Kafka topic .That bolt will have code which will create and submit the new topology to cluster.
I will create independent jars for all the customer and submit the topology manually.
I searched a lot about it but didn't get which approach is better.
What are the PROs and CONs of the above specified approach in terms of efficiency, code maintainability and adding new changes to the existing model ?
Is there any other way to handle this ?

Resources