periodic flushing during mybatis batch execution - spring

My batched statements in mybatis are timing out. I'd like to throttle the load I'm sending to the database by flushing the statements periodically. In iBATIS, I used a callback, something like this:
sqlMapClientTemplate.execute(new SqlMapClientCallback<Integer>() {
#Override
public Integer doInSqlMapClient(SqlMapExecutor executor)
throws SQLException {
executor.startBatch();
int tally = 0;
for (Foo foo: foos) {
executor.insert("FooSql.insertFoo",foo.getData());
/* executes batch when > MAX_TALLY */
tally = BatchHelper.updateTallyOnMod(executor, tally);
}
return executor.executeBatch();
}
});
Is there a better way to do this in mybatis? Or do I need to do the same type of thing with SqlSessionCallback? This feels cumbersome. What I'd really like to do is configure the project to flush every N batched statements.

I did not get any responses, so I'll share the solution I settled on.
Mybatis provides direct access to statement flushing. I autowired the SqlSession, used Guava to partition the collection into manageable chunks, then flushed the statements after each chunk.
Iterable<List<Foo>> partitions = Iterables.partition(foos, MAX_TALLY);
for (List<Foo> partition : partitions) {
for (Foo foo : partition) {
mapper.insertFoo(foo);
}
sqlSession.flushStatements();
}

Sorry for the late response, but I just stumbled onto this question right now. However, hopefully it will help others with a similar problem.
You don't need to explicitly Autowire the SqlSession. You may use the mapper interface itself. In the mapper interface simply define a method that is annotated with the #Flush annotation and has a return type of List<BatchResult>. Here's an example of a method in the mapper interface:
#Flush
List<BatchResult> flushBatchedStatements();
Then simply call this method on your mapper object like so:
Iterable<List<Foo>> partitions = Iterables.partition(foos, MAX_TALLY);
for (List<Foo> partition : partitions)
{
for (Foo foo : partition)
{
mapper.insertFoo(foo);
}
mapper.flushBatchedStatements(); //this call will flush the all statements batched upto this point, into the table.
}
Note that you don't need to add anything special into your mapper XML file to support this type of statement flushing via the mapper interface. Your XML mapper may simply be something like
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace=".....">
<insert id="bulkInsertIntoTable" parameterType="myPackage.Foo">
insert into MyDatabaseTable(col1, col2, col3)
values
( #{fooObj.data1}, #{fooObj.data2}, #{fooObj.data3} )
</insert>
</mapper>
The only thing that is needed is that you use MyBatis 3.3 or higher. Here's what the MyBatis docs on the MyBatis website state:
If this annotation is used, it can be called the
SqlSession#flushStatements() via method defined at a Mapper
interface.(MyBatis 3.3 or above)
For more details please visit the MyBatis official documentation site:
http://www.mybatis.org/mybatis-3/java-api.html

Related

Spring Batch - use JpaPagingItemReader to read lists instead of individual items

Spring Batch is designed to read and process one item at a time, then write the list of all items processed in a chunk. I want my item to be a List<T> as well, to be thus read and processed, and then write a List<List<T>>. My data source is a standard Spring JpaRepository<T, ID>.
My question is whether there are some standard solutions for this "aggregated" approach. I see that there are some, but they don't read from a JpaRepository, like:
https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java
Spring Batch - Item Reader and ItemProcessor with a list
Spring Batch- how to pass list of multiple items from input to ItemReader, ItemProcessor and ItemWriter
Update:
I'm looking for a solution that would work for a rapidly changing dataset and in a multithreading environment.
I want my item to be a List as well, to be thus read and processed, and then write a List<List>.
Spring Batch does not (and should not) be aware of what an "item" is. It is up to you do design what an "item" is and how it is implemented (a single value, a list, a stream , etc). In your case, you can encapsulate the List<T> in a type that could be used as an item, and process data as needed. You would need a custom item reader though.
The solution we found is to use a custom aggregate reader as suggested here, which accumulates the read data into a list of a given size then passes it along. For our specific use case, we read data using a JpaPagingItemReader. The relevant part is:
public List<T> read() throws Exception {
ResultHolder holder = new ResultHolder();
// read until no more results available or aggregated size is reached
while (!itemReaderExhausted && holder.getResults().size() < aggregationSize) {
process(itemReader.read(), holder);
}
if (CollectionUtils.isEmpty(holder.getResults())) {
return null;
}
return holder.getResults();
}
private void process(T readValue, ResultHolder resultHolder) {
if (readValue == null) {
itemReaderExhausted = true;
return;
}
resultHolder.addResult(readValue);
}
In order to account for the volatility of the dataset, we extended the JPA reader and overwritten the getPage() method to always return 0, and controlled the dataset through the processor and writer to have the next fresh data to be fetched always on the first page. The hint was given here and in some other SO answers.
public int getPage() {
return 0;
}

In Spring Batch, linked with a ItemReader call I want to call a static util method to populate a string

I have a Spring Batch reader with following configurations.
This reader is reading from the database and and at a time its reading a page size records.
#Autowired
private SomeCreditRepot someCreditRepo;
public RepositoryItemReader<SomeCreditModel> reader() {
RepositoryItemReader<SomeCreditModel> reader = new RepositoryItemReader<>();
reader.setRepository(someCreditRepo);
reader.setMethodName("someCreditTransfer");
.
.
..
return reader;
}
I want to call utils method,
refValue = BatchProcessingUtil.generateSomeRefValue();
before the processor step, so that all the records fetched by the reader will have the same value set by which is given by the above call.
So that all the entity fetched by the reader will get the same value, in the processor.
And then this refValue will be written to another table StoreRefValue(table).
What is the right way to do this in Spring Batch?
Should I fire the query to write the refValue, to the table StoreRefValue in the processor?
You can let your processor implement the interface StepExecutionListener. You'll then have to implement the methods afterStep and beforeStep. The first should simply return null, and in beforeStep you can call the utility method and save its return value.
Alternatively, you can use the annotation #BeforeStep. If you use the usual Java DSL, it's not required to explicitly add the processor as a listener to the step. Adding it as a processor should suffice.
There are more details in the reference documentation:
https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#interceptingStepExecution

Spring Batch multiple readers for different DB's

I have an existing spring batch project which reads data from MySQL or ArangoDB(NoSql database) based on feature toggle decision during startup and does some process and again writes back to MySQL/ArangoDB.
Now the reader configuration for MySQL is something like below,
#Bean
#Primary
#StepScope
public HibernatePagingItemReader reader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
Map<String, Object> queryParameters = new HashMap<>();
queryParameters.put(Constants.OLD_METADATA_DEFINITION_ID, oldMetadataDefinitionId);
HibernatePagingItemReader<Long> reader = new HibernatePagingItemReader<>();
reader.setUseStatelessSession(false);
reader.setPageSize(250);
reader.setParameterValues(queryParameters);
reader.setSessionFactory(((HibernateEntityManagerFactory) entityManagerFactory.getObject()).getSessionFactory());
return reader;
}
and i have another arango reader like below,
#Bean
#StepScope
public ListItemReader arangoReader(
#Value("#{jobParameters[oldMetadataDefinitionId]}") Long oldMetadataDefinitionId) {
List<InstanceDTO> instanceList = new ArrayList<InstanceDTO>();
PersistenceService arangoPersistence = arangoConfiguration
.getPersistenceService());
List<Long> instanceIds = arangoPersistence.getDefinitionInstanceIds(oldMetadataDefinitionId);
instanceIds.forEach((instanceId) ->
{
InstanceDTO instanceDto = new InstanceDTO();
instanceDto.setDefinitionID(oldMetadataDefinitionId);
instanceDto.setInstanceID(instanceId);
instanceList.add(instanceDto);
});
return new ListItemReader(instanceList);
}
and my step configuration is below,
#Bean
#SuppressWarnings("unchecked")
public Step InstanceMergeStep(ListItemReader arangoReader, ItemWriter<MetadataInstanceDTO> arangoWriter,
ItemReader<Long> mysqlReader, ItemWriter<Long> mysqlWriter) {
Step step = null;
if (arangoUsage) {
step = steps.get("arangoInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(arangoReader)
.writer(arangoWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(10)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
else {
step = steps.get("mysqlInstanceMergeStep")
.<Long, Long>chunk(1)
.reader(mysqlReader)
.writer(mysqlWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(failedSkipLimit)
.taskExecutor(stepTaskExecutor())
.build();
((TaskletStep) step).registerChunkListener(chunkListener);
}
return step;
}
The MySQL reader has pagination support through HibernatePagingItemReader so that it will handle millions of items without any memory issue.
I want to implement the same pagination support for arango reader to fetch only 250 documents per iteration how can modify the arango reader code to acheive this?
First of all documentation of ListItemReader says that - Useful for testing so don't use it for production. Return an ItemReader instead from all your reader beans instead of actual concrete types.
Having said that, Spring Batch API or Spring Data doesn't seem to supporting Arango DB . Closest that I could find is this
( I have not worked with Arango DB before ) .
So in my opinion, you have to write your own custom arango reader that implements paging by possibly implementing abstract class - org.springframework.batch.item.database.AbstractPagingItemReader
If its not doable by extending above class, you might have to implement everything from scratch. All of pagination readers in Spring Batch API extend this abstract class including HibernatePagingItemReader.
Also, remember that arango record set should have some kind of ordering to implement pagination so we can distinguish between page - 0 & page -1 etc ( similar to ORDER BY clause , BETWEEN Operator & less than , greater than operators etc in SQL. Also FETCH FIRST XXX ROWS OR LIMIT clause kind of thing would be needed too ) .
Implementing by your own is not a very tough task as you have to calculate total possible items , order them and then divide into pages and fetch only one page at a time.
Look at API for implementations like - HibernatePagingItemReader etc to get ideas.
Hope it helps !!

Dynamically initialize multiple data sources in Spring

In my spring application , I need to dynamically initialize multiple data sources based on some values set up in the application configuration.
I am aware of the AbstractRoutingDataSource class provided by spring jdbc library but it helps only when you need to initialize a single data source based on a single look up key value at a time.
Is it possible to extend the AbstractRoutingDataSource and change its behavior to support multiple key look up and data source resolution? Is there any other alternative approach ? Reference
Basically I am trying to achieve something like this through AbstractDataSourceRouter class:
public class DataSourceRouter extends AbstractRoutingDataSource {
#Value("${com.listdb.datasource.switch}")
private short listDBSwitch;
#Value("${com.scoringdb.datasource.switch}")
private short scoringDbSwitch;
#Value("${com.configmaster.datasource.switch}")
private short configDbSwitch;
private List<String> configuredDataSources;
/**
* Determine the current lookup key. This will typically be
* implemented to check a thread-bound transaction context.
* <p>Allows for arbitrary keys. The returned key needs
* to match the stored lookup key type, as resolved by the
* {#link #resolveSpecifiedLookupKey} method.
*/
#Override
protected Object determineCurrentLookupKey() {
if(ListUtil.isListNotEmpty(configuredDataSources)) {
configuredDataSources =new ArrayList<String>();
String listDBString = (listDBSwitch == 1)?DataSources.LIST.toString() : null;
String configDBString = (configDbSwitch == 1) ? DataSources.CONFIGMASTER.toString() :null;
String scoringDBString = (scoringDbSwitch == 1) ? DataSources.SCORING.toString() : null;
/**
* Add all configured data source keys for look up
*/
configuredDataSources.add(listDBString);
configuredDataSources.add(configDBString);
configuredDataSources.add(scoringDBString);
}
return configuredDataSources;
}
}
Any help/suggestions?
This is not really possible with current spring/hibernate versions even if it would be neat to have it that way. If you need multiple data sources and use AbstractRoutingDataSource to achieve this, then one possible solution is to let spring initialize one DB (the default/configuration DB) and add e.g. init.sql script (or flyway/liquibase such if you are more into that) that initializes all other under the same AbstractRoutingDataSource.
This approach works nicely and gives you more control over your (hopefully test!) environment. Personally I like to have more control over DB schema then any auto-initializers can provide, however that's only a taste/style issue.

What's the best way to pass a huge collection to a Spring Batch Step?

Use case:
A one-time read of data set X (from database) into a Collection C. [Collection size could be say 5000]
Use Collection C to process/enrich items in a Spring Batch Step (say enrichStep)
If C is much greater than what can be passed via ExecutionContext, how can we make it available in the ItemProcessor of the enrichStep?
In your enrichStep add a StepExecutionListener.beforeStep and load your huge collection in a HugeCollectionBeanHolder bean.
In this way you will load collection only once (when step start or re-start) and without persist it into execution context.
In your enrich processor wire the HugeCollectionBeanHolder to access huge collection.
class HugeCollectionBeanHolder {
Collection<Item> hudeCollection;
void setHugeCollection(Collection<Item> c) { this.hugeCollection = c;}
Collection<Item> getHugeCollection() { return this.hugeCollection;}
}
class MyProcessor implements ItemProcessor<Input,Output> {
HugeCollectionBeanHolder hcbh;
void setHugeCollectionBeanHolder(HugeCollectionBeanHolder bean) { this.hcbh = bean;}
// other methods...
}
You can also look at Spring Batch: what is the best way to use, the data retrieved in one TaskletStep, in the processing of another step

Resources