I have a requirement to use the spring batch to read the existing logic retrieved from database and the existing target object method returns me the list of objects after querying from database.
So I have a task to read this in chunks. When I see the list size from existing code, I see it is around 15000 but on implementing the spring batch, I wanted to read in chunks of 100 and this was not happening through ItemReaderAdapter.
Below code snippets would give you an idea of the issue I am mentioning. So would this be possible from Spring Batch. I notice the Delegating Job Sample Spring Example, but the service there returns the object on every chunk and not the total list object.
Please advice
Job.xml
<step id="firststep">
<tasklet>
<chunk reader="myreader" writer="mywriter" commit-interval="100" />
</tasklet>
</step>
<job id="firstjob" incrementer="idIncrementer">
<step id="step1" parent="firststep" />
</job>
<beans:bean id="myreader" class="org.springframework.batch.item.adapter.ItemReaderAdapter">
<beans:property name="targetObject" ref="readerService" />
<beans:property name="targetMethod" value="getCustomer" />
</beans:bean>
<beans:bean id="readerService" class="com.sh.java.ReaderService">
</beans:bean>
ReaderService.java
public class ReaderService {
public List<CustomItem> getCustomer() throws Exception {
/*
* code to get database instances
*/
List<CustomItem> customList = dao.getCustomers(date);
System.out.println("Customer List Size: " + customList.size()); //Here it is 15K
return (List<CustomItem>) customList;
}
}
Before all: read a 15K List<> of objects might impact (in negative) performance; check if you can write a custom SQL query and use a JDBC/Hibernate cursor item reader instead.
What you are trying to do is not possible using ItemReaderAdapter (it wasn't designed to read a chunk of object) but you can achieve the same result writing a custom ItemReader extending AbstractItemCountingItemStreamItemReader to inherit ItemStream capabilities and override the abstract or no-op methods; especially in:
doOpen() call your readerService.getCustomers() and save List<> in class variables,
in doRead() read next item - from List<> read in doOpen() - using built-in index stored in ExecutionContext
#Bellabax,
Doing the way you suggested also reads the entire database records in doOpen, however, from the list retrieved from doOpen, the reader reads it in a chunks. Pls advise
CustomerReader.java
public class CustomerReader extends
AbstractItemCountingItemStreamItemReader<Customer>
{
List<Customer> customerList;
public CustomerReader ()
{
}
#Override
protected void doClose() throws Exception
{
customerList.clear();
setMaxItemCount(0);
setCurrentItemCount(0);
}
#Override
protected void doOpen() throws Exception
{
customList = dao.getCustomers(date);
System.out.println("Customer List Size: "+list.size()); //This still prints 15k
setMaxItemCount(list.size());
}
#Override
protected Customer doRead() throws Exception
{
//Here reading 15K in chunks!
Customer customer = customList.get(getCurrentItemCount() - 1);
return customer;
}
}
Related
I am trying to do a change data capture from oracle DB using spring cloud data flow with kafka as broker. I am using polling mechanism for this. I am polling the data base with a basic select query at regular intervals to capture any updated data. For a better fail proof system, I have persisted my last poll time in oracle DB and used it to get the data which is updated after last poll.
public MessageSource<Object> jdbcMessageSource() {
JdbcPollingChannelAdapter jdbcPollingChannelAdapter =
new JdbcPollingChannelAdapter(this.dataSource, this.properties.getQuery());
jdbcPollingChannelAdapter.setUpdateSql(this.properties.getUpdate());
return jdbcPollingChannelAdapter;
}
#Bean
public IntegrationFlow pollingFlow() {
IntegrationFlowBuilder flowBuilder = IntegrationFlows.from(jdbcMessageSource(),spec -> spec.poller(Pollers.fixedDelay(3000)));
flowBuilder.channel(this.source.output());
flowBuilder.transform(trans,"transform");
return flowBuilder.get();
}
My queries in application properties are as below:
query: select * from kafka_test where LAST_UPDATE_TIME >(select LAST_POLL_TIME from poll_time)
update : UPDATE poll_time SET LAST_POLL_TIME = CURRENT_TIMESTAMP
This working perfectly for me. I am able to get the CDC from the DB with this approach.
The problem I am looking over now is below:
Creating an table just to maintain the poll time is an overburden. I am looking for maintaining this last poll time in a kafka topic and retrieve that time from kafka topic when I am making the next poll.
I have modified the jdbcMessageSource method as below to try that:
public MessageSource<Object> jdbcMessageSource() {
String query = "select * from kafka_test where LAST_UPDATE_TIME > '"+<Last poll time value read from kafka comes here>+"'";
JdbcPollingChannelAdapter jdbcPollingChannelAdapter =
new JdbcPollingChannelAdapter(this.dataSource, query);
return jdbcPollingChannelAdapter;
}
But the Spring Data Flow is instantiating the pollingFlow( ) (please see the code above) bean only once. Hence what ever the query that is run first will remain the same. I want to update the query with new poll time for each poll.
Is there a way where I can write a custom Integrationflow to have this query updated everytime I make a poll ?
I have tried out IntegrationFlowContext for that but wasn't successful.
Thanks in advance !!!
With the help of both the answer above, I was able to figure out the approach.
Write a jdbc template and wrap that as a bean and use it for the Integration Flow.
#EnableBinding(Source.class)
#AllArgsConstructor
public class StockSource {
private DataSource dataSource;
#Autowired
private JdbcTemplate jdbcTemplate;
private MessageChannelFactory messageChannelFactory; // You can use normal message channel which is available in spring cloud data flow as well.
private List<String> findAll() {
jdbcTemplate = new JdbcTemplate(dataSource);
String time = "10/24/60" . (this means 10 seconds for oracle DB)
String query = << your query here like.. select * from test where (last_updated_time > time) >>;
return jdbcTemplate.query(query, new RowMapper<String>() {
#Override
public String mapRow(ResultSet rs, int rowNum) throws SQLException {
...
...
any row mapper operations that you want to do with you result after the poll.
...
...
...
// Change the time here for the next poll to the DB.
return result;
}
});
}
#Bean
public IntegrationFlow supplyPollingFlow() {
IntegrationFlowBuilder flowBuilder = IntegrationFlows
.from(this::findAll, spec -> {
spec.poller(Pollers.fixedDelay(5000));
});
flowBuilder.channel(<<Your message channel>>);
return flowBuilder.get();
}
}
In our use case, we were persisting the last poll time in a kafka topic. This was to make the application state less. Every new poll to the DB now, will have a new time in the where condition.
P.S: your messaging broker (kafka/rabbit mq) sdould be running in your local or connect to them if there are hosted on a different platform.
God Speed !!!
See Artem's answer for the mechanism for a dynamic query in the standard adapter; an alternative, however, would be to simply wrap a JdbcTemplate in a Bean and invoke it with
IntegrationFlows.from(myPojo(), "runQuery", e -> ...)
...
or even a simple lambda
.from(() -> jdbcTemplate...)
We have this test configuration (sorry, it is an XML):
<inbound-channel-adapter query="select * from item where status=:status" channel="target"
data-source="dataSource" select-sql-parameter-source="parameterSource"
update="delete from item"/>
<beans:bean id="parameterSource" factory-bean="parameterSourceFactory"
factory-method="createParameterSourceNoCache">
<beans:constructor-arg value=""/>
</beans:bean>
<beans:bean id="parameterSourceFactory"
class="org.springframework.integration.jdbc.ExpressionEvaluatingSqlParameterSourceFactory">
<beans:property name="parameterExpressions">
<beans:map>
<beans:entry key="status" value="#statusBean.which()"/>
</beans:map>
</beans:property>
<beans:property name="sqlParameterTypes">
<beans:map>
<beans:entry key="status" value="#{ T(java.sql.Types).INTEGER}"/>
</beans:map>
</beans:property>
</beans:bean>
<beans:bean id="statusBean"
class="org.springframework.integration.jdbc.config.JdbcPollingChannelAdapterParserTests$Status"/>
Pay attention to the ExpressionEvaluatingSqlParameterSourceFactory and its createParameterSourceNoCache() factory. The this result can be used for the select-sql-parameter-source.
The JdbcPollingChannelAdapter has a setSelectSqlParameterSource on the matter.
So, you configure a ExpressionEvaluatingSqlParameterSourceFactory to be able to resolve some query parameter as an expression for some bean method invocation to get a desired value from Kafka. Then createParameterSourceNoCache() will help you to obtain an expected SqlParameterSource.
There is some info in docs as well: https://docs.spring.io/spring-integration/docs/current/reference/html/#jdbc-inbound-channel-adapter
I am a newbie in Spring batch and I have a couple of questions.
Question 1: I am using a MultiResourceItemReader for reading a bunch of CSV files and a JDBC Item writer to update the DB in batches. The commit interval is set to 1000. If there is a file with a 10k records and I encounter a DB error at the 7th batch is there any way I can roll back all the previously committed chunks?
Question 2: If there are two files each having 100 records and the commit interval is set to 1000 then the MultiResourceItemReader reads both files and sends it to the Writer. Is there any way we can just Write one file at a time ignoring the commit interval in this case essentially creating a loop in writer alone?
Posting the solution that worked for me in case someone need it for reference.
For Question 1 I was able to achieve it using the StepListenerSupport in the writer and overriding the BeforeStep and AfterStep. Sample snippet as below
public class JDBCWriter extends StepListenerSupport implements ItemWriter<MyDomain>{
private boolean errorFlag;
private String sql = "{ CALL STORED_PROC(?, ?, ?, ?, ?) }";
#Autowired
private JdbcTemplate jdbcTemplate;
#Override
public void beforeStep(StepExecution stepExecution){
try{
Connection connection = jdbcTemplate.getDataSource().getConnection();
connection.setAutoCommit(false);
}
catch(SQLException ex){
setErrorFlag(Boolean.TRUE);
}
}
#Override
public void write(List<? extends MyDomain> items) throws Exception{
if(!items.isEmpty()){
CallableStatement callableStatement = connection.prepareCall(sql);
callableStatement.setString("1", "FirstName");
callableStatement.setString("2", "LastName");
callableStatement.setString("3", "Date of Birth");
callableStatement.setInt("4", "Year");
callableStatement.registerOutParameter("errors", Types.INTEGER, "");
callableStatement.execute();
if(errors != 0){
this.setErrorFlag(Boolean.TRUE);
}
}
else{
this.setErrorFlag(Boolean.TRUE);
}
}
#Override
public void afterChunk(ChunkContext context){
if(errorFlag){
context.getStepContext().getStepExecution().setExitStatus(ExitStatus.FAILED); //Fail the Step
context.getStepContext().getStepExecution().setStatus(BatchStatus.FAILED); //Fail the batch
}
}
#Override
public ExitStatus afterStep(StepExecution stepExecution){
try{
if(!errorFlag){
connection.commit();
}
else{
connection.rollback();
stepExecution.setExitStatus(ExitStatus.FAILED);
}
}
catch(SQLException ex){
LOG.error("Commit Failed!" + ex);
}
return stepExecution.getExitStatus();
}
public void setErrorFlag(boolean errorFlag){
this.errorFlag = errorFlag;
}
}
XML Config:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
....
http://www.springframework.org/schema/batch/spring-batch-3.0.xsd">
<job id="fileLoadJob" xmlns="http://www.springframework.org/schema/batch">
<step id="batchFileUpload" >
<tasklet>
<chunk reader="fileReader"
commit-interval="1000"
writer="JDBCWriter"
/>
</tasklet>
</step>
</job>
<bean id="fileReader" class="...com.FileReader" />
<bean id="JDBCWriter" class="...com.JDBCWriter" />
</beans>
Question 1: The only way to accomplish this is via some form of compensating logic. You can do that via a listener (ChunkListener#afterChunkError for example), but the implementation is up to you. There is nothing within Spring Batch that knows what the overall state of the output is and how to roll it back beyond the current transaction.
Question 2: Assuming you're looking for one output file per input file, due to the fact that most Resource implementations are non-transactional, the writers associated with them do special work to buffer up to the commit point and then flush. The problem here is that because of that, there is no real opportunity to divide that buffer to multiple resources. To be clear, it can be done, you'll just need a custom ItemWriter to do it.
i am new to spring batch. I am using flat file item reader, configured in xml file. then there is a processor which processes each obj created. I need to pre process contents of file before passing it to file item reader. The processed results/file should not be written to disk. may i know how to do it through xml file configuration.
is it though tasklet or extending flat file item reader? then the processor should work as before with no change. i need to introduce a layer before passing the file to flat file item reader.
You can use ItemReadListener for this. ItemReadListener has three callback methods.
beforeRead , afterRead and onReadError.
You can but your logic in beforeRead.
Sample Code for CustomItemReaderListener
public class CustomItemReaderListener implements ItemReadListener<Domain> {
#Override
public void beforeRead() {
System.out.println("ItemReadListener - beforeRead");
//I need to pre process contents of file before passing it to file item reader
// add this logic here
}
#Override
public void afterRead(Domain item) {
System.out.println("ItemReadListener - afterRead");
}
#Override
public void onReadError(Exception ex) {
System.out.println("ItemReadListener - onReadError");
}
}
Map listeners to Step in XML :
<step id="step1">
<tasklet>
<chunk reader="myReader" writer="flatFileItemWriter"
commit-interval="1" />
<listeners>
<listener ref="customItemReaderListener" />
</listeners>
</tasklet>
</step>
In my Spring Batch application I have written a CustomItemWriter which internally writes item to DynamoDB using DynamoDBAsyncClient, this client returns Future object. I have a input file with millions of record. Since CustomItemWriter returns future object immediately my batch job exiting within 5 sec with status as COMPLETED, but in actual it is taking 3-4 minutes to write all item to the DB, I want that batch job finishes only after all item written to DataBase. How can i do that?
job is defined as below
<bean id="report" class="com.solution.model.Report" scope="prototype" />
<batch:job id="job" restartable="true">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="cvsFileItemReader" processor="filterReportProcessor" writer="customItemWriter"
commit-interval="20">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="customItemWriter" class="com.solution.writer.CustomeWriter"></bean>
CustomeItemWriter is defined as below
public class CustomeWriter implements ItemWriter<Report>{
public void write(List<? extends Report> item) throws Exception {
List<Future<PutItemResult>> list = new LinkedList();
AmazonDynamoDBAsyncClient client = new AmazonDynamoDBAsyncClient();
for(Report report : item) {
PutItemRequest req = new PutItemRequest();
req.setTableName("MyTable");
req.setReturnValue(ReturnValue.ALL_ODD);
req.addItemEntry("customerId",new
AttributeValue(item.getCustomeId()));
Future<PutItemResult> res = client.putItemAsync(req);
list.add(res);
}
}
}
Main class contains
JobExecution execution = jobLauncher.run(job, new JobParameters());
System.out.println("Exit Status : " + execution.getStatus());
Since in ItemWriter its returning future object it doesn't waits to complete the opration. And from the main since all item is submitted for writing Batch Status is showing COMPLETED and job terminates.
I want that this job should terminate only after actual write is performed in the DynamoDB.
Can we have some other step well to wait on this or some Listener is available?
Here is one approach. Since ItemWriter::write doesn't return anything you can make use of listener feature.
#Component
#JobScope
public class YourWriteListener implements ItemWriteListener<WhatEverYourTypeIs> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void afterWrite(final List<? extends WhatEverYourTypeIs> paramList) {
Future future = this.executionContext.readAndValidate("FutureKey", Future.class);
//wait till the job is done using future object
}
#Override
public void beforeWrite(final List<? extends WhatEverYourTypeIs> paramList) {
}
#Override
public void onWriteError(final Exception paramException, final List<? extends WhatEverYourTypeIs> paramList) {
}
}
In your writer class, everything remains same except addind the future object to ExecutionContext.
public class YourItemWriter extends ItemWriter<WhatEverYourTypeIs> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
protected void doWrite(final List<? extends WhatEverYourTypeIs> youritems)
//write to DynamoDb and get Future object
executionContext.put("FutureKey", future);
}
}
}
And you can register the listener in your configuration. Here is a java code, you need to do the same in your xml
#Bean
public Step initStep() {
return this.stepBuilders.get("someStepName").<YourTypeX, YourTypeY>chunk(10)
.reader(yourReader).processor(yourProcessor)
.writer(yourWriter).listener(YourWriteListener)
.build();
}
I have this use case.
First chain:
<int:chain input-channel="inserimentoCanaleActivate" output-channel="inserimentoCanalePreRouting">
<int:service-activator ref="inserimentoCanaleActivator" method="activate" />
</int:chain>
This is the relative code:
#Override
#Transactional(propagation = Propagation.REQUIRES_NEW)
public EventMessage<ModificaOperativitaRapporto> activate(EventMessage<InserimentoCanale> eventMessage) {
...
// some Database changes
dao.save(myObject);
}
All is working great.
Then I have another chain:
<int:chain id="onlineCensimentoClienteChain" input-channel="ONLINE_CENSIMENTO_CLIENTE" output-channel="inserimentoCanaleActivate">
<int:service-activator ref="onlineCensimentoClienteActivator" method="activate" />
<int:splitter expression="payload.getPayload().getCanali()" />
</int:chain>
And the relative activator:
#Override
public EventMessage<CensimentoCliente> activate(EventMessage<CensimentoCliente> eventMessage) {
...
// some Database changes
dao.save(myObject);
}
The CensimentoCliente payload as described below has a List of payload of the first chain, so with a splitter I split on the list and reuse the code of the first chain.
public interface CensimentoCliente extends Serializable {
Collection<? extends InserimentoCanale> getCanali();
void setCanali(Collection<? extends InserimentoCanale> canali);
...
}
But since every activator gets his transaction definition (since the first one can live without the second one) I have a use case where the transactions are separated.
The goal is to have the db modifies of the two chains been part of the same transaction.
Any help?
Kind regards
Massimo
You can accomplish this by creating a custom channel (or other custom component, but this is the simplest approach) that wraps the message dispatch in a TransactionTemplate callback execution:
public class TransactionalChannel extends AbstractSubscribableChannel {
private final MessageDispatcher dispatcher = new UnicastingDispatcher();
private final TransactionTemplate transactionTemplate;
TransactionalChannel(TransactionTemplate transactionTemplate) {
this.transactionTemplate = transactionTemplate;
}
#Override
protected boolean doSend(final Message<?> message, long timeout) {
return transactionTemplate.execute(new TransactionCallback<Boolean>() {
#Override
public Boolean doInTransaction(TransactionStatus status) {
return getDispatcher().dispatch(message);
}
});
}
#Override
protected MessageDispatcher getDispatcher() {
return dispatcher;
}
}
In your XML, you can define your channel and transaction template and reference your custom channel just as you would any other channel:
<bean id="transactionalChannel" class="com.stackoverflow.TransactionalChannel">
<constructor-arg>
<bean class="org.springframework.transaction.support.TransactionTemplate">
<property name="transactionManager" ref="transactionManager"/>
<property name="propagationBehavior" value="#{T(org.springframework.transaction.TransactionDefinition).PROPAGATION_REQUIRES_NEW}"/>
</bean>
</constructor-arg>
</bean>
For your example, you could perhaps use a bridge to pass the message through the new channel:
<int:bridge input-channel="inserimentoCanaleActivate" output-channel="transactionalChannel" />
<int:chain input-channel="transactionalChannel" output-channel="inserimentoCanalePreRouting">
<int:service-activator ref="inserimentoCanaleActivator" method="activate" />
</int:chain>
You you have <service-activator> and #Transactional on service method, the transaction will be bounded only to that method invocation.
If you want to have a transction for entire message flow (or its part) you should declare TX advice somewhere before.
If your channels are direct all service invocations will be wrapped with the same transaction.
The most simple way to accomplish your wishes, write simple #Gateway interface with #Transactional and call it from the start of your message flow.
To clarify a bit regarding transactions
Understanding Transactions in Message flows
Are these modifying 2 separate relational databases ? If so you are looking at an XA transaction. Now if you are running this on a non XA container like tomcat, all of this must be done in a single thread that is watched by a transaction manager - (you will have to piggy back on the transaction manager that actually triggers these events). The transaction manager can be a JMS message or a poller against some data source. Also this processing must be done in a single thread so that spring can help you run the entire process in a single transaction.
As a final note , do not introduce threadpools / queues between service activators. This can cause the activators to run in separate threads