Pass data from one writer to another writer after reading from DB - spring-boot

I have to create a batch job where I need to fetch data from 1 DB and after processing dump that data to another DB where auto generated ID would be assigned to persisted data. I need to send that data along with generated ID to solace queue.
Reader(DB1) --data1--> Processor --data2--> Writer (DB2) --data3--> Writer (Solace Publisher)
I am using spring boot-2.2.5.RELEASE and spring-boot-starter-batch.
I have created a job having 1 step that read data from DB1 and write data to DB2 via RepositoryItemReader and RepositoryItemWriter respectively. This is working fine.
Now next task is to send persisted data having generated ID to solace stream (using spring-cloud-starter-stream-solace).
I have below questions. Please assist as I am totally new to spring batch
How can I get the complete record after it's saved to DB2 based on some parameter? Do I have to write my own RepositoryItemWriter having StepExecution Context or can I somehow use the existing RepositoryItemWriter.
Once I got the record I need to use solace stream and there I have publish method which expects argument(record) to be published. I think again I need to write my own Item Writer and either I could use the record passed from above repositoryItemWriter by StepExecutionContext or should I query into DB2 directly from here based on some parameter ?
Either of the above case I need to use stepexecution context but can I use available RepositoryItemWriter or do I have to write my own?
Is there any other concept which is handy in this handy instead of using above approaches?

Passing data to future steps is a common pattern in Spring Batch. According to the documentation https://docs.spring.io/spring-batch/docs/current/reference/html/common-patterns.html#passingDataToFutureSteps you can use stepExecution to store and retrieve your generated IDs. In your case the writers are also listeners which has before step methods annotated with #BeforeStep. For example:
public class DB2ItemWriter implements ItemWriter<Object> {
private StepExecution stepExecution;
public void write(List<? extends Object> items) throws Exception {
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("generatedIds", ids);
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
and then you retrieve the ids in the next writer
public class SolacePublisherItemWriter implements ItemWriter<Object> {
public void write(List<? extends Object> items) throws Exception {
// ...
}
#BeforeStep
public void retrieveGeneratedIds(StepExecution stepExecution) {
ExecutionContext stepExecutionContext = stepExecution.getExecutionContext();
this.generatedIds = stepExecutionContext.get("generatedIds");
}
}

I have created a job having 1 step that read data from DB1 and write data to DB2 via RepositoryItemReader and RepositoryItemWriter respectively. This is working fine.
I would add a second step that reads data from the table (in which records have been persisted by step 1 and have their IDs generated) and push it to solace using a custom writer.

Related

Spring State Machine | Actions (Calling External API with data & pass data to another State)

I would like to use the Action<S,E> to call an external api. How can i add more data into this Action in order to invoke an external API? Another question is what if i want to send back the response (pass data to another State)
What is the best way to add more data? I'm trying to find an alternative of using context (which i know is possible but very ugly using Key-value).
Calling an external API is the same as any executing code, you can wire in your action any executable code. This includes autowiring a Service or Gateway and retrieve the data you need.
Regarding the second question, in my company we are using the extended state (context) to expose data. Before we release the state machine we get the data inside of it and serialise to a response object using object mapper.
Here is a snippet for illustration
#Configuration
#RequiredArgsConstructor
public class YourAction implements Action<States, Events> {
private final YourService service;
#Override
public void execute(final StateContext<States, Events> context) {
//getting input data examples
final Long yourIdFromHeaders = context.getMessageHeaders().get(key, Long.class);
final Long yourIdFromContext = context.getExtendedState().get(key, Long.class);
//calling service
final var responseData = service.getData(yourIdFromContext);
//storing results
context.getExtendedState().getVariables().put("response", responseData);
}

Extending RepositoryItemWriter does not delete rows so that the next step sees the rows

I am attempting to use a RepositoryItemWriter to delete Products by their Ids using the RepositoryItemWriter as recommended. I have configured a RepositoryItemWriter as follows:
#Repository
public interface ProductRepository extends CrudRepository<Product,Long> {
}
Next, I specify a RepositoryItemWriter bean as follows:
class ProductRepositoryItemWriter extends RepositoryItemWriter<Product>{
private CrudRepository productRepository;
ProductRepositoryItemWriter(CrudRepository productRepository) {
super.setRepository(this.productRepository =
productRepository
}
#Transactional
#Override
protected void doWrite(List<? extends Product> products) {
this.productRepository.deleteAll(products);
}
}
My step looks like this:
public Step processStep(#Qualifier("jpaTransactionManager") final PlatformTransactionManager jpaTransactionManager) {
return stepBuilderFactory.get("processStep")
.transactionManager(jpaTransactionManager)
.chunk(120)
.reader(productJpaItemReader)
.writer((ItemWriter)productRepositoryWriter)
.build();
}
I see the deletes occurring but the products are not deleted so that the next step fails to insert the products. That step follows the delete step like this on("COMPLETED").to("uploadStep).end()
#Bean("repopulateFlow")
Flow repopulateFlow() {
FlowBuilder<Flow> flowBuilder = new FlowBuilder<>.
("repopulateFlow);
flowBuilder.start(deleteStep).on("COMPLETED")
.to(upLoadStep)
.end();
return flowBuilder.build();
}
The deleteStep uses my ProductRepositoryItemWriter to delete the rows and then the next step in the flow tries to re-insert the data but that step finds data in the table that the deleteStep should have deleted from the table. How can I achieve what I am trying to do?
I ran delete step alone in a job and it does not delete the rows in the table after it completes. I use a HikariCP pool using properties that populate a DataSourceProperties object to create the datasource. I wonder if Spring is not setting it to auto-commit true or I need to create a Hikari Pool and then set the auto-commit property to. true.
The uploadStep fails because rows are still there so I get a ConstraintsViolationException. I removed the #Transactional but still see the problem?
You need to remove #Transactional on your doWrite method. The writer is already executed in a transaction driven by Spring Batch.

#InboundChannelAdapter in Spring-integration is not running continously?

i am working in spring cloud data flow,there i am having a scenario like reading from the database and send the data to the kafka topic using the #InboundChannelAdapter
Below is the strategy i followed.
->Created common list to store the objects if the list was empty
->if the list have the data i won't poll
->i am sending the values to kafka one by one by using index and after that i will remove the index
if i keep the #Bean it is inserting only the first object in the list to kafka topic.
{"id":101443442,"name":"Mobile1","price":8000}
if i remove the #Bean then it will insert all empty data into kafka.
{}
public static List<Product> products;
#Bean
public void initList() {
products = new ArrayList<>();
}
#Bean
#InboundChannelAdapter(channel = TbeSource.PR1)
public MessageSource<Product> addProducts() {
if (products.size() == 0) {
products.add(new Product(101443442, "Mobile1", 8000));
products.add(new Product(102235434, "book111", 6000));
}
MessageBuilder<Product> message = MessageBuilder.withPayload(products.get(0));
products.remove(0);
return message::build;
}
what am i doing wrong?
i need to send the data frequently by reading from db ?
Really not clear what you are asking.
If you talk about JDBC then you may consider to use a JDBC Source from tout-of-the-box applications for Data Flow.
If you are doing logic yourself to take data from data base, you may consider to use a JdbcPollingChannelAdapter from Spring Integration for the same #InboundChannelAdapter reason.
The rest of your logic with that list is not clear. It is strange to see a #Bean on a void method. If you need to initialize that products and get access from the MessageSource implementation, you just need to do private List<Product> products = new ArrayList<>();. Having property as public is really a bad practice.

AsyncCassandraOperations examples

I am reading up on AsyncCassandraOperations to perform async inserts to improve performance based on another post here. But I am unable to find a lot of help on google or spring data documentation.
Previously I was using Cassandra Repository for all data extraction and insert/updates which I found to be super slow. As per recommendation I am now using AsyncCassandraOperations for the insert operation alone, but it wont let me. I encounter required a bean of type 'org.springframework.data.cassandra.core.AsyncCassandraOperations' error.
What would be the correct way to use AsyncCassandraOperations please?
#Autowired private MyRepository repository_name;
#Autowired private AsyncCassandraOperations acops;
public void persist(List<POJO> l_POJO)
{
System.out.println("Enter Persist: "+new java.util.Date());
List<l_POJO> l_POJO_stale = repository_name.findBycol1AndStale("sample",false);
l_POJO_stale.forEach(s -> s.setStale(true));
l_POJO_stale.forEach(s -> acops.update(s));
try
{
acops.insert(l_POJO);
}
catch (Exception e)
{
System.out.println("Error in persisting new data");
}
}
Don't know whether spring boot is used, if so the AsyncCassandraOperations(AsyncCassandraTemplate is the implementation class) should be created automatically.
If the error shows you need an AsyncCassandraOperations bean, the straight way is to create one as shown below.
#Bean
AsyncCassandraTemplate asyncCassandraTemplate(Session session) {
return new AsyncCassandraTemplate(session);
}
Since you are using Spring data Repository interface, you can alse use the ReactiveCrudRepository interface to update or insert entity objects to Cassandra, which is shown in this spring data example project , as an alternative way to using the AsyncCassandraTemplate class.
In the case of using ReactiveCrudRepository and regarding what you want to do, your code needs the following changes.
change the return type of WRRepository.findByCol1AndCol2AndCol3(String, boolean, String) from List<WRpojo> to Flux<WRpojo> , in order to fully utilize the reactive functionality.
change the return type of persist(List<WRpojo>) from boolean to Mono<Void> , making the result non-blocking too.
change your persist(List<WRpojo>) to the following.
public Mono<Void> persist(List<WRpojo> l_wr) {
Flux<WRpojo> l_old_wr = objWRRepository.findByCol1AndCol2AndCol3("1", false, "2").doOnNext(s -> s.setStale(true));
return objWRRepository.saveAll(l_old_wr).thenMany(objWRRepository.saveAll(l_wr)).then();
}
In reactive programming, basically we don't block any code, this means that somewhere the returned Mono<Void> should be subscribed somewhere downstream, if you do want to block and wait for all operations complete, you can call block() on Mono<Void> , which is not recommended.

Spring Data Solr #Transaction Commits

I currently have a setup where data is inserted into a database, as well as indexed into Solr. These two steps are wrapped in a spring-managed transaction via the #Transaction annotation. What I've noticed is that spring-data-solr issues an update with the following parameters whenever the transaction is closed : params{commit=true&softCommit=false&waitSearcher=true}
#Transactional
public void save(Object toSave){
dbRepository.save(toSave);
solrRepository.save(toSave);
}
The rate of commits into solr is fairly high, so ideally I'd like send data to the solr index, and have solr auto commit at regular intervals. I have the autoCommit (and autoSoftCommit) set in my solrconfig.xml, but since spring-data-solr is sending those commit parameters, it does a hard commit every time.
I'm aware that I can drop down to the SolrTemplate API and issue commits manually, I would like to keep the solr repository.save call within a spring-managed transaction if possible. Is there a way to modify the parameters that are sent to solr on commit?
After putting in an IDE debug breakpoint in org.springframework.data.solr.repository.support.SimpleSolrRepository here:
private void commitIfTransactionSynchronisationIsInactive() {
if (!TransactionSynchronizationManager.isSynchronizationActive()) {
this.solrOperations.commit(solrCollectionName);
}
}
I discovered that wrapping my code as #Transactional (and other details to actually enable the framework to begin/end code as a transaction) doesn't achieve what we expect with "Spring Data for Apache Solr". The stacktrace shows the Proxy and Transaction Interceptor classes for our code's Transactional scope but then it also shows the framework starting its own nested transaction with another Proxy and Transaction Interceptor of its own. When the framework exits its CrudRepository.save() method my code calls, the action to commit to Solr is done by the framework's nested transaction. It happens before our outer transaction is exited. So, the attempt to batch-process many saves with one commit at the end instead of one commit for every save is futile. It seems, for this area in my code, I'll have to make use of SolrJ to save (update) my entities to Solr and then have "my" transaction's exit be followed with a commit.
If using Spring Solr, I found using the SolrTemplate bean allows you to 'batch' updates when adding data to the Solr index. By using the bean for SolrTemplate, you can use "addBeans" method, which will add a collection to the index and not commit until the end of the transaction. In my case, I started out using solrClient.add() and taking up to 4 hours for my collection to get saved to the index by iterating over it, as it commits after every single save. By using solrTemplate.addBeans(Collect<?>), it finishes in just over 1 second, as the commit is on the entire collection. Here is a code snippet:
#Resource
SolrTemplate solrTemplate;
public void doReindexing(List<Image> images) {
if (images != null) {
/* CMSSolrImage is a class with #SolrDocument mappings.
* the List<Image> images is a collection pulled from my database
* I want indexed in Solr.
*/
List<CMSSolrImage> sImages = new ArrayList<CMSSolrImage>();
for (Image image : images) {
CMSSolrImage sImage = new CMSSolrImage(image);
sImages.add(sImage);
}
solrTemplate.saveBeans(sImages);
}
}
The way I've done something similar is to create a custom repository implementation of the save methods.
Interface for the repository:
public interface FooRepository extends SolrCrudRepository<Foo, String>, FooRepositoryCustom {
}
Interface for the custom overrides:
public interface FooRepositoryCustom {
public Foo save(Foo entity);
public Iterable<Foo> save(Iterable<Foo> entities);
}
Implementation of the custom overrides:
public class FooRepositoryImpl {
private SolrOperations solrOperations;
public SolrSampleRepositoryImpl(SolrOperations fooSolrOperations) {
this.solrOperations = fooSolrOperations;
}
#Override
public Foo save(Foo entity) {
Assert.notNull(entity, "Cannot save 'null' entity.");
registerTransactionSynchronisationIfSynchronisationActive();
this.solrOperations.saveBean(entity, 1000);
commitIfTransactionSynchronisationIsInactive();
return entity;
}
#Override
public Iterable<Foo> save(Iterable<Foo> entities) {
Assert.notNull(entities, "Cannot insert 'null' as a List.");
if (!(entities instanceof Collection<?>)) {
throw new InvalidDataAccessApiUsageException("Entities have to be inside a collection");
}
registerTransactionSynchronisationIfSynchronisationActive();
this.solrOperations.saveBeans((Collection<? extends T>) entities, 1000);
commitIfTransactionSynchronisationIsInactive();
return entities;
}
private void registerTransactionSynchronisationIfSynchronisationActive() {
if (TransactionSynchronizationManager.isSynchronizationActive()) {
registerTransactionSynchronisationAdapter();
}
}
private void registerTransactionSynchronisationAdapter() {
TransactionSynchronizationManager.registerSynchronization(SolrTransactionSynchronizationAdapterBuilder
.forOperations(this.solrOperations).withDefaultBehaviour());
}
private void commitIfTransactionSynchronisationIsInactive() {
if (!TransactionSynchronizationManager.isSynchronizationActive()) {
this.solrOperations.commit();
}
}
}
and you also need to provide a SolrOperations bean for the right solr core:
#Configuration
public class FooSolrConfig {
#Bean
public SolrOperations getFooSolrOperations(SolrClient solrClient) {
return new SolrTemplate(solrClient, "foo");
}
}
Footnote: auto commit is (to my mind) conceptually incompatible with a transaction. An auto commit is a promise from solr that it will try to start to write it to disk within a certain time limit. Many things might stop that from actually happening however - a timely power or hardware failure, errors between the document and the schema, etc. But the client won't know that solr failed to keep its promise, and the transaction will see a success when it actually failed.

Resources