Read data from DB in Tasklet oriented Batch process

Read data from DB in Tasklet oriented Batch process - spring

I have a simple question. I have to code a Tasklet oriented Spring Batch project which have to retrieve some data from DB, process info. and write it into a .json file. I´m using Spring Data JPA, but is this the correct and safest way to do this?
If not, what is the best way to code this?
Thanks a lot for you help!
Latest Tasklet reader code:
public class DataReader implements Tasklet, StepExecutionListener {
#Autowired
EntityRepository entityRepository;
#Autowired
ProductRepository productRepository;
#Autowired
SuscriptionRepository suscriptionRepository;
#Autowired
MapperUtils mapperUtils;
private List<EntityDTO> entityDataDTO;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
this.entityDataDTO = new ArrayList<EntityDTO>();
List<Entidad> entities = entityRepository.findAll();
for (Entidad entity : entities) {
List<SuscriptionDTO> suscriptionsDTO = new ArrayList<SuscriptionDTO>();
for (Suscripcion suscription : entity.getSuscripciones()) {
List<Suscripcion> suscriptionsByProduct = suscriptionRepository.findSuscriptionsByEntityIdAndSuscriptionId(suscription.getId().getIdEntidadEurbt(), suscription.getId().getIdSuscripcion());
List<String> suscriptionProducts = new ArrayList<String>();
for (Suscripcion suscriptionProduct : suscriptionsByProduct) {
Producto product = productRepository.findById(suscriptionProduct.getId().getIdProductoEurbt()).get();
suscriptionProducts.add(product.getTlDescProducto());
}
SuscriptionDTO suscriptionDTO = mapperUtils.mapSuscriptionDataToSuscriptionDTO(suscription, suscriptionProducts);
if (!suscriptionsDTO.contains(suscriptionDTO))
suscriptionsDTO.add(suscriptionDTO);
}
this.entityDataDTO.add(mapperUtils.mapEntityDataToEntityDTO(entity, suscriptionsDTO));
}
return RepeatStatus.FINISHED;
}
#Override
public void beforeStep(StepExecution stepExecution) {
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
stepExecution.getJobExecution().getExecutionContext().put("entityDataDTO", this.entityDataDTO);
return ExitStatus.COMPLETED;
}
}

Using a Tasklet to read, process and write the whole dataset is is not the best way as there will be only one transaction for the whole dataset. A chunk-oriented step is more appropriate for your use case. With chunk oriented processing, there will be one transaction for each chunk.
Hope this helps.

Related

how to get Spring Batch job instance id from execute method in TASKLET

I am using a layout using Spring Batch 3.0 version.
Create a Job and execute the placement by executing the JobLauncher run method of the TASKLET.
I want to know more accurately whether the Job is executed or not through insert logic in the query in TASKLET with the corresponding JobId and other tables other than the metatables.
public class SampleScheduler {
protected final Logger log = LoggerFactory.getLogger(this.getClass());
#Autowired
private JobLauncher jobLauncher;
#Autowired
private Job sampleJob;
public void run() {
try {
String dateParam = new Date().toString();
JobParameters param = new JobParametersBuilder().addString("date",dateParam).toJobParameters();
JobExecution execution = jobLauncher.run(sampleJob, param);
log.debug("###################################################################");
log.debug("Exit Status : " + execution.getStatus());
log.debug("###################################################################");
} catch (Exception e) {
// e.printStackTrace();
log.error(e.toString());
}
}
}
Code for calling tasklet -
public class SampleTasklet implements Tasklet{
#Autowired
private SampleService sampleService;
#Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
sampleService.query();
return RepeatStatus.FINISHED;
}
}
This is my tasklet code.
StepContext stepContext = chunkContext.getStepContext();
StepExecution stepExecution = stepContext.getStepExecution();
JobExecution jobExecution = stepExecution.getJobExecution();
long jobInstanceId = jobExecution.getJobId();
Is it right to try this in the TASKLET code above?

how to get Spring Batch job instance id from execute method in TASKLET
The org.springframework.batch.core.step.tasklet.Tasklet#execute method gives you access to the ChunkContext which in turn allows you to get the parent StepExecution and JobExecution. You can then get the job instance id from the job execution.
Is it right to try this in the TASKLET code above?
Yes, that's the way to go.

How to use spring transaction support with Spring Batch

I am trying to use spring batch to read file from a .dat file and persist the data into database. My requirement says to either insert all of the data or insert none of the data into table i.e, atomicity. However, using spring batch i'm not able to achieve the same it is reading data in chunks and is inserting data as long as the records are fine. if at some point the record is inappropriate and some db exception is thrown then i want complete rollback which is not happening. Let's say we get error at 2051th record then my code saves 2050 records but i want complete rollback and if all data is good then all N records should be persisted. Thanks in advance for any help or relevant approach that may solve my issue...
NOTE: I have already used Spring Transactional annotation on caller method but it's not working and i'm reading data in a chunk size of 10 items.
MyConfiguration.java
#Configuration
public class MyConfiguration
{
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("MyCompletionListener")
JobCompletionNotificationListener jobCompletionNotificationListener;
#StepScope
#Bean(name="MyReader")
public FlatFileItemReader<InputMapperDTO> reader(#Value("#{jobParameters['fileName']}") String fileName) throws IOException
{
FlatFileItemReader<InputMapperDTO> newBean = new FlatFileItemReader<>();
newBean.setName("MyReader");
newBean.setResource(new InputStreamResource(FileUtils.openInputStream(new File(fileName))));
newBean.setLineMapper(lineMapper());
newBean.setLinesToSkip(1);
return newBean;
}
#Bean(name="MyLineMapper")
public DefaultLineMapper<InputMapperDTO> lineMapper()
{
DefaultLineMapper<InputMapperDTO> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(lineTokenizer());
Reader reader = new Reader();
lineMapper.setFieldSetMapper(reader);
return lineMapper;
}
#Bean(name="MyTokenizer")
public DelimitedLineTokenizer lineTokenizer()
{
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setNames("InvestmentAccountUniqueIdentifier", "BaseCurrencyUniqueIdentifier",
"OperatingCurrencyUniqueIdentifier", "PricingHierarchyUniqueIdentifier", "InvestmentAccountNumber",
"DummyAccountIndicator", "InvestmentAdvisorCompanyNumberLegacy","HighNetWorthAccountTypeCode");
tokenizer.setIncludedFields(0, 5, 7, 13, 29, 40, 49,75);
return tokenizer;
}
#Bean(name="MyBatchProcessor")
public ItemProcessor<InputMapperDTO, FinalDTO> processor()
{
return new Processor();
}
#Bean(name="MyWriter")
public ItemWriter<FinalDTO> writer()
{
return new Writer();
}
#Bean(name="MyStep")
public Step step1() throws IOException
{
return stepBuilderFactory.get("MyStep")
.<InputMapperDTO, FinalDTO>chunk(10)
.reader(this.reader(null))
.processor(this.processor())
.writer(this.writer())
.build();
}
#Bean(name=MyJob")
public Job importUserJob(#Autowired #Qualifier("MyStep") Step step1)
{
return jobBuilderFactory
.get("MyJob"+new Date())
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.flow(step1)
.end()
.build();
}
}
Writer.java
public class Writer implements ItemWriter<FinalDTO>
{
#Autowired
SomeRepository someRepository;
#Override
public void write(List<? extends FinalDTO> listOfObjects) throws Exception
{
someRepository.saveAll(listOfObjects);
}
}
JobCompletionNotificationListener.java
public class JobCompletionNotificationListener extends JobExecutionListenerSupport
{
#Override
public void afterJob(JobExecution jobExecution)
{
if(jobExecution.getStatus() == BatchStatus.COMPLETED)
{
System.err.println("****************************************");
System.err.println("***** Batch Job Completed ******");
System.err.println("****************************************");
}
else
{
System.err.println("****************************************");
System.err.println("***** Batch Job Failed ******");
System.err.println("****************************************");
}
}
}
MyCallerMethod
#Transactional
public String processFile(String datFile) throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException
{
long st = System.currentTimeMillis();
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("fileName",datFile);
builder.addDate("date", new Date());
jobLauncher.run(job, builder.toJobParameters());
System.err.println("****************************************");
System.err.println("***** Total time consumed = "+(System.currentTimeMillis()-st)+" ******");
System.err.println("****************************************");
return response;
}

The operation I have tried is not provided in batch. For my requirement, I have implemented custom delete which flushes the database upon failure in any step.

preprocessing in spring boot batch with multiple threads

I have a Spring batch with multi threads. In my processor I want to use global variables say a map. The map contains some values which is to be queried from a table and is to be used by the processor. How can I achieve this? If i write the logic to set the map in the processor, the query will be executed for every record fetched by the item reader, which would be millions in numbers. Is there a way to do this?

You can intercept step execution
Spring Batch - Reference Documentation section 5.1.10. Intercepting Step Execution
For example, you can implement the StepExecutionListener interface
#Component
#JobScope
public class Processor implements ItemProcessor<Integer,Integer>, StepExecutionListener {
private final Map<String,String> map = new HashMap<>();
#Override
public void beforeStep(StepExecution stepExecution) {
// initialize a variable once before step
map.put("KEY","VALUE");
}
#Override
public Integer process(Integer item) throws Exception {
// use a variable for each iteration
final String key = map.get("KEY");
// ...
}
// ....
}
or use the #BeforeStep annotation
#Component
#JobScope
public class Processor implements ItemProcessor<Integer,Integer>{
private final Map<String,String> map = new HashMap<>();
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
// initialize a variable once before step
map.put("KEY","VALUE");
}
#Override
public Integer process(Integer item) throws Exception {
// use a variable for each iteration
final String key = map.get("KEY");
//...
}
}

Spring batch execute dynamically generated steps in a tasklet

I have a spring batch job that does the following...
Step 1. Creates a list of objects that need to be processed
Step 2. Creates a list of steps depending on how many items are in the list of objects created in step 1.
Step 3. Tries to executes the steps from the list of steps created in step 2.
The executing x steps is done below in executeDynamicStepsTasklet(). While the code runs without any errors it does not seem to be doing anything. Does what I have in that method look correct?
thanks
/*
*
*/
#Configuration
public class ExportMasterListCsvJobConfig {
public static final String JOB_NAME = "exportMasterListCsv";
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Value("${exportMasterListCsv.generateMasterListRows.chunkSize}")
public int chunkSize;
#Value("${exportMasterListCsv.generateMasterListRows.masterListSql}")
public String masterListSql;
#Autowired
public DataSource onlineStagingDb;
#Value("${out.dir}")
public String outDir;
#Value("${exportMasterListCsv.generatePromoStartDateEndDateGroupings.promoStartDateEndDateSql}")
private String promoStartDateEndDateSql;
private List<DivisionIdPromoCompStartDtEndDtGrouping> divisionIdPromoCompStartDtEndDtGrouping;
private List<Step> dynamicSteps = Collections.synchronizedList(new ArrayList<Step>()) ;
#Bean
public Job exportMasterListCsvJob(
#Qualifier("createJobDatesStep") Step createJobDatesStep,
#Qualifier("createDynamicStepsStep") Step createDynamicStepsStep,
#Qualifier("executeDynamicStepsStep") Step executeDynamicStepsStep) {
return jobBuilderFactory.get(JOB_NAME)
.flow(createJobDatesStep)
.next(createDynamicStepsStep)
.next(executeDynamicStepsStep)
.end().build();
}
#Bean
public Step executeDynamicStepsStep(
#Qualifier("executeDynamicStepsTasklet") Tasklet executeDynamicStepsTasklet) {
return stepBuilderFactory
.get("executeDynamicStepsStep")
.tasklet(executeDynamicStepsTasklet)
.build();
}
#Bean
public Tasklet executeDynamicStepsTasklet() {
return new Tasklet() {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
FlowStep flowStep = new FlowStep(createParallelFlow());
SimpleJobBuilder jobBuilder = jobBuilderFactory.get("myNewJob").start(flowStep);
return RepeatStatus.FINISHED;
}
};
}
public Flow createParallelFlow() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = dynamicSteps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
#Bean
public Step createDynamicStepsStep(
#Qualifier("createDynamicStepsTasklet") Tasklet createDynamicStepsTasklet) {
return stepBuilderFactory
.get("createDynamicStepsStep")
.tasklet(createDynamicStepsTasklet)
.build();
}
#Bean
#JobScope
public Tasklet createDynamicStepsTasklet() {
return new Tasklet() {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
for (DivisionIdPromoCompStartDtEndDtGrouping grp: divisionIdPromoCompStartDtEndDtGrouping){
System.err.println("grp: " + grp);
String stepName = "stp_" + grp;
String fileName = grp + FlatFileConstants.EXTENSION_CSV;
Step dynamicStep =
stepBuilderFactory.get(stepName)
.<MasterList,MasterList> chunk(10)
.reader(queryStagingDbReader(
grp.getDivisionId(),
grp.getRpmPromoCompDetailStartDate(),
grp.getRpmPromoCompDetailEndDate()))
.writer(masterListFileWriter(fileName))
.build();
dynamicSteps.add(dynamicStep);
}
System.err.println("createDynamicStepsTasklet dynamicSteps: " + dynamicSteps);
return RepeatStatus.FINISHED;
}
};
}
public FlatFileItemWriter<MasterList> masterListFileWriter(String fileName) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, fileName )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
So now I have a list of dynamic steps that need to be executed and I believe that they are in StepScope. Can someone advise me on how to execute them

This will not work. Your Tasklet just creates a job with a FlowStep as first Step. Using the jobBuilderfactory just creates the job. it does not launch it. The methodname "start" may be misleading, since this only defines the first step. But it does not launch the job.
You cannot change the structure of a job (its steps and substeps) once it is started. Therefore, it is not possible to configure a flowstep in step 2 based on things that are calculated in step 1. (of course you could do some hacking deeper inside the springbatch structure and directly modify the beans and so ... but you don't want to do that).
I suggest, that you use a kind of "SetupBean" with an appropriate postConstruct method which is injected into your class that configures your job. This "SetupBean" is responsible to calculate the list of objects being processed.
#Component
public class SetUpBean {
private List<Object> myObjects;
#PostConstruct
public afterPropertiesSet() {
myObjects = ...;
}
public List<Object> getMyObjects() {
return myObjects;
}
}
#Configuration
public class JobConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private SetUpBean setup;
...
}

Why can't i create a neo4j relationship with spring data for neo?

i'm fairly new to spring data for neo (though i have experience with neo4j itself). i tried following the 'official' guide on spring data for neo, specifically the chapter on creating relationships.
But it seems i cannot get it to work. Spring is giving me an
java.lang.IllegalStateException: This index (Index[__rel_types__,Relationship]) has been marked as deleted in this transaction
Let me stress, that i am NOT removing any nodes or relationships. These are the relevant classes of my domain model:
#NodeEntity
public class User {
#GraphId
private Long nodeid;
#Indexed(unique = true)
private String uuid;
....
}
#NodeEntity
public class Website {
#GraphId
private Long nodeid;
#Indexed(unique = true)
private String uuid;
....
}
#RelationshipEntity(type = RelTypes.REL_USER_INTERESTED_IN)
public class UserInterest {
#GraphId
private Long nodeid;
#StartNode
private User user;
#EndNode
private Website site;
...
}
And this is my basic test which i can't get to turn green ..
(note that i omitted large portions of the code, the basic setup of the spring context etc. is working fine)
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration
#Transactional
public class BaseTest {
#Autowired
protected Neo4jTemplate template;
#Autowired
protected GraphDatabaseService graphDatabaseService;
protected Transaction tx;
#Configuration
#EnableNeo4jRepositories
static class TestConfig extends Neo4jConfiguration {
TestConfig() throws ClassNotFoundException {
setBasePackage("me.bcfh.model");
}
#Bean
GraphDatabaseService graphDatabaseService() {
return new TestGraphDatabaseFactory().newImpermanentDatabase();
}
}
public void before() {
// provide implementation if necessary
}
public void after() {
// provide implementation if necessary
}
#Before
public void setup() throws Exception {
Neo4jHelper.cleanDb(graphDatabaseService, false);
before();
}
#After
public void tearDown() throws Exception {
after();
if (tx != null) {
tx.success();
tx.close();
tx = null;
}
}
}
public class BasicGraphTest extends BaseTest {
User user;
Website website;
UserInterest interest;
#Override
public void before() {
user = new User();
website = new Website();
website = template.save(website);
user = template.save(user);
}
#Test
#Transactional
public void dbShouldContainData() throws Exception {
UserInterest interest = new UserInterest();
interest.setSite(website);
interest.setUser(user);
template.save(interest);
// some assertions
...
}
}
The IllegalStateException is being thrown when I try persisting the UserInterest instance, which I do not understand because I am not removing anything anywhere.
The ways to create a relationship mentioned in the spring guide did not work for me either, here I got the same exception ..
Can anyone spot what I'm doing wrong here?
I am using Spring Version 4.1.4.RELEASE and Spring Data For Neo Version 3.2.1.RELEASE. Neo4j has version 2.1.6
Note: I also tried copying the domain model classes from the cineasts example into my project and borrowed a few lines of the DomainTest class but this too gives me the IllegalStateException, maybe there is something wrong with my setup?

I think you are getting your IllegalStateException because you are calling cleanDb in your setup method.
You may not need to clean the database. Since your tests are makred #Transactional anything you do in your tests gets rolled back at the end of the test.
Looks like the transaction is trying to rollback and can't find the relationship it expects.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Read data from DB in Tasklet oriented Batch process - spring

Using a Tasklet to read, process and write the whole dataset is is not the best way as there will be only one transaction for the whole dataset. A chunk-oriented step is more appropriate for your use case. With chunk oriented processing, there will be one transaction for each chunk. Hope this helps.

Related

how to get Spring Batch job instance id from execute method in TASKLET

How to use spring transaction support with Spring Batch

preprocessing in spring boot batch with multiple threads

Spring batch execute dynamically generated steps in a tasklet

Why can't i create a neo4j relationship with spring data for neo?

Categories

Resources