Spring Batch & Integration Flat File Generation Pattern - spring

We have a use case where we receive data in flat files which we load into an Oracle DB using Spring Batch. Post data load in Oracle, we have to distribute the data in form of flat files to several consumers. The data selection criteria depends on some pre-decided values in some fields of the data.
We have a design in place which generates a list which contains objects that can be passed to a Spring Batch job as job parameter to generate the flat files needed to be sent to the data consumers.
Using a Splitter component, I can put the individual objects into a channel and plug a JobLaunchingGateway to launch a batch job to generate the flat file.
Need help on how I can launch multiple batch jobs in parallel using JobLaunchingGateway so that I can generate files in parallel.
A setup is already in place to FTP the files to consumers. We do not need to worry about that.

Use an ExecutorChannel with a task executor before the JobLaunchingGateway.

Related

Spring batch with two Data base

The main Moto is need to transfer the data from one data base to another date bade for that am going to write a batch program but spring batch is creating tables in both the data bases how to stop that

Multiple data collector for a job without duplicating records in streamsets

I have a directory consist of multiple files, and that is shared across multiple data collectors. I have a job to process those files and put it in the destination. Because the records are huge, I want to run the job in multiple data collector. but when I tried I got the duplicate entries in my destination. Is there a way to achieve it without duplicating the records. Thanks
You can use kafka for it. For example:
Create one pipeline which reads file names and sends them to kafka topic via kafka producer.
Create pipeline with kafka consumer as an origin and set the consumer group property to it. This pipeline will read filenames and work with files.
Now you can run multiple pipelines with kafka consumer with the same consumer group. In this case kafka will balance messages within consumer group by itself and you will not be getting duplicates.
To be sure that you won't have duplicates also set 'acks' = 'all' property to kafka producer.
With this schema you can run as many collectors as your kafka topic partition count.
Hope it will help you.
Copying my answer from Ask StreamSets:
At present there is no way to automatically partition directory contents across multiple data collectors.
You could run similar pipelines on multiple data collectors and manually partition the data in the origin using different character ranges in the File Name Pattern configurations. For example, if you had two data collectors, and your file names were distributed across the alphabet, the first instance might process [a-m]* and the second [n-z]*.
One way to do this would be by setting File Name Pattern to a runtime parameter - for example ${FileNamePattern}. You would then set the value for the pattern in the pipeline's parameters tab, or when starting the pipeline via the CLI, API, UI or Control Hub.

read data through spring batch and return data outside the job

I read everywhere how to read data in spring batch itemReader and write in database using itemWriter, but I wanted to just read data using spring batch then somehow I wanted to access this list of items outside the job. I need to perform remaining processing after job finished.
The reason I wanted to do this is because I need to perform a lot of validations on every item. I have to validate each item's variable xyz if it exists in list(which is not available within job). After performing a lot of processing I have to insert information in different tables using JPA. Please help me out!

How to levarage spring batch without using POJO?

I know BeanWrapperFieldSetMapper class depends on POJO.
But here is the thing: If I want to take advantage of Spring Batch features but do not want to create separate jobs ( does not want to write POJOs and separate reader writes or mappers) how to do this?
My requirement is to read *.csv file which will have the headers so I should be able to supply header names in a map or string[] and create my sql statement based on it, instead of writing a RowMapper.
This will help me uploading various files to different tables.
Is it possible to change BeanWrapperFieldSetMapper to make it suitable to map the values from Map or String[]?
Also Even if I do not have headers in the *.cvs file, I can construct update statement and load using chunk delimeters setting and other advantages of Spring Batch.

Spring batch without pojo or Dao [duplicate]

I know BeanWrapperFieldSetMapper class depends on POJO.
But here is the thing: If I want to take advantage of Spring Batch features but do not want to create separate jobs ( does not want to write POJOs and separate reader writes or mappers) how to do this?
My requirement is to read *.csv file which will have the headers so I should be able to supply header names in a map or string[] and create my sql statement based on it, instead of writing a RowMapper.
This will help me uploading various files to different tables.
Is it possible to change BeanWrapperFieldSetMapper to make it suitable to map the values from Map or String[]?
Also Even if I do not have headers in the *.cvs file, I can construct update statement and load using chunk delimeters setting and other advantages of Spring Batch.

Resources