Spring Batch dynamic Flow/Job construction - spring

I'm currently using Spring Batch to run a job that processes a file, does some stuff on each line and writes the output to another file.
This was developed in a 'core' product but now (as always) we have some client-specific requirements that mandate the inclusion of some extra steps in the job.
I've been able to do a proof-of-concept where use the common Spring features to be able to 'replace' the job with another one with the extra steps either by using distinct names for the job (if we define them in the same Configuration class) or by creating a completely distinct Configuration class and loading that as the Spring context.
What i'm asking is, and i'm 'almost' there, if it's possible to easily define a base Job (maybe with an initial step or not) and then only add the steps that make sense for that specific 'client'.
I'm using standard class inheritance to do this but it doesn't work properly with standard Spring facilities since Spring won't know which implementation of the "getSteps" method to use (code below).
abstract class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
protected StepBuilderFactory stepBuilderFactory;
#Bean
Job job() {
List<Step> steps = getSteps();
final JobBuilder jobBuilder = jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer());
SimpleJobBuilder builder = jobBuilder.start(steps.remove(0));
for (Step s : steps) {
builder = builder.next(s);
}
return builder.build();
}
protected abstract List<Step> getSteps();
}
#Configuration
#Import(BaseConfig.class)
public class Client1JobConfig extends JobConfig {
#Override
protected List<Step> getSteps() {
List<Step> steps = new ArrayList<>();
steps.add(step1());
return steps;
}
Step step1() {
return stepBuilderFactory.get("step1")
.<Integer, Integer>chunk(1)
.reader(dummyReader())
.processor(processor1())
.writer(dummyWriter())
.build();
}
}
#Configuration
#Import(BaseConfig.class)
public class Client2JobConfig extends JobConfig {
#Override
protected List<Step> getSteps() {
List<Step> steps = new ArrayList<>();
steps.add(step1());
steps.add(step2());
return steps;
}
Step step1() {
return stepBuilderFactory.get("step1")
.<Integer, Integer>chunk(1)
.reader(dummyReader())
.processor(processor1())
.writer(dummyWriter())
.build();
}
Step step2() {
return stepBuilderFactory.get("step2")
.<Integer, Integer>chunk(1)
.reader(dummyReader())
.processor(processor2())
.writer(dummyWriter())
.build();
}
}
I can make it work if i load just one Configuration class into the Spring context but if i have all the Configuration classes loaded (either by component scanning, or manually adding them to the context) of course it doesn't work because there's no way to select wither one implementation of the other.
I can also make it work by having differently-named jobs like "client1" and "client2" but let's say i can't change the calling code and the job is Autowired. How can i have the 'same' some but with different steps?
Is there a better way to accomplish this?

Related

When trying to update record in after job, batch throws "No transaction is in progress"

I have three different datasource to work with in this batch. Everything running fine. Job Repository is map-based and using ResourcelessTransactionManager for it. I configured it like this
#Configuration
#EnableBatchProcessing
public class BatchConfigurer extends DefaultBatchConfigurer {
#Override
public void setDataSource(DataSource dataSource){
}
}
I also use different platformtransactionmanager then spring batch (issue). So I set my
spring allow bean overriding to true in my properties.
Now my problem is, I want to update my record's status that is essential for start my batch according to job's exit status. So I use job execution listener to achieve that. But while everything working great in my local, it is throwing error in our remote server(k8 env), which is making it more interesting.
The part where I trying to update my record is
#Slf4j
#Component
#Lazy
public class MyListener implements JobExecutionListener {
#Autowired
private MyRepo myRepo;
#Override
public void beforeJob(JobExecution jobExecution) {
//before job
}
#Override
public void afterJob(JobExecution jobExecution) {
myRepo.saveAndFlush(myUpdatedEntity); // I cannot share all code because of my company's policy but there is no issue in here
}
}
The error I get is
javax.persistence.TransactionRequiredException: no transaction is in
progress
As I know, spring batch is not handling this transaction. I already have transaction manager for it. Like I said, it's working in my local, so there shouldn't be any configuration issue. I tried to add #Transactional(transactionManager = "myTransactionManager") to ensure it, didn't work. What do you think?
edit: I defined my transaction manager as
#Configuration
#EnableJpaRepositories(
basePackages = "repo's package",
entityManagerFactoryRef = "entityManagerFactory",
transactionManagerRef = "transactionManager"
)
public class DatasourceConfiguration {
#Bean(name = "transactionManager")
public PlatformTransactionManager transactionManager(
#Qualifier("entityManagerFactory") EntityManagerFactory entityManagerFactory ) { // I defined these (datasource etc.)
return new JpaTransactionManager(entityManagerFactory);
}
}
edit 2:
Setting hibernate.allow_update_outside_transaction to true resolved the issue but I have some concerns about it. Could it effect rollback of chunk when error accoured? I suppose not because it has it's own transaction but I need to be sure. And I couldn't fully understand why it happens.
Since you are using JPA, you need to configure the job repository as well as the step to use a JpaTransactionManager.
For the job repository, you need to override BatchConfigurer#getTransactionManager as mentioned in the documentation here: https://docs.spring.io/spring-batch/docs/4.3.7/reference/html/job.html#javaConfig.
For the step, you can set the transaction manager using the builder:
#Bean
public Step step(JpaTransactionManager transactionManager) {
return this.stepBuilderFactory.get("step")
// configure step type and other properties
.transactionManager(transactionManager)
.build();
}
EDIT: Add transaction details about JobExecutionListener
JobExecutionListener#afterJob is executed outside the transaction driven by Spring Batch for the step. So if you want to execute transactional code inside that method, you need to manage the transaction yourself. You can do that either declaratively by adding #Transactional(transactionManager = "jpaTransactionManager", propagation = Propagation.REQUIRES_NEW) on your repository method, or programmatically with a TransactionTemplate, something like:
#Override
public void afterJob(JobExecution jobExecution) {
new TransactionTemplate(transactionManager, transactionAttribute).execute(new TransactionCallbackWithoutResult() {
#Override
protected void doInTransactionWithoutResult(TransactionStatus status) {
myRepo.saveAndFlush(myUpdatedEntity);
}
});
}

How to dynamically initialize ItemWriter Bean with #StepScope at runtime when ItemWriter Bean is extended?

[New to Spring Batch] I have different csv of different format, there can be more csv added in future so I thought of having a common FlatFileItemReader<T> instead of defining #Bean for each csv format, I created a base configuration class then concrete class for each csv type.
Since I have defined Reader bean as #StepScope , during batch job runtime it auto-initializes bean with the first concrete class in the package, same kind of problem is discussed here but answer is not relevant to my case
How do I pass particular concrete class type of ItemReader to my step during job run?
Here is my base configuration class:
public abstract class AbstractBatchItemReader<T> {
private CsvInformation csvInformation;
protected AbstractBatchItemReader(CsvInformation csvInformation) {
this.csvInformation = csvInformation;
}
#Bean
#StepScope
//fileName is retrieved from jobParameters during runtime
public FlatFileItemReader<T> getItemReader(#Value("#{jobParameters['input.file.name']}") String fileName) {
return new FlatFileItemReaderBuilder<T>()
.name("invoiceHeaderItemReader")
.resource(new FileSystemResource(fileName))
.linesToSkip(1)
.delimited()
.names(csvInformation.getHeaders().split(","))
.fieldSetMapper(new BeanWrapperFieldSetMapper<T>() {{
setConversionService(new StringToLocalDateConversion().convert());
setTargetType(csvInformation.getERPClass());
}})
.build();
}
}
Here is the concrete class that extends the base config:
#Configuration
public class InvoiceHeaderReader extends AbstractBatchItemReader<ERPInvoiceHeader> {
protected InvoiceHeaderReader(InvoiceHeaderCsvInformation csvInformation) {
super(csvInformation);
}
}
Here is my base step config:
public abstract class AbstractBatchStep<T> {
private final AbstractBatchItemReader<T> reader;
private final AbstractBatchItemWriter<T> writer;
private final StepBuilderFactory stepBuilderFactory;
protected AbstractBatchStep(AbstractBatchItemReader<T> reader,
AbstractBatchItemWriter<T> writer,
StepBuilderFactory stepBuilderFactory) {
this.reader = reader;
this.writer = writer;
this.stepBuilderFactory = stepBuilderFactory;
}
public Step getStep() {
afterPropertiesSet();
return stepBuilderFactory.get("Batch Step")
.<T, T>chunk(BatchConfiguration.READER_CHUNK_SIZE)
//fileName is passed during runtime
.reader(reader.getItemReader(null))
.writer(writer.getItemWriter())
.build();
}
}
Here is the concrete class that extends step config:
#Configuration("invoice_header")
public class InvoiceHeaderStep extends AbstractBatchStep<ERPInvoiceHeader> {
protected InvoiceHeaderStep(InvoiceHeaderReader reader, InvoiceHeaderWriter writer, StepBuilderFactory stepBuilderFactory) {
super(reader, writer, stepBuilderFactory);
}
}
The whole Job cycle runs only for the first concrete class in the package if I try to run another type of csv it fails with exception.. Unexpected token required n found n exception is obviously because the reader was auto initialized by first class in package, not the one that I pass to Step
Please also suggest if this design pattern is correct of there could possibly be an easy way to achieve this.
I would like to post answer as a reference to others.
I created an AbstractBatchItemReader<T> class which has base configuration
Concrete classes that extends base config class TypeOneCsvReader extends AbstractBatchItemReader<TypeOneEntity>
3.Interface with Csv Information methods and Classes implementing interface for each Csv type
Here is the code sample:
AbstractBatchItemReader:
public abstract class AbstractBatchItemReader<T> {
private CsvInformation csvInformation;
protected AbstractBatchItemReader(CsvInformation csvInformation) {
this.csvInformation = csvInformation;
}
FlatFileItemReader<T> getItemReader() {
return new FlatFileItemReaderBuilder<T>()
.name("Batch Reader")
.resource(resource(null))
.linesToSkip(1)
.delimited()
.quoteCharacter(BatchConfiguration.READER_QUOTE_CHARACTER)
.names(csvInformation.getHeaders().split(","))
.fieldSetMapper(new BeanWrapperFieldSetMapper<T>() {{
setConversionService(StringToLocalDateConversion.convert());
setTargetType(csvInformation.getEntityClass());
}})
.build();
}
#Bean
#StepScope
public Resource resource(#Value("#{jobParameters['input.file.name']}") String fileName) {
return new FileSystemResource(fileName);
}
}
Concrete Class:
#Configuration
public class TypeOneCsvReader extends AbstractBatchItemReader<TypeOneEntity> {
protected TypeOneCsvReader(TypeOneCsv csvInformation) {
super(csvInformation);
}
}
CsvInformation Interface:
public interface CsvInformation {
String getHeaders();
Class getEntityClass();
}
Each implementation of interface has to be annotated with #Component so that Concrete Reader class picks it up via DI
Benefit of having such an approach is, it can be scaled to as many csv type as required and also the Reader logic stays at one place
Thanks

How to get StepExecution in delegated ItemReader in Spring Batch

I have created batch application which do chunk processing. I am creating chunks using Completion Policy.
Following is my batch configuration, (keeping code minimal, please let me know if need other information)
#Bean
public Job myJob() {
ItemReader itemReader = itemReader();
return jobBuilder.get("job").start(myStep(itemReader, completionPolicyReader(itemReader), writer(), processor()));
}
#Bean
public Step myStep(ItemReader itemReader, MyCompletionPolicy completionPolicyReader, ItemWriter writer, ItemProcessor processor) {
return stepBuilder.get("step").chunk(completionPolicyReader).reader(completionPolicyReader).processor(processor).writer(writer).listener(itemReader).build(); // registered delegated itemReader to listener.
}
#Bean
public MyCompletionPolicy completionPolicyReader(ItemReader itemReader) {
MyCompletionPolicy obj = new MyCompletionPolicy();
obj.setDelegate(itemReader);
return obj;
}
#Bean
public ItemReader itemReader() {
abc === xyz ? new AReader() : new BReader();
}
// other config
Following is my MyCompletionPolicy which delegates to actual ItemReader ie either AReader or BReader depending on some condition.
class MyCompletionPolicy extends
CompletionPolicySupport implements ItemReader<MyModel>, StepExecutionListener {
public void setDelegate(ItemReader<MyModel> itemReader) {
this.itemReader = itemReader;
this.delegate = new SingleItemPeekableItemReader<MyModel>();
this.delegate.setDelegate(itemReader);
}
#Override
public MyModel read() {
currentReadItem = delegate.read(); // Here I am delegating to actual reader (ex AReader) where I cannot get `StepExecution`
return currentReadItem;
}
.... // Other overridden methods
}
Following is my AReader where I am not able to get StepExecution
class AReader implements ItemReader<MyModel>, StepExecutionListener {
#Override
public void beforeStep(StepExecution stepExecution) {
// stepExecution is NULL
}
.... // other overridden methods
}
How I can get stepExecution in my delegated ItemReader ie in AReader.
======EDIT=====
Sub question regarding best practices. If I want to increment count between chunks i.e for example between multiple calls of ItemReader and use current value of counter in ItemReader. Is it good practice to Create class field in ItemReader class or should I store it in ExecutionContext ?
Considering SingleThread App
Considering MultiThread App
By default, Spring Batch will automatically register your reader/processor/writer as listeners if they implement StepExecutionListener. In your case, the reader is MyCompletionPolicy which implements StepExecutionListener and will be registered as a listener automatically.
However, Spring Batch is not aware that your MyCompletionPolicy delegates to another reader, so you need to explicitly register your delegate as a listener in the step.

Issues with Spring Batch

Hi I have been working in Spring batch recently and need some help.
1) I want to run my Job using multiple threads, hence I have used TaskExecutor as below,
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(taskExecutor())
.throttleLimit(4)
.build();
}
but, while executing in can see below line in console.
o.s.b.c.l.support.SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
What does this mean? However, while debugging I can see four SimpleAsyncExecutor threads running. Can someone shed some light on this?
2) I don't want to run my Batch application with the metadata tables that spring batch creates. I have tried adding spring.batch.initialize-schema=never. But it didn't work. I also saw some way to do this by using ResourcelessTransactionManager, MapJobRepositoryFactoryBean. But I have to make some database transactions for my job. So will it be alright if I use this?
Also I was able to do this by extending DefaultBatchConfigurer and overriding:
#Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}
Please guide me further. Thanks.
Update:
My full configuration class here.
#EnableBatchProcessing
#EnableScheduling
#Configuration
public class MyBatchConfiguration{
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
/* #Override
public void setDataSource(DataSource dataSource) {
// override to do not set datasource even if a datasource exist.
// initialize will use a Map based JobRepository (instead of database)
}*/
#Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyEntity,AnotherEntity> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(executor())
.throttleLimit(4)
.build();
}
#Bean
public Job myJob() {
return jobBuilderFactory.get("myJob")
.incrementer(new RunIdIncrementer())
.listener(listener())
.flow(myStep())
.end()
.build();
}
#Bean
public MyJobListener myJobListener()
{
return new MyJobListener();
}
#Bean
public ItemReader<MyEntity> reader()
{
return new MyReader();
}
#Bean
public ItemWriter<? super AnotherEntity> writer()
{
return new MyWriter();
}
#Bean
public ItemProcessor<MyEntity,AnotherEntity> processor()
{
return new MyProcessor();
}
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(4);
return taskExecutor;
}}
In the future, please break this up into two independent questions. That being said, let me shed some light on both questions.
SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
Your configuration is configuring myStep to use your TaskExecutor. What that does is it causes Spring Batch to execute each chunk in it's own thread (based on the parameters of the TaskExecutor). The log message you are seeing has nothing to do with that behavior. It has to do with launching your job. By default, the SimpleJobLauncher will launch the job on the same thread it is running on, thereby blocking that thread. You can inject a TaskExecutor into the SimpleJobLauncher which will cause the job to be executed on a different thread from the JobLauncher itself. These are two separate uses of multiple threads by the framework.
I don't want to run my Batch application with the metadata tables that spring batch creates
The short answer here is to just use an in memory database like HSQLDB or H2 for your metadata tables. This provides a production grade data store (so that concurrency is handled correctly) without actually persisting the data. If you use the ResourcelessTransactionManager, you are effectively turning transactions off (a bad idea if you're using a database in any capacity) because that TransactionManager doesn't actually do anything (it's a no-op implementation).

Accessing Beans outside of the Step Scope in Spring Batch

Is it possible to access beans defined outside of the step scope? For example, if I define a strategy "strategyA" and pass it in the job parameters I would like the #Value to resolve to the strategyA bean. Is this possible? I am currently working round the problem by getting the bean manually from the applicationContext.
#Bean
#StepScope
public Tasklet myTasklet(
#Value("#{jobParameters['strategy']}") MyCustomClass myCustomStrategy)
MyTasklet myTasklet= new yTasklet();
myTasklet.setStrategy(myCustomStrategy);
return myTasklet;
}
I would like to have the ability to add more strategies without having to modify the code.
The sort answer is yes. This is more general spring/design pattern issue rater then Spring Batch.
The Spring Batch tricky parts are the configuration and understanding scope of bean creation.
Let’s assume all your Strategies implement Strategy interface that looks like:
interface Strategy {
int execute(int a, int b);
};
Every strategy should implements Strategy and use #Component annotation to allow automatic discovery of new Strategy. Make sure all new strategy will placed under the correct package so component scan will find them.
For example:
#Component
public class StrategyA implements Strategy {
#Override
public int execute(int a, int b) {
return a+b;
}
}
The above are singletons and will be created on the application context initialization.
This stage is too early to use #Value("#{jobParameters['strategy']}") as JobParameter wasn't created yet.
So I suggest a locator bean that will be used later when myTasklet is created (Step Scope).
StrategyLocator class:
public class StrategyLocator {
private Map<String, ? extends Strategy> strategyMap;
public Strategy lookup(String strategy) {
return strategyMap.get(strategy);
}
public void setStrategyMap(Map<String, ? extends Strategy> strategyMap) {
this.strategyMap = strategyMap;
}
}
Configuration will look like:
#Bean
#StepScope
public MyTaskelt myTasklet () {
MyTaskelt myTasklet = new MyTaskelt();
//set the strategyLocator
myTasklet.setStrategyLocator(strategyLocator());
return myTasklet;
}
#Bean
protected StrategyLocator strategyLocator(){
return = new StrategyLocator();
}
To initialize StrategyLocator we need to make sure all strategy were already created. So the best approach would be to use ApplicationListener on ContextRefreshedEvent event (warning in this example strategy names start with lower case letter, changing this is easy...).
#Component
public class PlugableStrategyMapper implements ApplicationListener<ContextRefreshedEvent> {
#Autowired
private StrategyLocator strategyLocator;
#Override
public void onApplicationEvent(ContextRefreshedEvent contextRefreshedEvent) {
ApplicationContext applicationContext = contextRefreshedEvent.getApplicationContext();
Map<String, Strategy> beansOfTypeStrategy = applicationContext.getBeansOfType(Strategy.class);
strategyLocator.setStrategyMap(beansOfTypeStrategy);
}
}
The tasklet will hold a field of type String that will be injected with Strategy enum String using #Value and will be resolved using the locator using a "before step" Listener.
public class MyTaskelt implements Tasklet,StepExecutionListener {
#Value("#{jobParameters['strategy']}")
private String strategyName;
private Strategy strategy;
private StrategyLocator strategyLocator;
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
strategy = strategyLocator.lookup(strategyName);
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
int executeStrategyResult = strategy.execute(1, 2);
}
public void setStrategyLocator(StrategyLocator strategyLocator) {
this.strategyLocator = strategyLocator;
}
}
To attach the listener to the taskelt you need to set it in your step configuration:
#Bean
protected Step myTaskletstep() throws MalformedURLException {
return steps.get("myTaskletstep")
.transactionManager(transactionManager())
.tasklet(deleteFileTaskelt())
.listener(deleteFileTaskelt())
.build();
}
jobParameters is holding just a String object and not the real object (and I think is not a good pratice store a bean definition into parameters).
I'll move in this way:
#Bean
#StepScope
class MyStategyHolder {
private MyCustomClass myStrategy;
// Add get/set
#BeforeJob
void beforeJob(JobExecution jobExecution) {
myStrategy = (Bind the right strategy using job parameter value);
}
}
and register MyStategyHolder as listener.
In your tasklet use #Value("#{MyStategyHolder.myStrategy}") or access MyStategyHolder instance and perform a getMyStrategy().

Resources