I have the following requirement:
An endpoint http://localhost:8080/myapp/jobExecution/myJobName/execute which receives a CSV and use univocity to apply some validations and generate a List of some pojo.
Send that list to a Spring Batch Job for some processing.
Multiple users could do this.
I want to know if with Spring Batch I can achieve this?
I was thinking to use a queue, put the data and execute a Job that pull objects from that queue. But how can I be sure that if other person execute the endpoint and other Job is executing, Spring Batch Knows which Item belongs to a certain execution?
You can use a queue or go ahead to put the list of values that was generated after the step with validations and store it as part of job parameters in the job execution context.
Below is a snippet to store the list to a job context and read the list using an ItemReader.
Snippet implements StepExecutionListener in a Tasklet step to put List which was constructed,
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
//tenantNames is a List<String> which was constructed as an output of an evaluation logic
stepExecution.getJobExecution().getExecutionContext().put("listOfTenants", tenantNames);
return ExitStatus.COMPLETED;
}
Now "listOfTenants" are read as part of a Step which has Reader (To allow one thread read at a time), Processor and Writer. You can also store it as a part of Queue and fetch it in a Reader. Snippet for reference,
public class ReaderStep implements ItemReader<String>, StepExecutionListener {
private List<String> tenantNames;
#Override
public void beforeStep(StepExecution stepExecution) {
try {
tenantNames = (List<String>)stepExecution.getJobExecution().getExecutionContext()
.get("listOfTenants");
logger.debug("Sucessfully fetched the tenant list from the context");
} catch (Exception e) {
// Exception block
}
}
#Override
public synchronized String read() throws Exception {
String tenantName = null;
if(tenantNames.size() > 0) {
tenantName = tenantNames.get(0);
tenantNames.remove(0);
return tenantName;
}
logger.info("Completed reading all tenant names");
return null;
}
// Rest of the overridden methods of this class..
}
Yes. Spring boot would execute these jobs in different threads. So Spring knows which items belongs to which execution.
Note: You can use like logging correlation id. This will help you filter the logs for a particular request. https://dzone.com/articles/correlation-id-for-logging-in-microservices
Related
I have a partitioned Spring Batch job that reads several split up CSV files and processes each in their own thread, then writes the results to a corresponding output file.
If an item fails to process though (an exception is thrown), I want to write that result to an error file. Is there a way to add a writer or listener that can handle this?
Taking this one step further, is there a way to split this up by exception type and write the different exceptions to different files?
You can achieve this by specifying SkipPolicy. Implement this interface and add your own logic.
public class MySkipper implements SkipPolicy {
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof XYZException) {
//doSomething
}
......
}
You can specify this skip policy in your batch.
this.stepBuilders.get("importStep").<X, Y>chunk(10)
.reader(this.getItemReader()).faultTolerant().skipPolicy(....)
.processor(this.getItemProcessor())
.writer(this.getItemWriter())
.build();
One way that I have seen this done is through a combination of a SkipPolicy and a SkipListener.
The policy would allow you to skip over items that threw an exception, such as a FlatFileParseException (skippable exceptions can be configured).
The listener gives you access to the Throwable and the item that caused it (or just Throwable in the case of reads). The skip listener also lets you differentiate between skips in the read/processor/writer if you wanted to handle those separately.
public class ErrorWritingSkipListener<T, S> implements SkipListener<T, S> {
#Override
public void onSkipInRead(final Throwable t) {
// custom logic
}
#Override
public void onSkipInProcess(final T itemThatFailed, final Throwable t) {
// custom logic
}
#Override
public void onSkipInWrite(final S itemThatFailed, final Throwable t) {
// custom logic
}
}
I would recommend using the SkipPolicy only to identify the exceptions you want to write out to your various files, and leveraging the SkipListener to perform the actual file writing logic. That would match up nicely with their intended use as defined by their interfaces.
I have a requirement where I need to read values from an xls (where a column called netCreditAmount exists) and save the values in database. The requirement is to add the value of netCreditAmount from all the rows and then set this sum in database only for the first row in xls and remaining rows are inserted with their corresponding netCreditAmounts.
How should I go ahead with the implemetation in Spring Batch. Normal reader, processor and writer are working fine but where exactly should i insert this implementation?
Thanks!
Yo can solve this by adding additional tasklet.
job flow can be like below
#Bean
public Job myJob(JobBuilderFactory jobs) throws Exception {
return jobs.get("myJob")
.start(step1LoadAllData()) // This step will load all data in database excpet first row in xls
.next(updateNetCreditAmountStep()) //// This step will be a tasklet. and will update total sum in first row. You can use database sql for sum for this
.build();
}
Tasklet will be something like below
#Component
public class updateNetCreditAmountTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext)
throws Exception {
Double sum = jdbctemplate.queryForObject("select sum(netCreditAmount) from XYZ", Double.class);
// nouw update this some in database for first row
return null;
}
}
So what is the problem?
You need to setup your batch job step to use reader-processor-writer.
Reader has interface:
public interface ItemReader<T> {
T read();
}
Processor:
public interface ItemProcessor<I, O> {
O process(I item);
}
So what you need to have same type provided by reader - T; and pass it to processor - I
stepBuilderFactory.get("myCoolStep")
.<I, O>chunk(1)
.reader(myReader)
.processor(myProcessor)
.writer(myWriter)
.build();
I have a spring batch job that I'd like to do the following...
Step 1 -
Tasklet - Create a list of dates, store the list of dates in the job execution context.
Step 2 -
JDBC Item Reader - Get list of dates from job execution context.
Get element(0) in dates list. Use is as input for jdbc query.
Store element(0) date is job execution context
Remove element(0) date from list of dates
Store element(0) date in job execution context
Flat File Item Writer - Get element(0) date from job execution context and use for file name.
Then using a job listener repeat step 2 until no remaining dates in the list of dates.
I've created the job and it works okay for the first execution of step 2. But step 2 is not repeating as I want it to. I know this because when I debug through my code it only breaks for the initial run of step 2.
It does however continue to give me messages like below as if it is running step 2 even when I know it is not.
2016-08-10 22:20:57.842 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Duplicate step [readStgDbAndExportMasterListStep] detected in execution of job=[exportMasterListCsv]. If either step fails, both will be executed again on restart.
2016-08-10 22:20:57.846 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [readStgDbAndExportMasterListStep]
This ends up in a never ending loop.
Could someone help me figure out or give a suggestion as to why my stpe 2 is only running once?
thanks in advance
I've added two links to PasteBin for my code so as not to pollute this post.
http://pastebin.com/QhExNikm (Job Config)
http://pastebin.com/sscKKWRk (Common Job Config)
http://pastebin.com/Nn74zTpS (Step execution listener)
From your question and your code I deduct that based on the amount of dates that you retrieve (this happens before the actual job starts), you will execute a step for the amount of times you have dates.
I suggest a design change. Create a java class that will get you the dates as a list and based on that list you will dynamically create your steps. Something like this:
#EnableBatchProcessing
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private JobDatesCreator jobDatesCreator;
#Bean
public Job executeMyJob() {
List<Step> steps = new ArrayList<Step>();
for (String date : jobDatesCreator.getDates()) {
steps.add(createStep(date));
}
return jobBuilderFactory.get("executeMyJob")
.start(createParallelFlow(steps))
.end()
.build();
}
private Step createStep(String date){
return stepBuilderFactory.get("readStgDbAndExportMasterListStep" + date)
.chunk(your_chunksize)
.reader(your_reader)
.processor(your_processor)
.writer(your_writer)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
EDIT: added "jobParameter" input (slightly different approach also)
Somewhere on your classpath add the following example .properties file:
sql.statement="select * from awesome"
and add the following annotation to your JobDatesCreator class
#PropertySource("classpath:example.properties")
You can provide specific sql statements as a command line argument as well. From the spring documentation:
you can launch with a specific command line switch (e.g. java -jar
app.jar --name="Spring").
For more info on that see http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html
The class that gets your dates (why use a tasklet for this?):
#PropertySource("classpath:example.properties")
public class JobDatesCreator {
#Value("${sql.statement}")
private String sqlStatement;
#Autowired
private CommonExportFromStagingDbJobConfig commonJobConfig;
private List<String> dates;
#PostConstruct
private void init(){
// Execute your logic here for getting the data you need.
JdbcTemplate jdbcTemplate = new JdbcTemplate(commonJobConfig.onlineStagingDb);
// acces to your sql statement provided in a property file or as a command line argument
System.out.println("This is the sql statement I provided in my external property: " + sqlStatement);
// for now..
dates = new ArrayList<>();
dates.add("date 1");
dates.add("date 2");
}
public List<String> getDates() {
return dates;
}
public void setDates(List<String> dates) {
this.dates = dates;
}
}
I also noticed that you have alot of duplicate code that you can quite easily refactor. Now for each writer you have something like this:
#Bean
public FlatFileItemWriter<MasterList> division10MasterListFileWriter() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, MerchHierarchyConstants.DIVISION_NO_10 )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
Consider using something like this instead:
public FlatFileItemWriter<MasterList> divisionMasterListFileWriter(String divisionNumber) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, divisionNumber )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
As not all code is available to correctly replicate your issue, this answer is a suggestion/indication to solve your problem.
Based on our discussion on Spring batch execute dynamically generated steps in a tasklet I'm trying to answer the questions on how to access jobParameter before the job is actually being executed.
I assume that there is restcall which will execute the batch. In general, this will require the following steps to be taken.
1. a piece of code that receives the rest call with its parameters
2. creation of a new springcontext (there are ways to reuse an existing context and launch the job again but there are some issues when it comes to reuse of steps, readers and writers)
3. launch the job
The simplest solution would be to store the jobparameter received from the service as an system-property and then access this property when you build up the job in step 3. But this could lead to a problem if more than one user starts the job at the same moment.
There are other ways to pass parameters into the springcontext, when it is loaded. But that depends on the way you setup your context.
For instance, if you are using SpringBoot directly for step 2, you could write a method like:
private int startJob(Properties jobParamsAsProps) {
SpringApplication springApp = new SpringApplication(.. my config classes ..);
springApp.setDefaultProperties(jobParamsAsProps);
ConfigurableApplicationContext context = springApp.run();
ExitCodeGenerator exitCodeGen = context.getBean(ExitCodeGenerator.class);
int code = exitCodeGen.getExitCode();
context.close();
return cod;
}
This way, you could access the properties as normal with standard Value- or ConfigurationProperties Annotations.
In the olden days, we had ThreadLocal for programs to carry data along with the request path since all request processing was done on that thread and stuff like Logback used this with MDC.put("requestId", getNewRequestId());
Then Scala and functional programming came along and Futures came along and with them came Local.scala (at least I know the twitter Futures have this class). Future.scala knows about Local.scala and transfers the context through all the map/flatMap, etc. etc. functionality such that I can still do Local.set("requestId", getNewRequestId()); and then downstream after it has travelled over many threads, I can still access it with Local.get(...)
Soooo, my question is in Java, can I do the same thing with the new CompletableFuture somewhere with LocalContext or some object (not sure of the name) and in this way, I can modify Logback MDC context to store it in that context instead of a ThreadLocal such that I don't lose the request id and all my logs across the thenApply, thenAccept, etc. etc. still work just fine with logging and the -XrequestId flag in Logback configuration.
EDIT:
As an example. If you have a request come in and you are using Log4j or Logback, in a filter, you will set MDC.put("requestId", requestId) and then in your app, you will log many log statements line this:
log.info("request came in for url="+url);
log.info("request is complete");
Now, in the log output it will show this:
INFO {time}: requestId425 request came in for url=/mypath
INFO {time}: requestId425 request is complete
This is using a trick of ThreadLocal to achieve this. At Twitter, we use Scala and Twitter Futures in Scala along with a Local.scala class. Local.scala and Future.scala are tied together in that we can achieve the above scenario still which is very nice and all our log statements can log the request id so the developer never has to remember to log the request id and you can trace through a single customers request response cycle with that id.
I don't see this in Java :( which is very unfortunate as there are many use cases for that. Perhaps there is something I am not seeing though?
If you come across this, just poke the thread here
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-May/047867.html
to implement something like twitter Futures which transfer Locals (Much like ThreadLocal but transfers state).
See the def respond() method in here and how it calls Locals.save() and Locals.restort()
https://github.com/simonratner/twitter-util/blob/master/util-core/src/main/scala/com/twitter/util/Future.scala
If Java Authors would fix this, then the MDC in logback would work across all 3rd party libraries. Until then, IT WILL NOT WORK unless you can change the 3rd party library(doubtful you can do that).
My solution theme would be to (It would work with JDK 9+ as a couple of overridable methods are exposed since that version)
Make the complete ecosystem aware of MDC
And for that, we need to address the following scenarios:
When all do we get new instances of CompletableFuture from within this class? → We need to return a MDC aware version of the same rather.
When all do we get new instances of CompletableFuture from outside this class? → We need to return a MDC aware version of the same rather.
Which executor is used when in CompletableFuture class? → In all circumstances, we need to make sure that all executors are MDC aware
For that, let's create a MDC aware version class of CompletableFuture by extending it. My version of that would look like below
import org.slf4j.MDC;
import java.util.Map;
import java.util.concurrent.*;
import java.util.function.Function;
import java.util.function.Supplier;
public class MDCAwareCompletableFuture<T> extends CompletableFuture<T> {
public static final ExecutorService MDC_AWARE_ASYNC_POOL = new MDCAwareForkJoinPool();
#Override
public CompletableFuture newIncompleteFuture() {
return new MDCAwareCompletableFuture();
}
#Override
public Executor defaultExecutor() {
return MDC_AWARE_ASYNC_POOL;
}
public static <T> CompletionStage<T> getMDCAwareCompletionStage(CompletableFuture<T> future) {
return new MDCAwareCompletableFuture<>()
.completeAsync(() -> null)
.thenCombineAsync(future, (aVoid, value) -> value);
}
public static <T> CompletionStage<T> getMDCHandledCompletionStage(CompletableFuture<T> future,
Function<Throwable, T> throwableFunction) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return getMDCAwareCompletionStage(future)
.handle((value, throwable) -> {
setMDCContext(contextMap);
if (throwable != null) {
return throwableFunction.apply(throwable);
}
return value;
});
}
}
The MDCAwareForkJoinPool class would look like (have skipped the methods with ForkJoinTask parameters for simplicity)
public class MDCAwareForkJoinPool extends ForkJoinPool {
//Override constructors which you need
#Override
public <T> ForkJoinTask<T> submit(Callable<T> task) {
return super.submit(MDCUtility.wrapWithMdcContext(task));
}
#Override
public <T> ForkJoinTask<T> submit(Runnable task, T result) {
return super.submit(wrapWithMdcContext(task), result);
}
#Override
public ForkJoinTask<?> submit(Runnable task) {
return super.submit(wrapWithMdcContext(task));
}
#Override
public void execute(Runnable task) {
super.execute(wrapWithMdcContext(task));
}
}
The utility methods to wrap would be such as
public static <T> Callable<T> wrapWithMdcContext(Callable<T> task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.call();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static Runnable wrapWithMdcContext(Runnable task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.run();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static void setMDCContext(Map<String, String> contextMap) {
MDC.clear();
if (contextMap != null) {
MDC.setContextMap(contextMap);
}
}
Below are some guidelines for usage:
Use the class MDCAwareCompletableFuture rather than the class CompletableFuture.
A couple of methods in the class CompletableFuture instantiates the self version such as new CompletableFuture.... For such methods (most of the public static methods), use an alternative method to get an instance of MDCAwareCompletableFuture. An example of using an alternative could be rather than using CompletableFuture.supplyAsync(...), you can choose new MDCAwareCompletableFuture<>().completeAsync(...)
Convert the instance of CompletableFuture to MDCAwareCompletableFuture by using the method getMDCAwareCompletionStage when you get stuck with one because of say some external library which returns you an instance of CompletableFuture. Obviously, you can't retain the context within that library but this method would still retain the context after your code hits the application code.
While supplying an executor as a parameter, make sure that it is MDC Aware such as MDCAwareForkJoinPool. You could create MDCAwareThreadPoolExecutor by overriding execute method as well to serve your use case. You get the idea!
You can find a detailed explanation of all of the above here in a post about the same.
I'm developing a simple Spring MVC application to download tweets from the streaming API and show them in a webpage. Users of the application can submit a Task with the keywords of the tweets that they want to download. This tasks are shared so everyone can start, stop, modify, change or cancel a task.
TwitterFetcher is the class responsible of download tweets. This class receives a Task and persists all tweets downloaded in a database.
#Service
public class TwitterFetcher {
#Autowired
private OAuthService oAuthService;
#Autowired
private TweetService tweetService;
private Task task;
private TwitterStream twitterStream;
public void start(Task task) {
/* Stop previous stream */
stop();
/* Get OAuth credentials */
OAuth oAuth = oAuthService.findOneEnabled();
if (oAuth == null) {
} else {
this.task = task;
Configuration oAuthConfiguration = getOAuthConfiguration(oAuth);
twitterStream = new TwitterStreamFactory(oAuthConfiguration).getInstance();
twitterStream.addListener(new TwitterListener());
String keywords = task.getBaseKeywords() + ", " + task.getExpandedKeywords();
FilterQuery filterQuery = new FilterQuery();
filterQuery.track(keywords.split(", "));
twitterStream.filter(filterQuery);
}
}
public void stop() {
if (twitterStream != null) {
twitterStream.shutdown();
}
}
private Configuration getOAuthConfiguration(OAuth oAuth) {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(false);
cb.setJSONStoreEnabled(true);
cb.setOAuthAccessToken(oAuth.getAccessToken());
cb.setOAuthAccessTokenSecret(oAuth.getAccessTokenSecret());
cb.setOAuthConsumerKey(oAuth.getConsumerKey());
cb.setOAuthConsumerSecret(oAuth.getConsumerSecret());
return cb.build();
}
private class TwitterListener implements StatusListener {
#Override
public void onStatus(Status status) {
/* Persist new tweet */
Tweet tweet = new Tweet();
tweet.setJson(DataObjectFactory.getRawJSON(status));
tweetService.save(tweet);
}
[Omitted code]
}
}
The basic functionality would be the next one:
A user start the fetcher from the website.
The fetcher receives a new tweet and it's saved in the DB
The fetcher keeps receiving tweets until a user stop it.
The application has a dashboard to control the fetchers and the tasks and the users must be able to interact with it while the fetcher is downloading.
My question is, Would the fetcher block the app or will be executed in a different thread? In the worst case, what I have to change to solve this? I'm still far from an usable app so I can't test it. Even so, I want to fix it right now if possible.
You can use ExecutorService to run the fetcher in a separate thread. I'd recommend using ThreadPool so you don't blow performance if too many users running the fetcher:
ExecutorService executor = Executors.newFixedThreadPool(maxThreads)
When a task is submitted through the executor it will return a Future object from which you can check for job completion
Future f = executor.submit(myTask);
boolean isDone = f.isDone();
Please read more about Java concurrency if you're not familiar: http://docs.oracle.com/javase/tutorial/essential/concurrency/index.html
Annotate your start() method with #Async.
#Async
public void start(Task task)
This will make the start method asynchronous and will not block the application.
You can check out a simple example here.