I tried reading data from one table and writing to other table using spring batch but now my requirement is to read data from mutiple tables and write to a file, so we can achieve this by defining mutiple jobs but I want to do it using single job means single reader and single writer and single processor.
Please provide me some references for this scenario.
Not possible by the classes provided by the spring batch but you can make a way our of it.
Just before the chunk processing add one step, make a custom tasklet where you will assign different sql and different output file and make them run in loop as long as there are sqls to execute.
It might sound difficult but I have worked on same situation, Here is some idea how you can do it -
<flow id="databaseReadWriteJob">
<step id="step1_setReaderWriter">
<tasklet ref="setReaderWriter" />
<next on="FAILED" to="" />
<next on="*" to="dataExtractionStep" />
</step>
<step id="dataExtractionStep">
<tasklet>
<chunk reader="dbReader" writer="flatFileWriter" commit-interval="${commit-interval}" />
</tasklet>
<next on="FAILED" to="" />
<next on="*" to="step3_removeProcessedSql" />
</step>
<step id="step3_removeProcessedSql">
<tasklet ref="removeProcessedSql" />
<next on="NOOP" to="step1_setReaderWriter" />
<next on="*" to="step4_validateNumOfSuccessfulSteps" />
</step>
</flow>
and here is the bean for setReaderWriter
<beans:bean id="setReaderWriter" class="SetReaderWriter">
<beans:property name="reader" ref="dbReader" />
<beans:property name="flatFileWriter" ref="flatFileWriter" />
<beans:property name="fileSqlMap" ref="jobSqlFileMap" />
<beans:property name="fileNameBuilder" ref="localFileNameBuilder" />
<beans:property name="sourceFolder" value="${dataDir}" />
<beans:property name="dateDiff" value="${dateDiff}" />
Anything you need to add dynamically in Reader or Writer. Above sqlMap is the map of sql as key and Output file as value of that.
I hope it could help.
Related
I tried to find a solution for the problem I am looking for, but I dint find it or a chance that I might skipped. Help me if you can redirect me to the solution page.
Input to the batch: I have TRADES.csv, PORTFOLIO.csv files which are located at different paths.
How I am implemented it currently:
Currently I wrote a class CSVReader has two FlatFileItemReaders which are defined as below.
<beans:bean id="porfolioReader"
class="org.springframework.batch.item.file.FlatFileItemReader">
<beans:property name="lineMapper" ref="lineMapperForPortfolio"></beans:property>
<beans:property name="strict" value="false"></beans:property>
<beans:property name="recordSeparatorPolicy"
ref="csvRecordSeparatorPolicy"></beans:property>
<beans:property name="linesToSkip" value="1"></beans:property>
<beans:property name="encoding" value="ISO-8859-1"></beans:property>
</beans:bean>
<beans:bean id="tradeReader"
class="org.springframework.batch.item.file.FlatFileItemReader">
<beans:property name="lineMapper" ref="lineMapperForTrades"></beans:property>
<beans:property name="strict" value="false"></beans:property>
<beans:property name="recordSeparatorPolicy"
ref="csvRecordSeparatorPolicy"></beans:property>
<beans:property name="linesToSkip" value="1"></beans:property>
<beans:property name="encoding" value="ISO-8859-1"></beans:property>
</beans:bean>
So based on the input file path I am opening the corresponding reader and making the FieldSet.
Case I am looking for:
I am looking to read these CSV files in single step and make a FieldSet so that in processor I can again split the data into list of TRADES and PORTFOLIO objects. Is there a way I can make the FlatFileItemReader capable of finding what is the CSV picked, and choose the corresponding linemapper..?
Job Definition:
<batch:step id="tradeStep1" allow-start-if-complete="true">
<batch:tasklet>
<batch:chunk reader="csvReader"
processor="csvProcessor" writer="csvWriter"
commit-interval="1" />
</batch:tasklet>
<batch:next on="*" to="tradeStep2" />
<batch:fail on="FAILED" />
</batch:step>
tradeStep2 will archive processed CSV files.
I have a simple batch process with a skip limit set. When the skip limit is surpassed the job fails and it never gets to step two. I would like the process to go to step 3 if the skip limit has passed.
<job id="jobA" incrementer="runIdIncrementer" >
<step id="step1" next="step2">
<tasklet>
<chunk commit-interval="10" reader="dReader" writer="dWriter" skip-limit="100">
<skippable-exception-classes>
<include class="java.lang.Exception"/>
</skippable-exception-classes>
</chunk>
<listeners>
<listener ref="skipListener"/>
</listeners>
</tasklet>
</step>
<step id="step2" next="step3">
<tasklet>
<chunk commit-interval="10" reader="sReader" writer="sWriter"/>
</tasklet>
</step>
<step id="step3">
<tasklet ref="cleanUpStep"/>
</step>
</job>
Is there a way to do this? I have tried setting "next" but an error is thrown stating cant have next attribute and a transition element.
Any help would be great.
You could add a StepExecutionListener to your step. The afterStep(StepExecution stepExecution) will be executed even if the step failed. In this method, you can get the exit status of the step and change it: stepExecution.setExitStatus(ExitStatus.COMPLETED).
You might want to check if the error comes from the skip-limit being exceeded, maybe stepExecution.getFailureExceptions() and search for SkipLimitExceededException (or something like that). You could also get the number of skiped item and compare it with your max (However, if it's an Error on the 100's skip, maybe you should do something else ...)
Note: Skipping a step after having too much exceptions doesn't sound like good design, but as long as you're aware of what you're doing ...
I have managed to fix it using "next". I was including next attribute while also having
next="step2" set in the step declaration. The fix was to remove this from here and then you can add in the next attribute.
<step id="step1">
<tasklet>
<chunk commit-interval="10" reader="dReader" writer="dWriter" skip-limit="100">
<skippable-exception-classes>
<include class="java.lang.Exception"/>
</skippable-exception-classes>
</chunk>
<listeners>
<listener ref="skipListener"/>
</listeners>
</tasklet>
<next on="*" to="step2" />
</step>
The above code will continue to step2 even if the skip limit has been reached. <step id="step1" next="step2"> would cause the job to fail if the limit was reached.
UPDATE:
I try to add some details because it's very important for me to solve this problem.
I made a batch which generates pdf documents from data which is present in some tables and saves pdf in a table. The batch is ok but the data to process is huge, so i decided to divide input data in 8 groups and process indipendently the 8 groups with 8 parallel steps.
Each step has it's own reader (named "readerX" for the step "X") and has the same processor and writer which is used by the other steps.
Elaboration goes well, but my client says that this batch uses too much memory (he looks at the "Working Set" parameter in perfmon). In particular the batch begins with 300Mb of used memory, then the used memory reaches 7GB, then decreases to 2GB and the batch finish with 1/2GB of allocated memory.
I paste the code of the job here, hoping someone could help me to find the problem (i guess i made some mistake in adapting the job to parallel processing).
I'm new to spring batch so i apologize for the "bad look".
<job id="myJob"
xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="task1" />
</step>
<step id="step2" next="step3">
<tasklet ref="task2" />
</step>
<step id="step3" next="decider">
<tasklet ref="task3" />
</step>
<decision id="decider" decider="StepExecutionDecider">
<next on="CASE X" to="split1" />
<end on="*"/>
</decision>
<split id="split1" task-executor="taskExecutor" next="endStep">
<flow>
<step id="EXEC1">
<tasklet><chunk reader="reader1" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC2">
<tasklet><chunk reader="reader2" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC3">
<tasklet><chunk reader="reader3" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC4">
<tasklet><chunk reader="reader4" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC5">
<tasklet><chunk reader="reader5" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC6">
<tasklet><chunk reader="reader6" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC7">
<tasklet><chunk reader="reader7" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
<flow>
<step id="EXEC8">
<tasklet><chunk reader="reader8" processor="processor" writer="writer" commit-interval="100"/>
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</flow>
</split>
<step id="endStep" next="decider">
<tasklet ref="task4" >
<listeners>
<listener ref="Listner" />
</listeners>
</tasklet>
</step>
</job>
<bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
<bean id="reader1" class="class of the reader">
<property name="idReader" value="1"/> // Different for the 8 readers
<property name="subSet" value="10"/> // Different for the 8 readers
<property name="dao" ref="Dao" />
<property name="bean" ref="Bean" />
[...] // Other beans
</bean>
Thanks
If your getting an OOM eventually, first start by looking at the heap.
Start the JVM with -XX:+HeapDumpOnOutOfMemoryError to obtain the HPROF which you can then look at to see object allocation, sizes etc. When the JVM exits with an OOM, this file will be generated (may take some time depending on size).
If your able to run with a larger memory foot print such as your clients machine, take a snapshot of the heap when its consuming a large amount such as the 7GB you mentioned (or any other value considered high - 4, 5, 6 etc). You should be able to invoke this while running via tools such as jconsole that come part of the JDK.
With the HPROF file, you can then inspect that with JDK provided tools such as jhat or a more GUI based tool such as the eclipse memory analyzer. This should give you a good (and relatively easy) way of finding out whats holding on to what and provide a starting point for decreasing footprint.
Using a profiler and optimizing code i successfully limited memory consumption. Thanks to all!!!
The batch is ok but the data to process is huge, so i decided to divide input data in 8 groups and process independently the 8 groups with 8 parallel steps.
If you are processing in parallel on the same machine it won't reduce the memory foot print. All the data exists in memory at the same time. If you want to decrease memory use you have to execute the steps one after the other.
I'm using Spring Batch for the first time. I tried some examples and read through documentation. But I have still questions:
Can I skip one phase in chunk oriented processing? For example: I fetch data from database, process it and determine, that I need more, can I skip write phase and execute next step's read phase? Should I use Tasklet instead?
How to implement a conditional flow?
Thank you very much,
Florian
Skip chunks simply by throwing an exception that has been declared as "skippable exception". You can do it as follows:
<step id="step1">
<tasklet>
<chunk reader="reader" writer="writer"
commit-interval="10" skip-limit="10">
<skippable-exception-classes>
<include class="com.myapp.batch.MyException"/>
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
Conditional flow can easily be implemented deciding on the ExitStatus of a step-execution:
<job id="job">
<step id="step1" parent="s1">
<next on="*" to="stepB" />
<next on="FAILED" to="stepC" />
</step>
<step id="stepB" parent="s2" next="stepC" />
<step id="stepC" parent="s3" />
</job>
Read the documentation to gain deeper knowledge on these topics: http://docs.spring.io/spring-batch/reference/html/configureStep.html
I have a spring batch job configured to run asynchronously (being started from a web service and using annotations to configure the methods to be asynchronous) my first step runs successfully.
My issue is that I have multiple steps configured and the flow is determined on the status of the step i.e. on completed it moves to step 2 but on a failure it moves to a failure handling step which sends a mail. When a remove the annotations the flow appears to work as expected. However when I use the annotations to run the job asynchronously which ever step is configured to execute on completion gets executed.
flow configuration sample:
<batch:job id="batchJob" restartable="true">
<batch:step id="step1">
<batch:tasklet ref="task1">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
<batch:next on="COMPLETED" to="step2" />
</batch:step>
<batch:step id="step2">
<batch:tasklet ref="task2">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:next on="HAS_ERRORS" to="reportError" />
<batch:end on="*" />
</batch:step>
<batch:step id="reportError">
<batch:tasklet ref="failError">
<batch:listeners>
<batch:listener ref="failureHandler" />
</batch:listeners>
</batch:tasklet>
<batch:end on="*" />
</batch:step>
</batch:job>
I have attempted to return an ExitStatus and a BatchStatus which has been ignored.
I have implemented a step execution listener but I have not yet implemented a messaging mechanism to communicate across steps and I do not see anything in the step execution context which gives me an indication of the outcome of the step.
The question I have is whether or not there is a method or mechanism that I may have overlooked to get the status of a step once it's completed? or is a messaging mechanism outside of the batch process an accepted way of proceeding?
It feels wrong that I cannot see the status of the batch step once it's completed when it's asynchronous(I get the expected results/failures when I remove the #Async annotation) I think there might be something I'm missing in my understanding I've spent some time looking into it so a pointer in the right direction would be appreciated.
I do not have access to this particular code any more.
I believe the issue is caused by the annotations overriding the XML configuration which defined the expected flow.
By overriding this we change the actual flow which we expected.