Currently I'm working on some integration tests for a Spring Batch application. Such application reads from a SQL table, writes on another table and, at the end, generates a report as a .txt file.
Initially I thought of just assuring that I had another file with the expected output and compare it with the report file and check the table content.
(For some context, I'm not very experienced on Spring).
But, after reading some articles on Baelung, I'm having doubts about my initial methodology.
Should I manipulate the table content on my code to assure that I have the expected input? Should I use the Spring Test framework tools? Without them, I'm able to run the job from my test?
The correct approach for batch job integration testing is to test the job as a black box. If the job reads data from a table and writes to another table or a file, you can proceed as follows:
Put some test data in the input table (Given)
Run your job (When)
Assert on the output table/file (Then)
You can find more details in the End-To-End Testing of Batch Jobs section of the reference documentation. Spring Batch provides some test utilities that might help in testing your jobs (like mocking batch domain objects, asserting on file content, etc). Please refer to the org.springframework.batch.test package.
Related
I believe I've got a scoping issue here.
Project explanation:
The goal is to process any incoming file (on disk), including meta data (which is stored in an SQL database). For this I have two tasklets (FileReservation and FileProcessorTask) which are the steps in the overarching "worker" jobs. They wait for an event to start their work. There are several threads dealing with jobs for concurrency. The FileReservation tasklet sends the fileId to FileProcessorTask using the job context.
A separate job (which runs indefinitely) checks for new file meta data records in the database and upon discovering new records "wakes up" the FileReservationTask tasklets using a published event.
With the current configuration the second step in a job can receive a null message when the FileReservation tasklets are awoken.
If you uncomment the code in BatchConfiguration you'll see that it works when we have separate instances of the beans.
Any pointers are greatly appreciated.
Thanks!
Polling a folder for new files is not suitable for a batch job. So using a Spring Batch job (filePollingJob) is not a good idea IMO.
Any pointers are greatly appreciated.
Polling a folder for new files and running a job for each incoming file is a common use case, which can be implemented using a java.nio.file.WatchService or a FileInboundChannelAdapter from Spring integration. See How do I kickoff a batch job when input file arrives? for more details.
If I understand it correctly normal way of spring batch testing is to basically run my application and let JobLauncherTestUtils run my normal jobs. However my application reads input from external service and writes it to my database. I don't want my tests to write to my production database and I'd like to specify test input to be read rather from the files I'd provide than from external service.
Can anyone direct me to some example how I could do it? I'd like to feed a job with a file then when job has finished check in the database that what I expect is there. I guess I could specify h2 db in application-test.properties but I have no clue about the input.
Docs from https://docs.spring.io/spring-batch/4.1.x/reference/html/testing.html#testing don't really cover it for me.
Are you reading input files from disk? If so you can edit the input file source directory only for tests to be within the src/test/resources/input_dir/your_test_file.xml for example.
If the input file directory is configured with properties, you could create properties file only for tests with something like classpath:input_dir/your_test_file.xml (which would be in your project as src/test/resources/input_dir/your_test_file.xml).
If the input file directory is configured within execution context you can provide that in the jobExecutionContext parameter of JobLauncherTestUtils.launchStep
We have talend-jobs triggered within Spring-boot application. Is there any way to configure the output of talend-jobs to the application log files?
One workaround we find is to write logs directly to an external file (filePath passed as context-param). But wanted to find if there is a better way to configure this seamlessly.
Not sure if I understood the question correctly, but I guess your concerns might be on what might have happened to the triggered Jobs.
Logging
With Respect to Logging for Talend, You could configure using Log4j,
https://help.talend.com/reader/5DC~TBhDsBie5JTXyVLW4g/QSGCZJKXo~uhKvZDq1DxUg
Monitoring
Regarding the Status of the Job Executed, you could get the execution details retrieved using REST Call(Talend Metaservlet API).
getTaskExecutionStatus
https://help.talend.com/reader/oYf9gKhmYrkWCiSua4qLeg/SLiAyHyDTjuznLR_F~MiQQ
By Modifying the Existing Talend Job,You could also design a like a feedback loop, ie Trigger a REST Call back to your application. With the details of Execution from Talend Job.
I have a keyword driven framework in UFT. List of scripts listed in an excel file. Driver script simply executes one script at a time. After the test run, I get this executive summary from UFT.
How can I customize the executive summary so I can see the more information on the summary like steps details, which steps passed, failed, etc.? Also how can I print the executive summary?
I have a keyword driven framework in UFT. List of scripts listed in an excel file.
Most likely, you are using only one Action in your test suite, you are seeing only 1 test execution report. This is a general drawback of having Excel driven keyword framework. What you need to do is to create your own custom test script start/stop markers and then generate a report (preferably by using a custom XSL).
Sample test result
Notice that in the sample result, the tests are anything which starts with ▬▬ UC_. I've further divided each tests into various parts grouped under ●●● TD_. You then have to read the UFT generated results.xml and convert it into meaningful result. For example, I converted it into an Excel file.
I read a lot about how to enable parallel processing and chunking of an individual job, using Master/Slave paradigm. Consider an already implemented Spring Batch solution that was intended to run on a standalone server. With minimal refactoring I would like to enable this to horizontally scale and be more resilient in production operation. Speed and efficiency is not a goal.
http://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
In the following example a Job Repository is used that connects to an initializes a database schema for the Job Repository. Job initiation requests are fed to a message queue, that a single server, with a single Java process is listening on via Spring JMS. When encountering this it executes a new Java process that is the Spring Batch job. If the job has not been started according to the Job Repository it will begin. If the job had failed it will pick up where the job left off. If the job is in process it will ignore.
The single point of failure is the single server and single listening process for job initiation. I would like to increase resiliency by horizontally scaling identical server instances all competing for who can first grab the job initiation message when it first appears in the queue. That server instance will now attempt to run the job.
I was conceiving that all instances of the JobRepository would share the same schema, so they can all query for when the status is currently in process and decide what they will do. I am unsure though if this schema or JobRepository implementation is meant to be utilized by multiple instances.
Is there a risk in pursuing this that this approach could result in deadlocking the database? There are other constraints to where the Partition features of Spring Batch will not work for my application.
I decided to build a prototype to test if the condition that the Spring Batch Job Repository schema and SimpleJobRepository can be used in a load balanced way with multiple Spring Batch Java processes running concurrently. I was afraid that deadlock scenarios might have occurred at the database to where all running job processes get stuck.
My Test
I started with the mkyong Spring Batch HelloWorld example and made some changes to it where it could be packaged into a Jar that can be executed from the command line. I also removed the initialize database step defined in the database.config file and manually established a local MySQL server with the proper schema elements. I added a Job parameter for time to be the current time in millis so that each job instance would be unique.
Next, I wrote a separate Java main class that used Apache Commons Exec framework to create 50 sub processes with no wait between them. Each of these processes have a Thread.sleep for 1 second within their Processor objects as well so that a number of processes will all kick off at the same time and all attempt to access the database at the same time.
Results
After running this test a number of times in a row I see that all 50 Spring batch processes consistently complete successfully and update the same database schema correctly. I don't see any indication that if there were multiple Spring Batch job processes running on multiple servers connecting to the same database that they would interfere with each other on the schema nor do I see any indication that a deadlock could happen at this time.
So it sounds as if load balancing of Spring Batch jobs without the use of advanced Master/Slave and Step Partitioning approaches is a valid use case.
If anybody would like to comment on my test or suggest ways to improve it I would appreciate it.
Here is excerpt from
Spring Batch docs on how Spring Batch handles database updates for its repository:
Spring Batch employs an optimistic locking strategy when dealing with updates to the database. This means that each time a record is 'touched' (updated) the value in the version column is incremented by one. When the repository goes back to save the value, if the version number has changed it throws an OptimisticLockingFailureException, indicating there has been an error with concurrent access. This check is necessary, since, even though different batch jobs may be running in different machines, they all use the same database tables.