Accessing StepExecutionContext from Item Writer in Remote Chunk Processor - spring

I've implemented a simple Spring Batch solution that reads data from an XML file and writes it to a database. The solution works well but due to scalability requirements I'm converting this simple implementation to a more scalable implementation that uses Remote Chunking. I've run into an issue attempting to access the JobExecutionContext from the remote ItemWriter. My ItemWriter is shown below (non essentials removed for brevity) and works fine in the non remote chunking implementation.
public Object myObject;
public void write(List<? extends Employee> items) throws Exception
{
// do stuff
// use myObject here
}
public void beforeStep(StepExecution stepExecution_p)
{
myObject = stepExecution_p.getJobExecution().getExecutionContext().get("myObject");
}
However when I run the remote chunking implementation the beforeStep method above isn't invoked and so I don't have a handle on myObject when the write method is later invoked. I understand that the before method is not being invoked before the step begins because the ItemReader and ItemWriter are distributed and running in separate JVMs, therefore there is no way for the beforeStep to be invoked on the remote ItemWriter.
I need to find a way of getting a handle on the steExecutionContext in my remote ItemWriter so that I can get myObject from that context. Is there a way in which I can get a handle on the stepExecutionContext in the remote Item Writer or any other way of passing useful data into the remote ItemWriter? I looked at late binding to the remote ItemWriter but apparently step scoping of remote components (ItemWriter in this case) is not supported.
<bean id="itemWriterSlave" class="com.....ItemWriterSlave" scope="step">
<property name="myObject" value="#{stepExecutionContext[myObject]}" />
</bean>
Does anyone know how I can access data passed from previous step (via the stepExecutionContext) in a remote ItemWriter? Any suggestions would be greatly appreciated.

Spring Batch provides two different ways to do multi-JVM scaling: remote partitioning and remote chunking. Remote Partitioning, while the slave steps are not part of the job that contains the master step, they are still true steps that have all the rights and responsibilities of a Spring Batch step (access to the job repository, slaves get StepExecutions, etc).
However, remote chunking is different. The slaves are not actually true steps. They are really just remotely deployed components used by the master. Because of that, they don't have StepExecutions or related ExecutionContexts. The best way to pass data from previous steps to slaves in remote chunking is to persist it somewhere outside of the job.

Related

Need some guidance with Spring Integration Flow

I am new to Spring Integration and have read quite some documentation and other topics here on StackOverflow. But I am still a bit overwhelmed on how to apply the newly acquired knowledge in a Spring Boot Application.
This is what should happen:
receive message from a Kafka topic, eg from "request-topic" (payload is a custom Job POJO). InboundChannelAdapter?
do some preparation (checkout from a git repo)
process files using a batch job
commit&push to git, update Job object with commit-id
publish message to Kafka with updated Job object, eg to "reply-topic". OutboundChannelAdapter?
Using DSL or plain Java configuration does not matter. My problem after trying several variants is that I could not achieve the desired result. For example, handlers would be called too early, or not at all, and thus the reply in step 5 would not be updated.
Also, there should only be one flow running at any given time, so I guess, a queue should be involved at some point, probably at step 1(?).
Where and when should I use QueueChannels, DirectChannel (or any other?), do I need GatewayHandlers, eg to reply with a commit-id?
Any hints are appreciated.
Something like this:
#Bean
IntegrationFlow flow() {
return IntegrationFlows.from(Kafka.inboundGateway(...))
.handle(// prep)
.transform(// to JobLaunchRequest)
.handle(// JobLaunchingGateway)
.handle(// cleanUp and return result)
.get();
}
It will only process one request at a time (with default concurrency).

Spring Batch Step Integration Testing

I'm looking for some general opinions and advice on testing a Spring batch step and step execution.
My basic step reads in from an api, processes into an entity object and then writes to a DB. I have tested the happy path, that the step completes successfully. What I now want to do is test the exception handling when data is missing at the processor stage. I could test the processor class in isolation, but I'd rather test the step as a whole to ensure the process failure is reflected correctly at step/job level.
I've read the spring batch testing guidelines and if I'm honest, I'm slightly lost within it. Is it possible to use StepScopeTestUtils.doInStepScope or updating the StepExecution to test this scenario? Ideally I'd force the reader to return faulty data before the processor kicks in.
Any advice would be greatly appreciated.
The best approach depends on the scope of your test. Reading a little between the lines here, I assume you are using a Spring IT, setting up a Spring context and using the JobLauncherTestUtils to start a job or a step.
I think the easiest way is replace one of your beans with a mock that triggers the error scenario. Using Mockito, this can be done by adding something like this to your test-configuration.
#Bean
public ReaderDataRepository dataApi(){
return mock(ReaderDataRepository.class);
}
This bean then overrides the actual implementation. In the test setup you can then configure this mock very explicitly.
#Autowired
private ReaderDataRepository mockedRepository;
#Before
public void setUp() {
when(mockedRepository.getData()).thenReturn(faultyData())
}
This involves very little manipulation of Spring 'magic' and very explicitly defines the error from within the test.

Where should i store thread dependent data within a receiving rabbitListener component in a multithreaded environment?

I'm using the annotation based approach of spring amqp in a multithreaded environment (i have multiple consumers => multiple rabbit listener containers).
#RabbitListener(queues = "tasks")
public void receiveMessage(#Payload Task task) {
// usage of httpClient here with its own httpContext (would be fine)
// this method gets called from different listenerContainers / threads
}
My component which contains the annotated receiveMessage() method needs to do some http calls with the apache http client. Since i'm working with multiple consumers at the same time, this method gets called from different threads and the apache http client documentation says that i should create a httpContext for each thread to be thread safe. Since all threads are calling the same component method i can't put the httpContext into the component.
Is there something like a listener container context for each listener container where i can put the httpClientContext? Or does somebody have an idea how to solve this easy? I thought about ThreadLocal or a central registry for httpContexts but it would be fine if this would be more easy.
There is nothing like that provided by the framework; the simplest solution is to store them in something like a LinkedBlockingQueue and check one out, use it, and put it back in the queue when you're done (creating one as necessary if the queue is empty).
ThreadLocal will work too, but I prefer to use a pool.

using Spring integration with spring batch

I have a spring batch application which reads from file , does some processing and finally write a customized output. This all happens in one step. In next step i have a tasklet which archives the input files (move to another folder). This application works fine.But, now i got a requirement to sftp output files on a remote servers where they would further processed. I got a way to sftp using spring integration where i have created a input channel which feeds to outboundchannel adapter. I put my files as payload in message and send messages to channel. The only problem i see here is that everytime I have to get the context i eed to load the spring config file, which seems kind of hackish way to do the task. Does anyone know about any way to integrate SI with SB.
Let me know if you want to see my config...
Thanks in Advance !!
code to access the same app-context without loading the spring config again
public class AppContextProvider implements ApplicationContextAware{
private static ApplicationContext ctx;
public ApplicationContext getApplicationContext() {
return ctx;
}
public void setApplicationContext(ApplicationContext appContext) throws BeansException {
ctx = appContext;
}
}
code to push the output file to sftp server
log.info("Starting transfer of outputFile : " + absoluteOutputFileName);
final File file = new File(absoluteOutputFileName);
final Message<File> message = MessageBuilder.withPayload(file).build();
AppContextProvider context = new AppContextProvider();
final MessageChannel inputChannel = context.getApplicationContext().getBean("toChannel",MessageChannel.class);
inputChannel.send(message);
log.info("transfer complete for : " + absoluteOutputFileName);
Take a look at the spring-batch-integration module within the Spring Batch project. In there, we have components for launching jobs via messages. In your situation, you'd FTP the file down then have the JobLaunchingMessageHandler launch the job.
You can also watch this video of a talk I co-presented at SpringOne a couple years ago on this topic: https://www.youtube.com/watch?v=8tiqeV07XlI
As Michael said, you'll definitely want to look at and leverage spring-batch-integration. We actually use Spring Integration as a wrapper of sorts to launch 100% of our Spring Batch jobs.
One use case we've found particularly useful is leveraging the spring-integration-file Inbound Channel Adapters to poll staging directories to indicate when a new batch file has landed. As the poller finds a new file, we then launch a new batch job using the input filename as a parameter.
This has been a real help when it comes to restartability, because we now have one job instance per file as opposed to having a job kick off at arbitrary intervals and then partition across however many files happen to be in the staging folder. Now if an exception occurs during processing, you can target a specific job for restart immediately rather than waiting for 99 of the 100 "good" files to finish first.

Access to h2 web console while running junit test in a Spring application

I'm building a Spring application and I need to inspect my H2 in-memory database while I'm running my JUnit tests from a web browser.
In my Spring configuration I have a bean which is responsible of creating my database schema and populating it with some data which will be used within my JUnit tests. I've also added a bean in my test context which creates a web server where I eventually will look for my data.
<bean id="org.h2.tools.Server-WebServer" class="org.h2.tools.Server"
factory-method="createWebServer" init-method="start" lazy-init="false">
<constructor-arg value="-web,-webAllowOthers,-webPort,11111" />
</bean>
Everything seems ok because the database is populated properly since I can access to its data from my JUnit tests and H2 Server only runs while I'm in my test-phase (I can know that, because if I try to access to my_ip:111111 before debugging my tests I cannot connnect but I can connect afterwards once I've started my tests).
Anyway If I open my H2 console from a web browser no schema is shown in it. Any ideas??
Many thanks!!
As this is probably going to be a test-debugging feature, you can add it at runtime with your #Before:
import org.h2.tools.Server;
/* Initialization logic here */
#BeforeAll
public void initTest() throws SQLException {
Server.createWebServer("-web", "-webAllowOthers", "-webPort", "8082")
.start();
}
And then connect to http://localhost:8082/
Note: unless you need this to run as part of your CI build, you'll need to remove this code when you're finished debugging
For future reference here's another way to do it:
Start database and web servers (version can differ):
$ cd .../maven_repository/com/h2database/h2/1.4.194
$ java -cp h2-1.4.194.jar org.h2.tools.Server -tcp -web -browser
TCP server running at tcp://169.254.104.55:9092 (only local connections)
Web Console server running at http://169.254.104.55:8082 (only local connections)
Set database url for tests in code to jdbc:h2:tcp://localhost:9092/mem:mytest.
Run or debug tests.
Click Connect in browser window which opened in step 1.
Jar file for H2 can be downloaded at https://mvnrepository.com/artifact/com.h2database/h2.
Server can be started via #Before in test file like in snovelli's answer, but only in case connection to database in established afterwards, which might be a problem.
I guess the problem is that you are connecting to h2db directly from your application. Not through the server you are launching with bean. Because of this your app and h2db-web-interface can't share one in-memory database.
You should change jdbcUrl in tests to something like jdbc:h2:tcp://localhost/mem:my_DB;DB_CLOSE_DELAY=-1;MODE=Oracle and in browser you should connect to the same url.
With jdbc urls like jdbc:h2:tcp://localhost/... all connections will go through the h2db-server and you can view database state in browser.
If you have defined the jdbc url to something like jdbc:h2:mem:db in your properties, when the database is created it actually gets a bit longer name.
Add a #Autowired DataSource dataSource to your test class, set a debug point somewhere, and inspect that datasource with dataSource.getConnection() and look at the url property. In the case I'm running right this moment, it is
jdbc:h2:mem:43ed83d6-97a1-4515-a925-a8ba53cd322c
Plugging that into the web cosole shows everything I'm expecting.
It isn't the most straightforward way, but it does work.
#snovelli answer above is good.
To debug a particular test case in your IDE, add a infinite loop at the end of the test case and go to browser and launch the console and you can query the data.
Something like below
import org.h2.tools.Server;
/* Initialization logic here */
#BeforeAll
public void initTest() throws SQLException {
Server.createWebServer("-web", "-webAllowOthers", "-webPort", "8082")
.start();
}
#Test
void testMyDBOperation() {
//some db operations like save and get
while(true) {
}
}
now you can go to browser and launch the console at http://localhost:8082/
Of course delete above two changes after debugging
It is not the answer, but a debugging tip.
When you finally access h2-conole http://127.0.0.1:8082/ you may notice that database changes are not shown.
This is because the test cases are not transactional and the data is not committed. Although this behavior is good, as each test case, must run in predefined environment. It is not good if you want to debug and see database changes.
To achieve this, add #Commit annotation above test case and put a dummy line in a #AfterAll annotated method, to stop test and let you see the h2 console ( The h2 server will stop as the test finish).
#AfterAll
public static void finalizeTest() throws Exception {
System.out.print("Just put a break point here");
}
#Test
#Commit
void should_store_an_article() {
// Your test here
}

Resources