Spring-Boot: scalability of a component - spring

I am trying Spring Boot and think about scalabilty.
Lets say I have a component that does a job (e.g. checking for new mails).
It is done by a scheduled method.
e.g.
#Component
public class MailMan
{
#Scheduled (fixedRateString = "5000")
private void run () throws Exception
{ //... }
}
Now the application gets a new customer. So the job has to be done twice.
How can I scale this component to exist or run twice?

Interesting question but why Multiple components per customer? Can scheduler not pull the data for every customer on scheduled run and process the record for each customer? You component scaling should not be decided based on the entities evolved in your application but the resources utilization by the component. You can have dedicated components type for processing the messages for queues and same for REST. Scale them based on how much each of them is getting utilized.

Instead of using annotations to schedule a task, you could do the same thing programmatically by using a ScheduledTaskRegistrar. You can register the same bean multiple time, even if it is a singleton.
public class SomeSchedulingConfigurer implements SchedulingConfigurer {
private final SomeJob someJob; <-- a bean that is Runnable
public SomeSchedulingConfigurer(SomeJob someJob) {
this.someJob = someJob;
}
#Override
public void configureTasks(#NonNull ScheduledTaskRegistrar taskRegistrar) {
int concurrency = 2;
IntStream.range(0, concurrency)).forEach(
__ -> taskRegistrar.addFixedDelayTask(someJob, 5000));
}
}
Make sure the thread executor you are using is large enough to process the amount of jobs concurrently. The default executor has exactly one thead :-). Be aware that this approach has scaling limits.
I also recommend to add a delay or skew between jobs, so that not all jobs run at exactly the same moment.
See SchedulingConfigurer
and
ScheduledTaskRegistrar
for reference.

The job needs to run only once even with multiple customers. The component itself doesn't need to scale at all. It just a mechanism to "signal" that some logic needs to be run at some moment in time. I would keep the component really thin and just call the desired business logic that handles all the rest e.g.
#Component
public class MailMan {
#Autowired
private NewMailCollector newMailCollector;
#Scheduled (fixedRateString = "5000")
private void run () throws Exception {
// Collects emails for customers
newMailCollector.collect();
}
}
If you want to check for new e-mails per customer you might want to avoid using scheduled tasks in a backend service as it will make the implementation very inflexible.
Better make an endpoint available for clients to call to trigger that logic.

Related

Spring boot test: Wait for microservices schedulder task

I'm trying to test a service which's trying to communicate with other one.
One of them generates auditories which are stored on memory until an scheduled task flushs them on a redis node:
#Component
public class AuditFlushTask {
private AuditService auditService;
private AuditFlushTask(AuditService auditService) {
this.auditService = auditService;
}
#Scheduled(fixedDelayString = "${fo.audit-flush-interval}")
public void flushAudits() {
this.auditService.flush();
}
}
By other hand, this service provide an endpoint stands for providing those flushed auditories:
public Collection<String> listAudits(
) {
return this.boService.listRawAudits(deadlineTimestamp);
}
The problem is I'm building an integration test in order to check if this process works right, I mean, if audits are well provided.
So, I don't know how to "wait until audits has been flushed on microservice".
Any ideas?
Don't test the framework: Spring almost certainly has tests which test fixed delays.
Instead, keep all logic within the service itself, and integration test that in isolation from the Spring #Scheduled function.

Accessing Spring objects from a dynamically created persistent Quartz job

Spring Boot 1.5
Quartz 2.2
I dynamically create and schedule Quartz jobs during runtime with a Quartz-configured as a jdbc-job-store. These jobs need to be persistent between app executions.
During the job execution, I need access to the full Spring context (Spring-managed beans and JPA transactions).
However, if I try to Autowire anything into my job, then I get an error like..
"Unsatisfied dependency expressed through field myAutowiredField"
I can't figure this out. I have found tons of examples of people showing how to get autowiring to work in a Quartz job, but almost all of these just have a static, hard-coded job. Not a job dynamically created at runtime.
The example at the following URL comes the closest. It dynamically creates jobs and autowiring works great in them. However, it's a ram job store. As soon as I switch to jdbc, I'm back to square one.
https://icecarev.com/2016/11/05/spring-boot-1-4-and-quartz-scheduling-runtime-created-job-instances-from-a-configuration-file/
I have also looked at these..
Spring + Hibernate + Quartz: Dynamic Job
inject bean reference into a Quartz job in Spring?
.... etc.
But again, their solutions all seem to be missing something. For example, they rely on static jobs, triggers, or just plain don't work, etc.
Anybody have any tips or links to up-to-date resources?
Thanks!
Edit 1
Something happens to the spring context when the job is fired. Here's some code to illustrate.
In the first autowireBean() call (this is done during the Spring Boot configuration), it doesn't throw an error. NOTE: At this point, there's no use for this, I'm just showing that it does 'work' here.
In the second autowireBean() call (this is when the job is fired), it fails. This is the 'real' call.
public class AutowiringSpringBeanJobFactory extends SpringBeanJobFactory
{
private transient AutowireCapableBeanFactory beanFactory;
public AutowiringSpringBeanJobFactory(ApplicationContext context)
{
super();
this.beanFactory = context.getAutowireCapableBeanFactory();
MyJobClass job = new MyJobClass();
beanFactory.autowireBean(new MyJobClass()); /** no problem **/
beanFactory.initializeBean(job, "job");
}
#Override
protected Object createJobInstance(final TriggerFiredBundle bundle) throws Exception
{
final Object job = super.createJobInstance(bundle); /* job is an instance MyJobClass.. same as above */
beanFactory.autowireBean(job); /** "Unsatisfied dependency" exception **/
beanFactory.initializeBean(job, "job");
return job;
}
}
Edit 2
Well, I seem to have got it working, however, I don't know if there will be consequences.
Here's the culprit in org.springframework.scheduling.quartz.AdaptableJobFactory
protected Object createJobInstance(TriggerFiredBundle bundle) throws Exception {
return bundle.getJobDetail().getJobClass().newInstance();
}
It seems that bundle.getJobDetail().getJobClass() returns some 'proxy' type of my job class. But basically it 'says' it's returning my correct Job class, but it's different. It has the same name, but I can't cast it. For example...
Object job = bundle.getJobDetail().getJobClass().newInstance();
MyJobClass myJob = (MyJobClass) job;
Throws an error saying that I can't cast com.company.MyJobClass to com.company.MyJobClass.
So here's my 'fix'...
#Override
protected Object createJobInstance(final TriggerFiredBundle bundle) throws Exception
{
String className = bundle.getJobDetail().getJobClass().getName();
Object job = Class.forName(className).newInstance();
beanFactory.autowireBean(job);
job = beanFactory.applyBeanPostProcessorsAfterInitialization(job, "job"); // Without this, #Transactional in my job bean doesn't work
beanFactory.initializeBean(job, "job");
return job;
}
Since I'm not calling super() anymore, I lose a handy feature of SpringBeanJobFactory, but I'm OK with that for the moment.
I guess bottom line is that autowireBean() needs to be called before this sucker gets wrapped in a transactional proxy.
When you mention "Not a job dynamically created at runtime.", I'm going to assume that at some point you do something similar to: MyJob job = new MyJob();
If you want to inject dependencies in MyJob using Spring's #Autowire, one approach that comes to my mind is:
Annotate MyJob class with: #Configurable(dependencyCheck = true)
Run the Java process using a Java agent: java -javaagent:<path to spring-agent-${spring.version}.jar> ...

Reset state before each Spring scheduled (#Scheduled) run

I have a Spring Boot Batch application that needs to run daily. It reads a daily file, does some processing on its data, and writes the processed data to a database. Along the way, the application holds some state such as the file to be read (stored in the FlatFileItemReader and JobParameters), the current date and time of the run, some file data for comparison between read items, etc.
One option for scheduling is to use Spring's #Scheduled such as:
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
jobRunner.runJob(); //runs the batch job by calling jobLauncher.run(job, jobParameters);
}
The problem here is that the state is maintained between runs. So, I have to update the file to be read, the current date and time of the run, clear the cached file data, etc.
Another option is to run the application via a unix cron job. This will obviously meet the need to clear state between runs but I prefer to tie the job scheduling to the application instead of the OS (and prefer it to OS agnostic). Can the application state be reset between #Scheduled runs?
You could always move the code that performs your task (and more importantly, keeps your state) into a prototype-scoped bean. Then you can retrieve a fresh instance of that bean from the application context every time your scheduled method is run.
Example
I created a GitHub repository which contains a working example of what I'm talking about, but the gist of it is in these two classes:
ScheduledTask.java
Notice the #Scope annotation. It specifies that this component should not be a singleton. The randomNumber field represents the state that we want to reset with every invocation. "Reset" in this case means that a new random number is generated, just to show that it does change.
#Component
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
class ScheduledTask {
private double randomNumber = Math.random();
void execute() {
System.out.printf(
"Executing task from %s. Random number is %f%n",
this,
randomNumber
);
}
}
TaskScheduler.java
By autowiring in ApplicationContext, you can use it inside the scheduledTask method to retrieve a new instance of ScheduledTask.
#Component
public class TaskScheduler {
#Autowired
private ApplicationContext applicationContext;
#Scheduled(cron = "0/5 * * * * *")
public void scheduleTask() {
ScheduledTask task = applicationContext.getBean(ScheduledTask.class);
task.execute();
}
}
Output
When running the code, here's an example of what it looks like:
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#329c8d3d. Random number is 0.007027
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#3c5b751e. Random number is 0.145520
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#3864e64d. Random number is 0.268644
Thomas' approach seems to be a reasonable solution, that's why I upvoted it. What is missing is how this can be applied in the case of a spring batch job. Therefore I adapted his example little bit:
#Component
public class JobCreatorComponent {
#Bean
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public Job createJob() {
// use the jobBuilderFactory to create your job as usual
return jobBuilderFactory.get() ...
}
}
your component with the launch method
#Component
public class ScheduledLauncher {
#Autowired
private ... jobRunner;
#Autwired
private JobCreatorComponent creator;
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
// it would probably make sense to check the applicationContext and
// remove any existing job
creator.createJob(); // this should create a complete new instance of
// the Job
jobRunner.runJob(); //runs the batch job by calling jobLauncher.run(job, jobParameters);
}
I haven't tried out the code, but this is the approach I would try.
When constructing the job, it is important to ensure that all reader, processors and writers used in this job are complete new instances as well. This means, if they are not instantiated as pure java objects (not as spring beans) or as spring beans with scope "step" you must ensure that always a new instance is used.
Edited:
How to handle SingeltonBeans
Sometimes singleton beans cannot be prevented, in these cases there must be a way to "reset" them.
An simple approach would be to define an interface "ResetableBean" with a reset method that is implemented by such beans. Autowired can then be used to collect a list of all such beans.
#Component
public class ScheduledLauncher {
#Autowired
private List<ResetableBean> resetables;
...
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
// reset all the singletons
resetables.forEach(bean -> bean.reset());
...

Configuring multiple tasks with Spring scheduler having configurable triggers

I need some help with some design as well as spring scheduler related code. I am trying to write few utility classes, where by all the tasks (which are going to do some async processing) are scheduled at regular intervals.
So, I create a Task class which has a execute() method, which needs to be implemented by each task.
public interface Task
{
void execute();
}
public class TaskOne implements Task
{
#Override
public void execute()
{
// do something
}
}
public class TaskTwo implements Task
{
#Override
public void execute()
{
// do something
}
}
#Component
public class ScheduledTasks
{
#Scheduled(fixedRate = 5000)
public void runTask()
{
Task taskOne = new TaskOne();
taskOne.execute();
Task taskTwo = new TaskTwo();
taskTwo.execute();
}
}
I need to run different tasks at different interval, which I want to be configurable without the need of restarting the server. By this, I mean that time specified here can be changed without a restart.
#Scheduled(fixedRate = configurable, with some initial value)
Normally, what is the best way of doing this ?
One approach I can think of is:
1. Keep the trigger (periodic or cron) for each task in db
2. Load these triggers at the time of start up
3. Somehow, annotate the execute() methods with the proper value. This is going to be technically stiff.
4. If we want to change the job interval timing, update the db and refresh the caches.
Also, if some one can share some code snippet or suggest something like a small library, that would be highly appreciated. Thanks.
For configuration you don't need ScheduledTasks. Instead, you can annotate every task as #Component and annotate execute method with #Scheduled.
#Componenet
public class TaskOne implements Task
{
#Override
#Scheduled(...)
public void execute()
{
// do something
}
}
As far as updating this question. You'll have to figure out how to differentiate for different tasks.

Web API concurrency and scalability

We are faced with the task to convert a REST service based on custom code to Web API. The service has a substantial amount of requests and operates on data that could take some time to load, but once loaded it can be cached and used to serve all of the incoming requests. The previous version of the service would have one thread responsible for loading the data and getting it into the cache. To prevent the IIS from running out of worker threads clients would get a "come back later" response until the cache was ready.
My understanding of Web API is that it has an asynchronous behavior built in by operating on tasks, and as a result the number of requests will not directly relate to the number of physical threads being held.
In the new implementation of the service I am planning to let the requests wait until the cache is ready and then make a valid reply. I have made a very rough sketch of the code to illustrate:
public class ContactsController : ApiController
{
private readonly IContactRepository _contactRepository;
public ContactsController(IContactRepository contactRepository)
{
if (contactRepository == null)
throw new ArgumentNullException("contactRepository");
_contactRepository = contactRepository;
}
public IEnumerable<Contact> Get()
{
return _contactRepository.Get();
}
}
public class ContactRepository : IContactRepository
{
private readonly Lazy<IEnumerable<Contact>> _contactsLazy;
public ContactRepository()
{
_contactsLazy = new Lazy<IEnumerable<Contact>>(LoadFromDatabase,
LazyThreadSafetyMode.ExecutionAndPublication);
}
public IEnumerable<Contact> Get()
{
return _contactsLazy.Value;
}
private IEnumerable<Contact> LoadFromDatabase()
{
// This method could be take a long time to execute.
throw new NotImplementedException();
}
}
Please do not put too much value in the design of the code - it is only constructed to illustrate the problem and is not how we did it in the actual solution. IContactRepository is registered in the IoC container as a singleton and is injected into the controller. The Lazy with LazyThreadSafetyMode.ExecutionAndPublication ensures only the first thread/request is running the initialization code, the following rquests are blocked until the initialization completes.
Would Web API be able to handle 1000 requests waiting for the initialization to complete while other requests not hitting this Lazy are being service and without the IIS running out of worker threads?
Returning Task<T> from the action will allow the code to run on the background thread (ThreadPool) and release the IIS thread. So in this case, I would change
public IEnumerable<Contact> Get()
to
public Task<IEnumerable<Contact>> Get()
Remember to return a started task otherwise the thread will just sit and do nothing.
Lazy implementation while can be useful, has got little to do with the behaviour of the Web API. So I am not gonna comment on that. With or without lazy, task based return type is the way to go for long running operations.
I have got two blog posts on this which are probably useful to you: here and here.

Resources