creating a pojo/ejb with spring 3 that always runs in the background - spring

I have created apps in the past that would have web pages that would call the persistence layer to get some query results or to insert, delete, etc against a db. However, nothing was left running in the background except for the persistence layer. Now I need to develop an app that has an process that is always running in the background, which is waiting for messages to come thru a zeromq messaging system (cannot change this at this point). I am a little lost as to how to setup the object so that it can always be running and yet I can control or query the results from the object.
Is there any tutorial/examples that covers this configuration?
Thanks,

You could use some kind of timer, to start a method every second to look at a specific ressource and process the input taken from that.
If you use Spring than you could have a look at the #Scheduled annotation.
If your input is some kind of java method invokation, than have a look at the java.util.concurrent Package, and concurrent programming at all. -- But be aware of the fact, that there are some restictions one creating own Threads in an EJB environment.

Related

Spring Boot BufferedApplicationStartup drained after first request

I am using the great new ApplicationStartup recording feature of Spring 5.3 / Spring Boot 2.4, with BufferedApplicationStartup (only provided by SB 2.4).
However,
on the very first access to the /startup endpoint I seem to get all startup events, but
on subsequent calls to the endpoint I only get 3 (exactly 3) new events.
Is this documented somewhere? Is it configurable? It would be great if the data was not lost after the first call. Or is it a bug?
This is the expected behavior here as we are draining the buffer before sending startup events over the wire. Usage of the HTTP POST method also show that this method is not free of side effects.
This aspect could be better documented so feel free to create an issue.
The goal here is to free memory from those buffered events as soon as possible since there might be many. The Java Flight Recorder implementation is also interesting if you wish to record startup events and get more information like GC and class loading.
Once the application is available, most of the startup events should be there. If your application has lazy components, you won’t get that data until they’re called, which can happen anytime during the application runtime.

Components blocking each other

I'm a newbie in developing with SpringBoot, so maybe this question will sound silly, but I hope someone will be able to show me the error of my ways.
I have a SpringBoot application with 4 Components. One of them processes input files and writes records into database, while the other three read the records, process them according to component configuration and send the result to a Web Service.
My problem is that sometimes there's a lot of files to be processed, so the "reader" Component takes a bit longer to finish. What I've noticed is that, while it's running, none of the other Components are starting. Since the data is relatively time sensitive, it is important for me that the processing Components run periodically and asynchronously.
I have used #EnableAsync in the main Application and I've marked all Components as #Async, yet the blocking problem still occurs. I was under the impression that the scheduled Components will be executed independent of each other. There are no shared resources between Components and even if they were, I would understand that a Component starts and blocks, but the Components (threads) are not started at all (I have a trace entry as a first line of each Component).
What should I look at? Is this how it is supposed to work? If yes, I will then start async threads from the Components or find another way, but I thought I can get by without all of that by using SpringBoot.
Any and all answers will be very much appreciated!

Spring Batch (Boot) - Using custom app data directory for application configuration - App uses previous run data and not current

I have a Spring Boot / Batch app. I want to use an "app data directory" (not the same as a properties file) versus a db based datastore (ie: SQL/Mongo).
The data stored in the app data directory is aggregated from several webservices and stored as XML. Each Step within the Job will fetch data and write locally, then the next Step in the chain will pick up the created Files and process for the next step (and so on).
The problem here, is each Step will only fetch previous app run data. For example, the data at app start time and not directly after the Step execution.
I understand what is happening here, that Spring is checking for any resources at launch and using them as-is before the Step actually is run.
Is there a magic trick to requesting Spring to stop loading specified resources/Files at app launch?
Note: Using Java Config, not XML and the latest Spring/Boot/Batch versions, also tried #StepScope for all reader/writers
Repo: https://github.com/RJPalombo/salesforceobjectreplicator
Thanks in advance!
No, there is no magic :-)
Firstly, your code is very well structured and easy to understand.
The first thing, that pops in my eyes is: Why aren't you using the standard readers and writers from springbatch (FlatFileItemReader/Writer, StaxReader/Writer). There is no need to implement this logic by yourself.
As far as I see, the problem is that you load the whole data in the constructor of the readers.
The whole job-structure (together with step, reader, writer, and processor instances) is created when the spring context is loaded, way before the job actually is executed.
Therefore, the reader just read empty files.
The simplest fix you could make is to implement the "ItemStream" interface for all your readers and writers.
And then reading the data in the open method, instead of the constructur. The open method is called, just before the steps get executed.
But that is only a quick fix and only helps to understand the behaviour of springbatch. The problem with this approach is, that all data is loaded at once, which means, that the memory usage will increase with the amount of data; hence, the memory would blow up when reading lots of data. Something we don't want to have when doing batch processing.
So, I'll recommend that you have a look at the standard readers and writers. Have a look how they work, debug into them. See when the open/close methods are called; check what happens when the read method is called and what it does.
It is not that complicated and having a look at your code, I'm sure that you are able to understand this very fast.

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

performance of accessing a mono server application via remoting

This is my setting: I have written a .NET application for local client machines, which implements a feature that could also be used on a webpage. To keep this example simple, assume that the client installs a software into which he can enter some data and gets some data back.
The idea is to create a webpage that holds a form into which the user enters the same data and gets the same results back as above. Due to the company's available web servers, the first idea was to create a mono webservice, but this was dismissed for reasons unknown. The "service" is not to be run as a webservice, but should be called by a PHP script. This is currently realized by calling the mono application via shell_exec from PHP.
So now I am stuck with a mono port of my application, which works fine, but takes way too long to execute. I have already stripped out all unnecessary dlls, methods etc, but calling the application via the command line - submitting the desired data via commandline parameters - takes approximately 700ms. We expect about 10 hits per second, so this could only work when setting up a lot of servers for this task.
I assume the 700m are related to the cost of starting the application every time, because it does not make much difference in terms of time if I handle the request only once or five hundred times (I take the original input, vary it slighty and do 500 iterations with "new" data every time. Starting from the second iteration, the processing time drops down to approximately 1ms per iteration)
My next idea was to setup the mono application as a remoting server, so that it only has to be started once and can then handle incoming requests. I therefore wrote another mono application that serves as the client. Calling the client, letting the client pass the data to the server and retrieving the result now takes 344ms. This is better, but still way slower than I would expect and want it to be.
I have then implemented a new project from scratch based on this blog post and get stuck with the same performance issues.
The question is: am I missing something related to the mono-projects that could improve the speed of the client/server? Although the idea of creating a webservice for this task was dismissed, would a webservice perform better under these circumstances (as I would not need the client application to call the service), although it is said that remoting is faster than webservices?
I could have made that clearer, but implementing a webservice is currently not an option (and please don't ask why, I didn't write the requirements ;))
Meanwhile I have checked that it's indeed the startup of the client, which takes most of the time in the remoting scenario.
I could imagine accessing the server via pipes from the command line, which would be perfectly suitable in my scenario. I guess this would be done using sockets?
You can try to use AOT to reduce the startup time. On .NET you would use ngen for that purpoise, on mono just do a mono --aot on all assemblies used by your application.
AOT'ed code is slower than JIT'ed code, but has the advantage of reducing startup time.
You can even try to AOT framework assemblies such as mscorlib and System.
I believe that remoting is not an ideal thing to use in this scenario. However your idea of having mono on server instead of starting it every time is indeed solid.
Did you consider using SOAP webservices over HTTP? This would also help you with your 'web page' scenario.
Even if it is a little to slow for you in my experience a custom RESTful services implementation would be easier to work with than remoting.

Resources