Running 5 Spring boot application as standalone vs Deploying all in one Tomcat - spring

Am having 5 Spring boot applications and I have to run them in a single machine.
Which is the most efficient way to run these
Running all the micro services separately as different micro services
Deploy all together in tomcat cat server as WAR
I dont have any special requirement. I just want to know which is efficient in terms of Memory, I/O , Processing, Latency, Scalability

For example, in case when one of your apps fails you should restart only this one with Spring Boot. Otherwise, you should restart entire Tomcat server with all apps installed.
Another things are processing and scalability: if you have separated modules which are independent for each other (apps in your case), it's always simpler to maintain and modify them with no affect to another one (see item #1).
In terms of memory, I/O and latency see item #1 - it's always easier to have a deal with one separated app than with large number of them at once. :)
So if your apps are loosely coupled (by functionality, for example) your case may be a Spring Boot, otherwise you can use Tomcat server.

Related

How do I run a single JVM with multiple spring boot applications?

Lets say I have 25 spring boot micro-services each of which starts with 1GB JVM in production. At any given time not all are in use and there is no instance when they are using the full 25GB memory at once. In reality many of them will sit idle 90% of the time but any of them might at some point get called and require up to 1GB memory.
In my development environment I would like to run all of them at once but only have 8GB memory. I don't need great performance but I need them all to run at the same time for the entire app to work. I would like to try to run all the applications within a single JVM with 6GB dedicated memory. That should be enough at any given time.
This seems like it would be a common issue as many companies are converting to cloud/microservices. 10 years ago we would have one monolithic app with single JVM (easy to run in dev environment). Now we have dozens of small apps which might not need a ton of memory but they each run in their own JVM so each has a good amount of overhead. This actually makes development more complex rather than simplifying. So Im trying to find a solution for our developers where they can run everything but not kill the memory on their machines.
The spring boot apps need to run without modification aside from
maybe local profiles. Otherwise developers would have to make tons of changes every time they pull the code from git
Each project needs to be able to configure a different port (application-local.properties setting)
for tomcat.
Each project needs its own classpath entries (for instance one might use version 1.0 of a jar and another might use version 2.0 and without separate classpaths one or the other would break)
I have been trying to follow this post but its not 100% what I want. I feel like a proper solution should respect the application.properties / application-local.properties file and use the port set inside the project rather than having to hardcode any configuration outside the project. Essentially his post is starting a separate thread for each microservice and attaching a separate classloader to each thread. Then calling SpringApplication.run and passing in the classname that would normally be used to start the microservice. I think this is maybe ignoring the auto configuration properties.
Any help would be greatly appreciated!
You can manage how much resources your applications are consuming with docker. One spring boot application should be one docker container. You can at runtime change how much resources(in your case memory) container use. Take a look at this
article on how to at runtime change resource allocation in docker. Also, with kubernetes is possible to define minimum and maximum resources that your application needs.

ETL in Java Spring Batch vs Apache Spark Benchmarking

I have been working with Apache Spark + Scala for over 5 years now (Academic and Professional experiences). I always found Spark/Scala to be one of the robust combos for building any kind of Batch or Streaming ETL/ ELT applications.
But lately, my client decided to use Java Spring Batch for 2 of our major pipelines :
Read from MongoDB --> Business Logic --> Write to JSON File (~ 2GB | 600k Rows)
Read from Cassandra --> Business Logic --> Write JSON File (~ 4GB | 2M Rows)
I was pretty baffled by this enterprise-level decision. I agree there are greater minds than mine in the industry but I was unable to comprehend the need of making this move.
My Questions here are:
Has anybody compared the performances between Apache Spark and Java Spring Batch?
What could be the advantages of using Spring Batch over Spark?
Is Spring Batch "truly distributed" when compared to Apache Spark? I came across methods like chunk(), partition etc in offcial docs but I was not convinced of its true distributedness. After all Spring Batch is running on a single JVM instance. Isn't it ???
I'm unable to wrap my head around these. So, I want to use this platform for an open discussion between Spring Batch and Apache Spark.
As the lead of the Spring Batch project, I’m sure you’ll understand I have a specific perspective. However, before beginning, I should call out that the frameworks we are talking about were designed for two very different use cases. Spring Batch was designed to handle traditional, enterprise batch processing on the JVM. It was designed to apply well understood patterns that are common place in enterprise batch processing and make them convenient in a framework for the JVM. Spark, on the other hand, was designed for big data and machine learning use cases. Those use cases have different patterns, challenges, and goals than a traditional enterprise batch system, and that is reflected in the design of the framework. That being said, here are my answers to your specific questions.
Has anybody compared the performances between Apache Spark and Java Spring Batch?
No one can really answer this question for you. Performance benchmarks are a very specific thing. Use cases matter. Hardware matters. I encourage you to do your own benchmarks and performance profiling to determine what works best for your use cases in your deployment topologies.
What could be the advantages of using Spring Batch over Spark?
Programming model similar to other enterprise workloads
Enterprises need to be aware of the resources they have on hand when making architectural decisions. Is using new technology X worth the retraining or hiring overhead of technology Y? In the case of Spark vs Spring Batch, the ramp up for an existing Spring developer on Spring Batch is very minimal. I can take any developer that is comfortable with Spring and make them fully productive with Spring Batch very quickly. Spark has a steeper learning curve for the average enterprise developer, not only because of the overhead of learning the Spark framework but all the related technologies to prodictionalize a Spark job in that ecosystem (HDFS, Oozie, etc).
No dedicated infrastructure required
When running in a distributed environment, you need to configure a cluster using YARN, Mesos, or Spark’s own clustering installation (there is an experimental Kubernetes option available at the time of this writing, but, as noted, it is labeled as experimental). This requires dedicated infrastructure for specific use cases. Spring Batch can be deployed on any infrastructure. You can execute it via Spring Boot with executable JAR files, you can deploy it into servlet containers or application servers, and you can run Spring Batch jobs via YARN or any cloud provider. Moreover, if you use Spring Boot’s executable JAR concept, there is nothing to setup in advance, even if running a distributed application on the same cloud-based infrastructure you run your other workloads on.
More out of the box readers/writers simplify job creation
The Spark ecosystem is focused around big data use cases. Because of that, the components it provides out of the box for reading and writing are focused on those use cases. Things like different serialization options for reading files commonly used in big data use cases are handled natively. However, processing things like chunks of records within a transaction are not.
Spring Batch, on the other hand, provides a complete suite of components for declarative input and output. Reading and writing flat files, XML files, from databases, from NoSQL stores, from messaging queues, writing emails...the list goes on. Spring Batch provices all of those out of the box.
Spark was built for big data...not all use cases are big data use cases
In short, Spark’s features are specific for the domain it was built for: big data and machine learning. Things like transaction management (or transactions at all) do not exist in Spark. The idea of rolling back when an error occurs doesn’t exist (to my knowledge) without custom code. More robust error handling use cases like skip/retry are not provided at the level of the framework. State management for things like restarting is much heavier in Spark than Spring Batch (persisting the entire RDD vs storing trivial state for specific components). All of these features are native features of Spring Batch.
Is Spring Batch “truly distributed”
One of the advantages of Spring Batch is the ability to evolve a batch process from a simple sequentially executed, single JVM process to a fully distributed, clustered solution with minimal changes. Spring Batch supports two main distributed modes:
Remote Partitioning - Here Spring Batch runs in a master/worker configuration. The masters delegate work to workers based on the mechanism of orchestration (many options here). Full restartability, error handling, etc. is all available for this approach with minimal network overhead (transmission of metadata describing each partition only) to the remote JVMs. Spring Cloud Task also provides extensions to Spring Batch that allow for cloud native mechanisms to dynamically deploying the workers.
Remote Chunking - Remote chunking delegates only the processing and writing phases of a step to a remote JVM. Still using a master/worker configuration, the master is responsible for providing the data to the workers for processing and writing. In this topology, the data travels over the wire, causing a heavier network load. It is typically used only when the processing advantages can surpass the overhead of the added network traffic.
There are other Stackoverflow answers that discuss these features in further detail (as does as the documentation):
Advantages of spring batch
Difference between spring batch remote chunking and remote partitioning
Spring Batch Documentation

Update for JavaEE application

Our application are built on Spring boot, the app will be packaged to a war file and ran with java -jar xx.war -Dspring.profile=xxx. Generally the latest war package will served by a static web server like nginx.
Now we want to know if we can add auto-update for the application.
I have googled, and people suggested to use the Application server which support hot deployment, however we use spring boot as shown above.
I have thought to start a new thread once my application started, then check update and download the latest package. But I have to terminate the current application to start the new one since they use the same port, and if close the current app, the update thread will be terminated too.
So how to you handle this problem?
In my opinion that should be managed by some higher order dev-ops level orchestration system not by either the app nor its container. The decision to replace an app should not be at the dev-ops level and not the app level
One major advantage of spring-boot is the inversion of the traditional application-web-container to web-app model. As such the web container is usually (and best practice with Spring boot) built within the app itself. Hence it is fully self contained and crucially immutable. It therefore should not be the role of the app-web-container/web-app to replace either part-of or all-of itself.
Of course you can do whatever you like but you might find that the solution is not easy because it is not convention to do it in this way.

When do we need to run a Java application in a container?

Lately I started to learn Java EE and related technologies and there are some concepts which confuse me. Somewhere I read that whenever one is building a Java EE application then it is sort of mandatory to use a container.
Currently, I am learning Spring framework and trying to build a small application with it to get hands-on. Now in that I am not sure if it is mandatory for me to use a container (say Tomcat) or it depends application which I am building that I need a container or not.
If it depends on the application that one is building, then what are the factors which help to decide whether a container should be used or not?
Puuhhh, this is a very big question and there is no simple answer. But I will do my best to explain my own opinion at least:
What are containers?
Containers provide functionality to you. Such a functionality can be to handle web request and dispatch them to servlets - in this case we call them servlet containers (e.g. Tomcat or Jetty).
But containers can also provide other things, e.g. they can provide user authentication, logging or the connection to a database. Most containers (e.g. Tomcat) do multiple of those things (e.g. Tomcat does all I mentioned). Some containers do more then others, e.g. JBoss can do much more than Tomcat.
Trade Off
However, there is a trade off: If you use a simple container (like Tomcat), you need to do a lot of things on you own or by using other Frameworks (like Spring). But if you use a powerful container, you must know the container very well and the chance is high that your application will depend on this concrete container sooner or later.
The point, that using a container is not mandatory. It is a decision. Some people will argue for it, others against it. But depending on the books you read, this decision is already made (e.g. J2EE needs a J2EE container, that's how it works).
The trend (IMHO)
Years ago the trend was to use big and powerful (J2EE) containers which provide as much as possible. IMHO the trend today is to use smaller and light-way solutions. Most developers would prefer to use a Tomcat server instead of a JBoss server today.
Frameworks without containers
While J2EE needs a container, there are other frameworks/technologies which supports the development of web applications without any external container. Such frameworks are Play! or Spark Java.
Note
If you are not familiar with containers and Spring, take care to don't get confused. Most applications you will develop with Spring are web applications which will be deployed to a servlet container. This is very common. But Spring doesn't relay on that. You can also use Spring without such a container, e.g. to develop a desktop application. But if you want to develop a web application, the Java-way is to use a servlet container.
If your application is using servlets, you'll need a container to handle the requests. Tomcat is a very popular choice.
I'll anticipate your next topic to cover with this discussion of "application server" versus "container."
There are two containers. One is Web Container (IIS, Apache) to run Web Applications and another is "Application Container" to run Enterprise Applications.
Web Applications = Apps developed using HTML, XML, CSS and JSPs
Enterprise Applications = Apps developed used JAVA, J2E and Serverlets in addition to HTML and XML.

Application Server for non-Web Spring/Hibernate Application

We are developing a open source trading platform based on Springframework and Hibernate http://code.google.com/p/algo-trader/ and http://www.algotrader.ch. The application consists of a trading framework and several strategies that can be started independently. So far, these different parts have been running in separate JVM's communicating through RMI and JMS.
To avoid unnecessary serialization and network overhead we would like to run the entire application within some sort of container (potentially an application server). We do however have the requirement, that the individual parts of the application can be deployed, started and stopped independently.
We have looked into OSGi, but a lot of the libraries that we use are not OSGi ready yet, so this is not currently an option. Also please note, there is no web-GUI in our application.
Any suggestions on this?
Thanks
Andy
If OSGI is not an option then functionality can be broken into smaller units and then deploy them as utility jar, if deployed as utility jar they can be managed independently.
For application server I feel either glassfish or Jboss will be a good option considering they are open source and free.
Though at a later point in time you can check with Weblogic (Dev free).
So in your case you would like to break the static data configuration(Counterparty, Currencies), Dealing(Pricing, Quoting, Booking) as two separate feature.
For your choose of an application server i advise you Jboss and specially in his version 7.1 which is faster and more stable!

Resources