I have a 3 tier application. My question relates to the spring boot rest api middle tier and mongodb backend on aws.
I am thinking of running mongodb docker container in elastic beanstalk/single container option, for scaling the backend.
My rest api will run as docker container in a separate elastic beanstalk environment.
My understanding that elastic beanstalk will scale the dockerized mongodb service as needed.
High level architecture:
Frontend - Angular - s3 static website hosting
Middle tier - 3 Spring boot rest services - 3 separate environemnts with single container docker scaled with elastic bean stalk.
Backend - Mongodb - Single Docker container scaled with elastic bean stalk.
Questions:
Qn: Will this work? Will each tier scale? Will rest service be able to connect to database? How much will this cost? Will there be too much latency between the middle tier and backend?
Qn: Is this a foolhardy chase for any reason, that has some hurdle I am not seeing? My research on this approach has yielded almost nothing. Would someone discourage from even trying this? :)
Notes:
Elastic beanstalk appears to offer convenience at a slightly higher cost. I am willing to accept it, as I am just testing. Kubernetes/docker swarm appear too complex and time consuming as I need to focus on application function in the near term.
I should be able to map a volume to a physical location in aws. Guess elastic block storage or EFS. Any pros and cons or better alternatives?
I am aware that I can use thin jars for efficiency.
I have tested it with mongodb deployed on EC2. I should be able to set it up to work launch configurations and autoscaling groups. But I think it will be more expensive, and likely more work.
I am not sure what you mean by "Single Docker container scaled with elastic bean stalk" but if you intend to launch more containers running MongoDB, the reality is a little more complicated than that.
While MongoDB does scale horizontally, when a new node is launched in a replica set topology, it:
must be added to the replica set configuration
needs to have the data that the other nodes already have synced onto it
There are tools that handle both of these requirements but simply bringing up another container is not enough.
Sharded clusters are even more complex because a node needs to be assigned into a shard so there are two levels of management decisions made.
I need to focus on application function in the near term.
You may consider MongoDB Atlas which will handle all of this for you. There is a free tier available.
Related
I have a huge Laravel application, it contitutes of a dashboard where users have many different complex cruds that are all saved in a database with more than 100 tables, it also have an api for mobile app that can reach a peack of 300 thousand requests per minute. As the app scales I'm having issues with performance, as all is in one single aws hosted ec2 server, by all I mean all app images, company logos etc, all the resources for the dashboard and all the api for mobile app. I need a solution for this problem, should I separate all in different machines? If so, how?
All the app is currently running PHP 7.2 and Laravel 5.5 on a aws ec2 12xlarge instance.
You are asking us some basic concept of scalability in the Cloud.
I will try to give one direction you could follow.
The current design is very bad for couple of reasons:
As you said, it cannot scale because everything hold in one server;
Because everything hold in one server, I hope you have automated backup in case your instance fails
The only thing you can do in this configuration is to scale vertically, instead of horizontally (using more instances instead of a big one)
Files are on the same disk, so you cannot scale
In term of application (Laravel), you are running with a monolith: everything in one app. I don't need to tell you that it doesn't scale well but it can.
Lets dive into to main topic: How to scale this big fat instance?
First of all, you should use a shared space for your images. There are NFS (expensive), S3 (cheap) and shared EBS (cheaper than NFS, but can only be used by a limited number of instances at a time). I would use S3.
We can skip the part where you need to refactor your monolith application to a micro-service architecture, with smaller parts. It can be done if you have time and money, but I would say it is not the priority to your scaling issue.
I don't know if the database is also on the same EBS or not. If it is, use RDS: it is an almost no management managed database. You can have multi-AZ for very high availability, or Multi-AZ DB Cluster (new) which will spread the load for reads into 2 shadow instances.
To go further with your application, you can also run mobile and web on separated instances, to avoid one impacting the other.
And...That's all! Laravel has a transparent configuration mechanism for the storage to easily switch from one to another.
When I say "That's all", I mean in term of way to improve the scaling.
You will have to migrate the data from the EC2 database to RDS, perform the transfer of your images from the EBS to S3, create an autoscaling group, create an IAM Instance role for your EC2 Autoscaling group to access S3, know when the application has peaks so you can do a predictive scaling, etc.
I would recommand using IaC for this, like CloudFormation or Terraform.
This is the tip of the iceberg, but I hope you can start building a more robust system with these tips.
I'm working on a Spring Boot application, using AbstractRoutingDatasource in order to provide multitenancy feature. New tenants can be added dynamically, and each one has it's own datasource and pool configuration, and everything is working well, since we only have around 10 tenants right now.
But I'm wondering: since the application is running on a docker container, with limit resources, as the number of tenants grows, also more and more threads will be allocated for each connection (considering a pool from 1 to 30 threads for each tenant) and the container, at some point (with 50 tenants, for example), will be killed due to memory limit defined at container startup.
It appears to me that, this multitenancy solution (using AbstractRoutingDatasource) is not suitable to an application designed to be containerized since I can't simply scale it horizontally to deal with more tenants.
Am I missing something? Should I be worried about that?
The point made in the post is about the system resource exhaustion that might arise with the increasing volume of requests as a result of increased tenants in the system. I would like to address few points
The whole infrastructure can be managed efficiently using ECS & AWS Fargate so that when there is a huge load, there are automatically new containers spun up to take the load. In case of having separate servers, ELB might be of help. There will be no issues when spinning up new containers / servers as your services are stateless
Regarding the number of active connections to a database from your application, you should profile your app and understand the DAP data access patterns. Any master data or static information should NOT be taken always from the database (Except for the 1st time), instead they should be cached. There are many managed cache services that can help you scale better.
In regards to the database, it is understood that tenants have their own databases, in case of a very large tenant, try to scale out the databases as well.
Focus on building the entire suite of features using async features in JAVA or using RxJava so that the async nature will help managing the threads.
Since you have not mentioned what cloud your applications will be deployed, I have cited sample using AWS. However most of the features can be used across Azure , GCP or AWS.
There are lot of strategies to scale, the right understanding of the business needs and data usage patterns etc... could help us decide the right approach.
Hope this clarifies.
I have studied concept of microservices for a good while now, and understand what they are are and why they are necessary.
Quick refresher
In a nutshell, monolith application is decomposed into independent deployable units, each of which typically exposes it's own web API and has it's own database. Each service fulfills a single responsibility and does it well. These services communicates over synchronous web services such as REST or SOAP, or using asynchronous messaging such as JMS to fulfill some request in synergy. Our monolith application has became a distributed system. Typically all these fine grained APIs are made available through an API gateway or proxy, which acts as an single-point-of-entry facade, performing security and monitoring related tasks.
Main reasons to adapt microservices is high availability, zero downtime update and high performance achieved via horizontal scaling of a particular service, and looser coupling in the system, meaning easier maintenance. Also, IDE functionality, build and deployment process will be significantly faster, and it's easier to change framework or even the language.
Microservices goes hand in hand with clustering and containerization technologies, such as Docker. Each microservice could be packed as a docker container to run it in any platform. Principal concepts of clustering are service discovery, replication, load balancing and fault tolerance. Docker Swarm is a clustering tool which orchestrates these containerized services, glues them together, and handles all those tasks under the hood in a declarative manner, maintaining the desired state of the cluster.
Sounds easy and simple in theory, but I still don't understand how to implement this in practice, even I know Docker Swarm pretty well. Let's view an concrete example.
Here is the question
I'm building a simplistic java application with Spring Boot, backed by MySQL database. I want to build a system, where user gets a webpage from Service A and submits a form. Service A will do some manipulation to data and sends it to Service B, which will further manipulate data, write to database, return something and in the end some response is sent back to user.
Now the problem is, Service A doesn't know where to find Service B, nor Service B know where to find database (because they could be deployed at any node in the cluster), so I don't know how I should configure the Spring boot application. First thing to come in my mind is to use DNS, but I can't find tutorials how to setup such a system in docker swarm. What is the correct way to configure connection parameters in Spring for distributed cloud deployment? I have researched about Spring Cloud project, but don't understand if it's the key for this dilemma.
I'm also confused how databases should be deployed. Should they live in the cluster, deployed alongside with the service (possibly with aid of docker compose), or is it better to manage them in more traditional way with fixed IP's?
Last question is about load balancing. I'm confused if there should be multiple load balancers for each service, or just a single master load balancer. Should the load balancer has a static IP mapped to a domain name, and all user requests target this load balancer? What if load balancer fails, doesn't it make all the effort to scale the services pointless? Is it even necessary to setup a load balancer with Docker Swarm, as it has it's own routing mesh? Which node end user should target then?
If you're looking at using a Docker Swarm you don't need to worry about the DNS configurations as it's already handled by the overlay network. Let's say you have three services:
A
B
C
A is your DB, B might be the first service to collect data, and C to recieve that data and update the database (A)
docker network create \
--driver overlay \
--subnet 10.9.9.0/24 \
youroverlaynetwork
docker service create --network youroverlaynetwork --name A
docker service create --network youroverlaynetwork --name B
docker service create --network youroverlaynetwork --name C
Once all the services are created they can refer to each other directly by name
These requests are load balanced against all replicas of the container on that overlay network. So A can always get an IP for B by referencing "http://b" or just by calling hostname B.
When you're dealing with load balancing in Docker, a swarm service is already load balanced internally. Once you've defined a service to listen on port 8018, all swarm hosts, will listen on port 8018 and mesh route that to a container in round robin fashion.
It is still, however, best practice to have an application load balancer sit in front of the hosts in the event of host failure.
We are working on setting up our elasticsearch backend for a production environment. Up until a few weeks ago, we were using Solr, but we decided to use Elasticsearch for a few reasons, but the biggest reason is for the distributed nature of the backend.
With that said, we've been looking for some documentation and best practices on deploying elasticsearch using amazon's services.
For the moment, we were considering using a extra-large box and then scaling out from there, but we aren't sure that is the best approach. For example, it may be better to have three mediums than one extra-large.
We intend to index around 100K to 150K documents per day up to around ten million docs.
The question is, can anyone provide a general environment / deployment diagram for elasticsearch or best practices in general?
There's some docs for elasticsearch that talk about EC2 deployment. There's an autodiscovery plugin based on EC2 tags or security groups or whatever you like. You can also choose S3 for persistence, although that may not really be necessary.
I'd advise launching it in a VPC so you can have permanent internal IPs, in regular EC2 your internal IPs will change with every reboot even if you're using Elastic IPs.
I am building a MySQL database with a web front end for a client. The client and their staff will use this webapp on a daily basis, creating anywhere from a few thousand, to possibly a few hundred thousand records annually. I just picked up a second client who wishes to have the same product and will probably be creating the same number of records annually, possibly more.
In the future I hope to pick up a few more clients. In the next few years I could have up to 5 databases & web front ends running for 5 distinct clients, all needing tight security while creating, likely, millions of records annually (cumulatively across all the databases).
I would like to run all of this with Amazon's EC2 service but am having difficulty deciding on what type of instance to run. I am not sure if I should have several distinct Linux instances, one per client, or run one "large" instance which would manage all the clients' databases and web front ends.
I know that hardware configuration is rather specific to the task at hand. The web front ends will be using JQuery to make MySQL queries "pretty" and I will likely be doing some graphing of data (again with JQuery). The front ends will be using SSL for security, which I understand can add some overhead to the network speed.
I'm looking for some of your thoughts on this situation.
Thanks
Use the tools that are available. The Amazon RDS service lets you run a MySQL database in the cloud with no extra effort. You can scale it up and down as you need - start small, and then as you hit your limits, add extra capacity (at extra cost).
Next, use Elastic Load Balancing (ELB) with an SSL certificate, so you offload the overhead of SSL decryption to an Amazon service.
If you're using Java for your webapp, you could use Elastic Beanstalk to handle the whole hosting process for you.
Don't be afraid to experiment - you can always resize instances with no data loss (if they boot from an EBS volume) and you can always create and delete instances. Scaling horizontally is often better than scaling vertically, as you can spread your instances across multiple Availability Zones.
Good luck!