rsocket - how to balance the load - spring-boot

rsocket seems to be a cool idea. I have this question and I could not find a better answer.
Lets consider this initial set up. client sends multiple requests to the server-1 sequentially.
client --> server-1
server-1 is doing some compute intensive tasks. So auto-scaler created another instance of server which is server-2 after some time. Now the setup became like this.
client --> server-1
server-2
As per my understanding client-->server-1 connection is established and kept alive. We use this connection for all the client-server communication. How to make use of another server - server-2 to share some of the client requests like this.
client --> server-1
|-----> server-2
Otherwise it will be a sequential processing.
I use spring boot with rsocket.

There are two options for this:
use low-level load balancer on client side from rsocket-load-balancer
deploy RSocket broker between client and server, some options are Netifi Broker or Alibaba RSocket Broker

Related

How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?

I have an AWS Application load balancer to distribute the http(s) traffic.
Problem 1:
Suppose I have a target group with 2 EC2 instances: micro and xlarge. Obviously they can handle different traffic levels. Does the load balancer manage traffic proportionally to instance sizes or just round robin? If only round robin is used and no other factors taken into account, then it's not really balancing load, because at some point the micro instance will be suffering from the traffic, while xlarge will starve.
Problem 2:
Suppose I have target group with 2 EC2 instances, both are same size. But my service is not using a classic http request/response flow. It is using HTTP websockets, i.e. a client makes HTTP request just once, to establish a socket, and then keeps the socket open for longer time, sending and receiving messages (e.g. a chat service). Let's suppose my load balancer is using round robin and both EC2 instances have 1000 clients connected each. Now suppose one of the EC2 instances goes down and 1000 connected clients drop their socket connections. The instance gets back up quickly and is ready to accept websocket calls again. The 1000 clients who dropped are trying to reconnect. Now, if the load balancer would use pure round robin, I'll end up with 1500 clients connected to instance #1 and 500 clients connected to instance #2, thus not really balancing the load correctly.
Basically, I'm trying to find out if some more advanced logic is being used to select a target in a group, or is it just a naive round robin selection. If it's round robin only, then how can I really balance the websocket connections load?
Websockets start out as http or https connections, so a load balancer can dispatch them to a server. Once the server accepts the http connection, both the server and the client "upgrade" the connection to use the websocket protocol. They then leave the connection open to use for websocket traffic. As far as the load balancer can tell, the connection is simply a long-lasting http connection.
Taking a server down when it has websocket connections to clients requires your application to retry lost connections. Reconnecting on connection failure is one of the trickiest parts of websocket client programming. Your application cannot be robust without reconnect logic.
AWS's load balancer has no built-in knowledge of the capabilities of the servers behind it. You have observed that it sends requests equally to big and small servers. That can overwhelm the small ones.
I have managed this by building a /healthcheck endpoint in my servers. It's a straightforward https://example.com/heathcheck web page. You can put a little bit of content on the page announcing how many websocket connections are currently open, or anything else. Don't password protect it or require a session to hit it.
My /healthcheck endpoints, whenever hit, measure the server load. I simply use the number of current websocket connections, but you can use any metric you want. I compare the current load to a load threshold configured for each server. For example, on a micro instance I can handle 20 open websockets, and on a production instance I can handle 400.
If the server load is too high, my endpoint gives back a 503 http error status along with its content. 503 typically means "I am overloaded, please try again later." It can also mean "I will shut down when all my connections are closed. Please don't use me for any more connections."
Then I configure the load balancer to perform those health checks every couple of minutes on all the servers in the server pool (AWS calls the pool a "target group"). The health check operation detects "unhealthy" servers and temporarily takes them out of its rotation. (The health check also detects crashed servers, which is good.)
You need this loadbalancer health check for a large-scale production setup.
All that being said, you will get best results if all your server instances in your pool have roughly the same capacity as each other.

Working of websocket services in clustered deployment

Lets say I have a websocket implemented in springboot. The architecture is microservice. I have deployed the service in kubernetes cluster and I have 2 running instance of the service, the socket implementation is using stomp and redis as broker.
Now the first connection is created between a client and one of the service. Does all the data flow occur through the client and the connected service? Would the other service also have a connection? Incase the current service goes down would the other service open up a connection?
Now lets say I'am sending some data back to the client which comes through a kafka topic. One of the either service could read it. If then would either of them be able to send the data back to the client?
Can someone help me understand these scenarios?
A websocket is a permanent connection. After opening it, it will be routed through kubernetes to a fixed pod. No other pod will receive the connection.
If the pod goes down, the connection is terminated.
If a new connection is created, for example by a different user, it may be routed to a different pod.
What data is transmitted, for example with kafka as source, is not relevant in this context. It could be anything.

I need to build a Vert.x virtual host server that channels traffic to other Vert.x apps. How is this kind of inter-app communication accomplished?

As illustrated above, I need to build a Vert.x Java app that will be an HTTP server/virtual host (TLS Http traffic, Web socket traffic) that will redirect/channel specific domain traffic to other Vert.x Java apps running on the same server, each in it's own JVM.
I have been reading for days but I remain uncertain as to how to approach all aspects of the task.
What I DO know or have experience with:
Creating an HTTP server, etc
Using a Vert.x VirtualHost handler to "handle" incoming traffic for a
specific domain
What I DO NOT know:
How do I "re-direct" a domain's traffic to another Vert.x app (this
other Vert.x app would also be running on the same server, in its own
JVM).
- Naturally this "other" Vert.x app would need to respond to HTTP
requests, etc. What Vert.x mechanisms do I employ to accomplish this
aspect of the task?
Are any of the following concepts part of the solution? I'm unfamiliar with these concepts and how they may or may not form part of the solution.:
Running each Vert.x app using -cluster option?
Vert.x Streams?
Vert.x Pumps?
There are multiple ways to let your microservices communicate with each other, the fact that all your apps are running on the same server doesn't change much, but it makes number 2.) easy to configure
1.) Rest based client - server communication
Both host and apps have a webserver
When you handle the incoming requests on the host, you simply call another app with a HttpClient
Typically all services find each others address via service discovery.
Eg: each service registers his address in a central registry then other services use this central registry to find the addresses.
Note: this maybe an overkill for you and you can just configure the addresses of the other services.
2.) You start the vertx microservices in clustered mode
the eventbus is then shared among the services
For all incoming requests you send a broadcast on the eventbus
the responsible app replies to the message
For further reading you can checkout https://vertx.io/docs/vertx-hazelcast/java/#configcluster. You start your projects with -cluster option and define the clustering in an xml configuration. I think by default it finds the services via local broadcast.
3.) You use a message broker like RabbitMq etc.
All your apps connect to a central message broker
When a new request comes in to the host, it sends a message to the message broker
The responible app then listens to the relevant messages and replies
The host receives the reply from the message broker
There are already many existing vertx clients for certain message brokers like kafka, camel, zeromq:
https://github.com/vert-x3/vertx-awesome#integration

Websockets and scalability

I am a beginner with websockets.
I have a need in my application where server needs to notify clients when something changes and am planning to use websockets.
Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
How do you scale with websockets when your application has a 1000’s of user base?
Thanks much for your feedback.
1) Single server instance and single client ==> How many websockets will be created and how many connections to websockets?
If your client creates one webSocket connection, then that's what there will be one webSocket connection on the client and one on the server. It's the client that creates webSocket connections to the server so it is the client that determines how many there will be. If it creates 3, then there will be 3. If it creates 1, then there will be 1. Usually, the client would just create 1.
2) Single server instance and 10 clients ==> How many websockets will be created and how many connections to websockets?
As described above, it depends upon what the client does. If each client creates 1 webSocket connection and there are 10 clients connected to the server, then the server will see a total of 10 webSocket connections.
3) Single server instance and 1000 clients ==> How many websockets will be created and how many connections to websockets?
Same as point #2.
How do you scale with webscokets when your application has a 1000’s of user base?
A single server, configured appropriately can handle hundreds of thousands of simultaneous webSocket connections that are mostly idle since an idle webSocket uses pretty much no server CPU. For even larger scale deployments, one can cluster the server (run multiple server processes) and use sticky load balancing to spread the load.
There are many other articles like these on Google worth reading if you're pursuing large scale webSocket or socket.io deployments:
The Road to 2 Million Websocket Connections in Phoenix
600k concurrent websocket connections on AWS using Node.js
10 million concurrent webSockets
Ultimately, the achievable scale per a properly configured server will likely have more to do with how much activity there is per connection and how much computation is needed to deliver that.

Are WebSockets really meant to be handled by Web servers?

The WebSocket standard hasn't been ratified yet, however from the draft it appears that the technology is meant to be implemented in Web servers. pywebsocket implements a WebSocket server which can be dedicated or loaded as Apache plugin.
So what I am am wondering is: what's the ideal use of WebSockets? Does it make any sense to implement a service using as dedicated WebSocket servers or is it better to rethink it to run on top of WebSocket-enabled Web server?
The WebSocket protocol was designed with three models in mind:
A WebSocket server running completely separately from any web server.
A WebSocket server running separately from a web server, but with traffic proxied to the websocket server from the web server (allowing websocket and HTTP traffic to co-exist on the same port)
A WebSocket server running as a plugin in the web server.
The model you pick really depends on the application you are trying to build and some other constraints that may limit your choices.
For example, if your application is going to be served from a single web server and the WebSocket connection will always be back to that same server, then it probably makes sense to just run the WebSocket server as a plugin/module in the web server.
On the other hand if you have a general WebSocket service that is usable from many different web sites (for example, you could have continuous low-latency traffic updates served from a WebSocket server), then you probably want to run the WebSocket server separate from any web server.
Basically, the tighter the integration between your WebSocket service and your web service, the more likely you will want to run them together and on the same port.
There are some constraints that may force one model or another:
If you control the server(s) but not the incoming firewall rules, then you probably have no choice but to run the WebSocket server on the same port(s) as your HTTP/HTTPS server (e.g. 80 and 443). In which case you will have to use a web server plugin or proxy to the real WebSocket server.
On the other hand, if you do not have super-user permission on the server where you are running the WebSocket server, then you will probably not be able to use ports 80 and 443 (below 1024 is generally a privileged port range) and in that case it really doesn't matter whether you run the HTTP/S and WebSocket servers on the same port or not.
If you have cookie based authentication (such as OAuth) in the web server and you would like to re-use this for the WebSocket connections then you will probably want to run them together (special case of tight integration).

Resources