On AWS Elastic Beanstalk, how can I keep instances ready but not serving until the main instance crashes? - go

I'm working on an app that currently cannot work correctly if it is run as multiple instances behind a load balancer that sends traffic to more than one instance. That is because the web sockets are coordinated via goroutines rather than via an external pub/sub system. Also, the app crashes sometimes, and it takes about 30s for it to come back up, so I would like to have another instance of it ready to serve when the live instance crashes.
Is there a good way to do that within Elastic Beanstalk, minimizing downtime?

Related

AWS Route traffic to two load balancer simultaneously

I have a requirements to record all incoming and outgoing traffic to my application loadbalancer. I have a tool from F5 (install in EC2) to receive the traffic and process and perform actions (So I can setup ELB+ASG for this). However I want the traffic should go to the web server (Apache+PHP), so that my application also will work well.
I know GuardDuty and VPC flow logs are some alternate. But there are some limitation (It didn't capture all events comes to EC2 instance). Hence I need to rely some third party tools such as F5, checkpoint.
Regards
Senthil

How do I find out what port another process is running on besides the web process on Heroku?

I have a webhook URL and a normal web server (running HapiJS).
I'd like to proxy certain requests in HapiJS to the webhook server that's running on a private port but I need to know what the $PORT is on the other non web process.
Is there a way to find this port number?
There is no way to find that port number.
Heroku dynos run on different runtimes. So even if you did know it, you would also need to figure out the IP address of that server, which would change with every deployment and once every 24 hours.
This would also not be very scalable, as the strength of heroku is to allow you to boot more dynos easily. If you rely on knowing where the other dyno is, you're losing that easy scaling.
You don't necessarily need this to communicate between processes though. Using a redis queue, you could enqueue asynchronous jobs to be processed by your worker process. Both processes would communicate, and they wouldn't need to know where the other one is.

GKE + WebSocket + NodePort 30s dropped connections

I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).
I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.
I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.
I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.
I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.
Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.
Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?
Thanks for reading!
-- UPDATE --
There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".
The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.
That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".
Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.
I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.
I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).
LB Proxy Support docs
Backend Service Components docs
Cheers,
Matt
Check out the following example: https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket

Loadbalancing web sockets - AWS Elastic Loadbalancer

I have a question about how to load balance web sockets with AWS elastic load balancer.
I have 2 EC2 instances behind AWS elastic load balancer.
When any user login, the user session will be established with one of the server, say EC2 instance1. Now, all the requests from the same user will be routed to EC2 instance1.
Now, I have a different stateless request coming from a different system. This request will have userId in it. This request might end up going to a EC2 instance2. We are supposed to send a notification to the user based on the userId in the request.
Now,
1) Assume, the user session is with the EC2 instance1, but the notification is originating from the EC2 instance2.
I am not sure how to notify the user browser in this case.
2) Is there any limitation on the websocket connection like 64K and how to overcome with multiple servers, since user is coming thru Load balancer.
Thanks
You will need something else to notify the browser's websocket's server end about the event coming from the other system. There are a couple of publish-subscribe based solution which might help, but without knowing more details it is a bit hard to figure out which solution fits the best. Redis is generally a good answer, and Elasticache supports it.
I found this regarding to AWS ELB's limits:
http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elastic_load_balancer
But none of them seems to be related to your question.
Websocket requests start with HTTP communication before handing over to websockets. In theory if you could include a cookie in that initial HTTP request then the sticky session features of ELB would allow you to direct websockets to specific EC2 instances. However, your websocket client may not support this.
A preferred solution would be to make your EC2 instances stateless. Store the websocket session data in AWS Elasticache (Either Redis or Memcached) and then incoming connections will be able to access the session regardless of which EC2 instance is used.
The advantage of this solution is that you remove the dependency on individual EC2 instances and your application will scale and handle failures better.
If the ELB has too many incoming connections, then it should scale automatically. Although I can't find a reference for that. ELB's are relatively slow to scale - minutes rather than seconds, if you are expecting surges in traffic then AWS can "pre-warm" more ELB resource for you. This is done via support requests.
Also, factor in the ELB connection time out. By default this is 60 seconds, it can be increased via the AWS console or API. Your application needs to send at least 1 byte of traffic before the timeout or the ELB will drop the connection.
Recently had to hook up crossbar.io websockets with ALB. Basically there are two things to consider. 1) You need to set stickiness to 1 day on the target group attributes. 2) You either need something on the same port that returns static webpage if connection is not upgraded, or a separate port serving a static webpage with a custom health check specifying that port on the target group. Go for a ALB over ELB, ALB's have support for ws:// and wss://, they only lack the health check over websockets.

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

Resources