What does the Amazon ELB automatic health check do and what does it expect?

What does the Amazon ELB automatic health check do and what does it expect? - amazon-ec2

Here is the thing:
We've implemented a C++ RESTful API Server, with built-in HTTP parser and no standard HTTP server like apache or anything of the kind
It has been in use for several months in Amazon structure, using both plain and SSL communications, and no problems have been identified, related to Amazon infra-structure
We are deploying our first backend using Amazon ELB
Amazon ELB has a customizable health check system but also as an automatic one, as stated here
We've found no documentation of what data is sent by the health check system
The backend simple hangs on the socket read instruction and, eventually, the connection is closed
I'm not looking for a solution for the problem since the backend is not based on a standard web server, just if someone knows what kind of message is being sent by the ELB health check system, since we've found no documentation about this, anywhere.
Help is much appreciated. Thank you.

Amazon ELB has a customizable health check system but also as an
automatic one, as stated here
With customizable you are presumably referring to the health check configurable via the AWS Management Console (see Configure Health Check Settings) or via the API (see ConfigureHealthCheck).
The requirements to pass health checks configured this way are outlined in field Target of the HealthCheck data type documentation:
Specifies the instance being checked. The protocol is either TCP,
HTTP, HTTPS, or SSL. The range of valid ports is one (1) through
65535.
Note
TCP is the default, specified as a TCP: port pair, for example
"TCP:5000". In this case a healthcheck simply attempts to open a TCP
connection to the instance on the specified port. Failure to connect
within the configured timeout is considered unhealthy.
SSL is also specified as SSL: port pair, for example, SSL:5000.
For HTTP or HTTPS protocol, the situation is different. You have to
include a ping path in the string. HTTP is specified as a
HTTP:port;/;PathToPing; grouping, for example
"HTTP:80/weather/us/wa/seattle". In this case, a HTTP GET request is
issued to the instance on the given port and path. Any answer other
than "200 OK" within the timeout period is considered unhealthy.
The total length of the HTTP ping target needs to be 1024 16-bit
Unicode characters or less.
[emphasis mine]
With automatic you are presumably referring to the health check described in paragraph Cause within Why is the health check URL different from the URL displayed in API and Console?:
In addition to the health check you configure for your load balancer,
a second health check is performed by the service to protect against
potential side-effects caused by instances being terminated without
being deregistered. To perform this check, the load balancer opens a
TCP connection on the same port that the health check is configured to
use, and then closes the connection after the health check is
completed. [emphasis mine]
The paragraph Solution clarifies the payload being zero here, i.e. it is similar to the non HTTP/HTTPS method described for the configurable health check above:
This extra health check does not affect the performance of your
application because it is not sending any data to your back-end
instances. You cannot disable or turn off this health check.
Summary / Solution
Assuming your RESTful API Server, with built-in HTTP parser is supposed to serve HTTP only indeed, you will need to handle two health checks:
The first one you configured yourself as a HTTP:port;/;PathToPing - you'll receive a HTTP GET request and must answer with 200 OK within the specified timeout period to be considered healthy.
The second one configured automatically by the service - it will open a TCP connection on the HTTP port configured above, won't send any data, and then closes the connection after the health check is completed.
In conclusion it seems that your server might be behaving perfectly fine already and you are just irritated by the 2nd health check's behavior - does ELB actually consider your server to be unhealthy?

As far as I know it's just an HTTP GET request looking for a 200 OK http response.

Related

Private gke cluster and external HTTPS loadbalancer health checks failing

I have tested a neg deployment in a private and public cluster, however I cannot get the private cluster to work correctly with external loadbalancer, even with suggested fw rules created.
deployment of private cluster fw rules below:
firewall-rules --allow tcp:30000-32767,tcp:9376 --source-ranges 130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16
anyone who has done anything similar would be great to have some advice

When you choose HTTP, HTTPS or HTTP/2 as the protocol for the health check, the probes will require an HTTP 200 (OK) response code to be successful and after some retries (also specified in the health check) your backend service will be considered as healthy, each protocol has it own success criteria and need to be taken into consideration for your specific use case:
HTTP, HTTPS, HTTP/2
TCP and SSL
gRPC
For example, if you have a service (not http) running in a specific port, typical use case are databases (mysql, postgree) and you want to ensure that the service is working, you can use a TCP health check, if the handshake is completed, the probe will be successful and request will reach your backends.
Also, you need to configure your firewall rules to allow traffic from the specific ranges, this vary according the LB type (Network Load Balancers require different ranges, this is detailed here), otherwise probes will not be able to reach your backends and will be marked unhealthy.

DNS solution for Dante SOCKS proxy

I am trying to build a SOCKS solution for forward proxy. I am using dante SOCKS proxy as I have heard that big companies like google uses it as forward proxy solution.
on the SOCKS server, I am allowing based on FQDN's like google.com:443
Now the problem is, when the client constructs the packet, it tries to resolve google.com and gets X.X.X.X and sends connect request to SOCKS server. Now when the server receives the packets, it tries to reconstruct the packet to send out to internet, the server again does DNS resolution and if the server gets response as Y.Y.Y.Y, then it doesn't allow client's request as the destination IP in the client's request is different then the server's resolved IP address.
There was a solution in dante client which tells client to put a dummy destination address 0.0.0.1 and sends request to server and server processes it properly then. However that is creating a problem with internal domains as after using that dns resolution method, every requests goes through dante server :(
Please let me know
If there is any solution through which would help me in maintaining a DNS record expiry DC wide for e.g. google.com resolves to X.X.X.X and I should be able to resolve to this same IP address on 100's of DNS client and in case if the record changes, then it should immediately change/expire on client.
Any other proxy/socks solution which should be transparent to applications for forward proxy

I went ahead with this solution in case anyone is curious to see the solution.
I used PowerDNS Auth Server with Pipe backend. The requests would land to PowerDNS server for resolution, it will pass on all the data to Pipe backend script with ABI, the script analysis the requests, sees if it is present under cached variable/memory map, if it is cache hit, it will respond using cached DNS records else it will use a DNS resolver to resolve that query like a resolver resolves normally.
PowerDNS version lower than 4.1 supports Pipe backend + resolver. This way, the request would first land to pipe backend script, if the script doesn't have any entries cached, it will not respond or will respond blank and then PowerDNS would resolve it with the mentioned resolver server in the configuration. However with version 4.1 and above, the resolver part is removed from PowerDNS Auth server hence you need to handle that behaviour via Pipe backend script.

It depends on your client. Firefox, for example, sends hostname to SOCKS proxy without resolving it. You can confirm that by Wireshark.
PS. assume you are using a SOCKS5/4a proxy. SOCKS4 does not support hostname. Ref: https://en.wikipedia.org/wiki/SOCKS#SOCKS4a

Not able to receive and forward remote request using Charles Web Proxy as a Reverse Proxy

I am trying to capture an old application that didn't honour the system's proxy setting. The only config I can change is the server IP address.
Capturing the packets with Wireshark. Without the Charles reverse proxy, I can see requests after the first three handshake requests.
With the reverse proxy, the connection stuck after the handshake requests.
I notice that when Charles received a request and connecting to somewhere but it will just stuck there:
Following is the config of the reverse proxy (Remote host removed):
Any help, solution and workarounds would be appreciated!

First of all, your app uses neither HTTP nor HTTPS. Studying screen shot of successful connection gives some details on protocol used:
the first message after handhsake is originated by server contrary to common client-server approach, where client is responsible for sending query. This fact is enough to cross out HTTP and HTTPS.
payload data isn't human-readable, so it's a binary protocol.
based on PUSH flags, protocol is much more likely to be message-based rather than stream-based
So client establishes connection, immediately gets some command from server and replies it. Then communication continues. I can't guess exact protocol. Port number might be irrelevant, but even if it's not, there are only few protocols using 4321 port by default. Anyway, it can always be custom private protocol.
I'm not familiar with Charles, but forwarding arbitrary TCP stream is probably covered by its port forwarding feature rather than reverse proxy. However, I don't really see any benefits in sending traffic through Charles in this case, capturing data on your PC should be enough to study details.
If you are looking for traffic manipulation, for arbitrary TCP stream it's not an easy task, but it must be possible. I'm not aware of suitable tools, quick googling shows lots of utils, but some of them looks applicable to text based stream only, so deeper study is required.

Reason for Failure
It may be because you are requesting a local IP address from a remote scope, which Charles proxy doesn't applies. For POS(Proof Of Statement), please refer to the below link
https://www.charlesproxy.com/documentation/faqs/localhost-traffic-doesnt-appear-in-charles/
Solution
So In order to solve the problem for the current scenario, use
http://192.168.86.22.charlesproxy.com/
Note: The url that you request will only be proxied properly by Charles not any other proxy services.

GKE + WebSocket + NodePort 30s dropped connections

I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).
I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.
I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.
I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.
I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.
Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.
Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?
Thanks for reading!
-- UPDATE --
There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".
The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.
That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".
Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.
I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.

I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).
LB Proxy Support docs
Backend Service Components docs
Cheers,
Matt

Check out the following example: https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket

Loadbalancing web sockets - AWS Elastic Loadbalancer

I have a question about how to load balance web sockets with AWS elastic load balancer.
I have 2 EC2 instances behind AWS elastic load balancer.
When any user login, the user session will be established with one of the server, say EC2 instance1. Now, all the requests from the same user will be routed to EC2 instance1.
Now, I have a different stateless request coming from a different system. This request will have userId in it. This request might end up going to a EC2 instance2. We are supposed to send a notification to the user based on the userId in the request.
Now,
1) Assume, the user session is with the EC2 instance1, but the notification is originating from the EC2 instance2.
I am not sure how to notify the user browser in this case.
2) Is there any limitation on the websocket connection like 64K and how to overcome with multiple servers, since user is coming thru Load balancer.
Thanks

You will need something else to notify the browser's websocket's server end about the event coming from the other system. There are a couple of publish-subscribe based solution which might help, but without knowing more details it is a bit hard to figure out which solution fits the best. Redis is generally a good answer, and Elasticache supports it.
I found this regarding to AWS ELB's limits:
http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elastic_load_balancer
But none of them seems to be related to your question.

Websocket requests start with HTTP communication before handing over to websockets. In theory if you could include a cookie in that initial HTTP request then the sticky session features of ELB would allow you to direct websockets to specific EC2 instances. However, your websocket client may not support this.
A preferred solution would be to make your EC2 instances stateless. Store the websocket session data in AWS Elasticache (Either Redis or Memcached) and then incoming connections will be able to access the session regardless of which EC2 instance is used.
The advantage of this solution is that you remove the dependency on individual EC2 instances and your application will scale and handle failures better.
If the ELB has too many incoming connections, then it should scale automatically. Although I can't find a reference for that. ELB's are relatively slow to scale - minutes rather than seconds, if you are expecting surges in traffic then AWS can "pre-warm" more ELB resource for you. This is done via support requests.
Also, factor in the ELB connection time out. By default this is 60 seconds, it can be increased via the AWS console or API. Your application needs to send at least 1 byte of traffic before the timeout or the ELB will drop the connection.

Recently had to hook up crossbar.io websockets with ALB. Basically there are two things to consider. 1) You need to set stickiness to 1 day on the target group attributes. 2) You either need something on the same port that returns static webpage if connection is not upgraded, or a separate port serving a static webpage with a custom health check specifying that port on the target group. Go for a ALB over ELB, ALB's have support for ws:// and wss://, they only lack the health check over websockets.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio