How do I enable a port on Google Kubernetes Engine to accept websocket connections? Is there a way of doing so other than using an ingress controller?
Web sockets are supported by Google's global load balancer, so you can use a k8s Service of type LoadBalancer to expose such a service beyond your cluster.
Do be aware that load balancers created and managed outside Kubernetes in this way will have a default connection duration of 30 seconds, which interferes with web socket operation and will cause the connection to be closed frequently. This is almost useless for web sockets to be used effectively.
Until this issue is resolved, you will either need to modify this timeout parameter manually, or (recommended) consider using an in-cluster ingress controller (e.g. nginx) which affords you more control.
As per this article in the GCP documentation, there are 4 ways that you may expose a Service to external applications.
It can be exposed with a ClusterIP, a NodePort, a (TCP/UDP) Load Balancer, or an External Name.
Related
Background
I came from HAproxy background and recently there is a lot of hype around "Service Mesh" Architecture. Long story short, I began to learn "Envoy" and "Consul".
I develop an understanding that Envoy is a proxy software but using sidecar to abstract in-out network with "xDS" as Data Plane for the source of truth (Cluster, Route, Filter, etc). Consul is Service Discovery, Segmentation, etc. It also abstracts network and has Data Plane but Consul can't do complex Load Balancing, filter routing as Envoy does.
As Standalone, I can understand how they work and set up them since documentation relatively good. But it can quickly became a headache if I want to integrate Envoy and Consul, since documentation for both Envoy & Consul lacks specific for integration, use-cases, and best practice.
Schematic
Consider the following simple infrastructure design:
Legends:
CS: Consul Server
CA: Consul Agent
MA: Microservice A
MB: Microservice B
MC: Microservice C
EF: Envoy Front Facing / Edge Proxy
Questions
Following are my questions:
In the event of Multi-Instance Microservices, Consul (as
stand-alone) will randomize round-robin. With Envoy & Consul
Integration, How consul handle multi-instance microservice? which
software does the load balance?
Consul has Consul Server to store its data, however, it seems Envoy
does not have "Envoy Server" to store its data, so where are its
data being stored and distributed across multiple instances?
What about Envoy Cluster (Logical Group of Envoy Front Facing Proxy
and NOT Cluster of Services)? How the leader elected?
As I mentioned above, Separately, Consul and Envoy have their
sidecar/agent on each Machine. I read that when integrated, Consul
injects Envoy Sidecar, but no further information on how this works?
If Envoy uses Consul Server as "xDS", what if for example I want to
add an advanced filter so that for certain URL segment it must
forward to a certain instance?
If Envoy uses Consul Server as "xDS", what if I have another machine
and services (for some reason) not managed by Consul Server. How I
configure Envoy to add filter, cluster, etc for that machine and
services?
Thank You, I'm so excited I hope this thread can be helpful to others too.
Apologies for the late reply. I figure its better late than never. :-)
If you are only using Consul for service discovery, and directly querying it via DNS then Consul will randomize the IP addresses returned to the client. If you're querying the HTTP interface, it is up to the client to implement a load balancing strategy based on the hosts returned in the response. When you're using Consul service mesh, the load balancing function will be entirely handled by Envoy.
Consul is an xDS server. The data is stored within Consul and distributed to the agents within the cluster. See the Connect Architecture docs for more information.
Envoy clusters are similar to backend server pools. Proxies contain Clusters for each upstream service. Within each cluster, there are Endpoints which represent the individual proxy instances for the upstream services.
Consul can inject the Envoy sidecar when it is deployed on Kubernetes. It does this through a Kubernetes mutating admission webhook. See Connect Sidecar on Kubernetes: Installation and Configuration for more information.
Consul supports advanced layer 7 routing features. You can configure a service-router to route requests to different destinations by URL paths, headers, query params, etc.
Consul has an upcoming feature in version 1.8 called Terminating Gateways which may enable this use case. See the GitHub issue "Connect: Terminating (External Service) Gateways" (hashicorp/consul#6357) for more information.
I have a Go service which is pluggable and based on the plugin serve different purposes. At the start it's register the address and port into the Consul's service discovery and it's tags like plugin1, plugin2, plugin3, etc.
On the client-side I want to connect to the services based on the plugin what it implements. I can get them from consul, call the grpc.Dial and pass it to the proto's client implementation. When I call the right endpoint I just close the connection and go on.
The problem with this, the grpc.Dial can be a reusable connection, so not necessary to call it anytime when I want to connect to a service, but it's too dynamic to have a persistent connection to the services, because services could be removed or new services can appear.
What is the best way to have persistent connection to the services based on this requirements/problems?
I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).
I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.
I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.
I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.
I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.
Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.
Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?
Thanks for reading!
-- UPDATE --
There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".
The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.
That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".
Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.
I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.
I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).
LB Proxy Support docs
Backend Service Components docs
Cheers,
Matt
Check out the following example: https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket
I have a question about how to load balance web sockets with AWS elastic load balancer.
I have 2 EC2 instances behind AWS elastic load balancer.
When any user login, the user session will be established with one of the server, say EC2 instance1. Now, all the requests from the same user will be routed to EC2 instance1.
Now, I have a different stateless request coming from a different system. This request will have userId in it. This request might end up going to a EC2 instance2. We are supposed to send a notification to the user based on the userId in the request.
Now,
1) Assume, the user session is with the EC2 instance1, but the notification is originating from the EC2 instance2.
I am not sure how to notify the user browser in this case.
2) Is there any limitation on the websocket connection like 64K and how to overcome with multiple servers, since user is coming thru Load balancer.
Thanks
You will need something else to notify the browser's websocket's server end about the event coming from the other system. There are a couple of publish-subscribe based solution which might help, but without knowing more details it is a bit hard to figure out which solution fits the best. Redis is generally a good answer, and Elasticache supports it.
I found this regarding to AWS ELB's limits:
http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elastic_load_balancer
But none of them seems to be related to your question.
Websocket requests start with HTTP communication before handing over to websockets. In theory if you could include a cookie in that initial HTTP request then the sticky session features of ELB would allow you to direct websockets to specific EC2 instances. However, your websocket client may not support this.
A preferred solution would be to make your EC2 instances stateless. Store the websocket session data in AWS Elasticache (Either Redis or Memcached) and then incoming connections will be able to access the session regardless of which EC2 instance is used.
The advantage of this solution is that you remove the dependency on individual EC2 instances and your application will scale and handle failures better.
If the ELB has too many incoming connections, then it should scale automatically. Although I can't find a reference for that. ELB's are relatively slow to scale - minutes rather than seconds, if you are expecting surges in traffic then AWS can "pre-warm" more ELB resource for you. This is done via support requests.
Also, factor in the ELB connection time out. By default this is 60 seconds, it can be increased via the AWS console or API. Your application needs to send at least 1 byte of traffic before the timeout or the ELB will drop the connection.
Recently had to hook up crossbar.io websockets with ALB. Basically there are two things to consider. 1) You need to set stickiness to 1 day on the target group attributes. 2) You either need something on the same port that returns static webpage if connection is not upgraded, or a separate port serving a static webpage with a custom health check specifying that port on the target group. Go for a ALB over ELB, ALB's have support for ws:// and wss://, they only lack the health check over websockets.
I'm considering to use Gwan for a backend game server. Although Gwan can handle lot of requests, I would want to make it scalable automatically. Gwan has elastic load balancer. Are there examples on how should that be setup at code/deployment?
imo, load balancing is a function of the cloud or data center you are working with and not gwan.
in microsoft's azure which is good as it offers linux VMs you set up an endpoint (which is essentially a port like 8080) as a load-balanced endpoint that terminates to the port on each VM.
set up your gwan to port 8080.
set up a loadbalanced endpoint on port 8080.
point the loadbalancer to the gwan port of 8080.
clients then hold sessions with either vm1 or vm2.
auto-scaling is a function of azure's availability set.
im sure a similar process is offered on Amazon and Rackspace.