AWS Neptune connection pool settings - aws-lambda

We're using a connection pool to communicate with AWS Neptune from an AWS Lambda. Due to this, we are experiencing various connection problems. Usually, it happens after a maintenance window and requires a Neptune restart to fix it.
For example, below is an error rised in a Python Lambda after an automatic SSL certificate rollout in AWS Neptune:
Max retries exceeded with url: /endpoint/ (Caused by SSLError(SSLCertVerificationError(1,
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
This behavior seems to be related to the Neptune Endpoint functionality and is mentioned in the AWS Doc
A custom endpoint for a Neptune cluster represents a set of DB instances that you choose. When you connect to the endpoint, Neptune chooses one of the instances in the group to handle the connection.
When you add a DB instance to a custom endpoint or remove it from a custom endpoint, any existing connections to that DB instance remain active.
As far as a connection is still valid, it's not removed from the pool despite its not functioning anymore.
My question: How to configure the HTTP connection pool from the client-side to address this behavior? Is there a possibility to check a Neptune connection before using it?

The general best practice is to assume a connection is still alive/valid and catch/reconnect when you encounter these sorts of exceptions. Example Lambda function architecture (mainly for Gremlin, but other query languages would have similar patterns) are displayed here: https://docs.aws.amazon.com/neptune/latest/userguide/lambda-functions-examples.html

Related

Unable to establish a websocket connection for GraphQL subscription

I am trying to implement a GraphQL WebSocket-based #subscription on a server (using NestJS #subscription). The server is hosted on an AWS ECS and is behind an ALB.
We currently have an AWS API GW connection via VPC-link to our ALB.
I tried to build a dedicated Websocket API GW with the same VPC link we use in the HTTP API GW.
I also tried to spin up a new NLB (Network Load Balancer) over our ECS and a new REST VPC link to be used in the dedicated Websocket API GW.
The client and server are communicating over a graphql-transport-ws sub-protocol using graphql-ws library and the communication is working fine on a localhost setup.
When running the following command on our local host I am able to establish a web socket connection:
wscat -c ws://localhost:3000/graphql -s graphql-transport-ws
When running the same against the WebSocket API GW URL
wscat -c wss://*****.execute-api.*****.amazonaws.com/**** -s graphql-transport-ws
I’m getting this:
error: Server sent no subprotocol
The error indicates a problem with the sub-protocol so when removing the sub-protocol a connection is established and I am getting a prompt:
Connected (press CTRL+C to quit)
>
However, there’s no indication of reaching the server and it seems like the connection is only made with the WebSocket API GW itself.
When I circumvent the gateway and directly connect an internet-facing NLB I'm able to establish a WebSocket connection.
I am not a super Websocket expert, but I understand WebSocket connections will be terminated by the API Gateway and cannot be used as a connection pass-through. You can forward web socket events using AWS_PROXY integration to a graphQL server backend, BUT it's not a maintained direct connection - API Gateway terminates and events towards the backend integration and will not return the integration response to the WebSocket since it is event-driven and not a connection-oriented service - hence the “error: Server sent no subprotocol” you are seeing.
So to use API GW as the WebSocket layer, you would need to build out connection management functionality somewhere to manage the event-based nature of the APIGW and send out data to the APIGW connections or adjust the integration mechanism within the graphql server to utilise the #connection functionality to send responses/notifications to WebSocket consumers.
Integrating Backend Service documentation:
https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api-routes-integrations.html
Sending responses to a connected client:
https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-how-to-call-websocket-api-connections.html
API GW Websockets are great for building custom solutions but take some effort since you will be configuring the setup for the events.
For a GraphQL API on AWS - I would recommend taking a look at AppSync, which is an AWS Managed GraphQL service - it handles GraphQL subscriptions via WebSockets natively and with zero additional code and its highly scalable out of the box and would simplify the GraphQL hosting burden of an ECS based solution.
I suspect there may be a lot of other reasons for the need to build out using existing GraphQL on ECS, so understand it's not always possible to pivot to something like AppSync. I feel the NLB solution you tried is okay within the existing ECS backend landscape and, as you have noted, is connection-oriented (via NLB), so will achieve the outcome you are after.

Lambda opening RDS proxy connections

I am integrating lambda(Python) with RDS proxy, and for some of the examples on the web, I can see people initiating the connection outside the handler, and never closing it.
On the other hand, I have seen examples on which people initiate the connection within the handler.
So, what is the best practice? I assume if you go for the first approach, the idle client connection timeout(RDS proxy) should be short, otherwise you can surpass the connection limit.
If we create the connection outside handler then the same connection can be used for warm starts which is good have if idle timeout is set appropriately.
If the connection is set inside handler then it'll have an overhead of creating and closing connections for each invocations.

Websocket connection using AWS Lambda + API Gateway

I have a react application and I would like to setup a websocket connection to my backend for some realtime updates. I was going to deploy an EC2 or ECS-cluster to host websocket connections. Then I stumbled into some articles showing how websocket connection can be setup in a serverless manner.
One example: https://medium.com/#likhita507/real-time-chat-application-using-webscockets-in-apigateway-e3ed759c4740
However, I can't seem to figure out how this works for a few reasons.
Lambda has a max runtime of 15 min
How does the backend establish a connection when no lambda is running and the backend wants to invoke a message to the frontend
Does this entail that I have to keep a lambda alive all the time, if so, it no longer feels like a good idea. In the above example, what I can't grasp is that when creating that chat application, can each chat room only exist for 15 min? And if a user disconnects from the room, how will that user be updated on new messages.
Does anyone have any experience with this kind of solution?
It's the API Gateway that keeps the websocket connection alive. The browser (or whatever your client is) is connecting to the Gateway, not the lambda function.
The gateway triggers the Lambda function. You hook this up by selecting LAMBDA_PROXY from Integration Request. You can connect each route to a separate function, or have them all dealt with by one, whichever you prefer. Unless you're doing something very complicated in the function, it should only be executing for a few ms.
Communicating from the function to the original client is done through the gateway too - with APIGatewayManagementAPI.postToConnection (or you could roll your own http version using the connection URL I guess).

Loadbalancing web sockets - AWS Elastic Loadbalancer

I have a question about how to load balance web sockets with AWS elastic load balancer.
I have 2 EC2 instances behind AWS elastic load balancer.
When any user login, the user session will be established with one of the server, say EC2 instance1. Now, all the requests from the same user will be routed to EC2 instance1.
Now, I have a different stateless request coming from a different system. This request will have userId in it. This request might end up going to a EC2 instance2. We are supposed to send a notification to the user based on the userId in the request.
Now,
1) Assume, the user session is with the EC2 instance1, but the notification is originating from the EC2 instance2.
I am not sure how to notify the user browser in this case.
2) Is there any limitation on the websocket connection like 64K and how to overcome with multiple servers, since user is coming thru Load balancer.
Thanks
You will need something else to notify the browser's websocket's server end about the event coming from the other system. There are a couple of publish-subscribe based solution which might help, but without knowing more details it is a bit hard to figure out which solution fits the best. Redis is generally a good answer, and Elasticache supports it.
I found this regarding to AWS ELB's limits:
http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elastic_load_balancer
But none of them seems to be related to your question.
Websocket requests start with HTTP communication before handing over to websockets. In theory if you could include a cookie in that initial HTTP request then the sticky session features of ELB would allow you to direct websockets to specific EC2 instances. However, your websocket client may not support this.
A preferred solution would be to make your EC2 instances stateless. Store the websocket session data in AWS Elasticache (Either Redis or Memcached) and then incoming connections will be able to access the session regardless of which EC2 instance is used.
The advantage of this solution is that you remove the dependency on individual EC2 instances and your application will scale and handle failures better.
If the ELB has too many incoming connections, then it should scale automatically. Although I can't find a reference for that. ELB's are relatively slow to scale - minutes rather than seconds, if you are expecting surges in traffic then AWS can "pre-warm" more ELB resource for you. This is done via support requests.
Also, factor in the ELB connection time out. By default this is 60 seconds, it can be increased via the AWS console or API. Your application needs to send at least 1 byte of traffic before the timeout or the ELB will drop the connection.
Recently had to hook up crossbar.io websockets with ALB. Basically there are two things to consider. 1) You need to set stickiness to 1 day on the target group attributes. 2) You either need something on the same port that returns static webpage if connection is not upgraded, or a separate port serving a static webpage with a custom health check specifying that port on the target group. Go for a ALB over ELB, ALB's have support for ws:// and wss://, they only lack the health check over websockets.

What does the Amazon ELB automatic health check do and what does it expect?

Here is the thing:
We've implemented a C++ RESTful API Server, with built-in HTTP parser and no standard HTTP server like apache or anything of the kind
It has been in use for several months in Amazon structure, using both plain and SSL communications, and no problems have been identified, related to Amazon infra-structure
We are deploying our first backend using Amazon ELB
Amazon ELB has a customizable health check system but also as an automatic one, as stated here
We've found no documentation of what data is sent by the health check system
The backend simple hangs on the socket read instruction and, eventually, the connection is closed
I'm not looking for a solution for the problem since the backend is not based on a standard web server, just if someone knows what kind of message is being sent by the ELB health check system, since we've found no documentation about this, anywhere.
Help is much appreciated. Thank you.
Amazon ELB has a customizable health check system but also as an
automatic one, as stated here
With customizable you are presumably referring to the health check configurable via the AWS Management Console (see Configure Health Check Settings) or via the API (see ConfigureHealthCheck).
The requirements to pass health checks configured this way are outlined in field Target of the HealthCheck data type documentation:
Specifies the instance being checked. The protocol is either TCP,
HTTP, HTTPS, or SSL. The range of valid ports is one (1) through
65535.
Note
TCP is the default, specified as a TCP: port pair, for example
"TCP:5000". In this case a healthcheck simply attempts to open a TCP
connection to the instance on the specified port. Failure to connect
within the configured timeout is considered unhealthy.
SSL is also specified as SSL: port pair, for example, SSL:5000.
For HTTP or HTTPS protocol, the situation is different. You have to
include a ping path in the string. HTTP is specified as a
HTTP:port;/;PathToPing; grouping, for example
"HTTP:80/weather/us/wa/seattle". In this case, a HTTP GET request is
issued to the instance on the given port and path. Any answer other
than "200 OK" within the timeout period is considered unhealthy.
The total length of the HTTP ping target needs to be 1024 16-bit
Unicode characters or less.
[emphasis mine]
With automatic you are presumably referring to the health check described in paragraph Cause within Why is the health check URL different from the URL displayed in API and Console?:
In addition to the health check you configure for your load balancer,
a second health check is performed by the service to protect against
potential side-effects caused by instances being terminated without
being deregistered. To perform this check, the load balancer opens a
TCP connection on the same port that the health check is configured to
use, and then closes the connection after the health check is
completed. [emphasis mine]
The paragraph Solution clarifies the payload being zero here, i.e. it is similar to the non HTTP/HTTPS method described for the configurable health check above:
This extra health check does not affect the performance of your
application because it is not sending any data to your back-end
instances. You cannot disable or turn off this health check.
Summary / Solution
Assuming your RESTful API Server, with built-in HTTP parser is supposed to serve HTTP only indeed, you will need to handle two health checks:
The first one you configured yourself as a HTTP:port;/;PathToPing - you'll receive a HTTP GET request and must answer with 200 OK within the specified timeout period to be considered healthy.
The second one configured automatically by the service - it will open a TCP connection on the HTTP port configured above, won't send any data, and then closes the connection after the health check is completed.
In conclusion it seems that your server might be behaving perfectly fine already and you are just irritated by the 2nd health check's behavior - does ELB actually consider your server to be unhealthy?
As far as I know it's just an HTTP GET request looking for a 200 OK http response.

Resources