How many elasticsearch client connection should we create in the application - elasticsearch

I'm using Golang & elastic client.
Bellow is my client creation logic:
if client, err := elastic.NewClient(elastic.SetURL(ElasticsearchURL)); err != nil {
// Handle error
logger.Error.Println(err)
return nil
} else {
return client
}
Whats the correct approach to:
keep the client object singleton across the application?
create and close the clients for each request?
I am kind of confused between counterintuitive answers in below links:
where-to-close-an-elasticsearch-client-connection- suggests one connection per app
how-many-transport-clients-can-a-elasticsearch-cluster-have - suggests one connection per app
elasticsearch-how-to-query-for-number-of-connections -- kind of indicates connections quickly die after serving a request

That depends on the application.
In 99% of the use cases you have a normal, long-running application. Then you should create just one client with elastic.NewClient. You can pass it around in your code and it should always work, even in different goroutines. This will create a long-running client which has several benefits. E.g. it will run health checks in the background that will prevent Elastic from sending requests to unhealthy or dead nodes.
However, if you have a short-running application (something like AWS Lambda oder Cloud Functions) you might need a "connection" on a request level. In that specific case you can use elastic.NewSimpleClient. It has a bit more overhead though as you're creating a new client every time. And it won't do any health checks and other things.
DO NOT create a new client with elastic.NewClient for every request, as any call to NewClient will create a set of goroutines and you'll quickly run out of resources if you do that.
Please read the documentation and the wiki for further details.

Related

Disallow queuing of requests in gRPC microservices

SetUp:
We have gRPC pods running in a k8s cluster. The service mesh we use is linkerd. Our gRPC microservices are written in python (asyncio grpcs as the concurrency mechanism), with the exception of the entry-point. That microservice is written in golang (using gin framework). We have an AWS API GW that talks to an NLB in front of the golang service. The golang service communicates to the backend via nodeport services.
Requests on our gRPC Python microservices can take a while to complete. Average is 8s, up to 25s in the 99th %ile. In order to handle the load from clients, we've horizontally scaled, and spawned more pods to handle concurrent requests.
Problem:
When we send multiple requests to the system, even sequentially, we sometimes notice that requests go to the same pod as an ongoing request. What can happen is that this new request ends up getting "queued" in the server-side (not fully "queued", some progress gets made when context switches happen). The issue with queueing like this is that:
The earlier requests can start getting starved, and eventually timeout (we have a hard 30s cap from API GW).
The newer requests may also not get handled on time, and as a result get starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason the loadbalancer isn't routing it to those pods smartly. It's possible that linkerd's smarter load balancing doesn't work well for our high latency situation (we need to look into this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I want the service to immediately reject the request if one is already in progress, and have the client (meaning the golang service) retry. The client retry will hopefully hit a different pod (do let me know if that won’t happen). In order to do this, I set the "maximum_concurrent_rpcs" to 1 on the server-side (Python server). When i sent multiple requests in parallel to the system, I didn't see any RESOURCE_EXHAUSTED exceptions (even under the condition when there is only 1 server pod). What I do notice is that the requests are no longer happening in parallel on the server, they happen sequentially (I think that’s a step in the right direction, the first request doesn’t get starved). That being said, I’m not seeing the RESOURCE_EXHAUSTED error in golang. I do see a delay between the entry time in the golang client and the entry time in the Python service. My guess is that the queuing is now happening client-side (or potentially still server side, but it’s not visible to me)?
I then saw online that it may be possible for requests to get queued up on the client-side as a default behavior in http/2. I tried to test this out in custom Python client that mimics the golang one with:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
# create stub to server with channel…
However, I'm not seeing any change here either. (Note, this is a test dummy client - eventually i'll need to make this run in golang. Any help there would be appreciated as well).
Questions:
How can I get the desired effect here? Meaning server sends resource exhausted if already handling a request, golang client retries, and it hits a different pod?
Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!

How do you deploy a Golang server with multiple api endpoints as a single cloud function?

I have a main.go file that kind of looks like
func main() {
connection := db.Connect()
defer connection.Close()
// db.ResetDb() // uncomment if you want to drop the db on go run main.go
http.HandleFunc("/do-a", endpoints.DoA)
http.HandleFunc("/do-b", endpoints.DoB)
// ...
http.HandleFunc("/do-z", endpoints.DoZ)
http.ListenAndServe(":8081", nil)
}
A database connection is established in the beginning using db.Connect any function in my codebase can access the database if needed. There are then several endpoints created with http.HandleFunc. Finally the server listens on port 8081 of my local machine.
All of the endpoint handler functions are pure functions. There is no internal state that would require the server to constantly be running which is why I thought maybe cloud functions could work. The only hiccup I see for cloud functions is the database connection that needs to be established before each endpoint call. I think this issue can be fixed with GCF as it can cache objects.
On a side note, should I be deploying my backend like this? Would it be better to just run it on a typical server that runs 24/7?
You won't be able to use ListenAndServe in Cloud Functions. The way Cloud Functions works is this. It requires you define a function entry point for incoming connections whose socket is handled for you. That socket endpoint has its own dedicated URL that you can't change, and it has a path related to the name you give the function. You might want to review the documentation to see complete sample code for working HTTP functions. See how you get a request and response object handed to you.
Because of the way this works, you can't run any sort of "server" - you just handle incoming requests that are managed by the system. Typically, you give each endpoint its own deployed function. If you really want to run an HTTP server, Cloud Functions is not going to be a good choice. Consider instead App Engine, Compute Engine, or Cloud Run.

Front-facing REST API with an internal message queue?

I have created a REST API - in a few words, my client hits a particular URL and she gets back a JSON response.
Internally, quite a complicated process starts when the URL is hit, and there are various services involved as a microservice architecture is being used.
I was observing some performance bottlenecks and decided to switch to a message queue system. The idea is that now, once the user hits the URL, a request is published on internal message queue waiting for it to be consumed. This consumer will process and publish back on a queue and this will happen quite a few times until finally, the same node servicing the user will receive back the processed response to be delivered to the user.
An asynchronous "fire-and-forget" pattern is now being used. But my question is, how can the node servicing a particular person remember who it was servicing once the processed result arrives back and without blocking (i.e. it can handle several requests until the response is received)? If it makes any difference, my stack looks a little like this: TomCat, Spring, Kubernetes and RabbitMQ.
In summary, how can the request node (whose job is to push items on the queue) maintain an open connection with the client who requested a JSON response (i.e. client is waiting for JSON response) and receive back the data of the correct client?
You have few different scenarios according to how much control you have on the client.
If the client behaviour cannot be changed, you will have to keep the session open until the request has not been fully processed. This can be achieved employing a pool of workers (futures/coroutines, threads or processes) where each worker keeps the session open for a given request.
This method has few drawbacks and I would keep it as last resort. Firstly, you will only be able to serve a limited amount of concurrent requests proportional to your pool size. Lastly as your processing is behind a queue, your front-end won't be able to estimate how long it will take for a task to complete. This means you will have to deal with long lasting sessions which are prone to fail (what if the user gives up?).
If the client behaviour can be changed, the most common approach is to use a fully asynchronous flow. When the client initiates a request, it is placed within the queue and a Task Identifier is returned. The client can use the given TaskId to poll for status updates. Each time the client requests updates about a task you simply check if it was completed and you respond accordingly. A common pattern when a task is still in progress is to let the front-end return to the client the estimated amount of time before trying again. This allows your server to control how frequently clients are polling. If your architecture supports it, you can go the extra mile and provide information about the progress as well.
Example response when task is in progress:
{"status": "in_progress",
"retry_after_seconds": 30,
"progress": "30%"}
A more complex yet elegant solution would consist in using HTTP callbacks. In short, when the client makes a request for a new task it provides a tuple (URL, Method) the server can use to signal the processing is done. It then waits for the server to send the signal to the given URL. You can see a better explanation here. In most of the cases this solution is overkill. Yet I think it's worth to mention it.
One option would be to use DeferredResult provided by spring but that means you need to maintain some pool of threads in request serving node and max no. of active threads will decide the throughput of your system. For more details on how to implement DeferredResult refer this link https://www.baeldung.com/spring-deferred-result

Can I call same RPC func in many servers at the same time?

I try to find some fast algorithm of interprocess communication.
One of I need is an ability to send one command to multiple application instances at the same time. I had tried to find out for a day if I am able to start many instances of the same app (local-rpc-server-app) and call RPC from one client. I use ncalrpc protocol for this purpose.
I just want to start several instances of server and one instance if client, and then call the same RPC func one time on a client to evaluate this RPC func on every running server.
Yes, you can either use multiple client threads (each making a separate server call) or modify the .acf and mark the call with the [async] attribute. If you go the latter route you can then make multiple calls on a single client thread. Note that asynchronous RPC is a fair bit more complicated than synchronous RPC due to needing to deal with call completions.
Making calls to multiple server instances (even local instances) is also made more complicated by the fact that you will have to somehow discover those endpoints, and the RPC namespace functions (RpcNs*) are no longer available as of Windows Vista.

WebSocket pushing database updates

Most of the articles on the web dealing with WebSockets are about in-memory Chat.
I'm interested in kind of less instant Chat, that is persistent, like a blog's post's comments.
I have a cluster of two servers handling client requests.
I wonder what could be the best strategy to handle pushing of database update to corresponding clients.
As I'm using Heroku to handle this cluster (of 2 web dynos), I obviously read this tutorial aiming to build a Chat Room shared between all clients.
It uses Redis in order to centralize coming messages; each server listening for new messages to propagate to web clients through websocket connections.
My use case differs in that I've got a Neo4j database, persisting into it each message written by any client.
My goal is to notify each client from a specific room that a new message/comment has just been persisted by a client.
With an architecture similar to the tutorial linked above, how could I filter only new messages to propagate to user? Is there an easy and efficient way to tell Redis:
"(WebSocket saying) When my client initiates the websocket connection, I take care to make a query for all persisted messages and sent them to client, however I want you (Redis) to feed me with all NEW messages, that I didn't send to client, so that I will be able to provide them."
How to prevent Redis from publishing the whole conversation each time a websocket connection is made? It would lead to duplications since the database query already provided the existing contents at the moment.
This is actually a pretty common scenario, where you have three components:
A cluster of stateless web servers that maintain open connections with all clients (load balanced across the cluster, obviously)
A persistent main data storage - Neo4j in your case
A messaging/queueing backend for broadcasting messages across channels (thus across the server cluster) - Redis
Your requirement is for new clients to receive an initial feed of the recent messages, and any consequent messages in real-time. All of this is implemented in your connection handlers.
Essentially, this is what your (pseudo-)code should look like:
class ConnectionHandler:
redis = redis.get_connection()
def on_init():
self.send("hello, here are all the recent messages")
recent_msgs = fetch_msgs_from_neo4j()
self.send(recent_msgs)
redis.add_listener(on_msg)
self.send("now listening on new messages")
def on_msg(msg):
self.send("new message: ")
self.send(msg)
The exact implementation really depends on your environment, but this is the general flow of things.

Resources