Hystrix command key decision, Service Name+Instance IP+Api Name? - high-availability

I want to implement Hystrix in gateway(like zuul).
The gateway will discover service A, B or C, assume the service A has 10 instances and 10 Api. My question is.
What is the best practice for the command key decision? Service Name+Instance IP+Api Name.
it seems this gain the best detail level, as the different api, different instance fail will not circle break the other, But it may occupy large volume of command key.
Here is the example. Suppose I talk to service A, there are 5 instances of service A, I talk to service A with a load balancer, and the ip as below
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
and service A has 4 api, like
createOrder
deleteOrder
updateOrder
getOrder
Now there are many options for the command key choosen.
serivce level, like serviceA
instance level, like 192.168.1.1
instance + api level like 192.168.1.1_getOrder
for the first option, there are only one hystrix command, it take less cpu or memory, but if one api fail, all api are circle breaks.

Your HystrixCommandKey identifies a HystrixCommand, which encapsulates aService.anOperation(). Thus a HystrixCommandKey could be named using the composite key Service+Command (but not instances running the service or the IP addresses). If you do not provide an explicit name, the class name of HystrixCommand is used as the default HystrixCommandKey.
The Hystrix Dashboard then aggregates metrics for each of the HystrixCommandKey (Service+Command), from each instance running in the service cluster.
In your example, it would be serviceA_createOrder.

Related

Grafana/Prometheus visualizing multiple ips as query

I want to have a graph where all recent IPs that requested my webserver get shown as total request count. Is something like this doable? Can I add a query and remove it afterwards via Prometheus?
Technically, yes. You will need to:
Expose some metric (probably a counter) in your server - say, requests_count, with a label; say, ip
Whenever you receive a request, inc the metric with the label set to the requester IP
In Grafana, graph the metric, likely summing it by the IP address to handle the case where you have several horizontally scaled servers handling requests sum(your_prometheus_namespace_requests_count) by (ip)
Set the Legend of the graph in Grafana to {{ ip }} to 'name' each line after the IP address it represents
However, every different label value a metric has causes a whole new metric to exist in the Prometheus time-series database; you can think of a metric like requests_count{ip="192.168.0.1"}=1 to be somewhat similar to requests_count_ip_192_168_0_1{}=1 in terms of how it consumes memory. Each metric instance currently being held in the Prometheus TSDB head takes something on the order of 3kB to exist. What that means is that if you're handling millions of requests, you're going to be swamping Prometheus' memory with gigabytes of data just from this one metric alone. A more detailed explanation about this issue exists in this other answer: https://stackoverflow.com/a/69167162/511258
With that in mind, this approach would make sense if you know for a fact you expect a small volume of IP addresses to connect (maybe on an internal intranet, or a client you distribute to a small number of known clients), but if you are planning to deploy to the web this would allow a very easy way for people to (unknowingly, most likely) crash your monitoring systems.
You may want to investigate an alternative -- for example, Grafana is capable of ingesting data from some common log aggregation platforms, so perhaps you can do some structured (e.g. JSON) logging, hold that in e.g. Elasticsearch, and then create a graph from the data held within that.

A distributed sequence of actions over services that can horizontally scale

I have a microservice distributed sequence of action. Service A needs to tell service B to do something and once that is complete it will tell service C. The sequence is important so I'm using the saga pattern as you can see.
My issue is that service B can scale and each instance needs to receive the message and complete the action. The action must happen on every service B instance. Then service C should only run once all the service B instances have completed their task.
It is a cache purge that must happen on each instance. I have no control over this architecture so the cache for service B is coupled to each instance. I would have a shared cache for the instances if I could.
I have come up with this orchestration solution but it requires maintaining state and lots of extra code to handle edge cases which I would like to avoid.
service A sends the same message to all service B instances which it knows about
all service B instances send success to service A
On the final service B success, service A messages service C
Is there a better alternative to this?
Assuming that you can't rearchitect service B, you've captured the essential complexity of the operation: A will have to track instances of service B and will have to deal with a ton of edge cases. The process is fundamentally stateful.
If the cache purge command is idempotent (i.e. you don't care if it happens multiple times in the process) you can simplify some of the edge case handling and can get away with the state being less durable (on failure you can start from the beginning instead of needing to reconstruct where you were in the process).

ELB Balancing Stateful Servers

Let's say i have this HTTP2 service, that has a list of users and this user hair color, in memory and database well.
Now i want to scale this up into multiple nodes - however i do not want the same user to be in two different servers memory - each server shall handle those specific users. This means i need to inform the load balancer where each user is being handled. In case of de-scaling, i need to inform this user is nowhere and can be routed to any server or by a given rule - IE server with less memory being used.
Would any1 know if ALB load balancer supports that ? One path i was thinking of using Query string parameter-based routing, so i could inform in the request itself something like destination_node = (int)user_id % 4 in case i had 4 nodes for instance - and this worked well in a proof of concept but that leads to a few issues:
The service itself would need to know how many instances there are to balance.
I could not guarantee even balancing, its basically a luck based balancing.
What would be the preferred approach for this, or what is a common way of solving this problem ? Does AWS ELB supports this out of the box ? I was trying to avoid having to write my own balancer, a middleware that keeps track of what services are handling what users, whose responsibility would be distributing the requests among those servers.
In AWS Application Load Balancer (ALB) it is possible to write Routing-Rules on
Host Header
HTTP Header
HTTP Request Method
Path Pattern
Query String
Source IP
But at the moment there is no way to route under dynamic conditions.
If it possible to group your data, i would prefere path pattern like
/users/blond/123

How to limit rate of out going http calls in scaled microservice?

I have a scenario in which my microservice is scaled to 3 instances. Each service makes http calls to third party service. However, the third party service has a rate limit i.e. it cannot accept more than 1000 requests per second. Now that I have 3 instances of same service running its hard to keep track of count. Any solutions that could help me implement this?
You can use Circuit Breaker pattern and tools like Hystrix in such a scenario.
My answer is based on assumption that each service is independent and don't interact with each others and can possibly scaled up or down.
Use Redis data cache service, introduce a variable there. Each service will be able to refer that variable and will update when ever they make a API call, write some conditions so no service is allow to make calls if its reach to 1000 for that specific second .
Hence they will not be able to make more than 1000 call per seconds.

Is there a way to reverse the bind on zmq pub/sub?

I have server code on one box that needs to listen in on status coming from another box with about 10 chips with linux embedded in them. The 10 chips have their own ip addresses and each will send basically health status to the server which could (possibly) do something with it.
I would like the server just to passively listen and not have to send a response. So, this looks like a job for zmq's pub/sub. Where, each of the 10 chips have their own publication and the server would subscribe to each.
However, the server would need to know the well known address that each chip bound their publication to. But, in the field, these chips can be swapped or replace with a different ip address.
Instead, it's safer to have the chips know the server code's ip adddress.
What I would like a pub/sub where the receiver is the well known address. Or, a request/response pattern where the clients (the chips) send a messages to the server (the requests), but neither the server nor the chips need to send/receive a response.
Now, currently, there are two servers on the separate box. So, if possible I'd like a solution for one server and multiple servers.
Is this possible in zmq? And what pattern would that be?
thanks.
Yes, you can do this exactly the way you'd expect to do so. Just bind on your subscriber, then connect to that subscriber with your publishers. ZMQ doesn't designate which end should be the "server", or more reliable end, and which should be the "client", or more transient end, specifically for this reason, and this is an excellent reason to switch up the normal paradigm.
Edit to address the new clarification--
It should work fine with multiple servers. In general it would work like the following (the order of operations in this case is just to ensure no messages get lost, which is possible if the PUB socket starts sending messages before the SUB is ready):
Spin up server 1. Create SUB socket and bind on address:port.
Spin up server 2. Create SUB socket and bind on address:port.
Spin up a chip. That chip will create a PUB socket and connect on [server 1] address:port and connect on [server 2] address:port.
Repeat step (3) for the other nine chips.
Dual .SUB model
Oh yes, each .PUB-lishing entity may have numerous .SUB-s listening,
so having two <serverNode>-s meets the .PUB/.SUB-primitive Formal Communication Pattern ( one speaks - many listen )
As given above, each of your <serverNode> binds
.bind( aFixServer{A|B}_ipAddress_portNumber )
so as allow each .PUB-lishing <chipNode> to
.connect( anAprioriKnownServer{A|B}_bindingNode_ipAddress_portNumber )
And both <serverNode{A|B}> than .SUB-s to receive any messages from them.
Multi-Server model
As seen above, the {A|B} grammar is freely extensible to {A|B|C|D|...} so the principal messaging model will stand for any reasonable multi-server extension
Q.E.D.

Resources