I want to have a graph where all recent IPs that requested my webserver get shown as total request count. Is something like this doable? Can I add a query and remove it afterwards via Prometheus?
Technically, yes. You will need to:
Expose some metric (probably a counter) in your server - say, requests_count, with a label; say, ip
Whenever you receive a request, inc the metric with the label set to the requester IP
In Grafana, graph the metric, likely summing it by the IP address to handle the case where you have several horizontally scaled servers handling requests sum(your_prometheus_namespace_requests_count) by (ip)
Set the Legend of the graph in Grafana to {{ ip }} to 'name' each line after the IP address it represents
However, every different label value a metric has causes a whole new metric to exist in the Prometheus time-series database; you can think of a metric like requests_count{ip="192.168.0.1"}=1 to be somewhat similar to requests_count_ip_192_168_0_1{}=1 in terms of how it consumes memory. Each metric instance currently being held in the Prometheus TSDB head takes something on the order of 3kB to exist. What that means is that if you're handling millions of requests, you're going to be swamping Prometheus' memory with gigabytes of data just from this one metric alone. A more detailed explanation about this issue exists in this other answer: https://stackoverflow.com/a/69167162/511258
With that in mind, this approach would make sense if you know for a fact you expect a small volume of IP addresses to connect (maybe on an internal intranet, or a client you distribute to a small number of known clients), but if you are planning to deploy to the web this would allow a very easy way for people to (unknowingly, most likely) crash your monitoring systems.
You may want to investigate an alternative -- for example, Grafana is capable of ingesting data from some common log aggregation platforms, so perhaps you can do some structured (e.g. JSON) logging, hold that in e.g. Elasticsearch, and then create a graph from the data held within that.
Related
I would like to operate a service that anticipates having subscribers who are interested in various kinds of products. A product is a bag of dozens of attributes:
{
"product_name": "...",
"product_category": "...",
"manufacturer_id": "...",
[...]
}
A subscriber can express an interest in any subset of these attributes. For example, this subscription:
{ [...]
"subscription": {
"manufacturer_id": 1234,
"product_category": 427
}
}
will receive events that match both product_category: 427 and manufacturer_id: 1234. Conversely, this event:
{ [...]
"event": {
"manufacturer_id": 1234,
"product_category": 427
}
}
will deliver messages to any subscribers who care about:
that manufacturer_id, or
that product_category, or
both that manufacturer_id and that product_category
It is vital that these notifications be delivered as expeditiously as possible, because the subscriptions may have a few hundred milliseconds, or a second at most, to take downstream actions. The cache lookup should therefore be fast.
Question: If one wants to cache subscriptions this way for highly efficient lookup on one or more filterable attributes, what sort of approaches or architectures would allow one to do this well?
The answer depends on some factors that you have not described in your scenario. For example, what is the extent of the data? How many products/categories/users and what are the estimated data sizes for these- Megabytes, Gigabytes, Terabytes? Also what is the expected throughput of changes to products/subscriptions and events?
So my answer will be a for a medium size scenario in the Gigabytes range where you can likely fit your subscription dataset into memory on a machine.
In this case the straight forward approach would be to have your events appear on an event bus, for example implemented with Kafka or Pulsar. Then you would have a service that consumes the events as they come in and inquires an in memory data store about the subscription matches. (The in-memory db has to be built/copied on startup and kept up to date from a different event source potentially.)
This in-memory store could be a key-value database like MongoDB for example. It comes with an pure in-memory-mode that gives you more predictable performance. In order to ensure predictable, high performance lookups within the db you need to specify your indexes correctly. Any property that is relevant to the lookup needs to be indexed. Also consider that kv-stores can use compound indexes for speeding up lookups of property combinations. Other in-memory kv-stores that you may want to consider as alternatives are redis or mem-cached. If performance is a critical requirement I would recommend to do trials with different systems where you ingest your dataset, build index and try out the queries you need for comparing lookup times.
So the service can now quickly determine the set of users to notify. From here you have two choices - You could have the same service send out notifications directly, or (what I would probably do) you could separate concerns and have a second service whose responsibility is performing the actual notifications. The communication between those services could again be via a topic on the event bus system.
This kind of setup should easily work up to thousands of events per second with single service instances. If it should happen that the number of events scales to massive sizes you can run multiple instances of your services to improve throughput. For that you'd have to look into organizing consumer groups correctly for multiple consumers.
The technologies for implementing the services are probably not critical, but if I'd knew it has strict performance requirements I would go with a language that potentially has manual memory management. For example Rust or C++. Other alternatives could be languages like golang or java, but you'd have to pay attention to how garbage collection is performed and that it doesn't interfere with your performance requirements.
In terms of infrastructure - For a medium or large size system you would typically run your services in a containerized fashion on a cluster of machines, for example using kubernetes.
If it happens that your system scale is on the smaller side you may not need a distributed setup and instead can deploy the described components/services on a single machine.
With such a setup the expected round trip time from a local client should reliably be in the single digit milliseconds from the time the event comes in and a notification goes out.
The way I would do that is having a key/value table that holds an array for the "subscribers ids" by attribute name = value, like this: (where a,b,c,d,y,z are the subscriber's ids).
{ [...]
"manufacturer_id=1234": [a,b,c,d],
"product_category=427": [a,b,y,z],
[...]
}
In you example your event has "manufacturer_id" = 1234 and "product_category" = 427, so just search for the subscribers where key = manufacturer_id=1234 or product_category=427 and you'll get arrays of all subscribers you want. Then just "merge distinct" those arrays and you'll have every subscribe id you need.
Or, depending of how complex/smart is the database you are using, you can normalize it, like this:
{ [...]
"manufacturer_id": {
"1234": [a,b,c,d],
"5678": [e,f,g,h],
[...]
},
"product_category": {
"427": [a,b,g,h],
"555": [c],
[...]
},
[...]
}
I would propose sharding as an architecture pattern.
Every shard will listen for all events for all products from the source of the events.
For best latency I would propose two layers of sharding, first layer is geographical (country or city depending on customer distribution), it is connected to the source with low latency connection and it is in the same data center as the second level sharding for this location. Second level sharding is on userId and it needs to be receiving all product events, but handle subscriptions only for it's region.
The first layer has the responsibility to fan out the events to the second layer based on geographical position of the subscriptions. This is more or less a single microservice. It can be done with genral event brokers but considering it is going to be relatively simple we can implement it in golang or C++ and optimize for latency.
For the second layer every shard will respond for a number of users from the location, every shard will receive all the events for all products. A shard will be made from one microservice for subscriptions caching and notify logic and one or more notifications delivery microservices.
The subscriptions microservice stores an in memory cache of the subscriptions and checks every event for subscribed users based on maps. I.e. It stores a map from product field to subcribed userIds for example. For this microservice latency is more important so a custom implementation in golang / C++ should deliver the best latency. The subscriptions microservice should not have it's own db or any external cache as network latency is a just a drag in this case.
The notifications delivery microservices are dependant on where you want to send the notifications, but again golang or C++ can deliver one of the lowest latencies.
The system data is it's subscriptions, the data can be sharded per location and userId the same way as the rest of the architecture. So we can have a single DB per second level shard.
For storage of the product fields delending on how often they change they can be: in the code (presuming very rarely changed or never) or in the dbs, with synchronisation mechanism between the dbs if they are expected to change more often.
Let's say i have this HTTP2 service, that has a list of users and this user hair color, in memory and database well.
Now i want to scale this up into multiple nodes - however i do not want the same user to be in two different servers memory - each server shall handle those specific users. This means i need to inform the load balancer where each user is being handled. In case of de-scaling, i need to inform this user is nowhere and can be routed to any server or by a given rule - IE server with less memory being used.
Would any1 know if ALB load balancer supports that ? One path i was thinking of using Query string parameter-based routing, so i could inform in the request itself something like destination_node = (int)user_id % 4 in case i had 4 nodes for instance - and this worked well in a proof of concept but that leads to a few issues:
The service itself would need to know how many instances there are to balance.
I could not guarantee even balancing, its basically a luck based balancing.
What would be the preferred approach for this, or what is a common way of solving this problem ? Does AWS ELB supports this out of the box ? I was trying to avoid having to write my own balancer, a middleware that keeps track of what services are handling what users, whose responsibility would be distributing the requests among those servers.
In AWS Application Load Balancer (ALB) it is possible to write Routing-Rules on
Host Header
HTTP Header
HTTP Request Method
Path Pattern
Query String
Source IP
But at the moment there is no way to route under dynamic conditions.
If it possible to group your data, i would prefere path pattern like
/users/blond/123
I have buckets in 2 AWS regions. I'm able to perform puts or gets against both buckets without specifying the regional endpoint(the ruby client defaults to us-east-1).
I haven't found much relevant info on how requests on a bucket reach the proper regional endpoint when the region is not specified. From what I've found(https://github.com/aws/aws-cli/issues/223#issuecomment-22872906), it appears that requests are routed to the bucket's proper region via DNS.
Does specifying the region have any advantages when performing puts and gets against existing buckets? I'm trying to decide whether I need to specify the appropriate region for operations against a bucket or if I can just rely on it working.
Note that the buckets are long lived so the DNS propagation delays mentioned in the linked github issue are not an issue.
SDK docs for region:
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/Core/Configuration.html#region-instance_method
I do not think that there is any performance benefit to putting/getting data if you specify the bucket. All bucket names are supposed to be unique across all regions. I don't think there's a lot of overhead in that lookup, compared to data throughput.
I welcome comments to the contrary.
What is the best practise solution for programmaticaly changing the XML file where the number of instances are definied ? I know that this is somehow possible with this csmanage.exe for the Windows Azure API.
How can i measure which Worker Role VMs are actually working? I asked this question on MSDN Community forums as well: http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/02ae7321-11df-45a7-95d1-bfea402c5db1
To modify the configuration, you might want to look at the PowerShell Azure Cmdlets. This really simplifies the task. For instance, here's a PowerShell snippet to increase the instance count of 'WebRole1' in Production by 1:
$cert = Get-Item cert:\CurrentUser\My\<YourCertThumbprint>
$sub = "<YourAzureSubscriptionId>"
$servicename = '<YourAzureServiceName>'
Get-HostedService $servicename -Certificate $cert -SubscriptionId $sub |
Get-Deployment -Slot Production |
Set-DeploymentConfiguration {$_.RolesConfiguration["WebRole1"].InstanceCount += 1}
Now, as far as actually monitoring system load and throughput: You'll need a combination of Azure API calls and performance counter data. For instance: you can request the number of messages currently in an Azure Queue:
http://yourstorageaccount.queue.core.windows.net/myqueue?comp=metadata
You can also set up your role to capture specific performance counters. For example:
public override bool OnStart()
{
var diagObj= DiagnosticMonitor.GetDefaultInitialConfiguration();
AddPerfCounter(diagObj,#"\Processor(*)\% Processor Time",60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Request Execution Time", 60.0);
AddPerfCounter(diagObj,#"\ASP.NET Applications(*)\Requests Executing", 60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Requests/Sec", 60.0);
//Set the service to transfer logs every minute to the storage account
diagObj.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
//Start Diagnostics Monitor with the new storage account configuration
DiagnosticMonitor.Start("DiagnosticsConnectionString",diagObj);
}
So this code captures a few performance counters into local storage on each role instance, then every minute those values are transferred to table storage.
The trick, now, is to retrieve those values, parse them, evaluate them, and then tweak your role instances accordingly. The Azure API will let you easily pull the perf counters from table storage. However, parsing and evaluating will take some time to build out.
Which leads me to my suggestion that you look at the Azure Dynamic Scaling Example on the MSDN code site. This is a great sample that provides:
A demo line-of-business app hosting a wcf service
A load-generation tool that pushes messages to the service at a rate you specify
A load-monitoring web UI
A scaling engine that can either be run locally or in an Azure role.
It's that last item you want to take a careful look at. Based on thresholds, it compares your performance counter data, as well as queue-length data, to those thresholds. Based on the comparisons, it then scales your instances up or down accordingly.
Even if you end up not using this engine, you can see how data is grabbed from table storage, massaged, and used for driving instance changes.
Quantifying the load is actually very application specific - particularly when thinking through the Worker Roles. For example, if you are doing a large parallel processing application, the expected/hoped for behavior would be 100% CPU utilization across the board and the 'scale decision' may be based on whether or not the work queue is growing or shrinking.
Further complicating the decision is the lag time for the various steps - increasing the Role Instance Count, joining the Load Balancer, and/or dropping from the load balancer. It is very easy to get into a situation where you are "chasing" the curve, constantly churning up and down.
As to your specific question about specific VMs, since all VMs in a Role definition are identical, measuring a single VM (unless the deployment starts with VM count 1) should not really tell you much - all VMs are sitting behind a load balancer and/or are pulling from the same queue. Any variance should be transitory.
My recommendation would be to pick something that is not inherently highly variable to monitor (e.g. CPU). Generally, you want to find a trending point - for web apps it may be the response queue, for parallel apps it may be azure queue depth, etc. but for either they would be the trend and not the absolute number. I would also suggest measuring them at fairly broad intervals - minutes, not seconds. If you have a load you need to respond to in seconds, then realistically you will need to increase your running instance count ahead of time.
With regard to your first question, you can also use the Autoscaling Application Block to dynamically change instance counts based on a set of predefined rules.
Consider a poker game server which hosts many tables. While a player is at the lobby he has a list of all the active tables and their stats. These stats constantly change while players join, play, and leave tables. Tables can be added and closed.
Somehow, these changes must be notified to the clients.
How would you implement this functionality?
Would you use TCP/UDP for the lobby (that is, should users connect to server to observe the lobby, or would you go for a request-response mechanism)?
Would the server notify clients about each event, or should the client poll the server?
Keep that in mind: Maybe the most important goal of such a system is scalability. It should be easy to add more servers in order to cope with growing awdience, while all the users should see one big list that consists from multiple servers.
This specific issue is a manifestation of a very basic issue in your application design - how should clients be connecting to the server.
When scalability is an issue, always resort to a scalable solution, using non-blocking I/O patterns, such as the Reactor design pattern. Much preferred is to use standard solutions which already have a working and tested implementation of such patterns.
Specifically in your case, which involves a fast-acting game which is constantly updating, it sounds reasonable to use a scalable server (again, non-blocking I/O), which holds a connection to each client via TCP, and updates him on information he needs to know.
Request-response cycle sounds less appropriate for your case, but this should be verified against your exact specifications for your application.
That's my basic suggestion:
The server updates the list (addition, removal, and altering exsisting items) through an interface that keeps a queue of a fixed length of operations that have been applied on the list. Each operation is given a timestamp. When the queue is full, the oldest operations are progressivly discarded.
When the user first needs to retrive the list, it asks the server to send him the complete list. The server sends the list with the current timestamp.
Once each an arbitary period of time (10-30 seconds?) the client asks the server to send him all the operations that have been applied to the list since the timestamp he got.
The server then checks if the timestamp still appears in the list (that is, it's bigger than the timestamp of the first item), and if so, sends the client the list of operations that have occured from that time to the present, plus the current timestamp. If it's too old, the server sends the complete list again.
UDP seems to suit this approach, since it's no biggy if once in a while an "update cycle" get's lost.