Query for a cache hit rate graph with prometheus - spring-boot

I'm using Caffeine cache with Spring Boot application. All metrics are enabled, so I have them on Prometheus and Grafana.
Based on cache_gets_total metric I want to build a HitRate graph.
I've tried to get a cache hits:
delta(cache_gets_total{result="hit",name="myCache"}[1m])
and all gets from cache:
sum(delta(cache_gets_total{name="myCache"}[1m]))
Both of the metrics works correctly and have values. But when I'm trying to get a hit ratio, I have no data points. Query I've tried:
delta(cache_gets_total{result="hit",name="myCache"}[1m]) / sum(delta(cache_gets_total{name="myCache"}[1m]))
Why this query doesn't work and how to get a HitRate graph based on information, I have from Spring Boot and Caffeine?

Run both ("cache hits" and "all gets") queries individually in prometheus and compare label sets you get with results.
For "/" operation to work both sides have to have exactly the same labels (and values). Usually some aggregation is required to "drop" unwanted dimensions/labels (like: if you already have one value from both queries then just wrap them both in sum() - before dividing).

First of all, it is recommended to use increase() instead of delta for calculating the increase of the counter over the specified lookbehind window. The increase() function properly handles counter resets to zero, which may happen on service restart, while delta() would return incorrect results if the given lookbehind window covers counter resets.
Next, Prometheus searches for pairs of time series with identical sets of labels when performing / operation. Then it applies individually the given operation per each pair of time series. Time series returned from increase(cache_gets_total{result="hit",name="myCache"}[1m]) have at least two labels: result="hit" and name="myCache", while time series returned from sum(increase(cache_gets_total{name="myCache"}[1m])) have zero labels because sum removes all the labels after the aggregation.
Prometheus provides the solution to this issue - on() and group_left() modifiers. The on() modifier allows limiting the set of labels, which should be used when searching for time series pairs with identical labelsets, while the group_left() modifier allows matching multiple time series on the left side of / with a single time series on the right side of / operator. See these docs. So the following query should return cache hit rate:
increase(cache_gets_total{result="hit",name="myCache"}[1m])
/ on() group_left()
sum(increase(cache_gets_total{name="myCache"}[1m]))
There are alternative solutions exist:
To remove all the labels from increase(cache_gets_total{result="hit",name="myCache"}[1m]) with sum() function:
sum(increase(cache_gets_total{result="hit",name="myCache"}[1m]))
/
sum(increase(cache_gets_total{name="myCache"}[1m]))
To wrap the right part of the query into scalar() function. This enables vector op scalar matching rules described here:
increase(cache_gets_total{result="hit",name="myCache"}[1m])
/
scalar(sum(increase(cache_gets_total{name="myCache"}[1m])))
It is also possible to get cache hit rate for all the caches with a single query via sum(...) by (name) template:
sum(increase(cache_gets_total{result="hit"}[1m])) by (name)
/
sum(increase(cache_gets_total[1m])) by (name)

Related

PROMQL Joining more metrics

I'm using SYSDIG monitoring in IBM Cloud.
I have these two metrics
first:
sum by(container_image_repo,container_image_tag) (sysdig_container_cpu_cores_used)
Which return by repo and tag the total used cpu (in Value_A)
second:
count by(container_image_repo,container_image_tag)(sysdig_container_info)
Which return by repo and tag the total number of containers (in Value_B)
My problem is that I would like to have one single request which returns the two metrics at the same time by repo and tag, i.e.:
Repo Tag Value_A Value_B
Any hints?
I tried joining the two requests,
sum by(container_image_repo,container_image_tag) (sysdig_container_cpu_cores_used) *on (container_image_repo,container_image_tag) (count by(
container_image_repo,container_image_tag)(sysdig_container_info))
but I get still one value (which is the multiplication of the two values A*B, grouped by repo and tag. No surprise indeed...)
Thank you
This is not related to Sysdig, but to the way how PromQL is designed. In PromQL, when you apply a function over a vector, the resulting vector or scalar does not contain the metric name (since this is not the same metric anymore, but its derivative).
In your example, these two metrics that you are using denote two different things:
sysdig_container_cpu_cores_used: is the number of cores a particular container occupies
sysdig_container_info : a set of additional labels for each container. In order not to add all container information to every container metric (such as agent id, container id, the id of the image in the container, image digest value, etc.), when you need it, you can join that container metric with sysdig_container_info to enrich it with these additional labels.
In my opinion, your query gives you all the relevant info you stated you need:
e.g.
Repo Tag Value (CPU used)
k8s.gcr.io/kops/kops-controller 1.22.3 0.00148
Disclosure:
I work as an engineer at Sysdig, but my answers/comments are strictly my own.

How do I only filter the results for custom graphs on the HTML Report for Jmeter?

I've tried using the following under the definition for custom graph but it filters the entire report:
## Custom graph definition
jmeter.reportgenerator.graph.custom_mm_hit.classname=org.apache.jmeter.report.processor.graph.impl.ResponseTimeOverTimeGraphConsumer
jmeter.reportgenerator.graph.custom_mm_hit.title=Login Response Time Comparison
jmeter.reportgenerator.graph.custom_mm_hit.property.set_Y_Axis=Response Time (ms)
jmeter.reportgenerator.graph.custom_mm_hit.property.set_X_Axis=Over Time
jmeter.reportgenerator.graph.custom_mm_hit.property.set_granularity=${jmeter.reportgenerator.overall_granularity}
jmeter.reportgenerator.graph.custom_mm_hit.property.setSampleVariableName=label
jmeter.reportgenerator.graph.custom_mm_hit.property.setContentMessage=Message for graph point label
jmeter.reportgenerator.exporter.html.series_filter=^(Run 1 Login|Run 2 Login)(-success|-failure)?$
How can I provide separate filtering for each custom graph?
For instance, if there are 3 transactions being monitored but I would like to split out one on its own Response Time Over Time custom graph while keeping all 3 on the original Charts dropdown Response Time over Time graph.
Thanks in advance!
As per JMeter Properties Reference:
jmeter.reportgenerator.exporter.html.series_filter
Regular Expression which Indicates which graph series are filtered in display.
Empty value means no filtering.
Defaults to empty value.
So you can apply a filter only to all HTML charts.
The only workaround I can think of is storing particular transaction response time into a Sample Variable and plot it as a custom chart
Alternative solution would be uploading your test results to BM.Sense analysis service where you can apply whatever filters you want on the Composite Timeline Analysis tab

Is Elasticsearch Scroll API not recommended for real-time pagination?

I understand that Elasticsearch Scroll API is not intended for real-time user requests. But would it be bad if it's used for that? I have a requirement to implement paginated results (to be displayed on web frontend) and from/size approach is returning duplicates across pages. Presumably because I have a sharded setup (with no replicas at all). I've tried setting preferencebut it did not help.
Scroll API does not seem to have this issue, I'm wondering if it's really bad to use it for my use case?
Thanks
Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests. it means that your pagination is based on the time you requested the search result, so you don't see new document or will see deleted in your result. Also Scroll API is not recommended by ES for deep pagination any more(ES 7.x). you can find more info on ElasticSearch documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/scroll-api.html
On the question 'why you get duplicate results', I think this is caused by intermediate indexing. When doing independent search calls with pagination, each call runs independently (still using some caching). So if you ask the first 100, you get the first 100 at that time. When then asking x seconds later the 'next' 100, you get 100 - 199 at x seconds later. If meanwhile a new document got indexed which logically fits in the first 100, it will push the rest further. This way, your result 100 (first in the second results) might have been #99 in the first call. When then gluing them together in the UI, you see the same result twice.
Both scroll and search-after are designed to refer ES back to the original call, indicating it that you want to continue counting from that moment onwards.
I have not found a good explanation though why search_after is better than scroll.
I assume that scroll is optimized for the use case where you will go through the entire set anyway (so the pagination is to avoid overloading the client and the pipe between ES and client with too big chunks at once). While search_after is optimized for the use case where you are likely to only go a few pages far/deep (it is known that human users tend to stay on the first page with a quickly lowering frequency of going much further, because you would force your eyes to find something into overwhelming amounts of information). Implementing good filters in the user interface is the much better approach.

What's the expected behavior of the Bing Search API v5 when deeply paginating?

I perform a bing API search for webpages and the query cameras.
The first "page" of results (offset=0, count=50) returns 49 actual results. It also returns a totalEstimatedMatches of 114000000 -- 114 million. Neat, that's a lot of results.
The second "page" of results (offset=49, count=50) performs similarly...
...until I reach page 7 (offset=314, count=50). Suddenly totalEstimatedMatches is 544.
And the actual count of results returned per-page trails off precipitously from there. In fact, over 43 "pages" of results, I get 413 actual results, of which only 311 have unique URLs.
This appears to happen for any query after a small number of pages.
Is this expected behavior? There's no hint from the API documentation that exhaustive pagination should lead to this behavior... but there you have it.
Here's a screenshot:
Each time the API is called, the search API obtains a group of possible matches starting at in the result set, and then filters out the results based on different parameters (e.g spam, duplicates, safesearch setting, etc), finally leaving a final result set.  If the final result after filtering and optimization is more than the count parameter then the number of results equal to count would be returned. If the parameter is more than the final result set count then the final result set is returned which will be less than the count parameter.  If the search API is called again, passing in the offset parameter to get the next set of results, then the filtering process happens again on the next set of results which means it may also be less than count.
 
You should not expect the full count parameter number of results to always be returned for each API call.  If further search results beyond the number returned are required then the query should be called again, passing in the offset parameter with a value equal to the number of results returned in the previous API call.  This also means that when making subsequent API calls, the offset parameter should never be a hard coded value and should always be calculated based on the results of previous queries. 
 
totalEstimatedMatches can also add to confusion around the Bing Search API results.  The word ‘estimated’ is important because the number is an estimation based on an initial quick result set, prior to the filtering described above.  Additionally, the totalEstimatedMatches value can change as you iterate through the result set by making subsequent API calls with increasing offset values.  The totalEstimatedMatches should only be used as a rough guide indicating the magnitude of the possible result set, and it should not be used to determine the number of results that will ultimately be returned.  To query all of the possible results you should continue making API calls, passing in offset with a value of the sum of the results returned in previous calls, until that sum is greater than totalEstimatedMatches of the most recent API call.
 
Note that you can see this same behavior by going to bing.com directly and using a query such as https://www.bing.com/search?q=bill+gates&count=50.  Notice that you will get around 34 results with a totalEstimatedMatches of ~567,000 (valid as of June 2017, future searches may change), and if you click the 'next page' arrow you will see that the next query executed will start at the offset of the 34 returned in the first query (ie. https://www.bing.com/search?q=bill+gates&count=50&first=34).  If you click ‘next’ several more times you may see the totalEstimatedMatches also change from page to page.
This seems to be expected behavior. The Web Search API is not a crawler API, thus it only delivers results, that the algorithms deem relevant for a human. Simply put, most humans won't skim through more than a few pages of results, furthermore they expect to find relevant results on the first page.
If you could retrieve the results in the millions, you could simply copy their search index and Bing would be out of business.
Search indices seem to be things of political and economic power, as far as I know there are only four relevant search indices world wide: from Google, from Microsoft (Bing), from Russia, and from China.
Those who control the search, control the Spice... ;-)

How does Parse Query.each count towards execution limits

I am wondering how the each command on a Parse Query counts towards the request execution limits. I am building an app that will need to perform a function on many objects (could be more than 1000) in a parse class.
For example (in JavaScript),
var query = new Parse.Query(Parse.User);
query.equalTo('anObjectIWant',true); //there could be more than 1000 objects I want
query.each(function(object){
doSomething(object); //doSomething does NOT involve another Parse request
});
So, will the above code count as 1 request towards my Parse application execution limit (you get 30/second free), or will each object (each recurrence of calling "each") use one request (so 1000 objects would be 1000 requests)?
I have evaluated the resource usage by observing the number of API requests made by query.each() for different result set sizes. The bottom line is that (at the moment of writing) this function is using the default query result count limit of 100. Thus if your query matches up to 100 results it will make 1 API request, 2 API requests for 101-200 and so forth.
This behavior can not be changed by manually increasing the limit to the maximum using query.limit(1000). If you do this you will get an error when you call query.each() afterwards (this is also mentioned in the documentation).
Therefore it has to be considered to manually implement this functionality (e.g., by recursive query.find()) which allows you to set the query limit to 1000 and thus, in the best case, only consumes one-tenth of the API requests query.each() would consume.
This would count as 1 or 2 depending on :
If it is run from cloudcode function =2,when 1 is for cloudcode call + 1 for query. Since queries get their results all at once it is single call.
If this should be place within "beforeSave" functions or similar then only query would be counted, 1 API call.
So you should be pretty fine as long as you don't trigger another parse API for each result.
I would not be surprised if the .each method would query the server each iteration.
You can actually check this using their "control panel", just look at the amount of requests beeing made.
We left Parse after doing some prototyping, one of the reasons was that while using proper and sugested code from the parse website, I managed to create 6500 requests a day beeing the only one using the app.
Using our own API, we are down to not more than 100.

Resources