Confusion about micrometer metrics - Isn't the gauge supposed to calculate the value automatically just before it is submitted? - metrics

I am exploring micrometer and aws cloudwatch. I think there is some understanding gap -
I've create a gauge which is supposed to return the number of connections being used in a connection pool.
public MetricService(CloudWatchConfig config) {
this.cloudwatchMeterRegistry = new CloudWatchMeterRegistry(config, Clock.SYSTEM, CloudWatchAsyncClient.create());
gauge = Gauge.builder("ConnectionPoolGauge", this.connectionPool, value -> {
Double usedConnections = 0.0;
for (Map.Entry<String, Boolean> entry : value.entrySet()) {
if (entry.getValue().equals(Boolean.FALSE)) {
usedConnections++;
}
}
return usedConnections;
})
.tag("GaugeName", "Bhushan's Gauge")
.strongReference(true)
.baseUnit("UsedConnections")
.description("Gauge to monitor connection pool")
.register(Metrics.globalRegistry);
Metrics.addRegistry(cloudwatchMeterRegistry);
}
As you can see, I am currently initiating this gauge in a constructor. Passing the connectionPool instance from outside.
Following is a controller method which consumes the connection -
#GetMapping("/hello")
public String hello() {
// connectionPool.consumeConnection();
// finally { connectionPool.releaseConnection();}
}
Step interval is set to 10 seconds. My understanding is - Every 10 seconds, Micrometer should automatically execute the double function passed to the gauge.
Obviously, it is not happening. I've seen some code samples here which are explicitly setting the gauge value (in a separate thread or scheduled logic).
I also tried with a counter which is instantiated only once, but I explicitly invoke the increment method per call to hello method. My expectation was this counter would keep on incrementing, but after a while, it drops to 0 and starts counting again.
I am totally confused. Appreciate if someone can put light on this concept.
Edit:
Tried following approach for creating Gauge - still no luck.
cloudwatchMeterRegistry.gauge("ConnectionPoolGauge", this.connectionPool, value -> {
Double usedConnections = 0.0;
System.out.println("Inside Guage Value function." + value.entrySet());
for (Map.Entry<String, Boolean> entry : value.entrySet()) {
if (entry.getValue().equals(Boolean.FALSE)) {
usedConnections++;
}
}
return usedConnections;
});
This doesn't return the instance of Gauge, so I cannot call value() on it. Also the gauge is not visible in AWS Cloudwatch. I can see the counter in cloudwatch that I created in the same program.

Micrometer takes the stance that gauges should be sampled and not be set, so there is no information about what might have occurred between samples. After all, any intermediate values set on a gauge are lost by the time the gauge value is reported to a metrics backend anyway, so there seems to be little value in setting those intermediate values in the first place.
If it helps, think of a Gauge as a "heisen-gauge" - a meter that only changes when it is observed. Every other meter type provided out-of-the-box accumulates intermediate counts toward the point where the data is sent to the metrics backend.
So the gauge is updated when the metrics are published, here are a few tips to troubleshooting this:
Put a brake point in the publish method of your CloudWatchMeterRegistry and see if it is called or not.
You are using the Global registry (Metrics.addRegistry) as well as keeping the reference to CloudWatchMeterRegistry (this.cloudwatchMeterRegistry = new CloudWatchMeterRegistry). You don't need both, I would suggest to do not use the Global registry and inject the registry you have wherever you need it.
I'm not sure what you are doing with the connection pool (did you implement your own one?) but there is out-of-the-box support for HikariCP and DBCP is publishing JMX counters that you can bind to Micrometer.

Related

Cypress: global variables in callbacks

I am trying to set up a new Cypress framework and I hit a point where I need help.
The scenario I am trying to work out: a page that is calling the same endpoint after every change and I use an interceptor to wait for the call. As we all know you can’t use the same intercept name for the same request multiple times so I did a trick here. I used a dynamically named alias, which works.
var counters = {}
function registerIntercept(method, url, name) {
counters[name] = 1;
cy.intercept(method, url, (req) => {
var currentCounter = counters[name]++;
cy.wrap(counters).as(counters)
req.alias = name + (currentCounter);
})
}
function waitForCall(call) {
waitForCall_(call + (counters[call.substr(1)]));
// HERE counters[any_previously_added_key] is always 1, even though the counters entry in registerIntercept is bigger
// I suspect counters here is not using the same counters value as registerIntercept
}
function waitForCall_(call) {
cy.wait(call)
cy.wait(100)
}
It is supposed to be used by using waitForCall(“#callalias”) and it will be converted to #callalias1, callalias2 and so on.
The problem is that the global counters is working in registerIntercept but I can’t get its values from waitForCall. It will always retrieve value 1 for a specific key, while the counters key in registerIntercept is already at a bigger number.
This code works if you use it as waitForCall_(“#callalias4”) to wait for the 4th request for example. Each intercept will get an alias ending with an incremented number. But I want to not keep track of how many calls were made and let the code retrieve that from counters and build the wait.
Any idea why counters in waitForCall is not having the same values for its keys as it has in registerIntercept?

How to record custom performance metrics with GCP Monitoring from python3 apps

I'm working on decorator that can be added to python methods that send a metric to GCP Monitoring. The approach is confirmed but the API calls to push the metrics fail if I attempt to send more than 1 observation. The patter is collect metrics and flush after the process finishes to keep it simple for this test. The code to capture the metric inline is here:
def append(self, value):
now = time.time()
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": seconds, "nanos": nanos}}
)
point = monitoring_v3.Point({
"interval": interval,
"value": {"double_value": value}
}
)
self.samples[self.name].append(point)
The code below takes a batch of data points in PerfMetric.samples dict pointing to arrays of the monitoring_v3.Point class which was attached in the method append via a decorator not shown here to call RPC called create_time_series using the MetricServiceClient class. We point to an array of arrays, so perhaps that's not right or somehow our meta data isn't right in append?
#staticmethod
def flush():
client = monitoring_v3.MetricServiceClient()
for x in PerfMetric.samples:
print('{} has {} points'.format(x, len(PerfMetric.samples[x])))
series = monitoring_v3.TimeSeries()
series.metric.type = 'custom.googleapis.com/perf/{}'.format(x)
series.resource.type = "global"
series.points = PerfMetric.samples[x]
client.create_time_series(request={
"name": PerfMetric.project_name,
"time_series": [series]}
)
Thanks in advance for any suggestions!
I believe this is a documented limitation in the TimeSeries call from the Cloud Monitoring API regarding the points[] object for its data points:
When creating a time series, this field must contain exactly one point and the point's type must be the same as the value type of the associated metric.

RunnableGraph to wait for multiple response from source

I am using Akka in Play Controller and performing ask() to a actor by name publish , and internal publish actor performs ask to multiple actors and passes reference of sender. The controller actor needs to wait for response from multiple actors and create a list of response.
Please find the code below. but this code is only waiting for 1 response and latter terminating. Please suggest
// Performs ask to publish actor
Source<Object,NotUsed> inAsk = Source.fromFuture(ask(publishActor,service.getOfferVerifyRequest(request).getPayloadData(),1000));
final Sink<String, CompletionStage<String>> sink = Sink.head();
final Flow<Object, String, NotUsed> f3 = Flow.of(Object.class).map(elem -> {
log.info("Data in Graph is " +elem.toString());
return elem.toString();
});
RunnableGraph<CompletionStage<String>> result = RunnableGraph.fromGraph(
GraphDSL.create(
sink , (builder , out) ->{
final Outlet<Object> source = builder.add(inAsk).out();
builder
.from(source)
.via(builder.add(f3))
.to(out); // to() expects a SinkShape
return ClosedShape.getInstance();
}
));
ActorMaterializer mat = ActorMaterializer.create(aSystem);
CompletionStage<String> fin = result.run(mat);
fin.toCompletableFuture().thenApply(a->{
log.info("Data is "+a);
return true;
});
log.info("COMPLETED CONTROLLER ");
If you have several responses ask won't cut it, that is only for a single request-response where the response ends up in a Future/CompletionStage.
There are a few different strategies to wait for all answers:
One is to create an intermediate actor whose only job is to collect all answers and then when all partial responses has arrived respond to the original requestor, that way you could use ask to get a single aggregate response back.
Another option would be to use Source.actorRef to get an ActorRef that you could use as sender together with tell (and skip using ask). Inside the stream you would then take elements until some criteria is met (time has passed or elements have been seen). You may have to add an operator to mimic the ask response timeout to make sure the stream fails if the actor never responds.
There are some other issues with the code shared, one is creating a materializer on each request, these have a lifecycle and will fill up your heap over time, you should rather get a materializer injected from play.
With the given logic there is no need whatsoever to use the GraphDSL, that is only needed for complex streams with multiple inputs and outputs or cycles. You should be able to compose operators using the Flow API alone (see for example https://doc.akka.io/docs/akka/current/stream/stream-flows-and-basics.html#defining-and-running-streams )

how can I prove field-grouping functionality(tuple with same field goes to same task)?

I'm a fresher of Storm, I'm getting started with Storm using the project storm-starter. In this project there is a Topology called WordCountTopology, the key code for building topology is:
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
and in the implementation of WordCount bolt, the key method execute is:
#Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
My Question is:
As the functionality of filed-grouping is that: tuples with the same filed word will go to the same task for post processing. Here "task" means thread, how can I prove this functionality? In addition, in my opinion, the logic in method execute is a little awkward. In a single task, the parameter tuple is always the same, but in the execute method it does not reflect this, in other words, the logic dose not use this convenience.
Am I clear? My point is that, the code here in execute is not taking the feature of filed-grouping into account, the code here can also be applied to the situation of shuffle-grouping.
I would like to site few points, it might help clear your doubts
Here "task" means thread
In storm's terminology tasks are NOT threads but they are responsible for processing the actual logic. Each spout or bolt that you implement in your code executes as many tasks across the cluster. So you can define them as an running instance of the components i.e Spouts or Bolts.
There is another entity called Executors which are the thread responsible for running these tasks.It can run one or multiple tasks of the same component. An executor having multiple tasks actually is saying the same component is executed for multiple times by the executor.
Now coming back to your question
the code here in execute is not taking the feature of filed-grouping into account, the code here can also be applied to the situation of shuffle-grouping
In very brief A fields grouping lets you group a stream by a subset of its fields, meaning in order to do a word count, if we filtered the stream by using fieldsGrouping on a field name 'first_name` then it is expected that all the first_name field with a value say (Foo) should go to the same task, and the same field with a different value (Bar) goes to another task.
So here the execute method is supposed to receive the same field value and thus can easily update its counter and to do that it does not require to consider anything special. The whole logic is written keeping in mind that the bolt will be chained with the proper data and that's why using the proper grouping become such an important thing. So if you use shuffleGrouping then same code will run but produces incorrect data.
Well Pinky (or anyone else who finds this useful), to prove it, you just have to keep track of the bolt or spout task ID:
#Override
public void prepare(Map map, TopologyContext tc, OutputCollector oc) {
this.boltId = tc.getThisTaskId();
}
Now in the execute() of the same fieldsGrouped bolt that receives the tuples, you just print the id and the tuple:
#Override
public void execute(Tuple tuple) {
String myWord = (String) tuple.getValue(0);
System.out.println("word: "+myWord+" boltID:"+boltId);
}

How to configure Lucene (SOLR) internal caching - memory issue/leak?

I am using SOLR 4.4.0 - I found (possible) issue related to internal caching mechanism.
JVM: -Xmx=15g but 12g was never free.
I created heap dump and analyze it using MemoryAnyzer - I found 2 x 6Gb used as cache data.
In second time I do the same for -Xmx12g - I found 1 x 3.5Gb
It was always the same cache.
I check in source code and I found:
/** Expert: The cache used internally by sorting and range query classes. */
public static FieldCache DEFAULT = new FieldCacheImpl();
see http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.4.0/org/apache/lucene/search/FieldCache.java#FieldCache.0DEFAULT
This is very bad news because it is public static field and it is used in about 160 places in source code.
MemoryAnalyzer say:
One instance of "org.apache.lucene.search.FieldCacheImpl" loaded by
"org.apache.catalina.loader.WebappClassLoader # 0x58c3a9848" occupies
4,103,248,240 (80.37%) bytes. The memory is accumulated in one
instance of "java.util.HashMap$Entry[]" loaded by "".
Keywords java.util.HashMap$Entry[]
org.apache.catalina.loader.WebappClassLoader # 0x58c3a9848
org.apache.lucene.search.FieldCacheImpl
I do not know how to manage this kind of caches - any advice?
And finally I got OutOfMemoryError + 12Gb of memory is blocked.
I implemented kind of workaround:
I created this kind of class:
public class InternalApplicationCacheManager implements InternalApplicationCacheManagerMBean {
public synchronized int getInternalCacheSize() {
return FieldCache.DEFAULT.getCacheEntries().length;
}
public synchronized void purgeInternalCaches() {
FieldCache.DEFAULT.purgeAllCaches();
}
}
and registered it in JMX via org.apache.lucene.search.FieldCacheImpl
...
private synchronized void init() {
...
initBeans();
}
private void initBeans() {
try {
InternalApplicationCacheManager cacheManagerMBean = new InternalApplicationCacheManager();
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName name = new ObjectName("org.apache.lucene.search.jmx:type=InternalApplicationCacheManager");
mbs.registerMBean(cacheManagerMBean, name);
} catch (InstanceAlreadyExistsException e) {
...
}
}
...
This solution provide you invalidate internal caches - which solve partially this issue.
Unfortunately there are other places (mostly caches) where some data is stored and not removed as fast as I expect.
If you use FieldCacheRangeFilter you may wanna try range filters which work without field cache. If sorting is an issue, you may try using less sort fields or ones with a data type using less memory.
The field cache for each reader/atomic reader is thrown away when the reader is garbage collected. so a re-initialization of the reader should clear the cache which also means that the first operation using the cache will be a lot slower.
Fact is: FieldCache based range filter and sorting relies on the cache. There is no getting around when you really need those. You only can adapt your usage to minimize the memory consumption.

Resources