Micrometer metrics in multiple instances of springboot application - spring-boot

I have a custom micrometer metrics in spring boot application configured with Prometheus which scrapes the metrics every 15s.
The custom metrics is querying the db every 1 min. As I have 2 instances of this service running, both the instances tries to run the same query every 1 minute.
package com.test;
import com.entity.Foo;
import com.repo.FooRepository;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.function.Supplier;
#Component
public class MonitoringService {
private final MeterRegistry meterRegistry;
private final Gauge fooCount;
private final FooRepository<Foo> fooRepository;
#Autowired
public MonitoringService(final FooRepository<Foo> fooRepository,
final MeterRegistry meterRegistry) {
this.fooRepository = fooRepository;
this.meterRegistry = meterRegistry;
fooCount = Gauge.builder("foo_count", checkFooCount())
.description("Number of foo count")
.register(meterRegistry);
}
#Scheduled(fixedDelayString = "PT1M", initialDelayString = "PT1M")
public Supplier<Number> checkFooCount() {
return ()-> fooRepository.getTotalFooCount();
}
}
Is there anyway I can configure to run this metrics in any 1 instance of my spring boot application?

Depending of your runtime environment you could either use Quartz with a persistent job store and clustering option or use a single App to get this job done. For the latter you might want to use something like Kubernetes Jobs.
For Quartz see Configure Clustering with JDBC-JobStore.
For Kubernetes Jobs see Kubernetes documentation for Jobs.
Quartz can schedule the job on just one instance, while all instances are equally configured. The Kubernetes job would be a separate pod next to your application that does just this task. A kubernetes job would start, fetch the data and end itself. (You could set some delay to allow the data to be collected by prometheus before the pod stops.) Both job frameworks work with cron-like configuration.
But wouldn't it be a better solution to use database tools instead of some Java application for counting database rows:
With mysql-exporter, you'll get at least some table statistics, see info_schema_tablestats.go.
For custom queries see Have Prometheus send a SQL Query.

Sharing the metric's query result among the instances will most likely require your services to interact with a third party that'll store the latest result. That might be a special table in a DB, a distributed configuration server like etcd or Zookeeper or a distributed cache like Apache Ignite. Obviously anything of this will add to complexity of the system and require more effort to setup and maintain then just letting all the instances run a counting query.
On the other hand you can try to optimize this query itself if it's really slow. For example if insertions to the queried table are relatively seldom you could setup an aggregation on the DB side - with each write to this table also update the total count in a separate table. This way your services will run a simple query that returns a single result of the already prepared total count.

Related

Hazelcast persisting and loading data on all nodes

I have a 2 node setup distributed cache setup which needs persistence setup for both members.
I have MapSore and Maploader implemented and the same code is deployed on both nodes.
The MapStore and MapLoader work absolutely ok on a single member setup, but after another member joins, MapStore and Maploader continue to work on the first member and all insert or updates by the second member are persisted to disk via the first member.
My requirement is that each member should be able to persist to disk independently so that distributed cache is backed up on all members and not just the first member.
Is there a setting I can change to achieve this.
Here is my Hazlecast Spring Configuration.
#Bean
public HazelcastInstance hazelcastInstance(H2MapStorage h2mapStore) throws IOException{
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setImplementation(h2mapStore);
mapStoreConfig.setWriteDelaySeconds(0);
YamlConfigBuilder configBuilder=null;
if(new File(hazelcastConfiglocation).exists()) {
configBuilder = new YamlConfigBuilder(hazelcastConfiglocation);
}else {
configBuilder = new YamlConfigBuilder();
}
Config config = configBuilder.build();
config.setProperty("hazelcast.jmx", "true");
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.setMapStoreConfig(mapStoreConfig);
return Hazelcast.newHazelcastInstance(config);
}
Here is my hazlecast yml config - This is placed in /opt/hazlecast.yml which is picked up by my spring config up above.
hazelcast:
group:
name: tsystems
management-center:
enabled: false
url: http://localhost:8080/hazelcast-mancenter
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
- 0
join:
multicast:
enabled: false
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: true
member-list:
- 192.168.1.13
Entire code is available here :
[https://bitbucket.org/samrat_roy/hazelcasttest/src/master/][1]
This might just be bad luck and low data volumes, rather than an actual error.
On each node, try the running the localKeySet() method and printing the results.
This will tell you which keys are on which node in the cluster. The node that owns key "X" will invoke the map store for that key, even if the update was initiated by another node.
If you have low data volumes, it may not be a 50/50 data split. At an extreme, 2 data records in a 2-node cluster could have both data records on the same node.
If you have a 1,000 data records, it's pretty unlikely that they'll all be on the same node.
So the other thing to try is add more data and update all data, to see if both nodes participate.
Ok after struggling a lot I noticed a teeny tiny buy critical detail.
Datastore needs to be a centralized system that is accessible from all Hazelcast members. Persistence to a local file system is not supported.
This is absolutely in line with what I was observing
[https://docs.hazelcast.org/docs/latest/manual/html-single/#loading-and-storing-persistent-data]
However not be discouraged, I found out that I could use event listeners to do the same thing I needed to do.
#Component
public class HazelCastEntryListner
implements EntryAddedListener<String,Object>, EntryUpdatedListener<String,Object>, EntryRemovedListener<String,Object>,
EntryEvictedListener<String,Object>, EntryLoadedListener<String,Object>, MapEvictedListener, MapClearedListener {
#Autowired
#Lazy
private RequestDao requestDao;
I created this class and hooked it into the config as so
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.addEntryListenerConfig(new EntryListenerConfig(entryListner, false, true));
return Hazelcast.newHazelcastInstance(config);
This worked flawlessly, I am able to replicate data over to both the embedded databases on each node.
My use case was to cover HA failover edge-cases. During HA failover, The slave node needed to know the working memory of the active node.
I am not using hazelcast as a cache, rather I am using as a data syncing mechanism.

Spark streaming jobs duration in program

How do I get in my program (which is running the spark streaming job) the time taken for each rdd job.
for example
val streamrdd = KafkaUtils.createDirectStream[String, String, StringDecoder,StringDecoder](ssc, kafkaParams, topicsSet)
val processrdd = streamrdd.map(some operations...).savetoxyz
In the above code for each microbatch rdd the job is run for map and saveto operation.
I want to get the timetake for each streaming job. I can see the job in port 4040 UI, but want to get in the spark code itself.
Pardon if my question is not clear.
You can use the StreamingListener in you spark app. This interface provides a method onBatchComplete that can give you total time taken by the batch jobs.
context.addStreamingListener(new StatusListenerImpl());
StatusListenerImpl is the implementation class that you have to implement using StreamingListener.
There are more other methods also available in listener you should explore them as well.

How to have spring boot metrics export averages and absolute values to csv

I need to export counters and gauges to csv in order to process them later on. By using gradle I get the jars for codahale metrics 3.1.2:
compile('io.dropwizard.metrics:metrics-core:${metricsVersion}' )
compile('io.dropwizard.metrics:metrics-annotations:${metricsVersion}' )
For the csv export I created one reporter using these lines of code:
#Configuration
public class MetricsConfiguration {
#Autowired
MetricRegistry metricRegistry;
#Bean
public CsvReporter configureReporters() {
CsvReporter reporter = CsvReporter.forRegistry(metricRegistry).build(new File("C:/temp/metrics"));
reporter.start(5, TimeUnit.SECONDS);
return reporter;
}
}
I can see files are created and they contain a timestamp and value (in this example the values were set for a gauge):
1444137261,42.0
1444137266,42.0
1444137271,42.0
1444137276,1.0
1444137281,1.0
1444137286,1.0
The only problem with this is that the file repeats the last value set until it is overwritten by me using gaugeService.submit(). In this scenario I set a gauge value of 42.0 once, waited several minutes and then set a new value of 1.0.
This makes it difficult to parse the csv and create averages on my own or create histograms because I don't know if the 42.0 was submitted once or three times.
I had a look at these SO posts but they didn't help me solve my problem:
Exporting Spring Boot Actuator Metrics (& Dropwizard Metrics) to Statsd
how to get metrics from spring-boot-actuator programmatically?
All of this was based on a misunderstanding from my side: the metrics offered by Spring Boot only allow to use a gauge or counters. For both no averages are calculated.
In order to access what I need I have to gain access to the underlying Codahale metrics registry:
#Autowired
MetricRegistry metricRegistry;
Then I can start and stop a timer in my code:
Context timer = metricRegistry.timer("foo").time();
// do whatever needs to be timed
timer.stop();
These values will be exported by the CsvReporter in a file named timer.foo.csv which contains these values:
t,count,max,mean,min,stddev,p50,p75,p95,p98,p99,p999,mean_rate,m1_rate,m5_rate,m15_rate,rate_unit,duration_unit
1446036738,5,18,597084,14,172184,12,832516,2,145484,13,280135,13,445525,18,597084,18,597084,18,597084,18,597084,0,215907,0,494353,0,577132,0,592274,calls/second,milliseconds
1446036798,5,18,597084,14,172184,12,832516,2,145484,13,280135,13,445525,18,597084,18,597084,18,597084,18,597084,0,060092,0,181862,0,472516,0,554077,calls/second,milliseconds
1446036858,5,18,597084,14,172184,12,832516,2,145484,13,280135,13,445525,18,597084,18,597084,18,597084,18,597084,0,034923,0,066903,0,386863,0,518343,calls/second,milliseconds
1446036918,5,18,597084,14,172184,12,832516,2,145484,13,280135,13,445525,18,597084,18,597084,18,597084,18,597084,0,024578,0,024612,0,316737,0,484913,calls/second,milliseconds

Get Hbase region size via API

I am trying to write a balancer tool for Hbase which could balance regions across regionServers for a table by region count and/or region size (sum of storeFile sizes). I could not find any Hbase API class which returns the regions size or related info. I have already checked a few of the classes which could be used to get other table/region info, e.g. org.apache.hadoop.hbase.client.HTable and HBaseAdmin.
I am thinking, another way this could be implemented is by using one of the Hadoop classes which returns the size of the directories in the fileSystem, for e.g. org.apache.hadoop.fs.FileSystem lists the files under a particular HDFS path.
Any suggestions ?
I use this to do managed splits of regions, but, you could leverage it to load-balance on your own. I also load-balance myself to spread the regions ( of a given table ) evenly across our nodes so that MR jobs are evenly distributed.
Perhaps the code-snippet below is useful?
final HBaseAdmin admin = new HBaseAdmin(conf);
final ClusterStatus clusterStatus = admin.getClusterStatus();
for (ServerName serverName : clusterStatus.getServers()) {
final HServerLoad serverLoad = clusterStatus.getLoad(serverName);
for (Map.Entry<byte[], HServerLoad.RegionLoad> entry : serverLoad.getRegionsLoad().entrySet()) {
final String region = Bytes.toString(entry.getKey());
final HServerLoad.RegionLoad regionLoad = entry.getValue();
long storeFileSize = regionLoad.getStorefileSizeMB();
// other useful thing in regionLoad if you like
}
}
What's wrong with the default Load Balancer?
From the Wiki:
The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via hbase.balancer.period and defaults to 300000 (5 minutes).
If you really want to do it yourself you could indeed use the Hadoop API and more specifally, the FileStatus class. This class acts as an interface to represent the client side information for a file.

Run #Scheduled task only on one WebLogic cluster node?

We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099

Resources