OpenNMS REST Api: Retrieving performance stats using /measurements - opennms

I am fairly new to OpenNMS. Currently I am in the process of creating a simplified dashboard for all the nodes that I have configured. I plan on monitoring the cpu utilization and the memory availability on all the nodes. To do this, I am using the REST Api of OpenNMS. For the cpu utilization, I am able to get the stats using the following url :
http://localhost:8980/opennms/rest/measurements/node%5B212%5D.nodeSnmp%5B%5D/cpuPercentBusy?start=1501158186498&maxrows=30
From https://wiki.opennms.org/wiki/ReST#Measurements_API, I was able to get the basic url as :
/measurements/{resourceId}/{attribute}
However, I can't seem to determine the url for getting the memory available on the various disk drives (C:\,D:\,etc...)
How do I get the memory utilization and other performance metrics for a node?

After digging through the official wiki page for OpennNMS, managed to find the appropriate REST query.
In my case for C:\ drive, it was :
http://localhost:8980/opennms/rest/measurements/node%5B212%5D.hrStorageIndex[C]/hrStorageSize?start=1501158186498
I managed to get other stats in a similar manner.
Links to documentation I refered :
https://wiki.opennms.org/wiki/Data_Collection_Configuration_How-To
https://wiki.opennms.org/wiki/ReST#Measurements_API

Related

Panels to have in Kibana dashboard for troubleshooting applications

what are some good panels to have in kibana visualisation for developers to troubleshoot issues in applications? I am trying to create a dashboard that developers could use to pinpoint where the app is having issues. So that they could resolve it. These are a few factors that I have considered :
Cpu usage of pod, memory usage of pod, network in and out, application logs are the ones I have got in mind. Any other panels I could add to so that developers could get an idea where to check if something goes wrong in the app.
For example, application slowness could be because of high cpu consumption, app goes down could because OOM kill, request takes longer could be due to latency or cache issues etc Is there any other thing that I could take into consideration if yes please suggest?
So here a few things that we could add are:
Number of pods, deployments,daemonsets,statefulsets present in the cluster
cpu utilised by the pod(pod wise breakdown)
memory utilised by the pod(pod wise breakdown)
Network in/out
5.top memory/cpu consuming pods and nodes
Latency
persistence disk details
error logs as annotations in tsvb
Logstreams to check logs within dashboard.

How to increase resource allocation to ravendb

I'm trying to process a document and store many documents into ravendb which I have running locally.
I'm getting the error
Tried to send *ravendb.BatchCommand request via POST http://127.0.0.1:8080/databases/mydb/bulk_docs to all configured nodes in the topology, all of them seem to be down or not responding. I've tried to access the following nodes: http://127.0.0.1:8080
I was able to fetch mydb topology from http://127.0.0.1:8080.
Fetched topology: ( url: http://127.0.0.1:8080, clusterTag: A, serverRole: Member)
exit status 1
To me, it sounds like maybe my local cluster is running out of compute to process the large amount of data I'm trying to store.
RavenDB says I'm using 3 of 12 available cores, and I'd also like to make sure it's using a reasonable amount of the ram I have available on the machine (I'd even be happy with giving it a swap)
But reading around online, I'm not finding much helpful information for making sure RavenDB is able to use what it needs. I found the settings.json so I can add in configurations which theoretically should get included into the server but I'm not making much progress.
I also found some settings and changed "reassign cores" to 12 but it says that still 3/12 are being used and 6/31.1 GB of memory are being used.
If an alternative solution is recommended I'm all ears. I just need to run things locally and storing everything as json's doesn't enable fast enough retrieval for my usecase.
Update
I was able to install mongodb and setup a local database. It hasn't given me any problems yet. RavenDB looks appealing if I understood it better but I guess I'll stick with the tried and true for this project.
It is highly unlikely that you managed to run out of resources on the server with 3 cores / 6 GB unless you are pushing hundreds of millions of documents and doing very heavy work.
Do you get any error on the server? There should be more details on the error or in the server log.

How to Monitor Resource Utilization?

Is there a tool which logs the system resource utilization like cpu,memory,io and network for a period of time and generate graph ?
I need to monitor system and identify in which period resource is been highly utilized.
If anyone of you had experience with this kind of tool,kindly suggest.
Thanks in Advance.
Besides third party tools, there is Windows Performance Monitor that can help. It shows real-time graphs, and can save the performance information into files that you can open and analyze later
It provides multiple metrics for CPU, memory, I/O and Network utilization, and shows an instance for each processor on the machine. It can also be used to monitor remote machines
You can also create collector sets, to have all monitored counters in a single component
Performance Monitoring Getting Started Guide
Create a Data Collector Set to Monitor Performance Counters
I think this tool will help you
System-Resources-Monitoring
System Monitoring

Best ways to diagnose elasticsearch issues?

The question is a little broad, but I feel there is no one place that helps systematically diagnose elastic search issues. The broad categories could be :
Client
Query errors
Incorrect Query Results
Unexplained behaviors
Server
Setup issues
Performance issues
Critical errors
Unexplained behaviors
Example for 1)a) would be to say, log the query string on the server ( reference to how to enable logging would be nice), install the inquistor plugin (link to github) and run the query string yourself. etc.
Your question is very broad and to be honest I am not sure I can fully answer it, however I will tell you how we monitor and manage our cluster.
1 - We log query logs and slow query logs to graylog2 (it uses es under the hood) so we can easily see, report, and alert on all logging from our cluster. We can also view slow queries that have occurred.
2 - we send es stats to statsd and then graph that information in graphite. This way we can see things like cluster state, query counts, indexing counts, jvm stats, disk i/o, etc. All parsed from the es stats api and sent to statsd
3 - we use fabric scripts to deploy/upgrade the cluster and manage plugin installation
4 - we use jenkins and jmeter to run occasional performance tests against the cluster (are we getting slower over time, does the cluster deployment work?)
5 - we use bigdesk and head plugins to keep an eye on the cluster and explore how it is doing.

Measuring CPU load for specific IIS websites on same server

I have a IIS server running on Windows Server 2003. The server hosts multiple websites.
Occasionally the CPU load peaks in long durations of time, such that the server stops responding or responds with lag.
The problem is that we don't know which of the multiple websites is creating the high load - I have tried looking around in Performance Monitor for counters which could be used, but I don't see anything about CPU load for specific IIS instances.
This makes it quite hard to find the root of the problem.
For each application pool there is w3wp.exe process. So try to set each web application in a different application pool. By the way this is always a good practice.
Then run the following script.
Then you can see which web application is creating the high CPU load via the Task Manager or the Performance Monitor.
Have you tried checking the Performance counters for Garbage Collection 2 spikes (# Gen 2 Collections)?
Periods of very high CPU load are often attributable to this.
EDIT: This SO answer might be useful: What are the best ASP.NET performance counters to monitor?
This blog post describes collecting and interpreting GC Performance Counters: http://blogs.msdn.com/maoni/archive/2004/06/03/148029.aspx
Using WMI try :
To get process usage (W2K3/2K8) :
"SELECT IDProcess, PercentPrivilegedTime, PercentProcessorTime, PercentUserTime FROM Win32_PerfFormattedData_PerfProc_Process where Name='w3wp'"
To identify your site use this :
"SELECT ProcessId, CommandLine, WorkingSetSize, ThreadCount, PrivatePageCount, PageFileUsage, PageFaults, HandleCount, CreationDate, Caption FROM Win32_Process where Caption='w3wp.exe'"
Use this tool for test sql : http://code.msdn.microsoft.com/NitoWMI
Good luck.

Resources