Filtering Graphite metrics by server - metrics

I've recently done a lot of research into graphite with statsD instrumentation. With help of our developer operations team we managed to get multiple servers reporting metrics to graphite, and combine all the metrics. This is partially what we are looking for, however I want to filter the metric collection by server rather than having all the metrics be averaged together. The purpose of this is to monitor metrics collection on a per server basis, as many of our stats could also be used to visualize server uptime and performance. I haven't been able to find anything about how this may be achieved in my research, other than maybe some trickery with the aggregation rules.

You should include the server name as the first path component of the metric name being emitted. When naming metrics, Graphite separates the metric name into path components using . as the delimiter between path components. For example, you may want to use a naming schema like: <data_center>_<environment>_<role>_<node_id>.gauges.cpu.idle_pct This will cause each server to be listed as a separate category on http://graphite_hostname.com/dashboard/
If you need to perform aggregations across servers, you can do that at the graphite layer, or you could emit the same metric under two different names: one metric name that has the first path component as the server name, and one metric name that has the first path component as a value that is shared across all servers you want that metric aggregated across.

Related

Elastic Cloud | How and why to handle Kibana resources when creating deploy

I am wondering which would be the reasons/use-cases that would require to assign Kibana high RAM, CPU and zone resources.
I mean, it's clear what does this means for Elasticsearch component (I/O efficiency). But how these variables affect Kibana performance? What different use cases there are where these resources might/must be handled differently?
Thank you in advance.
Below are the some of the you can consider while creating Kibana instance:
Number of simultaneous connetions to Kibana
simultaneous editing request the dashboards
the number of accesses to the shared dashboard
high Reporting functionality (which requires CPU)
large reporting jobs / alert jobs
number of space you are creating
Basically, you need to consider what all functionality you are going to use from Kibana. If you have limited usage of kibana then you can go with either first or second option from list.

What are the differences between different Elastic data stream types?

Elastic docs mentions that Elastic data stream supports the following types: logs, metrics and synthetics. What are the differences between these types?
I tested storing some data as logs and metrics types separately and I don't see any difference when querying the data. Are both types interchangeable or are they stored differently?
Those are different types of data sets collected by the new Elastic Agent and Fleet integration:
The logs type is for logs data, i.e. what Filebeat used to send to Elasticsearch.
The metrics type is for metric data, i.e. what Metricbeat used to send to Elasticsearch
The synthetics type is for uptime and status check data, i.e. what Heartbeat used to send to Elasticsearch.
Now, with Fleet, all the Beats have been refactored into a single agent called Elastic Agent which can do all of that, so instead of having to install all the *Beats, you just need to install that agent and enable/disable/configure whatever type of data you want to gather and index into Elasticsearch. All of that through a nice, powerful and centralized Kibana UI.
Beats are now simply Elastic Agent modules that you can enable/disable and they will all write their data into indexes that follow a new taxonomy and naming scheme, which is based on those types, which are nothing more than a generic way describing the nature of data they contain, i.e. logs, metrics, synthetics, etc.

Should logs, metrics, and analytics all go to one data lake or be stored separately?

Background:
I am setting up my first elastic stack, and while I will be starting simple, I want to make sure I'm starting with good architecture. I would eventually like to have a solution for the following: hosting metrics, server logs (expressjs APM), single page app monitoring (APM RUM js agent), Redis metrics, MongoDB metrics, and custom event analytics (ie: sale, customer cancelled, etc).
Question:
Should I store all of this on one Elasticsearch cluster and use search to filter out the different cases, OR do I create a separate instance for each and keep them clearly defined to their roles.
(I would prefer the single data lake)
For logging use case:
you can store all the logs on a file system share before ingesting them into any search solution , so that you can re-ingest if needed
after storage , you can either ingest them into just one cluster with different indices , or to multiple clusters , its open choice , but it depends on the amount of data
if the size and compute of each justify a separatre ES cluster then do it , othervise , use a single cluster , with a failover cluster
For metrics:
you can directly ingest them into one cluster with different index patterns
if size and compute requirements justfies , make separate clusters
make a failover/backup cluster if needed
In both the cases , you will also need to store the cluster snapshots.
I personally recommend ELK for logging uses case , and Promethous for metrics.
Reporting/Analytics:
For some use cases like reporting/analytics on monthly and yearly basis , the log data will be huge , and you will need to ingest the data from the file share into hadoop to summerize it/ roll up based on some fields , and then , ingest the reduced data into ELK , this can reduce the size and compute requirements by 1000 factor.

Are there conventions for naming/organizing Elasticsearch indexes which store log data?

I'm in the process of setting up Elasticsearch and Kibana as a centralized logging platform in our office.
We have a number of custom utilities and plug-ins which I would like to track the usage of and if users are encountering any errors. Not to mention there are servers, and scheduled jobs I would like to keep track of as well.
So if I have a number of different sources for log data all going to the same elasticsearch cluster what are the conventions or best practices for how this is organized into indexes and document types?
The default index value used by Logstash is "logstash-%{+YYYY.MM.dd}". So it seems like it's best to suffix any index names with the current date, as this makes it easy to purge old data.
However, Kibana allows for adding multiple "index patterns" that can be selected from in the UI. Yet all the tutorials I've read only mention creating a single pattern like logstash-*.
How are multiple index patterns used in practice? Would I just give names for all the sources for my data? Such as:
BackupUtility-%{+YYYY.MM.dd}
UserTracker-%{+YYYY.MM.dd}
ApacheServer-%{+YYYY.MM.dd}
I'm using nLog in a number of my tools which has an elastic search target. The convention for nLog and other similar logging frameworks is to have a "logger" for each class in the source code. Should these logger translate to indexes in elastic search?
MyCompany.CustomTool.FooClass-%{+YYYY.MM.dd}
MyCompany.CustomTool.BarClass-%{+YYYY.MM.dd}
MyCompany.OtherTool.BazClass-%{+YYYY.MM.dd}
Or is this too granular for elasticsearch index names, and it would be better to stick to just to a single dated index for the application?
CustomTool-%{+YYYY.MM.dd}
In my environment we're working through a similar question. We have a mix of system logs, metric alerts from Prometheus, and application logs from both client and server applications. In addition, we have some shared variables between the client and server apps that let us correlate the two (e.g., we know what server logs match some operation on the client that made requests to said server). We're experimenting with the following scheme to help Kibana answer questions for us:
logs-system-{date}
logs-iis-{date}
logs-prometheus-{date}
logs-app-{applicationName}-{date}
Where:
{applicationName} is the unique name of some application we wrote (these could be client or server side)
{date} is whatever date-based scheme you use for indexes
This way we can set up Kibana searches against logs-app-* and quickly search for logs among any of our applications. This is still new for us, but we started without this type of scheme and are already regretting it. It makes searching for correlated logs across applications much harder than it should be.
In my company we have worked lot about this topic. We agree the following convention:
Customer
-- Product
--- Application
---- Date
In any case, it is neccesary to review both how the data is organized and how the data is consulted inside the organization
Kind Regards
Dario Rodriguez
I am not aware of such conventions, but for my environment, we used to create two different type of indexes logstash-* and logstash-shortlived-*depending on the severity level. In my case, I create index pattern logstash-* as it will satisfy both kind of indices.
As these indices will be stored at Elasticsearch and Kibana will read them, I guess it should give you the options of creating the indices of different patterns.
Give it a try on your local machine. Why don't you try logstash-XYZ if you want more granularity otherwise you can always create indices with your custom name.

AWS cloudwatch dynamically list metrics and there infomation

Right, so I am trying to get a list of metric_names for a particular namespace (I'd rather it be for an object, but I'm working with what I've got) using AWS Ruby sdk, and cloudwatch has the list_metrics function, awesome!..
Except that list_metrics doesn't return what unit's and statistics a metric supports which is a bit stupid as you need both to request data from a metric.
If you're trying to dynamically build a list of metrics per namespace (which I am) you won't know what unit's or statistics a particular metric might support without knowing about the metrics before hand which makes using list_metrics to dynamically get a list of metrics pointless.
How do I get around this so I can build a hash in the correct format containing the metrics for any namespace without knowing anything about a metric before hand except for the hash structure.
Also why is there not a query for what metrics an object (dynamo,elb,etc) has?
It seems a logical thing to have because a metric does not exist for an object unless it's actually spat out data for that metric at least once (so I've been told); which means even if you have a list of all the metrics a namespace supports, it doesn't mean that an object within the namespace will have those metrics.
CloudWatch is a very general-purpose tool, with a generic schema for all metric data in the MetricDatum structure. But individual metrics have no schema other than the data sent in practice.
So there is no object for Dynamo, EC2, etc. that projects what metrics might be sent. There is only metric data that has already been sent with a particular namespace. Amazon CloudWatch Namespaces, Dimensions, and Metrics Reference documents the metric schema for many or all of the metrics AWS services capture. I know that's not what you wanted.
You can query any CloudWatch metric support by any of the statistics tracked by CloudWatch (SampleCount, Minimum, Maximum, Average, and Sum). CloudWatch requires that incoming metric data either include all statistics or with raw values that allow the statistics to be calculated.
I don't know of any way to get the units other than to query the data and look through what is returned.

Resources