Telegraf to Prometheus is seemingly incorrectly identifying some labels as metrics - label

BACKGROUND
I have some time series data that looks like this:
1671407650806101433,GROUPSTATS,group_id=0,last_rx=1671407650694827472,last_rx_local=18:54:10.694827472,time_since_last_rx=111273960,desc=00AL.R24097.-,exch=mdx,pid=12345
Where GROUPSTATS is the name of the type, group_id is either 0 or 1 and last_rx, last_rx_local and time_since_last_rx are the data sets that I care about and descriptor is the channel (many per pid).
Telegraf obtains this data which is then scraped by prometheus and results in the following metrics and labels:
metrics:
GROUPSTATS_group_id
GROUPSTATS_last_rx
GROUPSTATS_pid
GROUPSTATS_time
GROUPSTATS_time_since_last_rx
labels:
descriptor
exch
last_rx_local
type
instance (telegraf data)
I then visualise this data in grafana. Telegraf -> prometheus -> grafana.
QUESTION
Why are some of those fields arbitrarily taken as labels and some as metrics? I.e. why is pid a metric but exch a label?
Is there anyway to convert metrics to labels?
I want to be able to average time_since_last_rx on a host and show it by group_id, or average time_since_last_rx across a pid. At the moment I can't filter by group_id as its a separate metric.
telegraf config, is below. someBashScripts removes the keys from the csv:
commands = [
"someBashScript.sh"
]
data_format = "csv"
csv_header_row_count = 0
csv_column_names = ["time", "type", "group_id", "last_rx", "local_last_rx", "time_since_last_rx", "descriptor", "pid", "exch"]
csv_measurement_column = "type"
csv_timestamp_column = "time"
csv_timestamp_format = "unix_ns"

Related

Receive Gatling results in InfluxDB v2

I have a basic Gatling script on EC2 instance from which I want to push the results into an Influx database instance. I can successfully run a Gatling script and Influx is also running.
My Gatling configuration is the following:
data {
writers = [console, graphite] # The list of DataWriters to which Gatling write simulation data (currently supported : console, file, graphite)
console {
#light = false # When set to true, displays a light version without detailed request stats
#writePeriod = 5 # Write interval, in seconds
}
file {
#bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
light = false # only send the all* stats
host = "ec2-35-181-26-79.eu-west-3.compute.amazonaws.com" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # Internal data buffer size, in bytes
writePeriod = 1 # Write period, in seconds
}
And for Influx, I've setup a Telegraf with the following configuration
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## urls exp: http://127.0.0.1:8086
urls = ["http://ec2-35-181-26-79.eu-west-3.compute.amazonaws.com:8086"]
## Token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "Test"
## Destination bucket to write into.
bucket = "Test"
[[inputs.socket_listener]]
## URL to listen on
service_address = "tcp://:2003"
data_format = "graphite"
## Content encoding for message payloads, can be set to "gzip" to or
## "identity" to apply no encoding.
# content_encoding = "identity"
templates = [
"gatling.*.*.*.* measurement.simulation.request.status.field",
"gatling.*.users.*.* measurement.simulation.measurement.request.field"
]
With both Telegraf (with this configuration) and Influx running, I don't see any data pushed into the 'Test' bucket. Moreover I don't get any errors that could help me debugging.
Any help would be much appreciated. Thanks.

How to query google analytics api using the google api ruby gem?

The documentation of the google api ruby client lacks of practical examples, it only documents the classes and methods, so it's very hard to guess how should we use the gem in real life. For example, I'm trying to obtain all purchases from enhanced ecommerce to see where they came from (Acquisition Channel or Channel Grouping), but im only interested on transactions that took 5 sessions to convert the transaction ( our unconvinced clients).
First you will need your analytics view_id, can be obtained in the url at the end, after the letter p
Then you need to export the route to the credentials:
In your terminal:
export GOOGLE_APPLICATION_CREDENTIALS = 'folder/yourproject-a91723dsa8974.json'
For more info about credentials see google-auth-gem documentation
After setting this, you can query the api like this
require 'googleauth'
require 'google/apis/analyticsreporting_v4'
scopes = ['https://www.googleapis.com/auth/analytics']
date_from = 10.days.ago
date_to = 2.days.ago
authorization = Google::Auth.get_application_default(scopes)
analytics = Google::Apis::AnalyticsreportingV4::AnalyticsReportingService.new
analytics.authorization = authorization
view_id = '189761131'
date_range = Google::Apis::AnalyticsreportingV4::DateRange.new(start_date: date_from.strftime('%Y-%m-%d'), end_date: date_to.strftime('%Y-%m-%d'))
metric = Google::Apis::AnalyticsreportingV4::Metric.new(expression: 'ga:transactions')
transaction_id_dimension = Google::Apis::AnalyticsreportingV4::Dimension.new(name: 'ga:transactionID')
adquisition_dimension = Google::Apis::AnalyticsreportingV4::Dimension.new(name: 'ga:channelGrouping')
filters = 'ga:sessionsToTransaction==5'
request = Google::Apis::AnalyticsreportingV4::GetReportsRequest.new(
report_requests: [Google::Apis::AnalyticsreportingV4::ReportRequest.new(
view_id: view_id,
metrics: [metric],
dimensions: [transaction_id_dimension, adquisition_dimension],
date_ranges: [date_range],
filters_expression: filters
)]
)
response = analytics.batch_get_reports(request)
response.reports.first.data.rows.each do |row|
dimensions = row.dimensions
puts "TransactionID: #{dimensions[0]} - Channel: #{dimensions[1]}"
end
note filters_expression: filters
Where filters variable is in the form of ga:medium==cpc,ga:medium==organic;ga:source==bing,ga:source==google
Where commas (,) mean OR and semicolons (;) mean AND (where OR takes precedence over AND)
you can check the query explorer to play around with filters.
Here is filters documentation
If the report brings more than 1000 rows (default max rows), a next_page_token attribute will appear.
response.reports.first.next_page_token
=> "1000"
You will have to store that number to use it in the next ReportRequest
next_request = Google::Apis::AnalyticsreportingV4::GetReportsRequest.new(
report_requests: [Google::Apis::AnalyticsreportingV4::ReportRequest.new(
view_id: view_id,
metrics: [metric],
dimensions: [transaction_id_dimension, adquisition_dimension],
date_ranges: [date_range],
filters_expression: filters,
page_token: "1000"
)]
)
until
next_response.reports.first.next_page_toke
=> nil
Alternatively you can change the default page size of the report request by adding
page_size: 10_000 for example.

How do I setup ELK HTTP beat to send delta feed to logstash

I have setup HTTP beat that polls an endpoint and stashes the data to logstash. I could see the events using a specific index on Kibana. The interval has been set to every 5 seconds.
However I want the ELK stack to capture data that are changed over time (delta feed) rather than all the data in every 5 seconds. Is there any way to do it?
My HTTPbeat.yml looks like this :
httpbeat:
hosts:
# Each - Host endpoints to call. Below are the host endpoint specific configurations
-
# Optional cron expression, defines when to poll the host endpoint.
# Default is every 1 minute.
schedule: '#every 5s'
# The URL endpoint to call by Httpbeat
url: #sample end point
method: get
basic_auth:
# Basic authentication username
username:
# Basic authentication password
password:
output_format: json
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["localhost: 5400"]
The logstash.conf is as follows :
input {
# Accept input from the console.
beats {
port => "5400"
}
}
filter {
# Add filter here. This sample has a blank filter.
}
output {
# Output to the console.
stdout {
codec => "json"
}
elasticsearch {
hosts => [ "localhost:9200" ]
index => "test_jira2_idx"
}
}
If I hit the endpoint through postman, the JSON body is around 1.08 MB size. But while monitoring the Kibana index : test_jira2_idx, it is already 340MB in 1 hour and is constantly increasing.
This may be due to the fact that HTTP beat is constantly polling the same data every 5 seconds. Can anyone suggest the various alternatives to implement delta feed extract in ELK ?
I had a look into the HTTP_POLLER input for logstash - unsure if that can help.
Note : I am very new to ELK
just a suggestion, generate hash for response and every time check if the the hash is changed from previous response than only store it. (also add timestamp with stored document).

How to send jmeter test results to datadog?

I wanted to ask if anyone has ever saved jmeter test results (sampler names, duration, pass/fail) to Datadog? Kinda like the backend listener for influx/graphite... but for Datadog. Jmeter-plugins has no such plugin. Datadog seems to offer something called "JMX integration" but I'm not sure whether that is what I need.
I figured out how to do this using the datadog api https://docs.datadoghq.com/api/?lang=python#post-timeseries-points. The following python script takes in the jtl file (jmeter results) and posts the transaction name, response time, and status (pass/fail) to datadog.
#!/usr/bin/env python3
import sys
import pandas as pd
from datadog import initialize, api
options = {
'api_key': '<API_KEY>',
'app_key': '<APPLICATION_KEY>'
}
metrics = []
def get_current_metric(timestamp, label, elapsed, success):
metric = {}
metric.update({'metric': 'jmeter'})
metric.update({'points': [(timestamp, elapsed)]})
curtags = {}
curtags.update({'testcase': label})
curtags.update({'success': success})
metric.update({'tags': curtags})
return metric
initialize(**options)
jtl_file = sys.argv[1]
df = pd.read_csv(jtl_file)
for index, row in df.iterrows():
timestamp = row['timeStamp']/1000
label = row['label']
elapsed = row['elapsed']
success = str(row['success'])
metric = get_current_metric(timestamp, label, elapsed, success)
metrics.append(metric)
api.Metric.send(metrics)

auto scaling spins up multiple compute nodes in cfncluster

when I try to run a single job 'hellojob.sh' in the cfncluster multiple compute nodes spun up. It is a very simple job.please find my script below.
hellojob.sh
#!/bin/bash
sleep 30
echo "Hello World from $(hostname)"
can anyone please tell me how to avoid autoscaling spin up to multiple compute nodes.
please find my config file below:
[root#ip-00-00-0-1000 .cfncluster]# cat config
[aws]
aws_region_name = us-east-1
aws_access_key_id = ***************
aws_secret_access_key = *******************
[cluster default]
vpc_settings = testdev-dev
key_name = testdev-developers
initial_queue_size = 0
s3_read_write_resource =*
pre_install = s3://cfncluster/pre_install_script.sh
[vpc testdev-dev]
master_subnet_id = subnet-*****
vpc_id = vpc-*****
additional_sg=sg-*****
vpc_security_group_id =sg-*****
use_public_ips=false
[global]
update_check = true
sanity_check = true
cluster_template = default
[scaling]
scaling_cooldown = 2000
You should be able to solve this by including the scaling_threshold (and optionally scaling_threshold2) parameters in your scaling configuration. The scaling_threshold parameter determines the number of instances to add when an autoscaling ScaleUp event is triggered. See a snippet of my config below as an example
## Scaling settings
#[scaling custom]
# Threshold for triggering CloudWatch ScaleUp action
# (defaults to 4 for default template)
scaling_threshold = 1
# Number of instances to add when called CloudWatch ScaleUp action
# (defaults to 2 for default template)
scaling_adjustment = 1
# Threshold for triggering CloudWatch ScaleUp2 action
# (defaults to 4 for default template)
scaling_threshold2 = 10
# Number of instances to add when called CloudWatch ScaleUp2 action
# (defaults to 20 for default template)
scaling_adjustment2 = 2

Resources