Where does OpenTelemetry store it's collected data? - open-telemetry

Since OpenTelemetry documentation is quite young it doesn't properly elaborate where the data is stored and in which format and if we can manipulate it with a 3rd party.
To avoid diving into the source code in github and trying to figure it out by reading functions I figured I'd ask here.
Does anyone have any idea where does it store it's data and in which format?
Reading the data via Prometheus it shows to me in JSON format, but I'm not certain if it is either Prometheus who transforms it into JSON or if it is the OpenTelemetry that does it.
Thanks in advance!

OpenTelemetry doesn't store any data (traces, metrics, logs). It has concept of exporters, where data are exported to user selected data (trace, metric, log) storage (3rd party). OpenTelemetry is "middle layer", where you can switch to another storage easily.

Related

How do I instrument my code for Splunk metrics?

I'm brand new to Splunk, having worked exclusively with Prometheus before. The one obvious thing I can't see from looking at the Splunk website is how in my code, I create/expose a metric... if I must provide an HTTP endpoint for consumption, or call into some API to push values, etc. Further, I cannot see which languages Splunk provide libraries for, in order to aid instrumentation - I cannot see where all this low level stuff is documented!
Can anyone help me understand how Splunk works, particularly how it compares to Prometheus?
Usually, programs write their normal log files and Splunk ingests those files so they can be searched and data extracted.
There are other ways to get data into Splunk, though. See https://dev.splunk.com/enterprise/reference for the SDKs available in a few languages.
You could write your metrics to collectd and then send them to Splunk. See https://splunkonbigdata.com/2020/05/09/metrics-data-collection-via-collectd-part-2/
You could write your metrics directly to Splunk using their HTTP Event Collector (HEC). See https://dev.splunk.com/enterprise/docs/devtools/httpeventcollector/

Pushing metrics data to Prometheus

I am configuring Prometheus to access Spring boot metrics data. For some of the metrics, Prometheus's pull mechanism is ok, but for some custom metrics I prefer push based mechanism.
Does Prometheus allow to push metrics data?
No.
Prometheus is very opinionated, and one of it's design decisions is to dis-allow push as a mechanism into Prometheus itself.
The way around this is to push into an intermediate store and allow Prometheus to scrape data from there. This isn't fun and there are considerations on how quickly you want to drain your data and how pass data into Prometheus with time-stamps -- I've had to override the Prometheus client library for this.
https://github.com/prometheus/pushgateway
Prometheus provides its own collector above which looks like it would be what you want but it has weird semantics around when it expires pushed metrics (it never does, only overwrites their value for a new datapoint with the same labels).
They make it very clear that they don't want it used for pushed metrics.
All in all, you can hack something together to get something close to push events.
But you're much better off embracing the pull model than fighting it.
Prometheus has added support for the push model recently. It is called as remote write receiver.
Link: https://prometheus.io/docs/prometheus/latest/querying/api/#remote-write-receiver
From what I understand it only accepts POST request with protocol buffers and snappy compression.
Prometheus doesn't support push model. If you need Prometheus-like monitoring system, which supports both pull model and push model, then try VictoriaMetrics - the project I work on:
It supports scraping of Prometheus metrics - see these docs.
It supports data ingestion (aka push model) in Prometheus text exposition format - see these docs.
It supports other popular data ingestion formats such as InfluxDB line protocol, Graphite, OpenTSDB, DataDog, CSV and JSON - see these docs.
Additionally to this VictoriaMetrics provides Prometheus querying API and PromQL-like query language - MetricsQL, so it can be used as a drop-in replacement for Prometheus in most cases.

Feasibilty analysis of data transformation using any ETL tool

I don't have any experience on any ETL tool. However I want to know if it is possible to do the followings using any ETL tool or we need to write a java or any other batch job to do this:
Scenario 1:
The source system has different REST APIs. I need to get the data, transform it, then store the data in a MongoDB.
The hardest part is the transformation. There can be situation where I need to call a REST API of source, and based on its data I need to call several other REST APIs using the 1st API data. After that we need to format the entire data in different format and store it in Mongo.
Scenario 2:
The source system has a DB. I need to transform the data using my custom logic and store it in MongoDB.
Here the custom logic can include things like this:
From table1 of source I created collection1. After that I need to consult table2 and previously created collection1, process the data and then create collection2.
Is this possible using any ETL tool? If possible then which tool? If possible please mention in as short as possible, how it can be done using different terminology so that I can search internet, learn things and implement it.
Briefly speaking: yes, that is what ETL tools are exactly for. You can Extract data from REST sources, Transform using sophisticated logic and Load to target, like MongoDB.
Exact implementation depends on the tool. While I guess you will get help if you run across problems implementing the solution in any of the tools, I don't think anyone will prepare complete, detailed solutions for you.

How do I get from "Big Data" to a webpage?

I've spent a lot of time reading and watching videos of people talking about how they use tools designed for handling huge datasets and real-time processing in their architectures. And while I understand what it is that tools like Hadoop/Cassandra/Kafka etc do, no one seems to explain how the data gets from these large processing tools to rendering something on a client/webpage.
From what I understand of big data tools, is that you can't build your application the same way you would a standard web-app querying MySQL, which I can understand given the size of the data that flows through these tools, however, for all this talk of "realtime data analytics" I cannot find any explanation of how the actual analytics gets put in front of someone in terms of some chart/table/etc?
explain how the data gets from these large processing tools to rendering something on a client/webpage.
With respect to this, one way would be to process the big data using Spark or Hadoop and store the results onto a RDBMS. Then have your webapp pull data from RDBMS to render charts, table etc. I can provide you the examples that I have done myself if you need more information.
Impala supports ODBC/JDBC interfaces. So, you actually could hook up a web app to it the same way you do with MySQL.
Other stuff you might want to check out is HBase, Kudu or Solr. In some realtime architectures data ends up in one of those. And all of them have some sort of an API that you can use in your web app to access their data.
If you want a simple solution for realtime data processing and analytics, check out the new Stride API, which enables developers to collect, process, and analyze streaming data and then either visualize summary data in Stride or push processed data out to applications in realtime. This is a very easy way to build the kind of realtime reporting dashboards and monitoring / alerting systems you described above.
Take a look at the Stride API technical docs for examples and more info on how to implement this.

web analytics software which can analyze existing log archives too

I am looking for a web analytic solution which can also help me to analyze my existing log files. we are moving from sawmill to other solutions.. explored Google Urchin and it has some limitations on analyzing custom existing logs.
Currently exploring webtrends, but i am not sure if it supports custom log analysis
any ideas??
WebTrends will not support custom log formats. Maybe the best way with WebTrends would be to use the Data Collector and/or the API from the Data Collector. Or you put your specific informations within the URL as the Data Collector would do. Here is the link to the API: http://product.webtrends.com/dcapi/sw/index.html

Resources