How does StatsD store its data? - statsd

I've been going through the README at https://github.com/etsy/statsd but I can't figure out how does StatsD store the data it gets?
Does it do any permanent storage or is it one off thing? I was trying to figure out what database (if any) it uses or if it simply uses a file-based storage.

Etsy's version of statsD does not store data per se but relies on "backends" to do something with the data it aggregates (e.g. print them out, send them to another statsD server or send them to graphite) as shown in https://github.com/etsy/statsd/tree/master/backends.
If you want permanent storage, you'll need to stand up a graphite server, use a hosted one or use a service that supports statsD natively (e.g. Datadog).
Disclosure: I work for Datadog.

Related

Pushing metrics data to Prometheus

I am configuring Prometheus to access Spring boot metrics data. For some of the metrics, Prometheus's pull mechanism is ok, but for some custom metrics I prefer push based mechanism.
Does Prometheus allow to push metrics data?
No.
Prometheus is very opinionated, and one of it's design decisions is to dis-allow push as a mechanism into Prometheus itself.
The way around this is to push into an intermediate store and allow Prometheus to scrape data from there. This isn't fun and there are considerations on how quickly you want to drain your data and how pass data into Prometheus with time-stamps -- I've had to override the Prometheus client library for this.
https://github.com/prometheus/pushgateway
Prometheus provides its own collector above which looks like it would be what you want but it has weird semantics around when it expires pushed metrics (it never does, only overwrites their value for a new datapoint with the same labels).
They make it very clear that they don't want it used for pushed metrics.
All in all, you can hack something together to get something close to push events.
But you're much better off embracing the pull model than fighting it.
Prometheus has added support for the push model recently. It is called as remote write receiver.
Link: https://prometheus.io/docs/prometheus/latest/querying/api/#remote-write-receiver
From what I understand it only accepts POST request with protocol buffers and snappy compression.
Prometheus doesn't support push model. If you need Prometheus-like monitoring system, which supports both pull model and push model, then try VictoriaMetrics - the project I work on:
It supports scraping of Prometheus metrics - see these docs.
It supports data ingestion (aka push model) in Prometheus text exposition format - see these docs.
It supports other popular data ingestion formats such as InfluxDB line protocol, Graphite, OpenTSDB, DataDog, CSV and JSON - see these docs.
Additionally to this VictoriaMetrics provides Prometheus querying API and PromQL-like query language - MetricsQL, so it can be used as a drop-in replacement for Prometheus in most cases.

Caching with Redis on a Parse Server

I'm on a 2.6.3 Parse Server and I need to cache the results of queries, to speed things up!
I understand that Parse Server offers a Redis adapter. What exactly do I have to do, in order to start using Redis? Are there any modules I should install? Anything I should import or configure?
Also, I found this on Parse's documentation:
Those cache adapters can be cleaned at anytime internally, you should not use them to cache data and you should let parse-server manage their data lifecycle.
What do they mean by saying you should not use them to cache data and you should let parse-server manage their data lifecycle.? Should I not use the adapter?
What the doc is saying is that parse caches with it's own in-memory structure by default, but it leaves developers the option to use reddis as a substitute. To opt for that, just (1) setup redis as you typically would, (2) initialize the parse server with a RedisCacheAdapter that's been configured with your redis URL.
The point you're asking about: "you should not use them to cache data ..." means that Parse will continue to decide when to cache, when to retrieve from cache, and when to clean, etc. but it will do so by invoking the redis that you configured with.
I think the major advantage to this more elaborate setup is redis's distributed capability. If you're not running on a cluster, you may find the redis idea to be about equivalent performance-wise and a little messier setup-wise as not doing it.

How can we get high availability in prometheus data store?

I am new to prometheus, and so I am not sure if high availability is part of Prometheus data store tsdb. I am not looking into something like having two prometheus server instances scraping data from the same exporter as that has high chance of having two tsdb data store which are out of sync.
It really depends on your requirements.
Do you need highly available alerting on your metrics? Prometheus can do that.
Do you need a highly available monitoring system that contains the last few hours of data for operational triage? Two prometheus instances are pretty good for that too.
Do you need long-term storage of timeseries data? Prometheus is not designed to accomplish this on its own. Either use the remote write functionality of prometheus to ship data to another TSDB that supports redundant storage (InfluxDB and Clickhouse are pretty promising here) but you are on the hook for de-duping data. Alternatively, consider Cortex.
For Kubernetes setup Using kube-prometheus (prometheus-operator), you can configure it using values.
and including thanos Would help in this situation
There is prometheus-postgresql-adapter that allows you to use PostgreSQL / TimescaleDB as a remote storage. The adapter enables multiple Prometheus instances (HA setup) to write to a single remote storage, so you have one source of truth. Recently, I've published a blog post about it [How to manage Prometheus high-availability with PostgreSQL + TimescaleDB] (https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5/).
Disclaimer: I am one of the engineers behind the adapter

What are differences between Beats and jdbc plugin?

I am a newbie in the ElasticSearch's wonderful world so please be indulgent.
I am thinking about an import and synchronisation strategy for a Microsoft sql data source and if I did not misunderstand, I can use the input plugins JDBC or Beats.
But I don't see what are the deeps differences between them,
what are their usefulness? When use one or other one?
What are their benefits and their drawbacks?
Thank you if you can help me
They serve different purposes. Beats is another offering of the Elastic Stack, which is basically a platform for collecting and shipping data (logs, network packets, any kind of metrics, protocol data, etc) from the periphery of your architecture. Even though Beats also allows you to listen on the MySQL protocol and collect all kinds of metrics from your DB, it has nothing to do with loading data from your DB and load it into Elasticsearch. For that you can use the jdbc input plugin whose job is mainly to run a given query on regular time intervals and send each retrieved DB record as event through the Logstash pipeline to be processed further and sent to a variety of different outputs.

sending value from cc3200 to my server using mqtt

How can I make my server to accept the data sent by cc3200 through mqtt protocol ?Made cc3200 to publish the values successfully to my server IP address but I don't know what should I do to make my server dump those incoming values into its database.Actually I use XAMPP for server functionalities.
any suggestion guys ?
Am using hivemq broker
If your primary goal is to have some telemetry data from CC3200 stored in the database, I would suggest that you take a look at this webinar. You can configure Kaa server to use one of multiple existing log appenders to publish your data to Spark, Cassandra, MongoDB, HDFS, Couchbase, etc. There are several major benefits of doing data collection with Kaa:
All of the data is structured end-to-end. You define telemetry data model in Kaa UI, which translates into Avro-compatible schemas, and generates object bindings in the Kaa SDK. Instead of writing boilerplate code for data marshalling, you just invoke SDK functions like this: kaa_logging_add_record(kaa_client_get_context(kaa_client)->log_collector, log_record); where log_record is a structure auto-generated by Kaa based on your data model. On the other end, in your analytics system, you receive structured data that you can immediately start processing and querying - no need for the custom interpretation code, it's auto-generated for you.
You can write to several destinations simultaneously: for example, save telemetry data into HDFS for warehousing, send to Spark for stream analytics, and push to your custom data processing/visualization service with REST. All of this is configurable by adding log appenders through the Kaa administrative UI.
Kaa takes care of the data delivery reliability and consistency. You can set up one or more reliable log appenders. It is not until all of the configured reliable appenders acknowledge a successful write that the client is instructed to remove the local data copy.
Kaa server is scalable and reliable out-of-the box. There is no single point of failure in the cluster. You can add more server capacity on the fly by spinning off more nodes. They would register against Zookeeper and the cluster would automatically rebalance the load. If there is a node failure, the clients automatically migrate to the remaining nodes.
Kaa is transport agnostic, so you can plug in pretty much any transport protocol implementation you like, including MQTT. The default protocol is similar to MQTT in the amount of overhead it introduces.
The integration instructions specifically for CC3200 are being prepared for the upcoming 0.8.0 release here.
Disclaimer: I work for a company behind Kaa open-source IoT platform.

Resources