influxdb creating a new measurement - snmp

new to Influxdb but liking it a lot
I've configured it gather metrics from snmp polled devices - primarily network nodes
I can happily graph the statistics polled using derived values but what I want to know
Is it possible to create a new measurement in influxdb from data already stored?
The use case is we poll network traffic and graph it by doing the derived difference between the current and last reading (grafana)
What I want to do is create a measurement that does that in the influxdb and stores it. This is primarily so I can setup monitoring of the new derived value using a simple query and alert if it drops below x.
I have a measurement snmp_rx / snmp_tx with host and port name with the polled ifHCInOctets and ifHCOutOctets
so can I do a process that continuously creates a new measurement for each showing the difference between current and last readings?
Thanks

Apparently influxdb feature you are looking for is called continuous queries :
A CQ is an InfluxQL query that the system runs automatically and
periodically within a database. InfluxDB stores the results of the CQ
in a specified measurement
It will allow you to automatically create and fill new octet rates measurements from raw ifHCInOctet/ifHCOutOctets counters you have using derivative function in select statement and configured group by time interval. You can also do some scaling in select expression (like bytes-to-bits, etc).

Related

Report queue size using prometheus client_ruby client

I'm trying to export the size of an internal queue. I don't want to maintain a gauge counter, using incr/decr. What I need is to retrieve the actual queue size at the scrap moment. Is that possible using the prometheus ruby client?
The short answer to your vague question would be probably yes. A look at the documentation shows:
Overview
a multi-dimensional data model with time series data identified by metric name and key/value pairs
a flexible query language to leverage this dimensionality
no reliance on distributed storage; single server nodes are autonomous
time series collection happens via a pull model over HTTP
pushing time series is supported via an intermediary gateway
targets are discovered via service discovery or static configuration
multiple modes of graphing and dashboarding support
Data Model
Prometheus fundamentally stores all data as time series: streams of
timestamped values belonging to the same metric and the same set of
labeled dimensions. Besides stored time series, Prometheus may
generate temporary derived time series as the result of queries.

Design an alarm system based on logs in Elasticsearch

How would you design a system which can generate alarms based on certain conditions on data stored on Elasticsearch?
I'm thinking of a system similar to AWS CloudWatch.
Proposed alarming system should be able to work under following conditions:
There could be thousands of users using this system to create alarms.
There could be thousands of alarms active at any given time.
Shouldn't have high impact on query performance.
Large volume of data.
Naive approach would be to apply all the alarm conditions when a new record is added to Elasticsearch or a service/lambda function executing all the alarm rules at a specified time interval but I really doubt a system like this can satisfy above all conditions.
You might be interested in learning more about the Alerts feature in X-Pack. It includes Watchers, which are essentially the query you want to monitor.
Take control of your alerts by viewing, creating, and managing all of
them from a single UI. Stay in the know with real-time updates on
which alerts are running and what actions were taken.
Documentation: https://www.elastic.co/guide/en/x-pack/current/xpack-alerting.html
Sales Page: https://www.elastic.co/products/x-pack/alerting

high volume data storage and processing

I am building a new application where I am expecting a high volume of geo location data something like a moving object sending geo coordinates every 5 seconds. This data needs to be stored in some database so that it can be used for tracking the moving object on a map anytime. So, I am expecting about 250 coordinates per moving object per route. And each object can run about 50 routes a day. and I have 900 such objects to track. SO, that brings to about 11.5 million geo coordinates to store per day. I have to store about one week of data at least in my database.
This data will be basically used for simple queries like find all the geocoordates for a particular object and a particular route. so, the query is not very complicated and this data will not be used for any analysis purpose.
SO, my question is should I just go with normal Oracle database like 12C distributed over two VMs or should I think about some big data technologies like NO SQL or hadoop?
One of the key requirement is to have high performance. Each query has to respond withing 1 second.
Since you know the volume of data (11.5 million) you can easily simulate the all your scenario in Oracle DB and test it well before.
My suggestions are you need to go for day level partitions & 2 sub partitions like objects & routs. All your business SQL has to hit right partitions always.
and also you might required to clear older days data. or Some sort of aggregation you can created with past days and delete your raw data would help.
its well doable 12C.

Riemann to InfluxDB event drop

I have a simple setup which uses filebeat and topbeat to forward data to Logstash, which further forwards it to Riemann, which in turn sends it to InfluxDB 0.9. I use Logstash to split an event into multiple events, all of which show up on Riemann logs (with the same timestamp). However, only one of these split events reaches my InfluxDB. Any help please?
In InfluxDB 0.9, a point is uniquely identified by the measurement name, full tag set, and the timestamp. If another point arrives later with identical measurement name, tag set, and timestamp, it will silently overwrite the previous point. This is intentional behavior.
Since your timestamps are identical and you're writing to the same measurement, you must ensure that your tag set differs for each point you want to record. Even something like fuzz=[1,2,3,4,5] will work to differentiate the points.

Neo4j - Using Java plugins to REST api to improve performance?

I am building an application that requires a lot of data constantly being extracted from a local MongoDB to be put into Neo4j. Seeing as I am also having many users access the Neo4j database, from both a Django webserver and other places, I decided on using the REST interface for Neo4j.
The problem I am having is that, even with batch insertion, the Neo4j server is active over 50% of the time with just trying the insert all the data from the mongoDB. As far as I can see there might be some waiting time because of the HTTP requests but I have been trying to tweak but have only gotten so far.
The question is, if I write a Java plugin (http://docs.neo4j.org/chunked/stable/server-plugins.html) that can handle inserting the mongoDB extractions directly, will I then go around the REST API? Or will the java plugin commands just convert to regular REST API requests? Furthermore, will there be a performance boost by using the plugin?
The last question is how do I optimize the speed of the REST API (So far I am performing around 1500 read/write operations which includes many "get_or_create_in_index" operations)? Is there a sweet spot where the number of queries appended to one HTTP requests will keep Neo4j busy until the next HTTP request arrives?
Update:
I am using Neo4j version 2.0
The data that I am extracting consists of bluetooth observations, where the phone that is running the app i created scans all nearby phones. This single observation is then saved as a document in MongoDB and consists of the users id, the time of the scan and a list of the phones/users that he has seen in that scan.
In Neo4j I model all the users as nodes and I also model an observation between two users as a node so that it will look like this:
(user1)-[observed]->(observation_node)-[observed]->(user2)
Furthermore I index all user nodes.
When moving the observation from mongoDB to Neo4j, I do the following for each document:
Check in the index if the user doing the scan already has a node assigned, else create one
Then for each observed user in the scan: A) Check in index if the observed user has a node else create one B) Create an observation node and relationships between the users and the observation node, if this doesn't already exist C) Make a relationship between the observation node and a timeline node (the timeline just consists of a tree of nodes so that I can quickly find observations at a certain time)
As it can be seen I am doing quite a few lookups in the user index (3), some normal read (2-3) and potentially many writes for each observation.
Each bluetooth scan average around 5-30 observations and I batch 100 scans in a single HTTP request. This means that each request usually contains 5000-10000 updates.
What version are you using?
The unmanaged extension would use the underlying Java-API so it much faster, also you can decide on the format & protocol of the data that you push to it.
It is sensible to batch writes, so that you don't incurr tx overhead per each tiny write. E.g. aggregating 10-50k updates in one operation helps a lot.
What is the concrete shape of the updates you do? Can you edit your question to reflect that?
Some resources for this:
http://maxdemarzi.com/2013/09/05/scaling-writes/
http://maxdemarzi.com/2013/12/31/the-power-of-open-source-software/

Resources