Alert on Absent Data for Combined Metric in GCP Monitoring - google-cloud-logging

I have created an alert policy in GCP MOnitoring which will notify me when a certain kind of log message stops appearing (a dead man's switch). I have create a logs-based metric with a label, "client", which I use to group the metric and get a timeseries per client. I have been using "absence of data" as the trigger for the alert. This has all been working well, until...
After a recent change, the logs now also com from different resources, so there is a need to combine the metric across those resources. I can achieve this using QML
{ fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m
; fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m }
| union
Notice that I need to align the two series with the same bucket size (30m) to be able to join them, which makes sense. I notice that the value for a timeseries is "undefined" in those buckets where the metric data was absent (by downloading a CSV of the query).
To create an alert using this query, I tried something like this:
{ fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m
; fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m }
| union
| absent_for 1h
If I look at the CSV output for this query it doesn't reflect the absence of metric data for a timeseries, and this is presumably because a value of "undefined" doesn't qualify as absent data.
Is there a way to detect for absence of data for a "unioned" metric (and therefore aligned) across multiple resources?
Update 1
I have tried this, which seems to get me some of the way there. I'd really appreciate comments on this approach.
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
}
| union
| absent_for 1h

I have settled on a solution as follows,
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client]
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client]
}
| union
| absent_for 1h
| every 30m
Note:
group_by [metric.client] conforms the tables from different resource, which allows the union to work
absent_for does align input timeseries using the default period or one specified by a following every
I found it really hard to debug these MQL queries, in particular to confirm that absent_for was going to trigger an alert. I realised that I could use value [active] to show a plot of the active column (which absent_for produces) and that gave me confidence that my alert was actually going to work.
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client]
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client]
}
| union
| absent_for 1h
| value [active]

Related

Kusto\KQL - Render timechart for simple count value

I have a kql-query which calculates number of uploaded BLOBS in Azure storage since last 24 hours.
The query blow returns a number as expected when run in Azure log analytics.
StorageBlobLogs
| where TimeGenerated > ago(1d) and OperationName has "PutBlob" and StatusText contains "success" a
| distinct Uri
| summarize count()
I want now to visualise this information in a timechart to get some detailed view. Have tried to add "render timechart" to the query chain as follows
StorageBlobLogs
| where TimeGenerated > ago(1d) and OperationName has "PutBlob" and StatusText contains "success" a
| distinct Uri
| summarize count()
| render timechart
When executing the query however, i am getting the error message;
Failed to create visualization
The Stacked bar chart can't be created as you are missing a column of one of the following types: int, long, decimal or real
Any tips to how this can be accomplished?
if you wish to look at the data aggregated at an hourly resolution (for example) and rendered as a timechart, you could try this:
StorageBlobLogs
| where TimeGenerated > ago(1d) and OperationName has "PutBlob" and StatusText contains "success"
| summarize dcount(Uri) by bin(TimeGenerated, 1h)
| render timechart

Crate.io: Facets for search?

Does https://crate.io support facets (for faceted search)?
I didn't find anything in the docs. ElasticSearch replaced facets with aggregations in 2014, but the aggregation section in the crate docs only talks about SQL aggregation functions.
My use case:
I've got a list of web sites, each record has a domain and a language field. When displaying the search results, I want to get a list of all domains that the search results appear in, as well a list of all languages, ordered by number of occurences so search results can be narrowed down. The number of results for those single facet values shall also be given.
Screenshot with facets:
There is no way to get the facets I want from crate itself.
Instead we're enabling the ElasticSearch REST API in crate.yml now
es.api.enabled: true
.. and can use the ElasticSearch aggregation API.
Crate doesn't support facets or Elasticsearch aggregations directly. Like you suggested, you can always turn on the Elasticsearch API. However, there are other ways to get these aggregations.
1) Have you considered to issue multiple queries to the cluster? For example, if you load your page dynamically with Javascript, you can first return the search results and load the facets later. This should also decrease the overall response time of the application.
2) In CrateDB 2.1.x, there will be support for subqueries, which allow you to include the facets within your query:
select q1.id, q1.domain, q1.tag, q2.d_count, q3.t_count from websites q1,
(select domain, count(*) as d_count from websites where text like '%query%' group by domain) q2,
(select tag, count(*) as t_count from websites where text like '%query%' group by tag) q3
where q1.domain = q2.domain and q1.tag = q3.tag and q1.text like '%query%'
order by q1.id
limit 5;
This gives you a result table like this where you have the search results alongside with the domain and tag count for the query:
+----+--------------+-----------+---------+-----------+
| id | domain | tag | d_count | t_count |
+----+--------------+-------------+---------+---------+
| 1 | example.com | example | 2 | 3 |
| 14 | crate.io | software | 1 | 4 |
| 17 | google.com | search | 5 | 2 |
| 29 | github.com | open-source | 3 | 3 |
| 47 | linux.org | software | 2 | 4 |
+----+--------------+-------------+---------+---------+
Disclaimer: I'm new to Crate :)

SSRS Reporting - count number of group rows

I am trying to the count of number of groups in my report I know I could do it in the SQL however trying to avoid adding redundant data to my dataset if I can.
I have a MainDataSet that could have multiple entries per distinct group item. All I want is the no. of groups not the count of items within the group.
For example words starting with alphabet letters, lets say I have 2 groups A and B only (NB: number of groups can change dynamically as I filter the MainDataSet based on user parameter selection):
Group | Data
------|-----
A | Apple
A | Ant
B | Balloon
B | Book
B | Bowl
Final Result:
Group | Index | NGroups
A | 1 | 2
B | 2 | 2
I know I can get the Index using a aggregate function as follows:
RunningValue(Fields!Group.Value, CountDistinct, "TablixName")
But how do I get the NGroups value?
I guess I could also create another dataset based on the MainDataSet (make use of a sql function) and do:
SELECT 'X' AS GroupCount, COUNT(Distinct Group) AS NGroups
FROM dbo.udf_MainDataSet()
WHERE FieldX = #Parameter1
Then use a LookUp:
Lookup("X", Fields!GroupCount.Value, Fields!NGroups.Value, "NewDataSet")
But is there a simple solution that I am not seeing?
CountDistinct(Fields!Group.Value, "TablixName")

SELECT ... LIMIT 1 query results in more than one row?

I noticed that LIMIT queries will return more than the expected number of rows when they are executed against tables that contain nested or repeated data. For example, the following query run against the persons sample data set from the developer guide produces the following results:
% bq query 'SELECT fullName, children.name FROM [persons.person] LIMIT 1'
+----------+---------------+
| fullName | children_name |
+----------+---------------+
| John Doe | Jane |
| John Doe | John |
+----------+---------------+
It looks like BQL is applying the LIMIT operator before flattening the results as opposed to the other way around (which I think would make more sense).
Is this a bug in the BQL implementation or is this the expected behavior? If this is the expected behavior can someone please provide an explanation for why this makes sense?
This is expected given the way BigQuery flattens query results. When you run the query, the LIMIT 1 applies to the repeated record. Then the results get flattened in the output, and you get two rows. A workaround is to use an explicit flatten operation. For example:
SELECT fullName, children.name
FROM (FLATTEN([persons.person], children.name) LIMIT 1
This will return only a single row.

Dynamic database design for variable length spatio-temporal data in Oracle (need a schema design)

Currently I am working on a research project, where I need to store spatio-temporal data and analyze them efficiently. I am giving the exact requirement below.
The research is going on meteorological data, so the data attributes are temperature, humidity, pressure, wind-speed, wind-direction etc. The number of attributes is previously unknown to us, depending on requirement we may need to add more attributes (Table having dynamic attribute and different datatype nature). Again the data is captured from various locations, from various height and in a certain time duration as well as time interval.
So, what should be the best way to design a schema for the requirement? We must have to find out relation efficiently.
The purpose of the project is not only to store database, also need to manipulate the data.
Sample data in table format -
location | time | height | pressure | temperature | wind-direction | ...
L1 | 2011-12-18 08:04:02 | 7 | 1009.6 | 28.3 | east | ...
L1 | 2011-12-18 08:04:02 | 15 | 1008.6 | 27.9 | east | ...
L1 | 2011-12-18 08:04:02 | 27 | 1007.4 | 27.4 | east | ...
L1 | 2011-12-18 08:04:04 | 7 | 1010.2 | 28.4 | north-east | ...
L1 | 2011-12-18 08:04:04 | 15 | 1009.4 | 28.2 | north-east | ...
L1 | 2011-12-18 08:04:04 | 27 | 1008.9 | 27.6 | north-east | ...
L2 | 2011-12-18 08:04:02 | ..... so on
Here I need to design a schema for the above sample data where Location is a spatial location that can be implemented using oracle MDSYS.SDO_GEOMETRY type.
Constraints are:
The no of attributes (table column) is unknown during development. In runtime any new attribute(let say - humidity, refractive index etc.) can be added. So we can't design attribute specific table schema.
    1.1) for this constraint I thought to use a schema like -
           tbl_attributes(attr_id_pk, attr_name, attr_type);          
tbl_data(loc, time, attr_id_fk, value);
     The my design the attribute value must be varchar type, and as required I thought to cast (not a good idea at all).
     But finding relational data with this schema is very difficult using SQL query only. For example I want to find -
          1.1.1) avg pressure for location L1 when wind direction is east and temperature in between 27-28
         1.1.2) locations, where pressure is maximum at 15 height.
     1.2) I am also thinking to edit table schema during runtime, which is again not a good idea I think.
We will use a loader application, which will be taking care of this dynamic insertion depending on the schema (what ever it maybe).
Need to retrieve statistical data efficiently as some example is given above [1.1.*].
I am not completely sure I understand what you mean when you say that
The no of attributes (table column) is unknown during development. In
runtime any new attribute(let say - humidity, refractive index etc)
can be added.
first of all, I suppose that this is not really happening at random: i.e. when you get a new bunch of data from the field you know (before importing) that these have an extra dimension or two. Correct?
Also, the fact that in this new data batch you get "refractive index" will not make the older data magically acquire a proper value for this dimension.
Therefore I would go for a classical Object-to-RDBMS mapping where you have:
a header table with things that exist for every measurement: i.e. time and space, possibly the source (i.e. lab, sensor, team which provided the data) and an autogenerated key.
one or more detail table where the values are defined as proper fields.
Example:
Header
location | time | height | source |Key |
L1 | 2011-12-18 08:04:02 | 7 | team-1 | 002020013 |
L1 | 2011-12-18 08:04:02 | 15 | team-1 | 002020017 |
L1 | 2011-12-18 08:04:02 | 27 | Lab-X | 002020018 |
L1 | 2011-12-18 08:04:04 | 7 | Lab-Y | 002020021 |
L1 | 2011-12-18 08:04:04 | 15 | Lab-X | 002020112 |
Atmospheric data (basic)
Key | pressure | temp | wind-dir |
002020013 | 1009.6 | 28.3 | east |
002020017 | 1019.3 | 29.2 | east |
002020018 | 1011.6 | 26.9 | east |
Light-sensor data
Key | refractive-ind | albedo | Ultraviolet |
002020017 | 79.6 | .37865 | 7.0E-34 |
002020018 | 67.4 | .85955 | 6.5E-34 |
002020021 | 91.6 | .98494 | 8.1E-34 |
In other words: every different set of data will use one or more subtables (these you can add "dynamically", if needed) and you can still create queries by standard means, you will just have to join subtables (where possible: i.e. if you want to analyze by Wind Directions AND refractive index, you can - but only when you have set of data which have both values) by using the reference keys to keep these consistent).
I believe this more efficient than using text fields with CSV inside, or data blobs or using a key-values associations.
I would definitely go with 1.2 (edit table schema during runtime), at least to begin with. Any sufficiently advanced configuration is indistinguishable from programming; don't think you can magically avoid making changes to your program.
Don't be scared of alter table. Yes, the upfront costs are higher - you may need a process (not just a program) to ensure your schema stays clean. And there are some potential locking problems (that have solutions). But if you do it right you only have to pay the price once for each change.
With a completely generic solution you will pay a small price with every query. Your queries will be complicated, slower, ugly, and more likely to fail. You can never write a query like select avg(value) ..., it may or may not work, depending on how the data is accessed. You can use a PL/SQL function to catch exceptions, or use inline views and hints to force a specific access pattern. Either way, your queries are more complicated and slower, and you have to make sure that everybody understands these problems before they use the data.
And with a generic solution the optimizer will suck because it knows nothing about your data. Oracle can't predict how many rows will be returned by where attr_name = 'temperature' and is_number(value) = 28.4. But it can make a very good guess for where temperature = 28.4. You may have significantly more bad plans (i.e. slow queries) with generic columns.
Thank you for the quick response and good guidance. I have gotten some concepts from the both answers and decided to go with a mix model. I don't know whether I am in the write path or not. I want comments on the model. Below I am describing the complete conceptual model with MySQL code snippet.
Conceptual model
For dynamicity - (no of column is not defined previously) I have created 4 tables as follows -
geolocation(locid int, name varchar, geometry spatial_type) - to store information of a particular location, may be defined with spatial feature.
met_loc_event(loceventid int, locid* int, record_time timestamp, height float) - this is to identify a perticular event in a place with sudden height.
metfeatures(featureid int, name varchar, type varchar) - to store feature (ie. Column) details with a data type, that type field will help to cast data as required.
metstore(loceventid* int, featureid* int, value varchar) - to store an atom value for a feature at a particular time.
Up to that part I design a column orientation to store a dynamic nature of table. But as you suggest this is not a good design for quering (some will not work like arithmetic functions) the database. This is also not good if we consider performance.
For efficient query needs (to avoid to much joining and to avoid casting value during query) - I extend the model with some helper view, I write store procedure to generate views from the stored database.
First I created views for each feature (by taking value from feature table, so no of entry will be no of feature view initially) with the help of met_loc_event, metfeatures and metstore tables. These views store locid, record_time, height, and caste value according to feature type
Next from these views, I created a row oriented view named metrelview - which consist of all relation data row wise as like normal table. I have planned to fire query to the view, so the query performance will be improved.
This view generation procedure needs to execute whenever any insert, update or delete operation will be there in features table.
Below is the MySQL procedure that I have developed for the view generation
CREATE PROCEDURE `buildModel`()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE fid INTEGER;
DECLARE fname VARCHAR(45);
DECLARE ftype VARCHAR(45);
DECLARE cur_fatures CURSOR FOR SELECT `featureid`, `name`, `type` FROM `metfeatures`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET #viewAlias = 'v_';
SET #metRelView = "metrelview";
SET #stmtCols = "";
SET #stmtJoin = "";
START TRANSACTION;
OPEN cur_fatures;
read_loop: LOOP
FETCH cur_fatures INTO fid, fname, ftype;
IF done THEN
LEAVE read_loop;
END IF;
IF fname IS NOT NULL THEN
SET #featureView = CONCAT(#viewAlias, LOWER(fname));
IF ftype = 'float' THEN
SET #featureCastStr = "`value`+0.0";
ELSEIF ftype = 'int' THEN
SET #featureCastStr = "CAST(`value` AS SIGNED)";
ELSE
SET #featureCastStr = "`value`";
END IF;
SET #stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", #featureView, "`");
SET #stmtCreateView = CONCAT("CREATE VIEW `", #featureView, "` AS SELECT le.`loceventid` AS loceventid, le.`locid`, le.`rectime`, le.`height`, ", #featureCastStr, " AS value FROM `metlocevent` le JOIN `metstore` ms ON (le.`loceventid`=ms.`loceventid`) WHERE ms.`featureid`=", fid);
PREPARE stmt FROM #stmtDeleteView;
EXECUTE stmt;
PREPARE stmt FROM #stmtCreateView;
EXECUTE stmt;
SET #stmtCols = CONCAT(#stmtCols, ", ", #featureView, ".`value` AS ", #featureView);
SET #stmtJoin = CONCAT(#stmtJoin, " ", "LEFT JOIN ", #featureView, " ON (le.`loceventid`=", #featureView,".`loceventid`)");
END IF;
END LOOP;
SET #stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", #metRelView, "`");
SET #stmtCreateView = CONCAT("CREATE VIEW `", #metRelView, "` AS SELECT le.`loceventid`, le.`locid`, le.`rectime`, le.`height`", #stmtCols, " FROM `metlocevent` le", #stmtJoin);
PREPARE stmt FROM #stmtDeleteView;
EXECUTE stmt;
PREPARE stmt FROM #stmtCreateView;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
CLOSE cur_fatures;
COMMIT;
END;
N.B. - I tried to call the procedure with any event in features table, so that every thing should be automated. But as MySQL is not supported dynamic query with function or trigger, I cant do it automatically
I also want criticism before i finalize as accepted model, I am not a DBA so, if you can help me how to improve performance for the model will be very helpful for me.
This sounds like a homework assignment whose underlying subject is: use-cases for abandoning strict normal-form design principles.
The solution to this conundrum is to develop a three-stage solution. Stage 1 is runtime adaptability using the flexible AttributeType, AttributeValue approach, so that rapidly incoming data can be captured and put somewhere temporarily in a quasi-structured manner. Stage 2 involves the analysis of that runtime data to see where the model must be extended with additional columns and validation tables to accommodate any new attributes. Stage 3 is the importing of the as-yet-unimported data into the revised model, which never relaxes its strict datatyping and declarative referential integrity constraints.
As they say: Life, friends, is a trade-off.

Resources