How to select the closest data to the given time for each group

How to select the closest data to the given time for each group - go

I'm using InfluxDB 1.4, and here's my task
1) find the closet value for each IDs.
2) Do 1) for every hour
For example,
select id, value, time from myTable where time = '2018-08-14T00:00:00Z' group by id;
select id, value, time from myTable where time = '2018-08-14T01:00:00Z' group by id;
....
select id, value, time from myTable where time = '2018-08-14T23:00:00Z' group by id;
then, some id have value at each o'clock but others don't. In this case, I want to get the closest row to the give time '2018-08-14T14:00:00Z', like as '2018-08-14T14:00:01Z' or '2018-08-14T13:59:59Z'
and I don't want to query 24 times for each hour. Can I do this task with group by time, id, or something else?

Q: I would like to select the point data closest to the hourly boundary. Is there a way I can do this without having to query 24 times for each day? Will group by time be any help on this?
A:
Will group by time be any help on this?
Unfortunately the group by time function will not be much help to you as it requires the query to have an aggregation function. What the group by time function does is that it groups all data that falls within the interval into one single record by using the aggregation function like sum, mean etc to tabulate the combined row's values.
Is there a way I can do this without having to query 24 times for each
day?
To the best of my knowledge, I don't think influxdb 1.5 has any way to build a one liner query for this task. Maybe there is something in 1.6, i'm not sure. Haven't tried that.
At the moment I think your best solution today is to build a query that uses the time filter, order by and limit functions e.g.
select * from uv where time >= '2018-08-18T14:00:00Z' and time < '2018-08-18T15:00:00Z' order by desc limit 1;
The query above means that you are selecting all the points within 2pm to 3pm and then order them by descending order but only return the first row, which is what you want.
If for some reason you can only do 1 HTTP request to influxdb for the hourly data on a particular day. You can bundle up the 24 queries into one big query using the ; seperator and retrieve the data in 1 transaction. E.g.
select * from uv where time >= '2018-08-18T14:00:00Z' and time < '2018-08-18T15:00:00Z' order by desc limit 1; select * from uv where time >= '2018-08-18T15:00:00Z' and time < '2018-08-18T16:00:00Z' order by desc limit 1; select * from uv where time >= '2018-08-18T16:00:00Z' and time < '2018-08-18T17:00:00Z' order by desc limit 1;
Output:
name: uv
time tag1 id value
---- -------- -- -----
1534603500000000000 apple uv 2
1534607100000000000 apple uv 1
1534610700000000000 apple uv 3.1

Related

Frequency Histogram in Clickhouse with unique and non unique data

I have a event table with created_at(DateTime), userid(String), eventid(String) column. Here userid can be repetitive while eventid is always unique uuid.
I am looking to build both unique and non unique frequency histogram.
This is for both eventid and userid on basis of given three input
start_datetime
end_datetime and
interval (1 min, 1 hr, 1 day, 7 day, 1 month).
Here, bucket will be decided by (end_datetime - start_datetime)/interval.
Output comes as start_datetime, end_datetime and frequency.
For any interval, if data is not available then start_datetime and end_datetime comes but with frequency as 0.
How can I build a generic query for this?
I looked in histogram function but could not find any documentation for this. While trying it, i could not understand relation behind the input and output.

count(distinct XXX) is deprecated.
More useful uniq(XXX) or uniqExact(XXX)

I got it work using following. Here, toStartOfMonth can be changed to other similar functions in CH.
select toStartOfMonth(`timestamp`) interval_data , count(distinct uid) count_data
from g94157d29.event1
where `timestamp` >= toDateTime('2018-11-01 00:00:00') and `timestamp` <= toDateTime('2018-12-31 00:00:00')
GROUP BY interval_data;
and
select toStartOfMonth(`timestamp`) interval_data , count(*) count_data
from g94157d29.event1
where `timestamp` >= toDateTime('2018-11-01 00:00:00') and `timestamp` <= toDateTime('2018-12-31 00:00:00')
GROUP BY interval_data;
But performance is very low for >2 billion records each month in event table where toYYYYMM(timestamp) is partition and toYYYYMMDD(timestamp) is order by.
Distinct count query takes > 30GB of space and 30 sec of time. Yet didn't complete.
While, General count query takes 10-20 sec to complete.

Impala : Running sum of 1 hour

I want to count records of each ID with in 1 Hour. I tried out some IMPALA queries but without any luck.
I have input data as follows:
And expected output would be:
I tried :
select
concat(month,'/',day,'/',year,' ',hour,':',minute) time, id,
count(1) over(partition by id order by concat(month,'/',day,'/',year,' ',hour,':',minute) range between '1 hour' PRECEDING AND CURRENT ROW) request
from rt_request
where
concat(year,month,day,hour) >= '2019020318'
group by id, concat(month,'/',day,'/',year,' ',hour,':',minute)
But I got exception.
RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
Any suggestion/help would be appreciated.
Thank you in advance!

I think you are looking for counts for the same hour across days for a given id. You can simply use row_number to do this.
select time,id,row_number() over(partition by id,hour order by concat(month,'/',day,'/',year,' ',hour,':',minute)) as total
from tbl

clickhouse - how get count datetime per 1minute or 1day ,

I have a table in Clickhouse. for keep statistics and metrics.
and structure is:
datetime|metric_name|metric_value
I want to keep statistics and limit number of accesses in 1 minute, 1 hour, 1 day and so on. So I need event counts in last minute, hour or day for every metric_name and I want to prepare statistics in a chart.
I do not know how to make a query. I get the count of metrics statistics based on the exact for example 1 minute, 1 hour, 1 day and so on.
I used to work on inflxdb:
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND time >= now() - 1h GROUP BY time(5m) fill(0)
In fact, I want to get the number of each metric per 5 minutes in the previous 1 hour.
I do not know how to use aggregations for this problem

ClickHouse has functions for generating Date/DateTime group buckets such as toStartOfWeek, toStartOfHour, toStartOfFiveMinute. You can also use intDiv function to manually divide value ranges. However the fill feature is still in the roadmap.
For example, you can rewrite the influx sql without the fill in ClickHouse like this,
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND
time >= now() - 1h GROUP BY toStartOfFiveMinute(time)
You can also refer to this discussion https://github.com/yandex/ClickHouse/issues/379
update
There is a timeSlots function that can help generating empty buckets. Here is a working example
SELECT
slot,
metric_value_sum
FROM
(
SELECT
toStartOfFiveMinute(datetime) AS slot,
SUM(metric_value) AS metric_value_sum
FROM metrics
WHERE (metric_name = 'k1') AND (datetime >= (now() - toIntervalHour(1)))
GROUP BY slot
)
ANY RIGHT JOIN
(
SELECT arrayJoin(timeSlots(now() - toIntervalHour(1), toUInt32(3600), 300)) AS slot
) USING (slot)

OBIEE using the same folder/fact twice aggregating on both

I know the exact SQL I would need to write to retrieve the results I'm looking for from the Oracle BI tool, however, as I am new to Oracle BI I am struggling to find a way to reproduce the same results. I realize that the ultimate answer largely depends on the BI data model and that takes a lot more communication than a question on Stack Overflow will allow, so I'm looking for more generic how-to answers than a specific definitive answer for my scenario.
Perhaps the SQL will help for starters:
select "All"."DT", ("LessThan5Mins"."Count" / "All"."Count") * 100
from
(
select to_char(m."EndDateTime", 'YYYY-MM') "DT", count(*) "Count"
from "Measurement" m,
"DwellTimeMeasurement" dtm
where dtm."MeasurementBase_id" = m."Id"
group by to_char(m."EndDateTime", 'YYYY-MM')
) "All",
(
select to_char(m."EndDateTime", 'YYYY-MM') "DT", count(*) "Count"
from "Measurement" m,
"DwellTimeMeasurement" dtm
where dtm."MeasurementBase_id" = m."Id"
and m."MeasValue" <= 300
group by to_char(m."EndDateTime", 'YYYY-MM')
) "LessThan5Mins"
where "All"."DT" = "LessThan5Mins"."DT";
The purpose of this is to return the percentage of dwell time records that were less than or equal to 5 mins (300 seconds).
I have a fact that represents the "MeasValue" field in the above query.
All attempts I've made to reproduce the dual result set nature of the above query in BI have failed.
Is the above possible in OBIEE and if so, how might I achieve this?

I'm assuming that you have imported the Measurement (M) and DwellTimeMeasurement (DTM) tables into the physical layer of the RPD, specified the join on DTM.MeasurementBase_id = M.Id, and then brought them both through to the presentation layer.
If so, then you could start building this query in Answers on the criteria tab by dragging in M.EndDateTime and any OBIEE measure column from DTM, for example DTM.Amount. Edit the formula for the DTM.Amount column:
Filter the column by clicking the filter button shown in blue below.
In the following dialog box double click on M.MeasValue and then select "is less than or equal to" and type 300 in the Value text box. Click OK twice and your column formula should now look something like this:
FILTER(DTM.Amount USING (M.MeasValue <= 300))
Now wrap this with COUNT():
COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300)))
This will give the count of records with M.MeasValue <= 300. You could rename this column to be "LessThan5Mins". Click OK to save the new formula. Now drag in the DTM.Amount column again but this time only perform a COUNT():
COUNT(DTM.Amount)
This will give you the count of all dwell time records. You could rename this to "All". Finally drag in the DTM.Amount column one last time and edit it's formula again. This is where you will calculate the percentage with a formula similar to the following:
COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300))) / COUNT(DTM.Amount) * 100
So ultimately you will have four columns with the following titles and formulas:
TITLE FORMULA
----- --------
EndDateTime M.EndDateTime
LessThan5Mins COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300)))
All COUNT(DTM.Amount)
% LessThan5Mins COUNT(FILTER(DTM.Amount USING (M.MeasValue <= 300))) / COUNT(DTM.Amount) * 100
Note that including the EndDateTime column takes care of grouping the records. Also, to match your original query you would only need the EndDateTime and % LessThan5Mins columns (you could hide or exclude the other columns) but I wanted to demonstrate for you the process of filtering column values in OBIEE.

How can I limit the numbers of results being grouped in my Group By in Oracle?

I've got a table of a parameters, values, and times at which those values were recorded.
I've got a procedure which takes in a time, and needs to get the average result of each parameters value in the window of time that is -15/+5 seconds around that time frame. On top of that, I want to make sure that I take the no more than 15 records before the passed in time, and no more than 5 records after it.
For example, maybe I'm recording values of some parameters every second. If I passed in the time 21:30:30, I'd want to get the values between 21:30:15 and 21:30:35. But if I was recording every half second, I'd actually have more parameters that fit in that time frame than I want, and that's where my need to limit my results comes in.
I've read this question and this article which seem pretty related to what I'm trying to do, but unfortunately I'm dealing with Oracle and not MySQL, so I can't use "limit".
I've currently got something that looks like this:
std_values as
(
select
V.ParameterId,
V.NumericValue,
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime + 5/86400
)
select
ParameterId,
avg(NumericValue) as NumericValue
from
std_values
group by
ParameterId
pValueSource is just something that lets me filter down which value types I'm looking at, and pSummaryTime is the input time that I'm basing my time frame around. The goal here is to get the 15 records before pSummaryTime that falls within that window, and the 5 after that falls within that window, and use those for the average. Currently I'm not limiting the number of "before" and "after" results though, so I'm ending up with the average of everything that falls into that time window. And without something like "limit", I'm not sure how to do this in Oracle.

Sounds like you want a moving window aggregate function. This is part of the Analytical functions feature of Oracle.
It's not my strong suit, and since you didn't include sample tables/data to build a test case, I'll just point you to the Oracle documentation, here:
http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm#i1006709
You probably want something like:
AVG(NumericValue) over (order by pSummaryTime RANGE BETWEEN 15 PRECEDING AND 5 FOLLOWING)
but, like I said, not my strong suit, and totally untested, but, I hope it gets the idea across.
Hope that helps.

Thanks to Mark Bobak's answer getting me on the right track, I ended up with this solution.
with
values_before as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime
),
values_after as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time <= pSummaryTime + 5/86400
and V.Time > pSummaryTime
),
values_all as
(
select * from values_before where RowNumber <= 15
union all
select * from values_after where RowNumber <= 5
)
select ParameterId, avg(NumericValue) from values_all group by ParameterId
No doubt there's a better way to do this, but it at least seems to be giving the correct result. The key was using an analytical function to set the row number and order for the 15 before and 5 after, and then filtering my results down to just those.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio