Oracle - determine and return the specfic hour of data with the highest sum of the values - oracle

I think I can do this in a more roundabout way using arrays, scripting, etc...BUT is it possible to sum up (aggregate) all the values for each "hour" of data in a database for a given field? Basically, I am trying to determine which hour in a day's worth of data had the highest sum...preferably without having to loop through 24 times for each day I want to look at. For example...let's say I have a table called "table", that contains columns for times and values as the follows:
Time Value
00:00 1
00:15 1
00:30 2
00:45 2
01:00 1
01:15 1
01:30 1
01:45 1
If I summed up by hand, I would get the following
Sum for 00 Hour = 6
Sum for 01 Hour = 4
So, in this example 00 Hour would be my "largest sum" hour. I'd like to end up returning simply which hour had the highest sum, and what that value was...the other hours don't matter in this case.
Can this be done all in a single ORACLE query, or does it need to be done outside the query with some scripting and working with the times and values separately? If not a single, maybe even just grab the sum for each hour, and I can run multiple queries - one for each hour? Then push each hour to an array, and just use the max of that array? I know there is a SUM() function in oracle, but how to tell it to "sum all the hours and just return the hour with the highest sum" escapes me. Hope all this makes sense. lol
Thanks for any advice to make this easier. :-)

The following query should do what you are looking for:
SELECT SUBSTR(time, 1, 2) AS HOUR,
SUM(amount) AS TOTAL_AMOUNT
FROM test_data
GROUP BY SUBSTR(time, 1, 2)
ORDER BY TOTAL_AMOUNT DESC
FETCH FIRST ROW WITH TIES;
The query uses the SUM function but grouping by the hour part of your time column. Then it orders the results by the summed amounts descending, only returning the maximum value.
Here is a DBFiddle showing the query in use (LINK)

Related

GoogleSheet formula for calculating time duration between two groups of time stamps

Looking for a formula that will calculate groups of time-based on a day if there is more than an hour between the two groups. If it is less than one hour leave as is.
For example, there are a total of 3hrs and 10 min of time stamps on Thursday 33min (Blue) + 2hrs 36min (Gray) = 3:hrs and 10min total duration.
In the table above I would like the Start Time, End Time (which already have the MIN and MAX calculation, and the total amount of timestamps for each group. I will have 60,000 records that I need this formula for.
Maybe not the best solution, but gets you where you want.
I wonder if someone could do the whole task in one formula ..keen to
learn how)
DIY approach: create 2 tables as helpers:
Spreadsheet demo: HERE
Table 1 (for mornings timings):
=query(arrayformula({A2:A,C2:C,timevalue(C2:C)}),"Select dayofweek(Col1),count(Col1),min(Col2),max(Col2),max(Col3)-min(Col3)
where Col1 is not null and Col2 < timeofday '12:00:00' group by dayofweek(Col1)
label dayofweek(Col1) 'Evenings',count(Col1) 'Record',min(Col2) 'Start Time',max(Col2) 'End Time',max(Col3)-min(Col3) 'Total Duration'")
Output Table 1:
Table 2 (for evenings timings):
=query(arrayformula({A2:A,C2:C,timevalue(C2:C)}),"Select dayofweek(Col1),count(Col1),min(Col2),max(Col2),max(Col3)-min(Col3)
where Col1 is not null and Col2 > timeofday '12:00:00' group by dayofweek(Col1)
label dayofweek(Col1) 'Evenings',count(Col1) 'Record',min(Col2) 'Start Time',max(Col2) 'End Time',max(Col3)-min(Col3) 'Total Duration'")
Output Table 2:
Then on your initial table you combine the results of both tables using vlookup:
Records:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:G20,2,0)+VLOOKUP(F4:F8,F25:G31,2,0)))
Start Time:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:H20,3,0)))
End Time:
=arrayformula(iferror(VLOOKUP(F4:F8,F25:I31,4,0)))
Total Duration:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:J20,5,0)+VLOOKUP(F4:F8,F25:J31,5,0)))

clickhouse - how get count datetime per 1minute or 1day ,

I have a table in Clickhouse. for keep statistics and metrics.
and structure is:
datetime|metric_name|metric_value
I want to keep statistics and limit number of accesses in 1 minute, 1 hour, 1 day and so on. So I need event counts in last minute, hour or day for every metric_name and I want to prepare statistics in a chart.
I do not know how to make a query. I get the count of metrics statistics based on the exact for example 1 minute, 1 hour, 1 day and so on.
I used to work on inflxdb:
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND time >= now() - 1h GROUP BY time(5m) fill(0)
In fact, I want to get the number of each metric per 5 minutes in the previous 1 hour.
I do not know how to use aggregations for this problem
ClickHouse has functions for generating Date/DateTime group buckets such as toStartOfWeek, toStartOfHour, toStartOfFiveMinute. You can also use intDiv function to manually divide value ranges. However the fill feature is still in the roadmap.
For example, you can rewrite the influx sql without the fill in ClickHouse like this,
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND
time >= now() - 1h GROUP BY toStartOfFiveMinute(time)
You can also refer to this discussion https://github.com/yandex/ClickHouse/issues/379
update
There is a timeSlots function that can help generating empty buckets. Here is a working example
SELECT
slot,
metric_value_sum
FROM
(
SELECT
toStartOfFiveMinute(datetime) AS slot,
SUM(metric_value) AS metric_value_sum
FROM metrics
WHERE (metric_name = 'k1') AND (datetime >= (now() - toIntervalHour(1)))
GROUP BY slot
)
ANY RIGHT JOIN
(
SELECT arrayJoin(timeSlots(now() - toIntervalHour(1), toUInt32(3600), 300)) AS slot
) USING (slot)

In sqoop what does "size" mean when used with --split-limit arguments

From sqoop docs
Using the --split-limit parameter places a limit on the size of the split section created. If the size of the split created is larger than the size specified in this parameter, then the splits would be resized to fit within this limit, and the number of splits will change according to that.
What does "size" refer to here. Can some one explain with a little example.
I was just reading this and I think it would be interpreted like this.
Example table has a Primary Key col called ID and is an INT and table has 1000 rows with the ID values from 1 to 1000. if you set num-mappers to 50 then you would have 50 tasks each try to import 20 rows. The first query would have a predicate that says WHERE ID >= 1 AND ID <= 20. The 2nd mapper would say WHERE ID >= 21 AND ID <= 40... and so on.
If you also define the split-limit then depending on the size of the splits this parameter may adjust the number of tasks used to sqoop the data.
For example, with num-mappers set to 50 and split-limit set to 10, you would now need 100 tasks to import 10 rows of data each to get all 1000 rows. Your first task would now saw WHERE ID >= 1 AND ID <= 10.
In the case of a DateTime column, the value is now based on seconds. So If you have 10 years of data with 1 row for every day you would have about 3,653 rows of data. If you set num-mappers to 10 then your tasks would each try to sqoop about 365 days of data with a predicate that looked something like MYDATETIMECOL >= '2010-01-01' AND MYDATETIMECOL <= '2010-12-31' but if you also set the split-limit to something like 2592000 (num of seconds in 30 days) then you would need about 122 tasks to sqoop the data and the first task would have a predicate like MYDATETIMECOL >= '2010-01-01' AND MYDATETIMECOL <= '2010-01-30'.
These two examples have both used a 1:1 ratio for column value to row count. If either of these tables had 1000 rows per value in the split-by col then ALL of those rows would be sqooped as well.
Example with DateTime col where every day you have loaded 1000 rows for the last 10 years and now you have 3,653,000 rows, the predicates and the number of tasks would be the same but the number of rows sqooped in each of those tasks would be 1000x more.

PL/SQL weekly Aggregation Logic with dynamic time range

I need to aggregate the values at weekly interval. My date range is dynamic means i can give any start date and end date. Every sunday should be the starting week of every month. say if i have two columns and my start and end date is 07/11/2016 to 13/11/2016
column A column B
07/11/2016 23
08/11/2016 20
09/11/2016 10
10/11/2016 05
11/11/2016 10
12/11/2016 20
13/11/2016 10
My result should come like taking the average of column B
Column A Column B
13/11/2016 14.00
It means i should consider the past value and aggregate it to the day Sunday of that week. Also if my start and end date is like 07/11/2016 to 10/11/2016 then I should not aggregate the value as my week is not complete. I am able to aggregate the values but if my week is not complete i m not able to restrict the aggregation.
Is there any way to do this in PL/SQL??
Thank you in advance.
select to_char(columnA, 'iw') as weeknumber, avg(columnB)
from table
group by to_char(columnA, 'iw');
This will aggregate by number of week. If you need to show last day of week as a label you can get it as max(columnA) over (partition by to_char(columnA, 'iw'))

Oracle Moving Average without Current Row

Would like to calculate moving average in oracle for the last 30 records excluding current row.
select crd_nb, flng_month, curr_0380,
avg(curr_0380) over (partition by crd_nb order by flng_month ROWS 30 PRECEDING) mavg
from us_ordered
The above SQL calculates moving average for 30 records including current row.
Is there any way to calculate moving average excluding current row.?
select
crd_nb,
flng_month,
curr_0380,
avg(curr_0380) over (
partition by crd_nb
order by flng_month
ROWS between 30 PRECEDING and 1 preceding
) mavg
from us_ordered
#be_here_now's answer is clearly superior. I'm leaving mine in place nonetheless, as it's still functional, if needlessly complex.
An answer would to be to get the sum and count individually, subtract out the current row then divide the two results. It's a little ugly, but it should work:
SELECT crd_nb,
flng_month,
curr_0380,
( SUM (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- curr_0380)
/ ( COUNT (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- 1)
mavg
FROM us_ordered
If curr_0380 can be null, you'd have to tweak this slightly so that the current row is removed only if it's not null.

Resources