GoogleSheet formula for calculating time duration between two groups of time stamps - google-sheets-formula

Looking for a formula that will calculate groups of time-based on a day if there is more than an hour between the two groups. If it is less than one hour leave as is.
For example, there are a total of 3hrs and 10 min of time stamps on Thursday 33min (Blue) + 2hrs 36min (Gray) = 3:hrs and 10min total duration.
In the table above I would like the Start Time, End Time (which already have the MIN and MAX calculation, and the total amount of timestamps for each group. I will have 60,000 records that I need this formula for.

Maybe not the best solution, but gets you where you want.
I wonder if someone could do the whole task in one formula ..keen to
learn how)
DIY approach: create 2 tables as helpers:
Spreadsheet demo: HERE
Table 1 (for mornings timings):
=query(arrayformula({A2:A,C2:C,timevalue(C2:C)}),"Select dayofweek(Col1),count(Col1),min(Col2),max(Col2),max(Col3)-min(Col3)
where Col1 is not null and Col2 < timeofday '12:00:00' group by dayofweek(Col1)
label dayofweek(Col1) 'Evenings',count(Col1) 'Record',min(Col2) 'Start Time',max(Col2) 'End Time',max(Col3)-min(Col3) 'Total Duration'")
Output Table 1:
Table 2 (for evenings timings):
=query(arrayformula({A2:A,C2:C,timevalue(C2:C)}),"Select dayofweek(Col1),count(Col1),min(Col2),max(Col2),max(Col3)-min(Col3)
where Col1 is not null and Col2 > timeofday '12:00:00' group by dayofweek(Col1)
label dayofweek(Col1) 'Evenings',count(Col1) 'Record',min(Col2) 'Start Time',max(Col2) 'End Time',max(Col3)-min(Col3) 'Total Duration'")
Output Table 2:
Then on your initial table you combine the results of both tables using vlookup:
Records:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:G20,2,0)+VLOOKUP(F4:F8,F25:G31,2,0)))
Start Time:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:H20,3,0)))
End Time:
=arrayformula(iferror(VLOOKUP(F4:F8,F25:I31,4,0)))
Total Duration:
=arrayformula(iferror(VLOOKUP(F4:F8,F14:J20,5,0)+VLOOKUP(F4:F8,F25:J31,5,0)))

Related

Google sheet formula to return sum based on multiple criteria using input cells

I have rows of dates with tasks measured in hours. These tasks are assigned to different team leaders whose names are also included on each row. I would like create a multiple criteria Google sheet formula that returns the sum of hours based on the date range and the name of the team leader.
These are the data input cells I would be entering to produce the sum:
Date Start:
Date End:
Team Leader Name:
Ideally if the Team Leader name was not entered, the formula would sum the hours for all of the rows selected by the date range.
Here are some sample rows:
Job Date
Hours
Team Leader
03/25/2022
8
John
04/22/2022
7
Hannah
04/23/2022
6
John
05/01/2022
6
Hannah
Thank you in advance for your help with this!
Assuming you have the following
A:C - Job Date, Hours, Team Leader
D1 - Start Date
D2 - End Date
D3 - Team Leader
=QUERY(
{A1:C},
"select Sum(Col2)
where
Col1 >= date '"&TEXT(IF(ISBLANK(D1),DATE(1970,1,1),D1),"yyyy-mm-dd")&"' and
Col1 <= date '"&TEXT(IF(ISBLANK(D2),DATE(3000,1,1),D2),"yyyy-mm-dd")&"' and
Col3 matches '"&IF(ISBLANK(D3),".*",D3)&"'
label Sum(Col2) ''")
If the start or end dates are blank, it reverts to extreme dates. For Col3 it is using REGEX -- if D3 is blank, it reverts to a wildcard and will return everything summed up.
You can use SUMIFS() function like-
=SUMIFS(B2:B,A2:A,">="&E2,A2:A,"<="&F2,C2:C,IF(G2="","*",G2))

Oracle - determine and return the specfic hour of data with the highest sum of the values

I think I can do this in a more roundabout way using arrays, scripting, etc...BUT is it possible to sum up (aggregate) all the values for each "hour" of data in a database for a given field? Basically, I am trying to determine which hour in a day's worth of data had the highest sum...preferably without having to loop through 24 times for each day I want to look at. For example...let's say I have a table called "table", that contains columns for times and values as the follows:
Time Value
00:00 1
00:15 1
00:30 2
00:45 2
01:00 1
01:15 1
01:30 1
01:45 1
If I summed up by hand, I would get the following
Sum for 00 Hour = 6
Sum for 01 Hour = 4
So, in this example 00 Hour would be my "largest sum" hour. I'd like to end up returning simply which hour had the highest sum, and what that value was...the other hours don't matter in this case.
Can this be done all in a single ORACLE query, or does it need to be done outside the query with some scripting and working with the times and values separately? If not a single, maybe even just grab the sum for each hour, and I can run multiple queries - one for each hour? Then push each hour to an array, and just use the max of that array? I know there is a SUM() function in oracle, but how to tell it to "sum all the hours and just return the hour with the highest sum" escapes me. Hope all this makes sense. lol
Thanks for any advice to make this easier. :-)
The following query should do what you are looking for:
SELECT SUBSTR(time, 1, 2) AS HOUR,
SUM(amount) AS TOTAL_AMOUNT
FROM test_data
GROUP BY SUBSTR(time, 1, 2)
ORDER BY TOTAL_AMOUNT DESC
FETCH FIRST ROW WITH TIES;
The query uses the SUM function but grouping by the hour part of your time column. Then it orders the results by the summed amounts descending, only returning the maximum value.
Here is a DBFiddle showing the query in use (LINK)

How to iterate over a hive table row by row and calculate metric when a specific condition is met?

I have a requirement as below:
I am trying to convert a MS Access table macro loop to work for a hive table. The table called trip_details contains details about a specific trip taken by a truck. The truck can stop at multiple locations and the type of stop is indicated by a flag called type_of_trip. This column contains values like arrival, departure, loading etc.
The ultimate aim is to calculate the dwell time of each truck (how much time does the truck take before beginning for another trip). To calculate this we have to iterate the table row by row and check for trip type.
A typical example look like this:
Do while end of file:
Store the first row in a variable.
Move to the second row.
If the type_of_trip = Arrival:
Move to the third row
If the type_of_trip = End Trip:
Store the third row
Take the difference of timestamps to calculate dwell time
Append the row into the output table
End
What is the best approach to tackle this problem in hive?
I tried checking if hive contains a keyword for loop but could not find one. I was thinking of doing this using a shell script but need guidance on how to approach this.
I cannot disclose the entire data but feel free to shoot any questions in the comments section.
Input
Trip ID type_of_trip timestamp location
1 Departure 28/5/2019 15:00 Warehouse
1 Arrival 28/5/2019 16:00 Store
1 Live Unload 28/5/2019 16:30 Store
1 End Trip 28/5/2019 17:00 Store
Expected Output
Trip ID Origin_location Destination_location Dwell_time
1 Warehouse Store 2 hours
You do not need loop for this, use the power of SQL query.
Convert your timestamps to seconds (using your format specified 'dd/MM/yyyy HH:mm'), calculate min and max per trip_id, taking into account type, subtract seconds, convert seconds difference to 'HH:mm' format or any other format you prefer:
with trip_details as (--use your table instead of this subquery
select stack (4,
1,'Departure' ,'28/5/2019 15:00','Warehouse',
1,'Arrival' ,'28/5/2019 16:00','Store',
1,'Live Unload' ,'28/5/2019 16:30','Store',
1,'End Trip' ,'28/5/2019 17:00','Store'
) as (trip_id, type_of_trip, `timestamp`, location)
)
select trip_id, origin_location, destination_location,
from_unixtime(destination_time-origin_time,'HH:mm') dwell_time
from
(
select trip_id,
min(case when type_of_trip='Departure' then unix_timestamp(`timestamp`,'dd/MM/yyyy HH:mm') end) origin_time,
max(case when type_of_trip='End Trip' then unix_timestamp(`timestamp`,'dd/MM/yyyy HH:mm') end) destination_time,
max(case when type_of_trip='Departure' then location end) origin_location,
max(case when type_of_trip='End Trip' then location end) destination_location
from trip_details
group by trip_id
)s;
Result:
trip_id origin_location destination_location dwell_time
1 Warehouse Store 02:00

clickhouse - how get count datetime per 1minute or 1day ,

I have a table in Clickhouse. for keep statistics and metrics.
and structure is:
datetime|metric_name|metric_value
I want to keep statistics and limit number of accesses in 1 minute, 1 hour, 1 day and so on. So I need event counts in last minute, hour or day for every metric_name and I want to prepare statistics in a chart.
I do not know how to make a query. I get the count of metrics statistics based on the exact for example 1 minute, 1 hour, 1 day and so on.
I used to work on inflxdb:
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND time >= now() - 1h GROUP BY time(5m) fill(0)
In fact, I want to get the number of each metric per 5 minutes in the previous 1 hour.
I do not know how to use aggregations for this problem
ClickHouse has functions for generating Date/DateTime group buckets such as toStartOfWeek, toStartOfHour, toStartOfFiveMinute. You can also use intDiv function to manually divide value ranges. However the fill feature is still in the roadmap.
For example, you can rewrite the influx sql without the fill in ClickHouse like this,
SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND
time >= now() - 1h GROUP BY toStartOfFiveMinute(time)
You can also refer to this discussion https://github.com/yandex/ClickHouse/issues/379
update
There is a timeSlots function that can help generating empty buckets. Here is a working example
SELECT
slot,
metric_value_sum
FROM
(
SELECT
toStartOfFiveMinute(datetime) AS slot,
SUM(metric_value) AS metric_value_sum
FROM metrics
WHERE (metric_name = 'k1') AND (datetime >= (now() - toIntervalHour(1)))
GROUP BY slot
)
ANY RIGHT JOIN
(
SELECT arrayJoin(timeSlots(now() - toIntervalHour(1), toUInt32(3600), 300)) AS slot
) USING (slot)

PL/SQL weekly Aggregation Logic with dynamic time range

I need to aggregate the values at weekly interval. My date range is dynamic means i can give any start date and end date. Every sunday should be the starting week of every month. say if i have two columns and my start and end date is 07/11/2016 to 13/11/2016
column A column B
07/11/2016 23
08/11/2016 20
09/11/2016 10
10/11/2016 05
11/11/2016 10
12/11/2016 20
13/11/2016 10
My result should come like taking the average of column B
Column A Column B
13/11/2016 14.00
It means i should consider the past value and aggregate it to the day Sunday of that week. Also if my start and end date is like 07/11/2016 to 10/11/2016 then I should not aggregate the value as my week is not complete. I am able to aggregate the values but if my week is not complete i m not able to restrict the aggregation.
Is there any way to do this in PL/SQL??
Thank you in advance.
select to_char(columnA, 'iw') as weeknumber, avg(columnB)
from table
group by to_char(columnA, 'iw');
This will aggregate by number of week. If you need to show last day of week as a label you can get it as max(columnA) over (partition by to_char(columnA, 'iw'))

Resources