I have a service that can be started or stopped. Each operation generates a record with timestamp and operation type. Ultimately, I end up with a series of timestamped operation records. Now I want to calculate the up-time of the service during a day. The idea is simple. For each pair of start/stop records, compute the timespan and sum up. But I don't know how to implement it with Hive, if possible at all. It's OK that I create tables to store intermediate results for this. This is the main blocking issue, and there are some other minor issues as well. For example, some start/stop pairs may span across a single day. Any idea how to deal with this minor issue would be appreciated too.
Sample Data:
Timestamp Operation
... ...
2017-09-03 23:59:00 Start
2017-09-04 00:01:00 Stop
2017-09-04 06:50:00 Start
2017-09-04 07:00:00 Stop
2017-09-05 08:00:00 Start
... ...
The service up-time for 2017-09-04 should then be 1 + 10 = 11 mins. Note that the first time interval spans across 09-03 and 09-04, and only the part that falls within 09-04 is counted.
select to_date(from_ts) as dt
,sum (to_unix_timestamp(to_ts) - to_unix_timestamp(from_ts)) / 60 as up_time_minutes
from (select case when pe.i = 0 then from_ts else cast(date_add(to_date(from_ts),i) as timestamp) end as from_ts
,case when pe.i = datediff(to_ts,from_ts) then to_ts else cast(date_add(to_date(from_ts),i+1) as timestamp) end as to_ts
from (select `operation`
,`Timestamp` as from_ts
,lead(`Timestamp`) over (order by `Timestamp`) as to_ts
from t
) t
lateral view posexplode(split(space(datediff(to_ts,from_ts)),' ')) pe as i,x
where `operation` = 'Start'
and to_ts is not null
) t
group by to_date(from_ts)
;
+------------+-----------------+
| dt | up_time_minutes |
+------------+-----------------+
| 2017-09-03 | 1.0 |
| 2017-09-04 | 11.0 |
+------------+-----------------+
Related
I have a SQL Server table Energy_CC with two columns: time [int] (Epoch time) and E_CC14 [float]. Every 30 minutes, the total amount of my energy (kWh) is appended to the data table - something like this:
time
E_CC14
1670990400
5469.00223
1670992200
5469.02791
1670994000
5469.056295
1670995800
5469.082706
1670997600
5469.10558
1670999400
5469.128534
I tried this SQL statement:
SELECT
MONTH(DATEADD(SS, i.time, '1970-01-01')),
i.E_CC14 AS mwh,
iprev.E_CC14 AS prevmwh,
(i.E_CC14 - iprev.E_CC14) AS diff
FROM
Energy_CC i
LEFT OUTER JOIN
Energy_CC iprev ON MONTH(iprev.time) = MONTH(i.time) - MONTH(DATEADD(month, -1, i.time))
AND DAY(iprev.time) = 1
WHERE
DAY(i.time) = 1
GROUP BY
MONTH(i.time), i.E_CC14, iprev.E_CC14;
I expect the result for monthly like this :
time
E_CC14
DECEMBER-22
10223
Any help would be greatly appreciated.
I'm using a transaction to see how long a device is RFM mode and the duration field increases with each table row. How I think it should work is that while the field is 'yes' it would calculate the duration that all events equal 'yes', but I have a lot of superfluous data that shouldn't be there IMO.
I only want to keep the largest duration event so I want to compare the current events duration to the next events duration and if its smaller than the current event, keep the current event.
index=crowdstrike sourcetype=crowdstrike:device:json
| transaction falcon_device.hostname startswith="falcon_device.reduced_functionality_mode=yes" endswith="falcon_device.reduced_functionality_mode=no"
| table _time duration
_time
duration
2022-10-28 06:07:45
888198
2022-10-28 05:33:44
892400
2022-10-28 04:57:44
896360
2022-08-22 18:25:53
3862
2022-08-22 18:01:53
7703
2022-08-22 17:35:53
11543
In the data above the duration goes from 896360 to 3862, and can happen on any date, and the duration runs in cycles like that where it increases until it starts over. So in the comparison I would keep the event at the 10-28 inflection point and so on at all other inflection points throughout the dataset.
How would I construct that multi event comparison?
By definition, the transaction command bundles together all events with the same hostname value, starting with the first "yes" and ending with the first "no". There is no option to include events by size, but there are options that govern the maximum time span of a transaction (maxspan), how many events can be in a transaction (maxevents), and how long the time between events can be (maxpause). That the duration value you want to keep (896360) is 10 days even though the previous transaction was only 36 minutes before it makes me wonder about the logic being used in this query. Consider using some of the options available to better define a "transaction".
What problem are you trying to solve with this query? It's possible there's another solution that doesn't use transaction (which is very non-performant).
Sans sample data, something like the following will probably work:
index=crowdstrike sourcetype=crowdstrike:device:json falcon_device.hostname=* falcon_device.reduced_functionality_mode=yes
| stats max(_time) as yestime by falcon_device.hostname
| append
[| search index=crowdstrike sourcetype=crowdstrike:device:json falcon_device.hostname=* falcon_device.reduced_functionality_mode=no
| stats max(_time) as notime by falcon_device.hostname ]
| stats values(*) as * by falcon_device.hostname
| eval elapsed_seconds=yestime-notime
Thanks for your answers but it wasn't working out. I ended up talking to some professional splunkers and got the below as a solution.
index=crowdstrike sourcetype=crowdstrike:device:json
| addinfo ```adds info_max_time```
| fields + _time, falcon_device.reduced_functionality_mode falcon_device.hostname info_max_time
| rename falcon_device.reduced_functionality_mode AS mode, falcon_device.hostname AS Hostname
| sort 0 + Hostname, -_time ``` events are not always returned in descending order per hostname, which would break streamstats```
| streamstats current=f last(mode) as new_mode last(_time) as time_change by Hostname ```compute potential time of state change```
| eval new_mode=coalesce(new_mode,mode."+++"), time_change=coalesce(time_change,info_max_time) ```take care of boundaries of search```
| table _time, Hostname, mode, new_mode, time_change
| where mode!=new_mode ```keep only state change events```
| streamstats current=f last(time_change) AS change_end by Hostname ```add start time of the next state as change_end time for the current state```
| fieldformat time_change=strftime(time_change, "%Y-%m-%d %T")
| fieldformat change_end=strftime(change_end, "%Y-%m-%d %T")
``` uncomment the following to sort by duration```
```| search change_end=* AND new_mode="yes"
| eval duration = round( (change_end-time_change)/(3600),1)
| table time_change, Hostname, new_mode, duration
| sort -duration```
my company has numbers of shops around all the locations. They raised a request for delivering the item to their shop which they can sell . We wanted to understand how much time the company takes to deliver the item in minutes.However, we don't want to add the time in our elapsed time when the shop is closed i.e.
lets consider shop opening and closing time are
now elapsed time
When I deduct complain time and resolution time then I get calculatable elasped time in minutes but I need Required elapsed time in minutes so in the first case out of 2090 minutes those minutes are deducated when shop was closed. I need to write an oracle query to calcualted the required elapsed time in minutes which is in green.
help what query we can write.
One formula to get the net time is as follows:
For every day involved add up the opening times. For your first example this is two days 2021-01-11 and 2021-01-12 with 13 daily opening hours (09:00 - 22:00). That makes 26 hours.
If the first day starts after the store opens, subtract the difference. 10:12 - 09:00 = 1:12 = 72 minutes.
If the last day ends before the store closes, subtract the difference. 22:00 - 21:02 = 0:58 = 58 minutes.
Oracle doesn't have a TIME datatype, so I assume you are using Oracle's datetime data type they call DATE to store the opening and closing time and we must ignore the date part. And you are probably using the DATE type for the complain_time and the resolution_time, too.
In below query I convert the time parts to minutes right away, so the calculations get a tad more readable later.
with s as
(
select
shop,
extract(hour from opening_time) * 60 + extract(minute from opening_time) as opening_minute,
extract(hour from closing_time) * 60 + extract(minute from closing_time) as closing_minute
from shops
)
, r as
(
select
request, shop, complain_time, resolution_time,
trunc(complain_time) as complain_day,
trunc(resolution_time) as resolution_day,
extract(hour from complain_time) * 60 + extract(minute from complain_time) as complain_minute,
extract(hour from resolution_time) * 60 + extract(minute from resolution_time) as resolution_minute
from requests
)
select
r.request, r.shop, r.complain_time, r.resolution_time,
(r.resolution_day - r.complain_day + 1) * 60
- case when r.complain_minute > s.opening_minute) then r.complain_minute - s.opening_minute else 0 end
- case when r.resolution_minute < s.opening_minute) then s.closing_minute - r.resolution_minute else 0 end
as net_duration_in_minutes
from r
join s on s.shop = r.shop
order by r.request;
Game_ID | BeginTime | EndTime
1 | 1235000140| 1235002457
2 | 1235000377| 1235003300
3 | 1235000414| 1235056128
1 | 1235000414| 1235056128
2 | 1235000377| 1235003300
Here i would like to get the Milliseconds between two Epoch time fields, BeginTime and EndTime. Then Calculate the Average time for each games.
games = load 'games.txt' using PigStorage('|') as (gameid: int, begin_time: long, end_time:long);
dump games;
(1,1235000140,1235002457)
(2,1235000377,1235003300)
(3,1235000414,1235056128)
(1,1235000414,1235056128)
(2,1235000377,1235003300)
Step 1: Calculate the time difference
difference = foreach games generate gameid, end_time - begin_time as time_lapse;
dump difference;
(1,2317)
(2,2923)
(3,55714)
(1,55714)
(2,2923)
Step 2: Group the data on Game_ID
game_group = group difference by gameid;
dump game_group;
(1,{(1,55714),(1,2317)})
(2,{(2,2923),(2,2923)})
(3,{(3,55714)})
Step 3: Then the Average
average = foreach game_group generate group, AVG(difference.time_lapse);
dump average;
(1,29015.5)
(2,2923.0)
(3,55714.0)
Algorithm Challenge :
Problem statement :
How would you design a logging system for something like Google , you should be able to query for the number of times a URL was opened within two time frames.
i/p : start_time , end_time , URL1
o/p : number of times URL1 was opened between start and end time.
Some specs :
Database is not an optimal solution
A URL might have been opened multiple times for given time stamp.
A URL might have been opened a large number of times within two time stamps.
start_time and end_time can be a month apart.
time could be granular to a second.
One solution :
Hash of a hash
Key Value
URL Hash----> T1 CumFrequency
Eg :
Amazon Hash--> T CumFreq
11 00 am 3 ( opened 3 times at 11:00 am )
11 15 am 4 ( opened 1 time at 11:15 am , cumfreq is 3+1=4)
11 30 am 11 ( opened 4 times at 11:30 am , cumfreq is 3+4+4=11)
i/p : 11 : 10 am , 11 : 37 am , Amazon
the o.p can be obtained by subtracting , last timestamp less then 11:10 which 11:00 am , and last active time stamp less than 11:37 am which is 11:30 am. Hence the result is
11-3 = 8 ....
Can we do better ?