I am sure this is an easy problem to solve however I cannot see how no matter which way round I try and look at it.
I am using an Oracle database and I am trying to report on processing times. The system is running a process that picks up incoming files and processes then into the database. As it performs this task it logs when each of the different sections happen and from which source. I've filtered the data to just give me start and end times as that is all I am currently interested in. Example Data
5 15/MAY/15 00:37:01 Started
5 15/MAY/15 00:50:45 Finished
5 16/MAY/15 02:07:41 Started
5 16/MAY/15 02:19:16 Finished
5 16/MAY/15 23:20:25 Started
5 16/MAY/15 23:28:53 Finished
5 17/MAY/15 23:16:36 Started
5 17/MAY/15 23:27:51 Finished
5 18/MAY/15 23:31:28 Started
5 18/MAY/15 23:47:41 Finished
5 19/MAY/15 23:44:12 Started
5 20/MAY/15 00:06:17 Finished
5 20/MAY/15 23:33:42 Started
5 20/MAY/15 23:58:16 Finished
What I am trying to get is the duration for each of the "sets" (Start time to End Time). When there is only a single occurence on a day its easy, however when looking at days such as the 16th where there are two sets in a single day, or where it starts on the 19th and ends on the 20th I am struggling. I know this would be possible within a programming language, however I am sure it must also be possible within Oracle as well.
The output I'd expect form the above is:
Source StartTime EndTime
5 15/MAY/15 00:37:01 15/MAY/15 00:50:45
5 16/MAY/15 02:07:41 16/MAY/15 02:19:16
5 16/MAY/15 23:20:25 16/MAY/15 23:28:53
5 17/MAY/15 23:16:36 17/MAY/15 23:27:51
5 18/MAY/15 23:31:28 18/MAY/15 23:47:41
5 19/MAY/15 23:44:12 20/MAY/15 00:06:17
5 20/MAY/15 23:33:42 20/MAY/15 23:58:16
Thanks,
As Patrick Bacon mentioned in a comment, you can use the lead and lag analytic functions to peek ahead and behind each row. If the row you are looking at is 'Started' then you need to peak at the next row (chronologically, for the same source) to get the matching 'Finished' row, using lead. Conversely, if the row you are looking at is 'Finished' then you need to peak at the previous row to get the matching 'Started' row, using lag:
select distinct source,
case when action = 'Started' then time
else lag(time) over (partition by source order by time) end as starttime,
case when action = 'Finished' then time
else lead(time) over (partition by source order by time) end as endtime
from t
order by source, starttime;
SOURCE STARTTIME ENDTIME
---------- ------------------- -------------------
5 2015-05-15 00:37:01 2015-05-15 00:50:45
5 2015-05-16 02:07:41 2015-05-16 02:19:16
5 2015-05-16 23:20:25 2015-05-16 23:28:53
5 2015-05-17 23:16:36 2015-05-17 23:27:51
5 2015-05-18 23:31:28 2015-05-18 23:47:41
5 2015-05-19 23:44:12 2015-05-20 00:06:17
5 2015-05-20 23:33:42 2015-05-20 23:58:16
Because you're looking ahead and behind, you end up with duplicate pairs of data; here I've used distinct to squash those duplicates, but you could also use this as a subquery and filter the results.
SQL Fiddle with a CTE to provide your sample data.
Solution with function lead():
select tsource, starttime, endtime
from (
select tsource, ttime starttime, status,
lead(ttime) over (partition by tsource order by ttime) endtime
from test)
where status = 'Started'
SQLFiddle
Edit:
If there may happen that you have two rows with status Started consecutively without Finished between
then you need some protection against such situation, for instance this displays null:
select tsource, starttime, endtime
from (
select tsource, ttime starttime, status,
case when lead(status) over (partition by tsource order by ttime) = 'Finished'
then lead(ttime) over (partition by tsource order by ttime)
else null
end endtime
from test)
where status = 'Started'
Related
my company has numbers of shops around all the locations. They raised a request for delivering the item to their shop which they can sell . We wanted to understand how much time the company takes to deliver the item in minutes.However, we don't want to add the time in our elapsed time when the shop is closed i.e.
lets consider shop opening and closing time are
now elapsed time
When I deduct complain time and resolution time then I get calculatable elasped time in minutes but I need Required elapsed time in minutes so in the first case out of 2090 minutes those minutes are deducated when shop was closed. I need to write an oracle query to calcualted the required elapsed time in minutes which is in green.
help what query we can write.
One formula to get the net time is as follows:
For every day involved add up the opening times. For your first example this is two days 2021-01-11 and 2021-01-12 with 13 daily opening hours (09:00 - 22:00). That makes 26 hours.
If the first day starts after the store opens, subtract the difference. 10:12 - 09:00 = 1:12 = 72 minutes.
If the last day ends before the store closes, subtract the difference. 22:00 - 21:02 = 0:58 = 58 minutes.
Oracle doesn't have a TIME datatype, so I assume you are using Oracle's datetime data type they call DATE to store the opening and closing time and we must ignore the date part. And you are probably using the DATE type for the complain_time and the resolution_time, too.
In below query I convert the time parts to minutes right away, so the calculations get a tad more readable later.
with s as
(
select
shop,
extract(hour from opening_time) * 60 + extract(minute from opening_time) as opening_minute,
extract(hour from closing_time) * 60 + extract(minute from closing_time) as closing_minute
from shops
)
, r as
(
select
request, shop, complain_time, resolution_time,
trunc(complain_time) as complain_day,
trunc(resolution_time) as resolution_day,
extract(hour from complain_time) * 60 + extract(minute from complain_time) as complain_minute,
extract(hour from resolution_time) * 60 + extract(minute from resolution_time) as resolution_minute
from requests
)
select
r.request, r.shop, r.complain_time, r.resolution_time,
(r.resolution_day - r.complain_day + 1) * 60
- case when r.complain_minute > s.opening_minute) then r.complain_minute - s.opening_minute else 0 end
- case when r.resolution_minute < s.opening_minute) then s.closing_minute - r.resolution_minute else 0 end
as net_duration_in_minutes
from r
join s on s.shop = r.shop
order by r.request;
Hello fellow Stack Overflowers,
I have a situation, where I need some help choosing the best way to make an algorithm work, the objective is to manage the occupation of a resource (Lets consider the resource A) to have multiple tasks, and where each task takes a specified amount of time to complete. At this first stage I don't want to involve multiple variables, so lets keep it the simple way, lets consider he only has a schedule of the working days.
For example:
1 - We have 1 resource, resource A
2 - Resource A works from 8 am to 4 pm, monday to friday, to keep it simple by now, he doesn't have lunch for now, so, 8 hours of work a day.
3 - Resource A has 5 tasks to complete, to avoid complexity at this level, lets supose each one will take exactly 10 hours to complete.
4 - Resource A will start working on this tasks at 2018-05-16 exactly at 2 pm.
Problem:
Now, all I need to know is the correct finish date for all the 5 tasks, but considering all the previous limitations.
In this case, he has 6 working days and additionaly 2 hours of the 7th day.
The expected result that I want would be: 2018-05-24 (at 4 pm).
Implementation:
I thought about 2 options, and would like to have feedback on this options, or other options that I might not be considering.
Algorithm 1
1 - Create a list of "slots", where each "slot" would represent 1 hour, for x days.
2 - Cross this list of slots with the hour schedule of the resource, to remove all the slots where the resource isn't here. This would return a list with the slots that he can actually work.
3 - Occupy the remaining slots with the tasks that I have for him.
4 - Finnaly, check the date/hour of the last occupied slot.
Disadvantage: I think this might be an overkill solution, considering that I don't want to consider his occupation for the future, all I want is to know when will the tasks be completed.
Algorithm 2
1 - Add the task hours (50 hours) to the starting date, getting the expectedFinishDate. (Would get expectedFinishDate = 2018-05-18 (at 4 pm))
2 - Cross the hours, between starting date and expectedFinishDate with the schedule, to get the quantity of hours that he won't work. (would basically get the unavailable hours, 16 hours a day, would result in remainingHoursForCalc = 32 hours).
3 - calculate new expectedFinishDate with the unavailable hours, would add this 32 hours to the previous 2018-05-18 (at 4 pm).
4 - Repeat point 2 and 3 with new expectedFinishDate untill remainingHoursForCalc = 0.
Disadvantage: This would result in a recursive method or in a very weird while loop, again, I think this might be overkill for calculation of a simple date.
What would you suggest? Is there any other option that I might not be considering that would make this simpler? Or you think there is a way to improve any of this 2 algorithms to make it work?
Improved version:
import java.util.Calendar;
import java.util.Date;
public class Main {
public static void main(String args[]) throws Exception
{
Date d=new Date();
System.out.println(d);
d.setMinutes(0);
d.setSeconds(0);
d.setHours(13);
Calendar c=Calendar.getInstance();
c.setTime(d);
c.set(Calendar.YEAR, 2018);
c.set(Calendar.MONTH, Calendar.MAY);
c.set(Calendar.DAY_OF_MONTH, 17);
//c.add(Calendar.HOUR, -24-5);
d=c.getTime();
//int workHours=11;
int hoursArray[] = {1,2,3,4,5, 10,11,12, 19,20, 40};
for(int workHours : hoursArray)
{
try
{
Date end=getEndOfTask(d, workHours);
System.out.println("a task starting at "+d+" and lasting "+workHours
+ " hours will end at " +end);
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
}
}
public static Date getEndOfTask(Date startOfTask, int workingHours) throws Exception
{
int totalHours=0;//including non-working hours
//startOfTask +totalHours =endOfTask
int startHour=startOfTask.getHours();
if(startHour<8 || startHour>16)
throw new Exception("a task cannot start outside the working hours interval");
System.out.println("startHour="+startHour);
int startDayOfWeek=startOfTask.getDay();//start date's day of week; Wednesday=3
System.out.println("startDayOfWeek="+startDayOfWeek);
if(startDayOfWeek==6 || startDayOfWeek==0)
throw new Exception("a task cannot start on Saturdays on Sundays");
int remainingHoursUntilDayEnd=16-startHour;
System.out.println("remainingHoursUntilDayEnd="+remainingHoursUntilDayEnd);
/*some discussion here: if task starts at 12:30, we have 3h30min
* until the end of the program; however, getHours() will return 12, which
* substracted from 16 will give 4h. It will work fine if task starts at 12:00,
* or, generally, at the begining of the hour; let's assume a task will start at HH:00*/
int remainingDaysUntilWeekEnd=5-startDayOfWeek;
System.out.println("remainingDaysUntilWeekEnd="+remainingDaysUntilWeekEnd);
int completeWorkDays = (workingHours-remainingHoursUntilDayEnd)/8;
System.out.println("completeWorkDays="+completeWorkDays);
//excluding both the start day, and the end day, if they are not fully occupied by the task
int workingHoursLastDay=(workingHours-remainingHoursUntilDayEnd)%8;
System.out.println("workingHoursLastDay="+workingHoursLastDay);
/* workingHours=remainingHoursUntilDayEnd+(8*completeWorkDays)+workingHoursLastDay */
int numberOfWeekends=(int)Math.ceil( (completeWorkDays-remainingDaysUntilWeekEnd)/5.0 );
if((completeWorkDays-remainingDaysUntilWeekEnd)%5==0)
{
if(workingHoursLastDay>0)
{
numberOfWeekends++;
}
}
System.out.println("numberOfWeekends="+numberOfWeekends);
totalHours+=(int)Math.min(remainingHoursUntilDayEnd, workingHours);//covers the case
//when task lasts 1 or 2 hours, and we have maybe 4h until end of day; that's why i use Math.min
if(completeWorkDays>0 || workingHoursLastDay>0)
{
totalHours+=8;//the hours of the current day between 16:00 and 24:00
//it might be the case that completeWorkDays is 0, yet the task spans up to tommorrow
//so we still have to add these 8h
}
if(completeWorkDays>0)//redundant if, because 24*0=0
{
totalHours+=24*completeWorkDays;//for every 8 working h, we have a total of 24 h that have
//to be added to the date
}
if(workingHoursLastDay>0)
{
totalHours+=8;//the hours between 00.00 AM and 8 AM
totalHours+=workingHoursLastDay;
}
if(numberOfWeekends>0)
{
totalHours+=48*numberOfWeekends;//every weekend between start and end dates means two days
}
System.out.println("totalHours="+totalHours);
Calendar calendar=Calendar.getInstance();
calendar.setTime(startOfTask);
calendar.add(Calendar.HOUR, totalHours);
return calendar.getTime();
}
}
You may adjust the hoursArray[], or d.setHours along with c.set(Calendar.DAY_OF_MONTH, to test various start dates along with various task lengths.
There is still a bug , due to the addition of the 8 hours between 16:00 and 24:00:
a task starting at Thu May 17 13:00:00 EEST 2018 and lasting 11 hours will end at Sat May 19 00:00:00 EEST 2018.
I've kept a lot of print statements, they are useful for debugging purposes.
Here is the terminology explained:
I agree that algorithm 1 is overkill.
I think I would make sure I had the conditions right: hours per day (8), working days (Mon, Tue, Wed, Thu, Fri). Would then divide the hours required (5 * 10 = 50) by the hours per day so I know a minimum of how many working days are needed (50 / 8 = 6). Slightly more advanced, divide by hours per week first (50 / 40 = 1 week). Count working days from the start date to get a first shot at the end date. There was probably a remainder from the division, so use this to determine whether the tasks can end on this day or run into the next working day.
I have a service that can be started or stopped. Each operation generates a record with timestamp and operation type. Ultimately, I end up with a series of timestamped operation records. Now I want to calculate the up-time of the service during a day. The idea is simple. For each pair of start/stop records, compute the timespan and sum up. But I don't know how to implement it with Hive, if possible at all. It's OK that I create tables to store intermediate results for this. This is the main blocking issue, and there are some other minor issues as well. For example, some start/stop pairs may span across a single day. Any idea how to deal with this minor issue would be appreciated too.
Sample Data:
Timestamp Operation
... ...
2017-09-03 23:59:00 Start
2017-09-04 00:01:00 Stop
2017-09-04 06:50:00 Start
2017-09-04 07:00:00 Stop
2017-09-05 08:00:00 Start
... ...
The service up-time for 2017-09-04 should then be 1 + 10 = 11 mins. Note that the first time interval spans across 09-03 and 09-04, and only the part that falls within 09-04 is counted.
select to_date(from_ts) as dt
,sum (to_unix_timestamp(to_ts) - to_unix_timestamp(from_ts)) / 60 as up_time_minutes
from (select case when pe.i = 0 then from_ts else cast(date_add(to_date(from_ts),i) as timestamp) end as from_ts
,case when pe.i = datediff(to_ts,from_ts) then to_ts else cast(date_add(to_date(from_ts),i+1) as timestamp) end as to_ts
from (select `operation`
,`Timestamp` as from_ts
,lead(`Timestamp`) over (order by `Timestamp`) as to_ts
from t
) t
lateral view posexplode(split(space(datediff(to_ts,from_ts)),' ')) pe as i,x
where `operation` = 'Start'
and to_ts is not null
) t
group by to_date(from_ts)
;
+------------+-----------------+
| dt | up_time_minutes |
+------------+-----------------+
| 2017-09-03 | 1.0 |
| 2017-09-04 | 11.0 |
+------------+-----------------+
I have following redis keys:
REDIS.del "weekly:activity"
REDIS.del "2013-02-27:activity"
REDIS.del "2013-02-28:activity"
REDIS.sadd "2013-02-27:activity", 1
REDIS.sadd "2013-02-27:activity", 2
REDIS.sadd "2013-02-27:activity", 3
REDIS.sadd "2013-02-28:activity", 4
REDIS.sadd "2013-02-28:activity", 1
REDIS.sadd "2013-02-28:activity", 1
REDIS.sadd "2013-02-28:activity", 6
REDIS.sunionstore "weekly:activity", "2013-02-27:activity", "2013-02-28:activity"
REDIS.scard "weekly:activity"
How will be the best way to recognise first day in current week and sum stats from current week.
Can I do that using Redis?
Or should I do it in Ruby?
Instead of summing regularly, I suggest you instead collect more data and create keys for weeks and month.
So every time a user logs in, you:
REDIS.incr "{today}:activity"
REDIS.incr "{this week}:weekly:activity"
REDIS.incr "{this month}:monthly:activity"
This way, no reporting to do. I'll it up to you to compute this week and this month.
Algorithm Challenge :
Problem statement :
How would you design a logging system for something like Google , you should be able to query for the number of times a URL was opened within two time frames.
i/p : start_time , end_time , URL1
o/p : number of times URL1 was opened between start and end time.
Some specs :
Database is not an optimal solution
A URL might have been opened multiple times for given time stamp.
A URL might have been opened a large number of times within two time stamps.
start_time and end_time can be a month apart.
time could be granular to a second.
One solution :
Hash of a hash
Key Value
URL Hash----> T1 CumFrequency
Eg :
Amazon Hash--> T CumFreq
11 00 am 3 ( opened 3 times at 11:00 am )
11 15 am 4 ( opened 1 time at 11:15 am , cumfreq is 3+1=4)
11 30 am 11 ( opened 4 times at 11:30 am , cumfreq is 3+4+4=11)
i/p : 11 : 10 am , 11 : 37 am , Amazon
the o.p can be obtained by subtracting , last timestamp less then 11:10 which 11:00 am , and last active time stamp less than 11:37 am which is 11:30 am. Hence the result is
11-3 = 8 ....
Can we do better ?