How to sum up Redis keys from entire week? - ruby

I have following redis keys:
REDIS.del "weekly:activity"
REDIS.del "2013-02-27:activity"
REDIS.del "2013-02-28:activity"
REDIS.sadd "2013-02-27:activity", 1
REDIS.sadd "2013-02-27:activity", 2
REDIS.sadd "2013-02-27:activity", 3
REDIS.sadd "2013-02-28:activity", 4
REDIS.sadd "2013-02-28:activity", 1
REDIS.sadd "2013-02-28:activity", 1
REDIS.sadd "2013-02-28:activity", 6
REDIS.sunionstore "weekly:activity", "2013-02-27:activity", "2013-02-28:activity"
REDIS.scard "weekly:activity"
How will be the best way to recognise first day in current week and sum stats from current week.
Can I do that using Redis?
Or should I do it in Ruby?

Instead of summing regularly, I suggest you instead collect more data and create keys for weeks and month.
So every time a user logs in, you:
REDIS.incr "{today}:activity"
REDIS.incr "{this week}:weekly:activity"
REDIS.incr "{this month}:monthly:activity"
This way, no reporting to do. I'll it up to you to compute this week and this month.

Related

Algorithm to calculate a date for complex occupation management

Hello fellow Stack Overflowers,
I have a situation, where I need some help choosing the best way to make an algorithm work, the objective is to manage the occupation of a resource (Lets consider the resource A) to have multiple tasks, and where each task takes a specified amount of time to complete. At this first stage I don't want to involve multiple variables, so lets keep it the simple way, lets consider he only has a schedule of the working days.
For example:
1 - We have 1 resource, resource A
2 - Resource A works from 8 am to 4 pm, monday to friday, to keep it simple by now, he doesn't have lunch for now, so, 8 hours of work a day.
3 - Resource A has 5 tasks to complete, to avoid complexity at this level, lets supose each one will take exactly 10 hours to complete.
4 - Resource A will start working on this tasks at 2018-05-16 exactly at 2 pm.
Problem:
Now, all I need to know is the correct finish date for all the 5 tasks, but considering all the previous limitations.
In this case, he has 6 working days and additionaly 2 hours of the 7th day.
The expected result that I want would be: 2018-05-24 (at 4 pm).
Implementation:
I thought about 2 options, and would like to have feedback on this options, or other options that I might not be considering.
Algorithm 1
1 - Create a list of "slots", where each "slot" would represent 1 hour, for x days.
2 - Cross this list of slots with the hour schedule of the resource, to remove all the slots where the resource isn't here. This would return a list with the slots that he can actually work.
3 - Occupy the remaining slots with the tasks that I have for him.
4 - Finnaly, check the date/hour of the last occupied slot.
Disadvantage: I think this might be an overkill solution, considering that I don't want to consider his occupation for the future, all I want is to know when will the tasks be completed.
Algorithm 2
1 - Add the task hours (50 hours) to the starting date, getting the expectedFinishDate. (Would get expectedFinishDate = 2018-05-18 (at 4 pm))
2 - Cross the hours, between starting date and expectedFinishDate with the schedule, to get the quantity of hours that he won't work. (would basically get the unavailable hours, 16 hours a day, would result in remainingHoursForCalc = 32 hours).
3 - calculate new expectedFinishDate with the unavailable hours, would add this 32 hours to the previous 2018-05-18 (at 4 pm).
4 - Repeat point 2 and 3 with new expectedFinishDate untill remainingHoursForCalc = 0.
Disadvantage: This would result in a recursive method or in a very weird while loop, again, I think this might be overkill for calculation of a simple date.
What would you suggest? Is there any other option that I might not be considering that would make this simpler? Or you think there is a way to improve any of this 2 algorithms to make it work?
Improved version:
import java.util.Calendar;
import java.util.Date;
public class Main {
public static void main(String args[]) throws Exception
{
Date d=new Date();
System.out.println(d);
d.setMinutes(0);
d.setSeconds(0);
d.setHours(13);
Calendar c=Calendar.getInstance();
c.setTime(d);
c.set(Calendar.YEAR, 2018);
c.set(Calendar.MONTH, Calendar.MAY);
c.set(Calendar.DAY_OF_MONTH, 17);
//c.add(Calendar.HOUR, -24-5);
d=c.getTime();
//int workHours=11;
int hoursArray[] = {1,2,3,4,5, 10,11,12, 19,20, 40};
for(int workHours : hoursArray)
{
try
{
Date end=getEndOfTask(d, workHours);
System.out.println("a task starting at "+d+" and lasting "+workHours
+ " hours will end at " +end);
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
}
}
public static Date getEndOfTask(Date startOfTask, int workingHours) throws Exception
{
int totalHours=0;//including non-working hours
//startOfTask +totalHours =endOfTask
int startHour=startOfTask.getHours();
if(startHour<8 || startHour>16)
throw new Exception("a task cannot start outside the working hours interval");
System.out.println("startHour="+startHour);
int startDayOfWeek=startOfTask.getDay();//start date's day of week; Wednesday=3
System.out.println("startDayOfWeek="+startDayOfWeek);
if(startDayOfWeek==6 || startDayOfWeek==0)
throw new Exception("a task cannot start on Saturdays on Sundays");
int remainingHoursUntilDayEnd=16-startHour;
System.out.println("remainingHoursUntilDayEnd="+remainingHoursUntilDayEnd);
/*some discussion here: if task starts at 12:30, we have 3h30min
* until the end of the program; however, getHours() will return 12, which
* substracted from 16 will give 4h. It will work fine if task starts at 12:00,
* or, generally, at the begining of the hour; let's assume a task will start at HH:00*/
int remainingDaysUntilWeekEnd=5-startDayOfWeek;
System.out.println("remainingDaysUntilWeekEnd="+remainingDaysUntilWeekEnd);
int completeWorkDays = (workingHours-remainingHoursUntilDayEnd)/8;
System.out.println("completeWorkDays="+completeWorkDays);
//excluding both the start day, and the end day, if they are not fully occupied by the task
int workingHoursLastDay=(workingHours-remainingHoursUntilDayEnd)%8;
System.out.println("workingHoursLastDay="+workingHoursLastDay);
/* workingHours=remainingHoursUntilDayEnd+(8*completeWorkDays)+workingHoursLastDay */
int numberOfWeekends=(int)Math.ceil( (completeWorkDays-remainingDaysUntilWeekEnd)/5.0 );
if((completeWorkDays-remainingDaysUntilWeekEnd)%5==0)
{
if(workingHoursLastDay>0)
{
numberOfWeekends++;
}
}
System.out.println("numberOfWeekends="+numberOfWeekends);
totalHours+=(int)Math.min(remainingHoursUntilDayEnd, workingHours);//covers the case
//when task lasts 1 or 2 hours, and we have maybe 4h until end of day; that's why i use Math.min
if(completeWorkDays>0 || workingHoursLastDay>0)
{
totalHours+=8;//the hours of the current day between 16:00 and 24:00
//it might be the case that completeWorkDays is 0, yet the task spans up to tommorrow
//so we still have to add these 8h
}
if(completeWorkDays>0)//redundant if, because 24*0=0
{
totalHours+=24*completeWorkDays;//for every 8 working h, we have a total of 24 h that have
//to be added to the date
}
if(workingHoursLastDay>0)
{
totalHours+=8;//the hours between 00.00 AM and 8 AM
totalHours+=workingHoursLastDay;
}
if(numberOfWeekends>0)
{
totalHours+=48*numberOfWeekends;//every weekend between start and end dates means two days
}
System.out.println("totalHours="+totalHours);
Calendar calendar=Calendar.getInstance();
calendar.setTime(startOfTask);
calendar.add(Calendar.HOUR, totalHours);
return calendar.getTime();
}
}
You may adjust the hoursArray[], or d.setHours along with c.set(Calendar.DAY_OF_MONTH, to test various start dates along with various task lengths.
There is still a bug , due to the addition of the 8 hours between 16:00 and 24:00:
a task starting at Thu May 17 13:00:00 EEST 2018 and lasting 11 hours will end at Sat May 19 00:00:00 EEST 2018.
I've kept a lot of print statements, they are useful for debugging purposes.
Here is the terminology explained:
I agree that algorithm 1 is overkill.
I think I would make sure I had the conditions right: hours per day (8), working days (Mon, Tue, Wed, Thu, Fri). Would then divide the hours required (5 * 10 = 50) by the hours per day so I know a minimum of how many working days are needed (50 / 8 = 6). Slightly more advanced, divide by hours per week first (50 / 40 = 1 week). Count working days from the start date to get a first shot at the end date. There was probably a remainder from the division, so use this to determine whether the tasks can end on this day or run into the next working day.

Stata: Deleting duplicates based on dates

My dataset consists of a number of variables:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(v1 v2) str11 Date float(v4 v5 v6 v7 v8)
1 2 "15-aug-2016" 1 1 1 1 1
1 2 "07-may-2015" 1 1 1 1 50
1 2 "07-may-2015" 1 1 1 1 88
1 2 "15-aug-2016" 1 1 1 1 29
end
The variable date is a date and time and is formatted as a datetime
generate double date = date(Date,"DMY")
My duplicates are the same for v1-v2-v4-v5-v6-v7 (as in the example), while v8 is different.
I need to delete duplicates based on v1-v2-v4-v5-v6-v7 and keep the one with the smallest date (here 07-may-2015).
I have tried without success:
1.
gsort -date
bysort v1 v2 v4 v5 v6 v7: generate dublet=_n
order dublet date
keep if dublet==1
drop dublet
--> Works for the first 25 rows or so, then keeps the wrong one a couple of times and then the right one again. (Seems to me, that the bysort command removes the sort done by gsort? Any knowing if that's correct?)
bysort v1 v2 v4 v5 v6 v7 (date) : keep if _n == _N
--> Obviously keeps the wrong one, since Date is not -Date.
However, -Date is not an option - Stata writes: - invalid name
You could change your second answer to bysort v1 v2 v4 v5 v6 v7 (date) : keep if _n == 1 and that should give you what you're looking for.
Since in your data example there are duplicate dates (2 observations are May 7th 2015) you will get a random one of the observations with the minimum date.

Oracle Time Difference over a repeating set

I am sure this is an easy problem to solve however I cannot see how no matter which way round I try and look at it.
I am using an Oracle database and I am trying to report on processing times. The system is running a process that picks up incoming files and processes then into the database. As it performs this task it logs when each of the different sections happen and from which source. I've filtered the data to just give me start and end times as that is all I am currently interested in. Example Data
5 15/MAY/15 00:37:01 Started
5 15/MAY/15 00:50:45 Finished
5 16/MAY/15 02:07:41 Started
5 16/MAY/15 02:19:16 Finished
5 16/MAY/15 23:20:25 Started
5 16/MAY/15 23:28:53 Finished
5 17/MAY/15 23:16:36 Started
5 17/MAY/15 23:27:51 Finished
5 18/MAY/15 23:31:28 Started
5 18/MAY/15 23:47:41 Finished
5 19/MAY/15 23:44:12 Started
5 20/MAY/15 00:06:17 Finished
5 20/MAY/15 23:33:42 Started
5 20/MAY/15 23:58:16 Finished
What I am trying to get is the duration for each of the "sets" (Start time to End Time). When there is only a single occurence on a day its easy, however when looking at days such as the 16th where there are two sets in a single day, or where it starts on the 19th and ends on the 20th I am struggling. I know this would be possible within a programming language, however I am sure it must also be possible within Oracle as well.
The output I'd expect form the above is:
Source StartTime EndTime
5 15/MAY/15 00:37:01 15/MAY/15 00:50:45
5 16/MAY/15 02:07:41 16/MAY/15 02:19:16
5 16/MAY/15 23:20:25 16/MAY/15 23:28:53
5 17/MAY/15 23:16:36 17/MAY/15 23:27:51
5 18/MAY/15 23:31:28 18/MAY/15 23:47:41
5 19/MAY/15 23:44:12 20/MAY/15 00:06:17
5 20/MAY/15 23:33:42 20/MAY/15 23:58:16
Thanks,
As Patrick Bacon mentioned in a comment, you can use the lead and lag analytic functions to peek ahead and behind each row. If the row you are looking at is 'Started' then you need to peak at the next row (chronologically, for the same source) to get the matching 'Finished' row, using lead. Conversely, if the row you are looking at is 'Finished' then you need to peak at the previous row to get the matching 'Started' row, using lag:
select distinct source,
case when action = 'Started' then time
else lag(time) over (partition by source order by time) end as starttime,
case when action = 'Finished' then time
else lead(time) over (partition by source order by time) end as endtime
from t
order by source, starttime;
SOURCE STARTTIME ENDTIME
---------- ------------------- -------------------
5 2015-05-15 00:37:01 2015-05-15 00:50:45
5 2015-05-16 02:07:41 2015-05-16 02:19:16
5 2015-05-16 23:20:25 2015-05-16 23:28:53
5 2015-05-17 23:16:36 2015-05-17 23:27:51
5 2015-05-18 23:31:28 2015-05-18 23:47:41
5 2015-05-19 23:44:12 2015-05-20 00:06:17
5 2015-05-20 23:33:42 2015-05-20 23:58:16
Because you're looking ahead and behind, you end up with duplicate pairs of data; here I've used distinct to squash those duplicates, but you could also use this as a subquery and filter the results.
SQL Fiddle with a CTE to provide your sample data.
Solution with function lead():
select tsource, starttime, endtime
from (
select tsource, ttime starttime, status,
lead(ttime) over (partition by tsource order by ttime) endtime
from test)
where status = 'Started'
SQLFiddle
Edit:
If there may happen that you have two rows with status Started consecutively without Finished between
then you need some protection against such situation, for instance this displays null:
select tsource, starttime, endtime
from (
select tsource, ttime starttime, status,
case when lead(status) over (partition by tsource order by ttime) = 'Finished'
then lead(ttime) over (partition by tsource order by ttime)
else null
end endtime
from test)
where status = 'Started'

Determining if a bi-weekly schedule matches a given date

I'm creating multiple Schedule objects, which have a started_at datetime which begins on Mondays.
I have Location objects which have a visit_frequency. Some of these are set to :bi_weekly, in which case I only need to see them every other week.
However, things don't always go according to plan and sometimes Locations are visited more or less often than the need to.
Right now I'm doing
Location.all.each do |location|
...
elsif location.frequency.rate == 'biweekly'
if (#schedule.start_date - location.last_visit_date) > 7
schedule_for_week location
end
The problem is, if I make a Schedule more than 7 days from now, the Location's last_visit_date will ALWAYS be > 7 days. I need to calculate if it falls into a bi-weekly rate.
Example:
Location 1 visit_frequency set to :bi_weekly
Location 1 is visited on Week 1
Week 2 Schedule Generated -- Location 1 is left out because it is within 7 days
Week 3 Schedule Generated -- Location 1 is included because it is within 7 days
Week 4 Schedule Generated -- Location 1 is included because it is within 7 days
The last line should not have happened. Location 1 should not be included because it was visited on Week 1 and scheduled for Week 3.
How can I calculate if a week is within a bi-weekly frequency succintly? I"m guessing I need to use beginning_of_week?
As I understand your question, I believe this would do it:
require 'date'
def schedule?(sched_start_date, last_visit_date)
(sched_start_date - last_visit_date) % 14 > 7
end
sched_start_date = Date.parse("2014-12-29")
#=> #<Date: 2014-12-29 ((2457021j,0s,0n),+0s,2299161j)> a Monday
schedule?(sched_start_date, Date.parse("2014-12-04")) #=> true
schedule?(sched_start_date, Date.parse("2014-12-14")) #=> false
schedule?(sched_start_date, Date.parse("2014-12-20")) #=> true
schedule?(sched_start_date, Date.parse("2014-12-23")) #=> false

Combining data from multiple tuples in one bag in Pig

I am trying to parse a bunch of log data using pig. Unfortunately the data for one command is spread across multiple lines (an audit log). I know that there is an id that correlates all of the log messages and that there are different types that contain pieces of the whole, but I am unsure how to gather them all into one message.
I split the message based on type and then joined based on the id, but since there is a one to many relationship between SYSCALL and PATH, this doesn't gather all of the information on one line. I can group by id, but then I want to be able to pull out the same field (name) from every PATH tuple but I don't know of anyway to do that.
Should I just write my own UDF? A FOREACH doesn't keep track of state such that I can concatenate the name field from each tuple.
Edited to add example:
{"message":"Jan 6 15:30:11 r01sv06 auditd: node=r01sv06 type=SYSCALL
msg=audit(1389047402.069:4455727): arch=c000003e syscall=59
success=yes exit=0 a0=7fff8ef30600 a1=7fff8ef30630 a2=270f950
a3=fffffffffffffff0 items=2 ppid=1493 pid=1685 auid=0 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=8917
comm=\"ip\" exe=\"/sbin/ip\"
key=\"command\"","#timestamp":"2014-01-06T22:30:14.642Z","#version":"1","type":"audit","host":"r01sv09a","path":"/data/logs/audit.log","syslog_timestamp":"Jan
6 15:30:11","syslog_program":"auditd","received_at":"2014-01-06
22:30:14 UTC", "received_from":"r01sv06" ,"syslog_severity_code":5
,"syslog_facility_code":1
,"syslog_facility":"user-level","syslog_severity":"notice","#source_host":"r01sv06"}
{"message":"Jan 6 15:30:11 r01sv06 auditd: node=r01sv06 type=EXECVE
msg=audit(1389047402.069:4455727): argc=4 a0=\"/sbin/ip\" a1=\"link\"
a2=\"show\"
a3=\"lo\"","#timestamp":"2014-01-06T22:30:14.643Z","#version":"1","type":"audit","host":"r01sv09a","path":"/data/logs/audit.log","syslog_timestamp":"Jan
6 15:30:11","syslog_program":"auditd","received_at":"2014-01-06
22:30:14 UTC", "received_from":"r01sv06", "syslog_severity_code":5,
"syslog_facility_code":1,"syslog_facility":"user-level",
"syslog_severity":"notice","#source_host":"r01sv06"}
{"message":"Jan 6 15:30:11 r01sv06 auditd: node=r01sv06 type=CWD
msg=audit(1389047402.069:4455727):
cwd=\"/root\"","#timestamp":"2014-01-06T22:30:14.644Z","#version":"1","type":"audit","host":"r01sv09a","path":"/data/logs/audit.log","syslog_timestamp":"Jan
6 15:30:11","syslog_program":"auditd","received_at":"2014-01-06
22:30:14 UTC","received_from":"r01sv06", "syslog_severity_code":5,
"syslog_facility_code":1, "syslog_facility":"user-level",
"syslog_severity":"notice", "#source_host":"r01sv06"}
{"message":"Jan 6 15:30:11 r01sv06 auditd: node=r01sv06 type=PATH
msg=audit(1389047402.069:4455727): item=0 name=\"/sbin/ip\"
inode=1703996 dev=08:02 mode=0100755 ouid=0 ogid=0
rdev=00:00","#timestamp":"2014-01-06T22:30:14.645Z","#version":"1","type":"audit","host":"r01sv09a","path":"/data/logs/audit.log","syslog_timestamp":"Jan
6 15:30:11","syslog_program":"auditd","received_at":"2014-01-06
22:30:14 UTC", "received_from":"r01sv06", "syslog_severity_code":5,
"syslog_facility_code":1,"syslog_facility":"user-level",
"syslog_severity":"notice", "#source_host":"r01sv06",}

Resources