MDX query count Login occurences over time interval - time

Im puzzle as to how to build my fact and dimensions to procude the following results:
I want to count the number of occurences of logged people for each time interval.
In this case every 30 mins. It would look like this
Example: Person1 login at 10:05:00 and logout at 12:10:00
Person2 login at 10:45:00 and logout at 11:25:00
Person3 login at 11:05:00 and logout at 14:01:00
TimeStart TimeEnd People logged
00:00:00 00:30:00 0
00:30:00 01:00:00 0
...
10:00:00 10:30:00 1
10:30:00 11:00:00 2
11:00:00 11:30:00 3
11:30:00 12:00:00 2
12:00:00 12:30:00 2
12:30:00 13:00:00 1
13:00:00 13:30:00 1
13:30:00 14:00:00 1
14:00:00 14:30:00 0
...
23:30:00 00:00:00 0
So i have a DimTime and DimDate table that contain hour, halfhour, quarterhour
and i have a FactTimestamp table that has the following:
DateLoginID that points to DimDate dateID
DateLogoutID that points to DimDate dateID
TimeLoginID that points to DimTime timeID
TimeLogoutID that points to DimTime timeID
I'd like to know what kind of cube design i would need to achieve that?
Ive done it in sql if that can help:
--Create tmp table for time interval
CREATE TABLE #tmp(
StartRange time(0),
EndRange time(0),
);
--Interval set to 30 minutes
DECLARE #Interval int = 30
-- Example with #Date = 2017-07-27: Set starttime at 2017-07-27 00:00:00
DECLARE #StartTime datetime = DATEADD(HOUR,0, #Date)
--Set endtime at 2017-07-27 23:59:59
DECLARE #EndTime datetime = DATEADD(SECOND,59,DATEADD(MINUTE,59,DATEADD(HOUR,23, #Date)))
--Populate tmp table with the time interval. from midnight to 23:59:59
;WITH cSequence AS
(
SELECT
#StartTime AS StartRange,
DATEADD(MINUTE, #Interval, #StartTime) AS EndRange
UNION ALL
SELECT
EndRange,
DATEADD(MINUTE, #Interval, EndRange)
FROM cSequence
WHERE DATEADD(MINUTE, #Interval, EndRange) <= #EndTime
)
INSERT INTO #tmp SELECT cast(StartRange as time(0)),cast(EndRange as time(0)) FROM cSequence OPTION (MAXRECURSION 0);
--Insert last record 23:30:00 to 23:59:59
INSERT INTO #tmp (StartRange, EndRange) values ('23:30:00','23:59:59');
SELECT tmp.StartRange as [Interval], COUNT(ts.TimeIn) as [Operators]
FROM #tmp tmp
JOIN Timestamp ts ON
--If timeIn is earlier than StartRange OR within the start/end range
(CAST(ts.TimeIn as time(0)) <= tmp.StartRange OR CAST(ts.TimeIn as time(0)) BETWEEN tmp.StartRange AND tmp.EndRange)
AND
--AND If timeOut is later than EndRange OR within the start/end range
CAST(ts.[TimeOut] as time(0)) >= tmp.EndRange OR CAST(ts.[TimeOut] as time(0)) BETWEEN tmp.StartRange AND tmp.EndRange
GROUP BY tmp.StartRange, tmp.EndRange
END
Really any kind of hint as to how to achieve it in mdx would be greatly appreciated.

Honestly, I wouldn't do it in MDX against that table structure. Even if you succeed in getting an MDX query that returns that value, and surely it can be done, it will most likely be tremendously complex and hard to maintain and debug, and will probably require multiple passes on the fact table to get the numbers, hurting performance.
I think this is a clear cut case for a periodic snapshot table. Pick your granularity, but even at 1 min snapshots you get 1440 points of data per day for each tuple of all other dimensions. If your login/logout table is large you may need to decrease this to keep its size manageable. In the end, you get a table with time_id, count_of_logins, and whatever other keys you need to other dimensions, and the query you need is just a filter on which time periods you want (give me all hours of the day, but filter on only minutes 00 and 30 of each hour) and the count of total number of logged in users is trivial.

Related

How Oracle internally deduces the differece between dates

select (current_date - TO_DATE('20210817124015','YYYYMMDDHH24MISS')) from dual;
Outputs:
0.1229282407407407407407407407407407407407
I want to know how oracle internally achieves this value.
ps: the current_date and the hardcoded date are same, only time is the difference.
CURRENT_DATE returns the current date and time in the user's session time zone.
TO_DATE('20210817124015','YYYYMMDDHH24MISS') returns the date 2021-08-17T12:40:15.
Note: A DATE data type always has year, month, day, hour, minute and second components. However, the user interface you are using may chose not to show all the components.
Subtracting one date from another returns the number of days between the two values.
0.1229282407407407407407407407407407407407 days is:
2.950277778 hours; or
177.016666667 minutes; or
10621 seconds; or
2 hours 57 minutes and 1 second.
So your current date was 2021-08-17T12:40:15 + 10621 seconds or 2021-08-17T15:37:16.
For example:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD"T"HH24:MI:SS';
ALTER SESSION SET TIME_ZONE = 'Asia/Samarkand';
SELECT CURRENT_DATE,
TO_DATE('20210817124015','YYYYMMDDHH24MISS') As other_date,
CURRENT_DATE - TO_DATE('20210817124015','YYYYMMDDHH24MISS') as difference,
(CURRENT_DATE - TO_DATE('20210817124015','YYYYMMDDHH24MISS')) DAY TO SECOND
as interval_difference
FROM DUAL;
Outputs:
CURRENT_DATE
OTHER_DATE
DIFFERENCE
INTERVAL_DIFFERENCE
2021-08-17T15:40:01
2021-08-17T12:40:15
.124837962962962962962962962962962962963
+00 02:59:46.000000
db<>fiddle here
Subtracting two dates returns a difference in days.
0.1229282407407407407407407407407407407407 days is
2.9502777777768 hours
177.016666666608 minutes
10621 seconds
Or, put another way, current_date is returning a date value that is 2 hours 57 minutes and 1 second after the hard-coded date. Since the hard-coded date has a time of 12:40:51, that means that current_date has a time of 15:37:52.

Oracle Archive and Purge Options

I am trying to figure out what are the best options to perform archive and purge given our situation.
We have roughly 50 million records in say Table A. We want to archive data into a target table and then purge those data in the source table. We would like to retain the data base on several criteria that overlap with each other. For example, we want to retain the data from the past 5 months in addition to keeping all the records with say Indicator='True'. Indicator='True' will likely return records beyond 5 months. This means I have to use OR condition in order to capture the data. Base on the conditions, we would need to retain 10 million records and archive/purge 40 million records. I would need to create a process that will run every 6 months to do this.
My question is, what are the most efficient options for me to get this done for both archiving and purging? Would a PROC/bulk delete/insert be my best option?
Partition seems to be out of the question since there are several conditions that overlap with each other.
Use composite partitioning, e.g. range (for your time dimension) and list (to distinct between the rows that should be kept long and limited time.
Example
The rows with KEEP_ID='N' should be eliminated after 5 months.
CREATE TABLE tab
( id NUMBER(38,0),
trans_dt DATE,
keep_id VARCHAR2(1)
)
PARTITION BY RANGE (trans_dt) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY LIST (keep_id)
SUBPARTITION TEMPLATE
( SUBPARTITION p_catalog VALUES ('Y'),
SUBPARTITION p_internet VALUES ('N')
)
(PARTITION p_init VALUES LESS THAN (TO_DATE('01-JAN-2019','dd-MON-yyyy'))
);
Populate with sample data for 6 months
insert into tab (id, trans_dt, keep_id)
select rownum, add_months(date'2019-08-01', trunc((rownum-1) / 2)), decode(mod(rownum,2),0,'Y','N')
from dual connect by level <= 12;
select * from tab
order by trans_dt, keep_id;
ID TRANS_DT KEEP_ID
---------- ------------------- -------
1 01.08.2019 00:00:00 N --- this subpartition should be deleted
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
5 01.10.2019 00:00:00 N
6 01.10.2019 00:00:00 Y
7 01.11.2019 00:00:00 N
8 01.11.2019 00:00:00 Y
9 01.12.2019 00:00:00 N
10 01.12.2019 00:00:00 Y
11 01.01.2020 00:00:00 N
12 01.01.2020 00:00:00 Y
Now use partition extended names to reference the subpartition that should be dropped.
Drop subpartition older than 5 months, but only for KEEP_ID = 'N'
alter table tab drop subpartition for (DATE'2019-08-01','N');
New data
ID TRANS_DT KEEP_ID
---------- ------------------- -------
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
.....

SYSDATE between date in two fields for 6 month period Q

I am trying to set between a date range for 6 months in the past for two different fields that will group the data by month. How do I set such a between clause to achieve this?
SELECT TO_CHAR(mopend, 'MM-yyyy') AS month, MOPSTATUS, COUNT(*) MTS_COMPLETE_CNT
FROM MOPUSER.MOPACTIVITY
WHERE UPPER(MOPSTATUS) = 'COMPLETE'
AND TO_CHAR(MOPACTIVITY.MOPSTART, 'yyyy-mm-dd hh24:mi') BETWEEN TO_CHAR(sysdate,'YYYY-MM-DD')||' 06:02:00' AND TO_CHAR(sysdate,'YYYY-MM-DD')||' 22:59:59'
OR TO_CHAR(MOPACTIVITY.MOPEND, 'yyyy-mm-dd hh24:mi') BETWEEN TO_CHAR(SYSDATE,'YYYY-MM-DD')||' 06:02:00' AND TO_CHAR(SYSDATE,'YYYY-MM-DD')||' 22:59:59'
GROUP BY TO_CHAR(mopend, 'MM-yyyy'), MOPSTATUS
ORDER BY TO_CHAR(mopend, 'MM-yyyy'), MOPSTATUS
I will answer one part of your question first, and then based on your comments, I can give you the full query.
The following query returns the end points between which you want to search. T1 is 06:02 in the morning on the date that is six months back in time. T2 is the last second of today.
select sysdate
,add_months( trunc(sysdate) + interval '06:02' hour to minute, -6) as t1
, trunc(sysdate) + interval '23:59:59' hour to second as t2
from dual;
The above query returns the following (using yyyy-mm-dd hh24:mi:ss):
sydate: 2014-04-11 13:54:28
t1: 2013-10-11 06:02:00
t2: 2014-04-11 23:59:59
If I interpret you correctly, this is the time period you want to search?
For the second part of the answer, I'd need to know the following:
Can any of MOPSTART or MOPEND be null? If so, how do you want to treat those rows?
Do you want to include the end points, i.e. rows where MOPSTART >= t1? Or only where MOTSTART > t1?
Same as (2) but for MOPEND
What month do you want to group by (see below)?
For example, row (a), do you want count it once for each month, or only in JAN (started) or only in JUN(ended)?
JAN FEB MAR APR MAY JUN
a: |-------------------|
b: |---|---|
c: |---|
d: |-----------|
e: |--------|

Oracle - Date substraction in where clause

I'm trying to figure out how to compare the result of a date substraction in a where clause.
Clients subscribed to a service and therefore are linked to a subscription that has an end date. I want to display the list of subscriptions that will come to an end within the next 2 weeks.
I did not designed the databse but noticed that the End_Date column type is a varchar and not a date.. I can't change that.
My problem is the following:
If I try to select the result of the substraction for example with this request:
SELECT(TO_DATE(s.end_date,'YYYY-MM-DD') - TRUNC(SYSDATE)) , s.name
from SUBSCRIPTION s WHERE s.id_acces = 15
This will work and give me the number of days between the end of the subscription and the current date.
BUT now, if I try to include the exact same request in a clause where for comparison:
SELECT s.name
from SUBSCRIPTION S
WHERE (TO_DATE(s.end_date,'YYYY-MM-DD') - TRUNC(SYSDATE)) between 0 and 16
I will get an error: "ORA-01839 : date not valid for month specified".
Any help would be appreciated..
Somewhere in the table you have your date formatted in a different way from YYYY-MM-DD. In your first query you check a certain row (or a set of rows, s.id_acces = 15), which is probably ok, but in the second you scan through all the table.
Try finding these rows with something like,
select end_date from subscription
where not regexp_like(end_date, '[0-9]{4}-[0-9]{2}-[0-9]{2}')
Check your DD value (ie: day of the month). This value must be between 1 and the number of days in the month.
January - 1 to 31
February - 1 to 28 (1 to 29, if a leap year)
March - 1 to 31
April - 1 to 30
May - 1 to 31
June - 1 to 30
July - 1 to 31
August - 1 to 31
September - 1 to 30
October - 1 to 31
November - 1 to 30
December - 1 to 31
" the End_Date column type is a varchar and not a date.. I can't
change that."
If you can't change the date you'll have to chang3 the data. You can identify the rogue values with this function:
create or replace check_date_format (p_str in varchar2) return varchar2
is
d date;
begin
d := to_date(p_str,'YYYY-MM-DD');
return 'VALID';
exception
when others then
return 'INVALID';
end;
You can use this function in a query:
select sid, end_date
from SUBSCRIPTION
where check_date_format(end_date) != 'VALID';
Your choices are:
fix the data so all the dates are in the same format
fix the data and apply a check constraint to enforce future validity
write a bespoke MY_TO_DATE() function which takes a string and applies lots of different date format masks to it in the hope of getting a successful conversion.

Grouping data by date ranges

I wonder how do I select a range of data depending on the date range?
I have these data in my payment table in format dd/mm/yyyy
Id Date Amount
1 4/1/2011 300
2 10/1/2011 200
3 27/1/2011 100
4 4/2/2011 300
5 22/2/2011 400
6 1/3/2011 500
7 1/1/2012 600
The closing date is on the 27 of every month. so I would like to group all the data from 27 till 26 of next month into a group.
Meaning to say I would like the output as this.
Group 1
1 4/1/2011 300
2 10/1/2011 200
Group 2
1 27/1/2011 100
2 4/2/2011 300
3 22/2/2011 400
Group 3
1 1/3/2011 500
Group 4
1 1/1/2012 600
It's not clear the context of your qestion. Are you querying a database?
If this is the case, you are asking about datetime but it seems you have a column in string format.
First of all, convert your data in datetime data type (or some equivalent, what db engine are you using?), and then use a grouping criteria like this:
GROUP BY datepart(month, dateadd(day, -26, [datefield])), DATEPART(year, dateadd(day, -26, [datefield]))
EDIT:
So, you are in Linq?
Different language, same logic:
.GroupBy(x => DateTime
.ParseExact(x.Date, "dd/mm/yyyy", CultureInfo.InvariantCulture) //Supposed your date field of string data type
.AddDays(-26)
.ToString("yyyyMM"));
If you are going to do this frequently, it would be worth investing in a table that assigns a unique identifier to each month and the start and end dates:
CREATE TABLE MonthEndings
(
MonthID INTEGER NOT NULL PRIMARY KEY,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
);
INSERT INTO MonthEndings VALUES(201101, '27/12/2010', '26/01/2011');
INSERT INTO MonthEndings VALUES(201102, '27/01/2011', '26/02/2011');
INSERT INTO MonthEndings VALUES(201103, '27/02/2011', '26/03/2011');
INSERT INTO MonthEndings VALUES(201112, '27/11/2011', '26/01/2012');
You can then group accurately using:
SELECT M.MonthID, P.Id, P.Date, P.Amount
FROM Payments AS P
JOIN MonthEndings AS M ON P.Date BETWEEN M.StartDate and M.EndDate
ORDER BY M.MonthID, P.Date;
Any group headings etc are best handled out of the DBMS - the SQL gets you the data in the correct sequence, and the software retrieving the data presents it to the user.
If you can't translate SQL to LINQ, that makes two of us. Sorry, I have never used LINQ, so I've no idea what is involved.
SELECT *, CASE WHEN datepart(day,date)<27 THEN datepart(month,date)
ELSE datepart(month,date) % 12 + 1 END as group_name
FROM payment

Resources