Why does my total session (aggregated using EXTRACT MONTH) is less than total session if I broke down by the date? - session

I'm trying to generate my total session by month. I've tried using two different ways.
I'm using date field for the first column
I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month
I have tried using below code for the 1st one:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the 2nd one I tried using this code:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the result, I got the output as per below:
20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019
May: 2,905
So, there's 114 session discrepancy and I'd like to know why.
Thank you in advance.

For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session
Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day

The following query should show you which sessions_ids occur on both dates.
select session_id, count(distinct date) as num_dates
from table
where date >= '2019-05-20' AND date <= '2019-05-21'
group by 1
having num_dates > 1
This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.

Related

Oracle historical reporting - what was the row at a point in time

I have been asked to run a report of the state of our assets at a fixed point in time (1st Jan 2019).
The way this database has been written is that the asset has its own table with current info and then for various bits of data there is also the history of that info changing, each bit is stored its own "history" table with a start and end date. So for example one of the bits of info is the asset class - the asset table will have a field that contains the current asset class and then if that class has changed in the past then there will be rows in the asset_history table with start and end dates. Something like...
AssetID AssetClass StartDate EndDate
------- ---------- --------- -------
1 1 12-12-87 23-04-90
1 5 23-04-90 01-02-00
1 2 01-02-00 27-01-19
1 1 27-01-19
So this asset has changed classes a few times but I need to write something to be able to check, for each asset, and work out which class was the active class as at 1st Jan. For this example that would be the second-from last row as it changed to class 2 back in 2000 and then after 1st Jan 2019 it became a class 1.
And to make it more complicated I will need this for several bits of data but if I can get the notion of how to do it right then I'm happy to translate this to the other data.
Any pointers would be much appreciated!
I usually write this like
select assetClass
from history_table h
where :point_in_time >= startDate
and (:point_in_time < endDate
or endDate is null)
(assuming that those columns are actually date type and not varchar2)
between always seems tempting, but it includes both endpoints, so you'd have to write something like :point_in_time between startDate and (endDate - interval '1' second)
EDIT: If you try to run this query with a point_in_time before your first start_date, you won't get any results. That seems normal to me, but maybe instead you want to pick "the first result which hasn't expired yet", like this:
select assetClass
from history_table h
where (:point_in_time < endDate
or endDate is null)
order by startDate asc
fetch first 1 row only

Oracle - Show 0 if no data for the month

i'm trying to show some averages over the past 12 months but there is no data for June/July so i want the titles for the months to display but just 0's in the 3 columns
currently it's only showing August - May which is 10 rows so it's throwing off formulas and charts etc.
select to_char(Months.Period,'YYYY/MM') As Period, coalesce(avg(ec.hours_reset),0) as AvgOfHOURSReset, coalesce(AVG(ec.cycles_reset),0) as AvgofCycles_Reset, Coalesce(AVG(ec.days_reset),0) as AvgofDAYS_Reset
from (select distinct reset_date as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months
left outer join engineering_compliance ec on ec.reset_date = months.Period
WHERE EC.EO = 'AT CHECK'
group by to_char(Months.Period,'YYYY/MM')
order by to_char(Months.Period,'YYYY/MM')
;
(select distinct to_char(reset_date,'YYYY/MM') as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months;
That query is pretty good, it's not far from working.
You would need to replace the Months table part. You want exactly one row per month, regardless of whether there's any data in the ec table.
You could maybe synthesize some data without going to any actual table in your own schema.
For example:
SELECT
extract(month from add_months(sysdate,level-1)) Row_Month,
extract(year from add_months(sysdate,level-1)) Row_Year,
to_char(add_months(sysdate,level-1),'YYYY/MM') Formatted_Date,
trunc(add_months(sysdate,level-1),'mon') Join_Date
FROM dual
CONNECT BY level <= 12;
gives:
ROW_MONTH,ROW_YEAR,FORMATTED_DATE,JOIN_DATE
6,2016,'2016/06',1/06/2016
7,2016,'2016/07',1/07/2016
8,2016,'2016/08',1/08/2016
9,2016,'2016/09',1/09/2016
10,2016,'2016/10',1/10/2016
11,2016,'2016/11',1/11/2016
12,2016,'2016/12',1/12/2016
1,2017,'2017/01',1/01/2017
2,2017,'2017/02',1/02/2017
3,2017,'2017/03',1/03/2017
4,2017,'2017/04',1/04/2017
5,2017,'2017/05',1/05/2017
Option 1: Write that subselect inline into your query, replacing sysdate with the start month and the figure 12 on the last line can be altered for the number of months you want in the series.
Option 2 (can be reused more conveniently in a variety of situations and queries): Write a view with a long series of months (for example, Jan 1970 to Dec 2199) using my SQL above. You can then join to that view on join_date with whatever start and end months you want. It will give you one row per month and you can pick up the formatted date from its column.

Oracle SQL To compare 1 or 2 or more dates to be within a given period

I have a scenario where I need to compare 2 or more dates for given period.
I'm able to succeed when comparing 1 date to a period using between function. But challenge is when I have 2 dates to compare in parallel, getting single row sub query error
select A
from ORDER
where Date1 between sysdate and (sysdate-10)
Above query works fine for single date, please help to get a solution when I have Date 1 and Date 2 and need to compare against the same period (sysdate and (sysdate-10)) and I may have more than 2 dates as well.
Thanks
Shankar
Not having a proper description of your tables or the data they contain, it is difficult to know what you want.
Perhaps something like:
SELECT A
FROM ORDER
GROUP BY A
HAVING COUNT( CASE WHEN datecolumn BETWEEN SYSDATE - 10 AND SYSDATE THEN 1 ELSE NULL END ) > 0

Sysdate minus 2 different days

I am running this query currently for one scenario but I have two scenarios: If SYSDATE = Monday, then run "SYSDATE - 2", otherwise run "SYSDATE - 1". I'm connecting to the database via an OLE connection from Excel so I'm not sure I can use a stored procedure. Is there a way to write the query to accomplish both scenarios? Thanks for all help.
SELECT
DISTINCT VERSION_NAME VERSION, MIN(RECONCILE_START_DT) DATES
FROM
SDE.GDBM_RECONCILE_HISTORY
WHERE
RECONCILE_RESULT = 'Conflicts'
AND
RECONCILE_START_DT > SYSDATE -1
GROUP BY VERSION_NAME
ORDER BY 2 ASC NULLS LAST
You may use a CASE statement in your WHERE condition to subtract either 2 for Mondays or 1 for the rest of the week. TO_CHAR(DATE, 'D') delivers the day of week beginning with Sundays = 1. Therefore Mondays are 2.
Try this:
SELECT
VERSION_NAME AS VERSION,
MIN(RECONCILE_START_DT) AS DATES
FROM
SDE.GDBM_RECONCILE_HISTORY
WHERE
RECONCILE_RESULT = 'Conflicts'
AND
RECONCILE_START_DT > SYSDATE -
CASE TO_CHAR(SYSDATE, 'D')
WHEN '2' THEN 2
ELSE 1 END
GROUP BY VERSION_NAME
ORDER BY 2 ASC NULLS LAST
Also you don't need the DISTINCT keyword as you're already use GROUP BY.

Oracle - Counting timestamps where difference between timestamps greater than 1 hour

I have a worklog table that contains the following fields:
worklog_id,
agent_name,
ticket_number,
timestamp,
worklog_notes.
I would like to be able to count the number of worklog entries made where if the agent_name, ticket_number and timestamp (date) are the same the worklog entry is only counted if the time between the two entries is greater than 1 hour.
Example: John Smith make three worklog entries on ticket 12345. The first timestamp is "10/11/2012 9:11:44 AM", the second timestamp is "10/11/2012 9:36:16 AM" and the third timestamp is "10/11/2012 11:18:20 AM". In this example I would only want to give the agent credit for two worklog entries as the first two were less than an hour apart.
I've tried getting the logic to work using a "where" sub-query, but cannot get it working. Would anyone have any example they could provide? Thanks! :)
Does this get what you want? The first entry by a given agent on a ticket should always be counted, and entries after that should only be counted if at least an hour has elapsed since the prior entry.
select agent_name, ticket_number, count(*) from (
select agent_name, ticket_number, timestamp,
lag(timestamp) over
(partition by agent_name, ticket_number order by timestamp) prev_timestamp
)
from worklog
where (prev_timestamp is null
or (timestamp - prev_timestamp) >= interval '1' hour
)
group by agent_name, ticket_number
I'm not sure this is exactly what you want -- if an agent keeps adding entries within an hour of the prior entry, none of them will be counted except the first. So someone who adds a lot of updates gets penalized.
Maybe what you really want is to count the number of distinct hours in which an update was made:
select agent_name, ticket_number, count(distinct to_char(timestamp,'DD-MON-YYYY HH24')
from worklog
group by agent_name, ticket_number

Resources