Oracle historical reporting - what was the row at a point in time - oracle

I have been asked to run a report of the state of our assets at a fixed point in time (1st Jan 2019).
The way this database has been written is that the asset has its own table with current info and then for various bits of data there is also the history of that info changing, each bit is stored its own "history" table with a start and end date. So for example one of the bits of info is the asset class - the asset table will have a field that contains the current asset class and then if that class has changed in the past then there will be rows in the asset_history table with start and end dates. Something like...
AssetID AssetClass StartDate EndDate
------- ---------- --------- -------
1 1 12-12-87 23-04-90
1 5 23-04-90 01-02-00
1 2 01-02-00 27-01-19
1 1 27-01-19
So this asset has changed classes a few times but I need to write something to be able to check, for each asset, and work out which class was the active class as at 1st Jan. For this example that would be the second-from last row as it changed to class 2 back in 2000 and then after 1st Jan 2019 it became a class 1.
And to make it more complicated I will need this for several bits of data but if I can get the notion of how to do it right then I'm happy to translate this to the other data.
Any pointers would be much appreciated!

I usually write this like
select assetClass
from history_table h
where :point_in_time >= startDate
and (:point_in_time < endDate
or endDate is null)
(assuming that those columns are actually date type and not varchar2)
between always seems tempting, but it includes both endpoints, so you'd have to write something like :point_in_time between startDate and (endDate - interval '1' second)
EDIT: If you try to run this query with a point_in_time before your first start_date, you won't get any results. That seems normal to me, but maybe instead you want to pick "the first result which hasn't expired yet", like this:
select assetClass
from history_table h
where (:point_in_time < endDate
or endDate is null)
order by startDate asc
fetch first 1 row only

Related

Oracle SQL exlusion constraint

I am looking for an Oracle constraint that will exlude all rows in a table that have already been updated for a given year. For example, a given Id field of "123456789_2019_blah" includes the year 2019 as part of the Id. Each day a query will check the table to see if a given year is missing in the Id, such as the year 2020 in "123456789_2020_blah". If 2020 does not exist, a second row will be inserted with the value "123456789_2020_blah"
-----------------------
Id
-----------------------
1: 123456789_2019_blah
2: 123456789_2020_blah
Going forward, any other time the query runs it should never return rows for Id "123456789_2019_blah" or "123456789_2020_blah". The following year will repeat with 2021, etc. (Assume that field Id is the only field available for the constraint)
I tried using REGEXP_INSTR to check its length but this still returns the 2019 row because there will always be one true result. I also tried group by having with the same result.
where (REGEXP_INSTR(Id,'*_2019_*') > 0 and REGEXP_INSTR(Id,'*_2020_*') = 0)
The solution was to create a function that returns 0 if the latest year does not exist.
Return 0 if the latest year does not exist
REGEXP_INSTR(name,'*_' || i_year || '_*') > 0

Why does my total session (aggregated using EXTRACT MONTH) is less than total session if I broke down by the date?

I'm trying to generate my total session by month. I've tried using two different ways.
I'm using date field for the first column
I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month
I have tried using below code for the 1st one:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the 2nd one I tried using this code:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the result, I got the output as per below:
20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019
May: 2,905
So, there's 114 session discrepancy and I'd like to know why.
Thank you in advance.
For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session
Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day
The following query should show you which sessions_ids occur on both dates.
select session_id, count(distinct date) as num_dates
from table
where date >= '2019-05-20' AND date <= '2019-05-21'
group by 1
having num_dates > 1
This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.

How to iterate over a hive table row by row and calculate metric when a specific condition is met?

I have a requirement as below:
I am trying to convert a MS Access table macro loop to work for a hive table. The table called trip_details contains details about a specific trip taken by a truck. The truck can stop at multiple locations and the type of stop is indicated by a flag called type_of_trip. This column contains values like arrival, departure, loading etc.
The ultimate aim is to calculate the dwell time of each truck (how much time does the truck take before beginning for another trip). To calculate this we have to iterate the table row by row and check for trip type.
A typical example look like this:
Do while end of file:
Store the first row in a variable.
Move to the second row.
If the type_of_trip = Arrival:
Move to the third row
If the type_of_trip = End Trip:
Store the third row
Take the difference of timestamps to calculate dwell time
Append the row into the output table
End
What is the best approach to tackle this problem in hive?
I tried checking if hive contains a keyword for loop but could not find one. I was thinking of doing this using a shell script but need guidance on how to approach this.
I cannot disclose the entire data but feel free to shoot any questions in the comments section.
Input
Trip ID type_of_trip timestamp location
1 Departure 28/5/2019 15:00 Warehouse
1 Arrival 28/5/2019 16:00 Store
1 Live Unload 28/5/2019 16:30 Store
1 End Trip 28/5/2019 17:00 Store
Expected Output
Trip ID Origin_location Destination_location Dwell_time
1 Warehouse Store 2 hours
You do not need loop for this, use the power of SQL query.
Convert your timestamps to seconds (using your format specified 'dd/MM/yyyy HH:mm'), calculate min and max per trip_id, taking into account type, subtract seconds, convert seconds difference to 'HH:mm' format or any other format you prefer:
with trip_details as (--use your table instead of this subquery
select stack (4,
1,'Departure' ,'28/5/2019 15:00','Warehouse',
1,'Arrival' ,'28/5/2019 16:00','Store',
1,'Live Unload' ,'28/5/2019 16:30','Store',
1,'End Trip' ,'28/5/2019 17:00','Store'
) as (trip_id, type_of_trip, `timestamp`, location)
)
select trip_id, origin_location, destination_location,
from_unixtime(destination_time-origin_time,'HH:mm') dwell_time
from
(
select trip_id,
min(case when type_of_trip='Departure' then unix_timestamp(`timestamp`,'dd/MM/yyyy HH:mm') end) origin_time,
max(case when type_of_trip='End Trip' then unix_timestamp(`timestamp`,'dd/MM/yyyy HH:mm') end) destination_time,
max(case when type_of_trip='Departure' then location end) origin_location,
max(case when type_of_trip='End Trip' then location end) destination_location
from trip_details
group by trip_id
)s;
Result:
trip_id origin_location destination_location dwell_time
1 Warehouse Store 02:00

Oracle - Show 0 if no data for the month

i'm trying to show some averages over the past 12 months but there is no data for June/July so i want the titles for the months to display but just 0's in the 3 columns
currently it's only showing August - May which is 10 rows so it's throwing off formulas and charts etc.
select to_char(Months.Period,'YYYY/MM') As Period, coalesce(avg(ec.hours_reset),0) as AvgOfHOURSReset, coalesce(AVG(ec.cycles_reset),0) as AvgofCycles_Reset, Coalesce(AVG(ec.days_reset),0) as AvgofDAYS_Reset
from (select distinct reset_date as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months
left outer join engineering_compliance ec on ec.reset_date = months.Period
WHERE EC.EO = 'AT CHECK'
group by to_char(Months.Period,'YYYY/MM')
order by to_char(Months.Period,'YYYY/MM')
;
(select distinct to_char(reset_date,'YYYY/MM') as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months;
That query is pretty good, it's not far from working.
You would need to replace the Months table part. You want exactly one row per month, regardless of whether there's any data in the ec table.
You could maybe synthesize some data without going to any actual table in your own schema.
For example:
SELECT
extract(month from add_months(sysdate,level-1)) Row_Month,
extract(year from add_months(sysdate,level-1)) Row_Year,
to_char(add_months(sysdate,level-1),'YYYY/MM') Formatted_Date,
trunc(add_months(sysdate,level-1),'mon') Join_Date
FROM dual
CONNECT BY level <= 12;
gives:
ROW_MONTH,ROW_YEAR,FORMATTED_DATE,JOIN_DATE
6,2016,'2016/06',1/06/2016
7,2016,'2016/07',1/07/2016
8,2016,'2016/08',1/08/2016
9,2016,'2016/09',1/09/2016
10,2016,'2016/10',1/10/2016
11,2016,'2016/11',1/11/2016
12,2016,'2016/12',1/12/2016
1,2017,'2017/01',1/01/2017
2,2017,'2017/02',1/02/2017
3,2017,'2017/03',1/03/2017
4,2017,'2017/04',1/04/2017
5,2017,'2017/05',1/05/2017
Option 1: Write that subselect inline into your query, replacing sysdate with the start month and the figure 12 on the last line can be altered for the number of months you want in the series.
Option 2 (can be reused more conveniently in a variety of situations and queries): Write a view with a long series of months (for example, Jan 1970 to Dec 2199) using my SQL above. You can then join to that view on join_date with whatever start and end months you want. It will give you one row per month and you can pick up the formatted date from its column.

Oracle SQL To compare 1 or 2 or more dates to be within a given period

I have a scenario where I need to compare 2 or more dates for given period.
I'm able to succeed when comparing 1 date to a period using between function. But challenge is when I have 2 dates to compare in parallel, getting single row sub query error
select A
from ORDER
where Date1 between sysdate and (sysdate-10)
Above query works fine for single date, please help to get a solution when I have Date 1 and Date 2 and need to compare against the same period (sysdate and (sysdate-10)) and I may have more than 2 dates as well.
Thanks
Shankar
Not having a proper description of your tables or the data they contain, it is difficult to know what you want.
Perhaps something like:
SELECT A
FROM ORDER
GROUP BY A
HAVING COUNT( CASE WHEN datecolumn BETWEEN SYSDATE - 10 AND SYSDATE THEN 1 ELSE NULL END ) > 0

Resources