Oracle Moving Average without Current Row - oracle

Would like to calculate moving average in oracle for the last 30 records excluding current row.
select crd_nb, flng_month, curr_0380,
avg(curr_0380) over (partition by crd_nb order by flng_month ROWS 30 PRECEDING) mavg
from us_ordered
The above SQL calculates moving average for 30 records including current row.
Is there any way to calculate moving average excluding current row.?

select
crd_nb,
flng_month,
curr_0380,
avg(curr_0380) over (
partition by crd_nb
order by flng_month
ROWS between 30 PRECEDING and 1 preceding
) mavg
from us_ordered

#be_here_now's answer is clearly superior. I'm leaving mine in place nonetheless, as it's still functional, if needlessly complex.
An answer would to be to get the sum and count individually, subtract out the current row then divide the two results. It's a little ugly, but it should work:
SELECT crd_nb,
flng_month,
curr_0380,
( SUM (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- curr_0380)
/ ( COUNT (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- 1)
mavg
FROM us_ordered
If curr_0380 can be null, you'd have to tweak this slightly so that the current row is removed only if it's not null.

Related

Oracle - determine and return the specfic hour of data with the highest sum of the values

I think I can do this in a more roundabout way using arrays, scripting, etc...BUT is it possible to sum up (aggregate) all the values for each "hour" of data in a database for a given field? Basically, I am trying to determine which hour in a day's worth of data had the highest sum...preferably without having to loop through 24 times for each day I want to look at. For example...let's say I have a table called "table", that contains columns for times and values as the follows:
Time Value
00:00 1
00:15 1
00:30 2
00:45 2
01:00 1
01:15 1
01:30 1
01:45 1
If I summed up by hand, I would get the following
Sum for 00 Hour = 6
Sum for 01 Hour = 4
So, in this example 00 Hour would be my "largest sum" hour. I'd like to end up returning simply which hour had the highest sum, and what that value was...the other hours don't matter in this case.
Can this be done all in a single ORACLE query, or does it need to be done outside the query with some scripting and working with the times and values separately? If not a single, maybe even just grab the sum for each hour, and I can run multiple queries - one for each hour? Then push each hour to an array, and just use the max of that array? I know there is a SUM() function in oracle, but how to tell it to "sum all the hours and just return the hour with the highest sum" escapes me. Hope all this makes sense. lol
Thanks for any advice to make this easier. :-)
The following query should do what you are looking for:
SELECT SUBSTR(time, 1, 2) AS HOUR,
SUM(amount) AS TOTAL_AMOUNT
FROM test_data
GROUP BY SUBSTR(time, 1, 2)
ORDER BY TOTAL_AMOUNT DESC
FETCH FIRST ROW WITH TIES;
The query uses the SUM function but grouping by the hour part of your time column. Then it orders the results by the summed amounts descending, only returning the maximum value.
Here is a DBFiddle showing the query in use (LINK)

How to calculate longest period between two specific dates in SQL?

I have problem with the task which looks like that I have a table Warehouse containing a list of items that a company has on stock. This
table contains the columns ItemID, ItemTypeID, InTime and OutTime, where InTime (OutTime)
specifies the point in time where a respective item has entered (left) the warehouse. I have to calculate the longest period that the company has gone without an item entering or leaving the warehouse. I am trying to resolve it this way:
select MAX(OutTime-InTime) from Warehouse where OutTime is not null
Is my understanding correct? Because I believe that it is not ;)
You want the greatest gap between any two consecutive actions (item entering or leaving the warehouse). One method is to unpivot the in and out times to rows, then use lag() to get the date of the "previous" action. The final step is aggregation:
select max(x_time - lag_x_time) max_time_diff
from warehouse w
cross apply (
select x_time, lag(x.x_time) over(order by x.x_time) lag_x_time
from (
select w.in_time as x_time from dual
union all select w.out_time from dual
) x
) x
You can directly perform date calculation in oracle.
The result is calculated in days.
If you want to do it in hours, multiply the result by 24.
To calculate the duration in [day], and check all the information in the table:
SELECT round((OutTime - InTime)) as periodDay, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER BY periodDay DESC
To calculate the duration in [hour]:
SELECT round((OutTime - InTime)*24) AS periodHour, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER periodHour DESC
round() is used to remove the digits.
Select only the record with maximum period.
SELECT *
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
Select only the record with maximum period, with the period indicated.
SELECT (OutTime - InTime) AS period, Warehouse.*
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
When finding the longest period, the condition where OutTime is null is not needed.
SQL Server has DateDiff, Oracle you can just take one date away from the other.
The code looks ok. Oracle has a live SQL tool where you can test out queries in your browser that should help you.
https://livesql.oracle.com/

Impala : Running sum of 1 hour

I want to count records of each ID with in 1 Hour. I tried out some IMPALA queries but without any luck.
I have input data as follows:
And expected output would be:
I tried :
select
concat(month,'/',day,'/',year,' ',hour,':',minute) time, id,
count(1) over(partition by id order by concat(month,'/',day,'/',year,' ',hour,':',minute) range between '1 hour' PRECEDING AND CURRENT ROW) request
from rt_request
where
concat(year,month,day,hour) >= '2019020318'
group by id, concat(month,'/',day,'/',year,' ',hour,':',minute)
But I got exception.
RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
Any suggestion/help would be appreciated.
Thank you in advance!
I think you are looking for counts for the same hour across days for a given id. You can simply use row_number to do this.
select time,id,row_number() over(partition by id,hour order by concat(month,'/',day,'/',year,' ',hour,':',minute)) as total
from tbl

Add indicator to top and bottom 10%

I'm trying to capture the average of FIRST_CONTACT_CAL_DAYS but what I would like to do is create an indicator for the top and bottom 10% of values so I can exclude those (outliers) from my average calculation.
Not sure how to go about do this, any thoughts?
SELECT DISTINCT
TO_CHAR(A.FIRST_ASSGN_DT,'DAY') AS DAY_NUMBER,
A.FIRST_ASSGN_DT,
A.FIRST_CONTACT_DT,
TO_CHAR(A.FIRST_CONTACT_DT,'DAY') AS DAY_NUMBER2,
A.FIRST_CONTACT_DT AS FIRST_PHONE_CONTACT,
A.ID,
ABS(TO_DATE(A.FIRST_CONTACT_DT, 'DD/MM/YYYY') - TO_DATE(A.FIRST_ASSGN_DT, 'DD/MM/YYYY')) AS FIRST_CONTACT_CAL_DAYS,
FROM HIST A
LEFT JOIN CONTACTS D ON A.ID = D.ID
WHERE 1=1
You may be looking for something like this. Please adapt to your situation.
I assume you may have more than one "group" or "partition" and you need to compute the average for each group separately, after throwing out the outliers in each partition. (An alternative, which can be easily accommodated by adapting the query below, is to throw out the outliers at the global level, and only then to group and take the average for each group.)
If you don't have any groups, and everything is one big pile of data, it's even easier - you don't need GROUP BY and PARTITION BY.
Then: the function NTILE assigns a bucket number, in this example between 1 and 10, to each row, based on where they fall (first decile, i.e. first 10%, next decile, ... all the way to the last decile). I do this in a subquery. Then in the outer query just filter out the first and last bucket before you group by and you compute the average.
For testing purposes I create three groups with 10,000 random numbers each in a WITH clause - no need to spend any time on that portion of the code, since it is not part of the solution (the SQL code to solve your problem) - it's just a dirty trick to create test data on the fly.
with
inputs ( grp, val ) as (
select ceil(level/10000), dbms_random.value(0, 150)
from dual
connect by level <= 30000
)
select grp, avg(val) as avg_val
from (
select grp, val, ntile(10) over (partition by grp order by val) as bkt
from inputs
)
where bkt between 2 and 9
group by grp
;
GRP AVG_VAL
--- -----------------------
1 75.021614866547043734458
2 74.286117923344418598032
3 75.437412573353736953791

Trying to figure out top 5 land areas of the 50 states in the U.S

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000
Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle
Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.
Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea
There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

Resources