alternate of intersect set operator in oracle - oracle

I have a table emp1 wherein I am interested in only the employees who have joined with salary less than 2000 and whose salary is greater than 2000 now. This is the case with only one person Ward as shown below. I prepared the answer with intersect but wanted to know if there is more efficient way of doing it .Please let me know that will be of great help to me
(select empno,deptno
from emp1
where sal<2000
group by empno,hiredate,deptno
)
intersect
(select empno,deptno
from emp1
where sal>2000
group by empno,hiredate,deptno
)
Thanks

First, here's how you can get the specific employees who satisfy your conditions (as modified in a comment): Earliest salary < 2000, current (most recent) salary > 2500. Note that in my sample data employee 1008 started at 1300 and had salary > 2500 at some point, but his current salary is < 2500 so he is not selected.
The query is as efficient as possible: it performs a standard aggregation and nothing else. The conditions are in the having clause. The first/last aggregate function, even though it is exceptionally useful, is ignored by a vast majority of programmers - for no good reason.
with
sal_hist (empno, sal_date, sal) as (
select 1003, date '2000-01-01', 2300 from dual union all
select 1003, date '2008-01-01', 2600 from dual union all
select 1008, date '2002-03-20', 1300 from dual union all
select 1008, date '2005-01-31', 2600 from dual union all
select 1008, date '2013-11-01', 2400 from dual union all
select 2025, date '2008-03-01', 1900 from dual union all
select 2025, date '2015-04-01', 2550 from dual
)
select empno
from sal_hist
group by empno
having min(sal) keep (dense_rank first order by sal_date) < 2000
and min(sal) keep (dense_rank last order by sal_date) > 2500
;
EMPNO
----------
2025
To get the count of such employees, wrap the above query within an outer query, with select count(*) as my_count from ( <above query> ).
For extra credit, try to understand why the following query also works. It's more compact (and possibly faster, even though not by much), but a bit harder to understand - and especially, to understand why I need min(empno) rather than simply empno or * within the count() call.
select count(min(empno)) as my_count
from sal_hist
group by empno
having min(sal) keep (dense_rank first order by sal_date) < 2000
and min(sal) keep (dense_rank last order by sal_date) > 2500
;

Related

Problem with MINUS and sub queries with ORDER BYs

select salary
from (
(select salary
from employees
where rownum<=10
order by salary desc)
minus
(select salary
from employees
where rownum<=4
order by salary desc)
);
You cannot use ORDER BY there.
Try this instead:
select salary from (
select salary, row_number() over ( order by salary desc ) rn
from employees )
where rn between 5 and 10;
On Oracle 12c or later, you can also do this:
select salary from employees
order by salary desc
offset 4 rows fetch next 6 rows only;
You've got several issues in what you've written. The immediate problem is that you'll get an error from having an order by in the first branch of your union, but just removing that won't help you much.
You're making a (fairly common) mistake with ordering and rownum; looking just at the first subquery you have:
select salary
from employees
where rownum<=10
order by salary desc
The rownum filter will be applied before the order-by, so what this will actually produce is 10 indeterminate rows from the table, which are then ordered. If I run that I get:
SALARY
----------
24000
13000
12000
10000
8300
6500
6000
4400
2600
2600
but you'll see different values, even from the same sample schema. If you look at the whole table you'll see higher values than those; and even running the second query will show something isn't as you expect - for me that gets:
SALARY
----------
13000
4400
2600
2600
which are not the first four rows from the previous query. (Again, you'll see different results, but hopefully the same effect; if not, look at the whole table ordered by salary.)
You need to order the whole table - in a subquery - and then filter:
select salary
from (
select salary
from employees
order by salary desc
)
where rownum<=10
which gives a much more sensible - and consistent - result. You can then minus the two queries:
select salary
from (
select salary
from employees
order by salary desc
)
where rownum<=10
minus
select salary
from (
select salary
from employees
order by salary desc
)
where rownum<=4
order by salary desc;
SALARY
----------
13500
13000
12000
11500
You may be expecting to see six values there, but there are three employees with a salary of 12000, and minus eliminates duplicates so that is only reported once. #Matthew's approach (or #Jeff's!) will give you all six, including duplicates, if that is what you want. It also stops you having to hit the table multiple times.
A further problem is with ties - if the 4th highest was the same as the 5th highest, what would you expect to happen? Using minus would exclude that value; #Matthew's approach would preserve it.
You need to define what you actually want to get - the 5th to 10th highest salary values? The salaries of the 5th to 10th highest-paid people (a subtle but important difference)? Do you really only want the numbers, or who those employees are - in which case how you deal with ties is even more important? Etc. Once you know what you actually need to find you can decide the best way to get that result.
It doesn't make sense to order rows in two sets that are subsequently operated upon because sets don't have order. If you need a solution that can execute on older versions and you want to return the bottom 6 ranked out of the top 10 ranked, then this will work. If you can use newer features, then you may want to because it's possible they'll require fewer machine instruction executions.
After making the obvious changes that escaped me in my haste...
select salary
from (
select rownum rn, salary
from (
select salary
from employees
order by salary desc
)
)
where rn between 5 and 10

How to convert this code from oracle to redshift?

I am trying to implement the same in redshift and i am finding it little difficult to do that. Since redshift is in top of postgresql engine, if any one can do it in postgresql it would be really helpfull. Basically the code gets the count for previous two month at column level. If there is no count for exact previous month then it gives 0.
This is my code:
with abc(dateval,cnt) as(
select 201908, 100 from dual union
select 201907, 200 from dual union
select 201906, 300 from dual union
select 201904, 600 from dual)
select dateval, cnt,
last_value(cnt) over (order by dateval
range between interval '1' month preceding
and interval '1' month preceding ) m1,
last_value(cnt) over (order by dateval
range between interval '2' month preceding
and interval '2' month preceding ) m2
from (select to_date(dateval, 'yyyymm') dateval, cnt from abc)
I get error in over by clause. I tried to give cast('1 month' as interval) but still its failing. Can someone please help me with this windows function.
expected output:
Regards
This is how I would do it. In Redshift there's no easy way to generate sequences, do I select row_number() from an arbitrary table to create a sequence:
with abc(dateval,cnt) as(
select 201908, 100 union
select 201907, 200 union
select 201906, 300 union
select 201904, 600),
cal(date) as (
select
add_months(
'20190101'::date,
row_number() over () - 1
) as date
from <an arbitrary table to generate a sequence of rows> limit 10
),
with_lag as (
select
dateval,
cnt,
lag(cnt, 1) over (order by date) as m1,
lag(cnt, 2) over (order by date) as m2
from abc right join cal on to_date(dateval, 'YYYYMM') = date
)
select * from with_lag
where dateval is not null
order by dateval

How to get records based on most occurring value?

I have following table and I want to get records where promotion is mostly occurring.
For example if I got two events
FREQ_VISITOR, value= 250
HIGH_SHOPPER, value= 320
Then Promo 1 and Promo 2 should come in result. Since these 2 promos exists mostly for every trigger and their given values.
Here's one option, based on what I understood:
SQL> with test (event_name, value, promotion) as
2 (select 'freq_visitor', 250, 'promo1' from dual union all
3 select 'high_shopper', 320, 'promo2' from dual union all
4 select 'freq_visitor', 250, 'promo3' from dual union all
5 select 'high_shopper', 320, 'promo1' from dual union all
6 select 'freq_visitor', 250, 'promo2' from dual
7 ),
8 cnt_promo as
9 (select promotion, count(*) cnt
10 from test
11 group by promotion
12 ),
13 most_promos as
14 (select max(cnt) max_cnt
15 from cnt_promo
16 )
17 select c.promotion
18 from cnt_promo c join most_promos m on c.cnt = m.max_cnt;
PROMOT
------
promo1
promo2
SQL>
This is a good candidate for analytic functions.
The below code is a bit longer than the self-join approach, and flows from the inside-out instead of a more traditional top-to-bottom direction. But this approach will likely be faster, since it only reads from the table once. And this approach is easier to debug than common table expressions, since you can highlight and run different inline views and watch the result set be built.
--Promotions with the highest counts.
select promotion
from
(
--RANK the promotion counts.
select promotion, promotion_count,
rank() over (order by promotion_count desc) promotion_rank
from
(
--Count of promos per event and value.
select promotion, count(*) promotion_count
from
(
--Test data
select 'freq_visitor' event_name, 250 value, 'promo1' promotion from dual union all
select 'high_shopper' event_name, 320 value, 'promo2' promotion from dual union all
select 'freq_visitor' event_name, 250 value, 'promo3' promotion from dual union all
select 'high_shopper' event_name, 320 value, 'promo1' promotion from dual union all
select 'freq_visitor' event_name, 250 value, 'promo2' promotion from dual
) test_data
group by promotion
) add_promo_count
) add_promo_rank
where promotion_rank = 1
order by promotion;

use RANK or DENSE_RANK along with aggregate function

I have a table with the following data:
SCORE ROW_ID NAME
0.4 1011 ABC
0.95 1011 DEF
0.4 501 GHI
0.95 501 XYZ
At any point of time, i only need single row of data with maximum score, if there has more than 1 records, take the one with minimum row_id.
Is it possible to achieve by using RANK or DENSE_RANK function? How about partition by?
MAX(score) keep(dense_rank first order by row_id)
You are looking for max score, one row, so use row_number():
select score, row_id, name
from (select t.*, row_number() over (order by score desc, row_id) rn from t)
where rn = 1
demo
You can use rank and dense_rank in your example, but they can return more than one row, for instance when you add row (0.95, 501, 'PQR') to your data.
keep dense_rank is typically used when searched value is other than search criteria, for instance if we look for salary of employee who works the longest:
max(salary) keep (dense_rank first order by sysdate - hiredate desc)
max in this case means that if there are two or more employees who works longest, but exactly the same number of days than we take highest salary.
max(salary)
keep (dense_rank first order by sysdate - hiredate desc)
over (partition by deptno)
This is the same as above, but salary of longest working employees is shown for each department separately. You can even use empty over() to show salary of longest working employee in separate column except other data like name, salary, hire_date.
You dont need to use dense_rank. This would help
SELECT * FROM (
SELECT
SCORE,
ROW_ID
NAME
FROM T
ORDER BY SCORE DESC, ROW_ID DESC
)
WHERE ROWNUM = 1;

select minimum date after maximum gap oracle

I have data for a member like below
EFF_DT-Term_dt
1/1/13-7/31/14
1/1/15-3/31/15
5/1/15-5/31/15
6/1/15-12/31/15
1/1/16-12/31/16
Here there are 2 gaps - after 7/31/14 and 3/31/15. I want to select the row 5/1/15-5/31/15 as it is the minimum date after maximum gap. I tried using
select ( FIRST_VALUE(EFF_DT) OVER (PARTITION BY MemberID ORDER BY FLAG DESC) AS CUR_EFF_DT)
from
(
select EFF_DT,
CASE WHEN LAG(TERM_DT, 1) OVER (PARTITION BY MemberID ORDER BY TERM_DT) = EFF_DT - 1 THEN 0
ELSE sequence.nextval
END AS FLAG
from effective_dates_table).
This is giving correct result, but i don't want to use sequence Is there any other easiest way to do this?
Here's one way... compute the differences using the analytic LAG() function, then group by member_id and use the aggregate LAST() function.
NOTE: there may be more than one pair of rows with the same, greatest gap between term_dt and the following eff_dt. You must clarify which row should be selected if that happens. The solution below picks the earliest occurrence (if this happens). If you want the latest occurrence, change MIN to MAX. If you want something else, just say what the requirement is.
with
inputs ( member_id, eff_dt, term_dt ) as (
select 101, to_date('1/1/13', 'mm/dd/yy'), to_date('7/31/14' , 'mm/dd/yy') from dual union all
select 101, to_date('1/1/15', 'mm/dd/yy'), to_date('3/31/15' , 'mm/dd/yy') from dual union all
select 101, to_date('5/1/15', 'mm/dd/yy'), to_date('5/31/15' , 'mm/dd/yy') from dual union all
select 101, to_date('6/1/15', 'mm/dd/yy'), to_date('12/31/15', 'mm/dd/yy') from dual union all
select 101, to_date('1/1/16', 'mm/dd/yy'), to_date('12/31/16', 'mm/dd/yy') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- Use your actual table and column names in the SQL query below.
select member_id,
min(eff_dt) keep (dense_rank last order by diff nulls first) as eff_dt,
min(term_dt) keep (dense_rank last order by diff nulls first) as term_dt
from (
select member_id, eff_dt, term_dt,
eff_dt - lag(term_dt) over (partition by member_id order by eff_dt) as diff
from inputs
)
group by member_id
;
MEMBER_ID EFF_DT TERM_DT
--------- ------------------- -------------------
101 2015-01-01 00:00:00 2015-03-31 00:00:00

Resources