Oracle select random rows matching a join condition - oracle

My objective is simple, I have to create a temporary table with some random values from a employee table whenever the department is in some particular department (say 2). For the rest of departments I don't care the value, it can be NULL.
Currently I have the following :
create table test
as
select s.DEPTNAME,
cast (
(case when s.DEPTID in (2) then
(SELECT a.ENAME FROM
(SELECT b.ENAME, b.DEPTID FROM EMPLOYEE b
WHERE b.DEPTID IS NOT NULL
ORDER BY DBMS_RANDOM.VALUE) a
WHERE a.DEPTID = s.DEPTID AND ROWNUM = 1
)
END)
AS VARCHAR2(30)) "ENAME" from DEPARTMENT s;
But the main issue here is related to performance. For every department value in 2 we do a sort of EMPLOYEE table to get a single random ENAME.
Is there a better way to do this ? I know sample might work but I want to achieve more randomness.

First idea - join randomly numbered enames:
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn from department, c)
select d.deptname, e.ename from d left join e using (rn, deptid)
SQLFiddle demo
Second possible solution, which worked for me, is to create function returning random ename from table employee
and use it in your query, but it would be probably slower.
Edit - according to comment:
If, for some reason, the first part of your statement is "fixed", then you could use this syntax:
create table test as
select deptname, ename from (
with
e as (select ename, deptid, row_number() over (order by dbms_random.value) rn
from employee where deptid = 2),
c as (select count(1) cnt from e),
d as (select deptname, deptid, round(dbms_random.value(1, c.cnt)) rn
from department cross join c)
select d.deptname, e.ename from d left join e using (rn, deptid));

Related

Oracle How to make SELECT INSIDE A SELECT work?

Just wondering why the following select isn't working:
SELECT
A.FIELD1
, (SELECT PCN FROM (select B.PRIORITY, B.PCN
from
TABLE2 B
WHERE B.CUST= A.CUST
ORDER BY B.PRIORITY)
WHERE ROWNUM = 1) AS PCN
FROM TABLE1 A;
ERROR at line 2: ORA-00904: "A"."CUST": invalid identifier
Important to mention:
TABLE1 has as fields FIELD1, CUST.
TABLE2 has as fields PCN, PRIORITY, CUST.
Thanks in advance.
Your query shouldn't give you that error message, on when you remove the outer qiery this would happen
CREATE tABLE TABLE1 (FIELD1 int, CUST int)
INSERT INTO TABLE1 VALUES(1,1)
1 rows affected
CREATE TABLE TABLE2 (PCN int, PRIORITY int, CUST int)
INSERT INTO TABLE2 VALUES (1,1,1)
1 rows affected
SELECT
A.FIELD1
, (SELECT PCN FROM (select B.PRIORITY, B.PCN
from
TABLE2 B
WHERE B.CUST= A.CUST
ORDER BY B.PRIORITY)
WHERE ROWNUM = 1) AS PCN
FROM TABLE1 A;
FIELD1
PCN
1
1
fiddle
You can't nest inline selects (more than one level) without losing the ability of the inner nested selects being able to reference the parent block. So your query on TABLE2 cannot see the columns from TABLE1 because of this nesting.
Try this:
SELECT a.field1,
pcn.pcn
FROM table1 a,
(SELECT b.cust,
b.priority,
b.pcn,
ROW_NUMBER() OVER (PARTITION BY b.cust ORDER BY b.priority DESC) seq
FROM table2 b) pcn
WHERE a.cust = pcn.cust(+)
AND pcn.seq(+) = 1
That will work well for report queries. If you end up adding a filter on a specific customer, then you would be better off using OUTER APPLY if you have a recent-enough version of Oracle that supports that.
You could try this:
SELECT
A.FIELD1
, (SELECT B.PCN
from
TABLE2 B
WHERE B.CUST= A.CUST
ORDER BY B.PRIORITY
FETCH FIRST 1 ROWS ONLY) AS PCN
FROM TABLE1 A;
FETCH FIRST 1 ROWS ONLY gets you the first ordered record. Works on 12c and up and supports nesting, and no 2nd subquery needed.
Yet another option might be a CTE.
Sample data:
SQL> with
2 table1 (field1, cust) as
3 (select 1, 100 from dual union all
4 select 2, 200 from dual
5 ),
6 table2 (pcn, priority, cust) as
7 (select 10, 1, 100 from dual union all
8 select 20, 2, 100 from dual union all
9 select 30, 1, 200 from dual
10 ),
Query begins here. Rank rows by priority, and then fetch the ones that rank as the highest (line #20):
11 temp as
12 (select a.field1,
13 b.pcn,
14 rank() over (partition by a.field1 order by b.priority desc) rnk
15 from table1 a join table2 b on a.cust = b.cust
16 )
17 select field1,
18 pcn
19 from temp
20 where rnk = 1;
FIELD1 PCN
---------- ----------
1 20
2 30
SQL>
You may use first aggregate function to achieve the same (assuming that you have completely deterministic order by) functionality without nested subquery:
select
a.field1
, (
select max(b.pcn) keep(dense_rank first order by b.priority)
from table2 b
where b.cust = a.cust
) as pcn
from table1 a
which for this sample data
insert into table1 values(1,1);
insert into table1 values(2,2);
insert into table2 values(1,1,1);
insert into table2 values(2,2,1)
returns
FIELD1
PCN
1
1
2
(null)
SQL fiddle

How can I count the amount of values in different columns in oracle plsql

For example, I have a table with these values:
ID
Date
Col1
Col2
Col3
Col4
1
01/11/2021
A
A
B
2
01/11/2021
B
B
The A and B values are dynamic, they can be other characters as well.
Now I need somehow to get to the result that id 1 has 2 occurences of A and one of B. Id 2 has 0 occurences of A and 2 occurences of B.
I'm using dynamic SQL to do this:
for v_record in table_cursor
loop
for i in 1 .. 4
loop
v_query := 'select col'||i||' from table where id = '||v_record.id;
execute immediate v_query into v_char;
if v_char = "any letter I'm checking" then
amount := amount + 1;
end if;
end loop;
-- do somehting with the amount
end loop;
But there has to be a better much more efficient way to do this.
I don't have that much knowledge of plsql and I really don't know how to formulate this question in google. I've looked into pivot, but I don't think that will help me out in this case.
I'd appreciate it if someone could help me out.
Assuming the number of columns would be fixed at four, you could use a union aggregation approach here:
WITH cte AS (
SELECT ID, Col1 AS val FROM yourTable UNION ALL
SELECT ID, Col2 FROM yourTable UNION ALL
SELECT ID, Col3 FROM yourTable UNION ALL
SELECT ID, Col4 FROM yourTable
)
SELECT
t1.ID,
t2.val,
COUNT(c.ID) AS cnt
FROM (SELECT DISTINCT ID FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT val FROM cte) t2
LEFT JOIN cte c
ON c.ID = t1.ID AND
c.val = t2.val
WHERE
t2.val IS NOT NULL
GROUP BY
t1.ID,
t2.val;
This produces:
Demo

Report Job difference in HR schema

I'm new to Oracle and try to practice with HR schema. For example I want to report of those whose job is different from the previous job.
Employee name in employees table and job history in job_history table.
I think the following query will help. (I am considering that current JOB_ID is present in the EMPLOYEES table and you want to compare it with the latest previous JOB_ID from JOB_HISTORY table for the employee)
SELECT E.*, JH.LATEST_PREV_JOB_ID
FROM EMPLOYEES E
JOIN (SELECT FIRST_VALUE(JH.JOB_ID) OVER (PARTITION BY JH.EMPLOYEE_ID
ORDER BY JH.START_DATE DESC NULLS LAST) AS LATEST_PREV_JOB_ID,
JH.EMPLOYEE_ID
FROM JOB_HISTORY JH) JH
ON E.EMPLOYEE_ID = JH.EMPLOYEE_ID
WHERE E.JOB_ID <> JH.LATEST_PREV_JOB_ID
----- Update
You want the query without partition by clause (i.e. without WINDOWS function), We can use the NOT EXISTS as follows:
SELECT E.*, JH.LATEST_PREV_JOB_ID
FROM EMPLOYEES E
JOIN (SELECT JH.JOB_ID AS LATEST_PREV_JOB_ID,
JH.EMPLOYEE_ID
FROM JOB_HISTORY JH
WHERE NOT EXISTS (SELECT 1 FROM JOB_HISTORY JHIN
WHERE JHIN.EMPLOYEE_ID = JH.EMPLOYEE_ID
AND JHIN.START_DATE > JH.START_DATE)) JH
ON E.EMPLOYEE_ID = JH.EMPLOYEE_ID
WHERE E.JOB_ID <> JH.LATEST_PREV_JOB_ID
If I understood right your question, then the answer maybe something like this:
select
e.first_name,
e.last_name,
e.job_id as prev_job,
jh.job_id as last_lob
from
employees e,
job_history jh,
(select
employee_id,
max(end_date) as max_end_date
from
job_history
group by
employee_id
) t
where
(jh.employee_id = e.employee_id) and
(jh.job_id <> e.job_id) and
(jh.end_date = t.max_end_date) and
(t.employee_id = jh.employee_id)

how to get a row number for each group in report?

i am making a report that getting employees for each group and i need to get a row number for each employee in each department group
select rownum,e.empno, e.ename, e.sal, e.comm, e.deptno, d.dname
from emp e
left join dept d on emp.deptno = dept.deptno
order by deptno;
We can try using ROW_NUMBER here:
SELECT
ROW_NUMBER() OVER (PARTITION BY emp.deptno ORDER BY emp.ename) rn,
emp.empno,
emp.ENAME,
emp.SAL,
emp.COMM,
emp.DEPTNO,
dept.dname
FROM emp
LEFT JOIN dept
ON emp.deptno = dept.deptno
ORDER BY
emp.deptno;
Note that you never provided logic for what should decide the ordering of each employee within a department. In the absence of this, I have used the employee name. If you want some other ordering, then just modify the ORDER BY clause in the call to ROW_NUMBER.

How to optimize this SELECT with sub query Oracle

Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.

Resources