Oracle Self-Join on multiple possible column matches - CONNECT BY? - oracle

I have a query requirement from ----. Trying to solve it with CONNECT BY, but can't seem to get the results I need.
Table (simplified):
create table CSS.USER_DESC (
USER_ID VARCHAR2(30) not null,
NEW_USER_ID VARCHAR2(30),
GLOBAL_HR_ID CHAR(8)
)
-- USER_ID is the primary key
-- NEW_USER_ID is a self-referencing key
-- GLOBAL_HR_ID is an ID field from another system
There are two sources of user data (datafeeds)... I have to watch for mistakes in either of them when updating information.
Scenarios:
A user is given a new User ID... The old record is set accordingly and deactivated (typically a rename for contractors who become fulltime)
A user leaves and returns sometime later. HR fails to send us the old user ID so we can connect the accounts.
The system screwed up and didn't set the new User ID on the old record.
The data can be bad in a hundred other ways
I need to know the following are the same user, and I can't rely on name or other fields... they differ among matching records:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100 and EX000005 are connected properly by the NEW_USER_ID field. The rename occurred before there were global HR IDs, so EX0T1100 doesn't have one. EX000005 was given a new user ID, 'GL110456', and the two are only connected by having the same global HR ID.
Cleaning up the data isn't an option.
The query so far:
select connect_by_root cud.user_id RootUser,
count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots,
level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle,
cud.user_id, cud.new_user_id, cud.global_hr_id,
cud.user_type_code UserType, ccud.last_name, cud.first_name
from css.user_desc cud
where cud.user_id in ('EX000005','EX0T1100','GL110456')
-- Using this so I don't get sub-users in my list of root users...
-- It complicates the matches with GLOBAL_HR_ID, however
start with cud.user_id not in (select cudsub.new_user_id
from css.user_desc cudsub
where cudsub.new_user_id is not null)
connect by nocycle (prior new_user_id = user_id);
I've tried various CONNECT BY clauses, but none of them are quite right:
-- As a multiple CONNECT BY
connect by nocycle (prior global_hr_id = global_hr_id)
connect by nocycle (prior new_user_id = user_id)
-- As a compound CONNECT BY
connect by nocycle ((prior new_user_id = user_id)
or (prior global_hr_id = global_hr_id
and user_id != prior user_Id))
UNIONing two CONNECT BY queries doesn't work... I don't get the leveling.
Here is what I would like to see... I'm okay with a resultset that I have to distinct and use as a subquery. I'm also okay with any of the three user IDs in the ROOTUSER column... I just need to know they're the same users.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
Ideas?
Update
Nicholas, your code looks very much like the right track... at the moment, the lead(user_id) over (partition by global_hr_id) gets false hits when the global_hr_id is null. For example:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME
FP004468 FP004469 AARON TIMOTHY
FP004469 FOONG KOK WAH
I've often wanted to treat nulls as separate records in a partition, but I've never found a way to make ignore nulls work. This did what I wanted:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
... but there's got to be a better way. I haven't been able to get the query to finish yet on the full-blown user data (about 40,000 users). Both global_hr_id and new_user_id are indexed.
Update
The query returns after about 750 seconds... long, but manageable. It returns 93k records, because I don't have a good way of filtering level 2 hits out of the root - you have start with global_hr_id is null, but unfortunately, that isn't always the case. I'll have to think some more about how to filter those out.
I've tried adding more complex start with clauses before, but I find that separately, they run < 1 second... together, they take 90 minutes >.<
Thanks again for you help... plodding away at this.

You have provided sample of data for only one user. Would be better to have a little bit more. Anyway, lets look at something like this.
SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as(
2 select 'EX0T1100', 'EX000005', null from dual union all
3 select 'EX000005', null, 00126121 from dual union all
4 select 'GL110456', null, 00126121 from dual
5 )
6 select connect_by_root(user_id) rootuser
7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot
8 , level nodlevel
9 , connect_by_isleaf
10 , user_id
11 , new_user_id
12 , global_hr_id
13 from (select user_id
14 , coalesce(new_user_id, usr) new_user_id1
15 , new_user_id
16 , global_hr_id
17 from ( select user_id
18 , new_user_id
19 , global_hr_id
20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr
21 from user_desc
22 )
23 )
24 start with global_hr_id is null
25 connect by prior new_user_id1 = user_id
26 ;
Result:
ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID
-------- ---------- ---------- ----------------- -------- ----------- ------------
EX0T1100 3 1 0 EX0T1100 EX000005
EX0T1100 3 2 0 EX000005 126121
EX0T1100 3 3 1 GL110456 126121

Related

Extract a sub-tree from a hierarchy tree based on a leaf in Oracle

I have a table users representing a hierarchical tree like this:
Column
Type
Comment
user_id
integer
sequence
user_type
integer
1 for group of users 2 for normal user
group_id
integer
Reference to a user in the same table with user_type = 1
user_name
varchar(xxx)
The group_id column references another user_id so that groups and users are stored in the same table.
The master group_id is 0.
Like this:
user_id
user_type
group_id
user_name
0
1
null
'All users'
5
2
0
'USER1'
6
2
0
'USER2'
11
1
0
'SUBGROUP1'
12
1
11
'SUBGROUP2'
13
2
12
'USER3'
20
1
0
'SUBGROUP3'
21
2
20
'USER4'
Notice that:
There can be gaps in user_id.
A group can contain nothing or any number of groups or users.
I have already managed to retrieve the full tree, properly indented and sorted, by using the connect by oracle statement.
This is not my question here.
My question is:
Given a user_id to a query, how to browse the tree up to the master group 'All Users'
and output as a result the full path from the leaf to the master group ?
Example 1: I run the query for USER1, i want the following output:
All Users
- USER1
Example 2: I run the same query for USER3, i want the following output:
All Users
- SUBGROUP1
-- SUBGROUP2
--- USER3
I hope someone could help me on this.
For information i post the query to retrieve the full tree, for you to see the use of connect by and start with.
I'm sure this query is close to the one i want, but my tries never produce the result i want.
select
lpad('-', (level - 1) * 2, ' ') || u.user_name as padded_name,
u.userid,
u.user_group,
u.user_type,
level
from users u
connect by prior u.user_id = u.group_id
start with u.user_id = 0
order siblings by upper(u.user_name);
You could use connect by to walk in the opposite direction. Then the level will of course be opposite too. So to get the results in the right order and indentation, chain another query based on these results that will use row_number() to determine the indentation:
with base as (
select
u.user_name,
u.user_id,
u.group_id,
u.user_type,
level as lvl
from users u
connect by prior u.group_id = u.user_id
start with u.user_id = 13
)
select
lpad('-', (row_number() over (order by lvl desc) - 1) * 2, ' ') || base.user_name
as padded_name,
user_id,
group_id,
user_type
from base
order by lvl desc;

How to write correct left Join of two tables?

I want to join two tables, first table primary key data type is number, and second table primary key data type is VARCHAR2(30 BYTE). How to join both tables.
I tried this code but second tables all values are null. why is that?
SELECT a.act_phone_no,a.act_actdevice,a.bi_account_id, a.packag_start_date, c.identification_number,
FROM ACTIVATIONS_POP a
left JOIN customer c
on TO_CHAR(a.act_phone_no) = c.msisdn_voice
first table
act_phone_no bi_account_id
23434 45345
34245 43556
Second table
msisdn_voice identification_number
23434 321113
34245 6547657
It seems that you didn't tell us everything. Query works, if correctly written, on such a sample data:
SQL> with
2 -- Sample data
3 activations_pop (act_phone_no, bi_account_id) as
4 (select 23434, 45345 from dual union all
5 select 34245, 43556 from dual
6 ),
7 customer (msisdn_voice, identification_number) as
8 (select '23434', 321113 from dual union all
9 select '34245', 6547657 from dual
10 )
11 -- query works OK
12 select a.act_phone_no,
13 a.bi_account_id,
14 c.identification_number
15 from activations_pop a join customer c on to_char(a.act_phone_no) = c.msisdn_voice;
ACT_PHONE_NO BI_ACCOUNT_ID IDENTIFICATION_NUMBER
------------ ------------- ---------------------
23434 45345 321113
34245 43556 6547657
SQL>
What could be wrong? Who knows. If you got some result but columns from the CUSTOMER table are empty (NULL?), then they really might be NULL, or you didn't manage to join rows on those columns (left/right padding with spaces?). Does joining on e.g.
on to_char(a.act_phone_no) = trim(c.msisdn_voice)
or
on a.act_phone_no = to_number(c.msisdn_voice)
help?
Consider posting proper test case (CREATE TABLE and INSERT INTO statements).
You are using Oracle ?
Please check the below demo
SELECT a.act_phone_no, a.bi_account_id, c.identification_number
FROM ACTIVATIONS_POP a
left JOIN customer c
on TO_CHAR(a.act_phone_no) = c.msisdn_voice;
SQLFiddle

Oracle Select Query on Same Table (self join)

It seems to simple, but not getting desired results
I have a table with there data
Team_id, Player_id, Player_name Game_cd
1 100 abc 24
1 1000 xyz 24
1 588 ert 24
1 500 you 24
2 600 ops 24
2 700 dps 24
2 900 lmv 24
2 200 hmv 24
I have to write a query to get a result like this
Home_team home_plr_id home_player away_team away_plr_id away_player
1 100 abc 2 600 ops
1 1000 xyz 2 900 lmv
The query I wrote
select f1.Team_id as home_team,
f1.player_id as home_plr_id,
f1.player_Name as home_player,
f2.Team_id as away_team,
f2.player_id as away_plr_id,
f2.player_Name as home_player
from game f1, game f2
where
f1.team_id<> f2.team_id and
f1.game_cd = f2.game_cd
Alternative to #Radagast81's self-join is pivot, available in your Oracle version:
select home_plr_id, home_plr_name, away_plr_id, away_plr_name
from (select game.*,
row_number() over (partition by team_id order by player_id) rn
from game)
pivot (max(player_id) plr_id, max(player_name) plr_name
for team_id in (1 home, 2 away))
SQL Fiddle
Players have to be numbered somehow (here by ID), it can be done by name, null or even random. This numbering is needed only to put them in same rows. Pivot works also if numbers of players in teams differs.
It is not clear how you want to pair a home player with an away player. But provided that you don't care about that, the following might be what you are looking for:
WITH game_p AS (SELECT team_id, player_id, player_name, game_cd
, ROW_NUMBER() over (PARTITION BY team_id, game_cd ORDER BY player_id) pos
, dense_rank() over (PARTITION BY game_cd ORDER BY team_id) team_pos
FROM game)
SELECT NVL(f1.game_cd, f2.game_cd) AS game_cd
, f1.Team_id as home_team
, f1.player_id as home_plr_id
, f1.player_Name as home_player
, f2.Team_id as away_team
, f2.player_id as away_plr_id
, f2.player_Name as away_player
FROM (SELECT * FROM game_p WHERE team_pos = 1) f1
FULL JOIN (SELECT * FROM game_p WHERE team_pos = 2) f2
ON f1.game_cd = f2.game_cd
AND f1.pos = f2.pos
The new column POS gives any player of each team a position to pair them with the other team.
The new column TEAM_POS is to get the team_id mapped to the values 1 and 2, as the team_id's can differ per game.
Finally do a FULL JOIN to get the final list. If the number of players are allways the same for both teams you can do a normal join instead...

can I do insert in update of merge(Implementation SCD type 2)

I have source table and a target table I want to do merge such that there should always be insert in the target table. For each record updated there should ne a flag updated to 'Y' and when this in something is changed then record flag value should be chnaged to 'N' and a new row of that record is inserted in target such that the information of record that is updated should be reflected. Basically I want to implement SCD type2 . My input data is-
student_id name city state mobile
1 suraj bhopal m.p. 9874561230
2 ravi pune mh 9874563210
3 amit patna bihar 9632587410
4 rao banglore kr 9236547890
5 neel chennai tn 8301456987
and when my input chnages-
student_id name city state mobile
1 suraj indore m.p. 9874561230
And my output should be like-
surr_key student_id name city state mobile insert_Date end_date Flag
1 1 suraj bhopal m.p.9874561230 31/06/2015 1/09/2015 N
2 1 suraj indore m.p.9874561230 2/09/2015 31/12/9999 Y
Can anyone help me how can I do that?
You can do this with the use of trigger ,you can create before insert trigger on your target table which will update flag column of your source table.
Or you can have after update trigger on source table which will insert record in your target table.
Hope this helps
Regards,
So this should be the outline of your procedure steps. I used different columns in source and target for simplification.
Source (tu_student) - STUDENT_ID, NAME, CITY
Target (tu_student_tgt)- SKEY, STUDENT_ID, NAME, CITY, INSERT_DATE, END_DATE, IS_ACTIVE
The basic idea here is
Find the new records from source which are missing in target and Insert it. Set start_date as sysdate, end_date as 9999 and IsActive to 1.
Find the records which are updated (like your Bhopal -> Indore case). So we have to do 2 operations in target for it
Update the record in target and set end date as sysdate and IsActive to 0.
Insert this record in target which has new values. Set start_date as sysdate, end_date as 9999 and IsActive = 1.
-- Create a new oracle sequence (test_utsav_seq in this example)
---Step 1 - Find new inserts (records present in source but not in target
insert into tu_student_tgt
(
select
test_utsav_seq.nextval as skey,
s.student_id as student_id,
s.name as name,
s.city as city,
sysdate as insert_date,
'31-DEC-9999' as end_date,
1 as Flag
from tu_student s
left outer join
tu_student_tgt t
on s.student_id=t.student_id
where t.student_id is null)
----Step 2 - Find skey which needs to be updated due to data chage from source and target. So get the active records from target and compare with source data. If mismatch found, we need to
-- a update this recods in target and mark it as Inactive.
-- b Insert a new record for same student_id with new data and mark it Active.
-- part 2a - find updates.
--these records need update. Save these skey and use it one by one while updating.
select t.skey
from tu_student s inner join
tu_student_tgt t
on s.student_id=t.student_id
where t.Flag = 1 and
(s.name!=t.name or
s.city!=t.city)
--2 b ) FInd the ids which needs to be inserted as they changed in source from target. Now as above records are marked inactive,
select s.student_id
from tu_student s inner join
tu_student_tgt t
on s.student_id=t.student_id
where t.Flag = 1 and
(s.name!=t.name or
s.city!=t.city)
---2a - Implement update
-- Now use skey from 2a in a loop and run update statements like below. Replace t.key = with the keys which needs to be updated.
update tu_student_tgt t
set t.student_id = (select s.student_id from tu_student s,tu_student_tgt t where s.student_id=t.student_id and t.key= -- id from 2a step . )
, t.name=(select s.name from tu_student s,tu_student_tgt t where s.student_id=t.student_id and t.key= --id from 2a step. )
, end_date = sysdate
, is_active = 0
where t.skey = -- id from 2a step
---2b Implement Insert use student_id found in 2a
--Insert these student id like step 1
insert into tu_student_tgt
(
select
test_utsav_seq.nextval as skey,
s.student_id as student_id,
s.name as name,
s.city as city,
sysdate as insert_date,
'31-DEC-9999' as end_date,
1 as Flag
from tu_student s
where s.student_id = -- ID from 2b step - Repeat for other ids
I cannot give you a simple example of SCD-2. If you understand SCD-2, you should understand this implementation.

oracle hierarchical query nocycle and connect by root

Can somebody explain use of nocycle and connect by root clauses in hierarchical queries in oracle, also when we dont use 'start with' what is the order we get the rows, i mean when we don't use 'start with' we get lot many rows, can anybody explain nocycle and connect by root(how is different than start with?) using simple emp table, Thanks for the help
If your data has a loop in it (A -> B -> A -> B ...), Oracle will throw an exception, ORA-01436: CONNECT BY loop in user data if you do a hierarchical query. NOCYCLE instructs Oracle to return rows even if such a loop exists.
CONNECT_BY_ROOT gives you access to the root element, even several layers down in the query. Using the HR schema:
select level, employee_id, last_name, manager_id ,
connect_by_root employee_id as root_id
from employees
connect by prior employee_id = manager_id
start with employee_id = 100
LEVEL EMPLOYEE_ID LAST_NAME MANAGER_ID ROOT_ID
---------- ----------- ------------------------- ---------- ----------
1 100 King 100
2 101 Kochhar 100 100
3 108 Greenberg 101 100
4 109 Faviet 108 100
...
Here, you see I started with employee 100 and started finding his employees. The CONNECT_BY_ROOT operator gives me access to King's employee_id even four levels down. I was very confused at first by this operator, thinking it meant "connect by the root element" or something. Think of it more like "the root of the CONNECT BY clause."
Here is about nocycle use in query.
Suppose we have a simple table
with r1 and r2 column names and the values for
first row r1=a,r2=b
and second row r1=b,r2=a
Now we know a refers to b and b refers back to a .
Hence there is a loop and if we write a hierarchical query as
select r1 from table_name
start with r1='a'
connect by prior r2=r1;
we get connect by loop error
Hence use nocycle to allow oracle to give results even if loop exists.
Hence the query
select r1 from table_name
start with r1='a'
connect by nocycle prior r2=r1;

Resources