Find sum of economy, region - oracle

DDL Statement table name is world
country Economy
USA 2000
CHINA 1500
INDIA 1600
DUBAI 1000
Nepal 500
Pakistan 700
Show a query in oracle so that from this table we retriview this output
output: Region sum(economy)
USA 2000
Asia 5300

You're missing a table which says which country belongs to which region:
create table region
(id_region number constraint pk_reg primary key,
name varchar2(20)
);
create table country
(id_country number constraint pk_cou primary key,
id_region number constraint fk_coureg references region (id_region),
name varchar2(20)
);
Then you'd
select r.name as region,
sum(c.economy) as sum_economy
from region r join country c on c.id_region = r.id_region
group by r.name
If you insist on doing it wrong & hardcode regions, here you are:
select case when country = 'USA' then 'USA'
else 'Asia'
end as region,
sum(economy) as sum_economy
from your_table
group by
case when country = 'USA' then 'USA'
else 'Asia'
end;
Note that this "solution" is simply wrong and I suggest you do it properly, as previously described.

Related

Oracle merge into with text similarlity

I have 2 tables: from_country and to_country. I want to bring new records and update records to to_country
Definition and data
--
CREATE TABLE from_country
(
country_code varchar2(255) not null
);
--
CREATE TABLE to_country
(
country_code varchar2(255) not null
);
-- Meaning match
INSERT INTO from_country
(country_code)
VALUES
('United States of America');
-- Match 100%
INSERT INTO from_country
(country_code)
VALUES
('UGANDA');
-- Meaning match, but with domain knowledge
INSERT INTO from_country
(country_code)
VALUES
('CON CORRECT');
-- Brand new country
INSERT INTO from_country
(country_code)
VALUES
('NEW');
--
INSERT INTO to_country
(country_code)
VALUES
('USA');
-- Match 100%
INSERT INTO to_country
(country_code)
VALUES
('UGANDA');
-- Meaning match, but with domain knowledge
INSERT INTO to_country
(country_code)
VALUES
('CON');
I need to run merge into so I bring data from from_county to to_country
Here is my 1st attempt, but it only does a equal, which is not good enough. I need some smartness so that it is able to do meaning match.
If anyone know how to do it, please provide your solution.
merge into
to_country to_t
using
from_country from_t
on
(to_t.country_code = from_t.country_code)
when not matched then insert (
country_code
)
values (
from_t.country_code
);
So in a nutshell, here is what I want
from_table:
United States of America
UGANDA
CON CORRECT
NEW
to_table:
USA
UGANDA
CON
After oracle merge into
the new to_country table:
United States of America
UGANDA
CON CORRECT
NEW
sql fiddle: http://sqlfiddle.com/#!4/f512d
Please note that this is my simplified example. I have larger data set.
Since the match is not guaranteed unique, you have to write a query that will return only one match using some decision.
Here is a simplified case which uses a naive match and then just picks one value when there is more than one match:
merge into to_country t
using (
select * from (
select t.rowid as trowid
,f.country_code as fcode
,t.country_code as tcode
,case when t.country_code is null then 1 else
row_number()
over (partition by t.country_code
order by f.country_code)
end as match_no
from from_country f
left join to_country t
on f.country_code like t.country_code || '%'
) where match_no = 1
) s
on (s.trowid = t.rowid)
when matched then update set country_code = s.fcode
when not matched then insert (country_code) values (s.fcode);
Result in to_country:
USA
UGANDA
CON CORRECT
United States of America
Now that that's taken care of, you just need to make the match algorithm smarter. This is where you need to look at the whole dataset to see what sort of errors there are - i.e. typos, etc.
You could try some of the procedures in Oracle's supplied UTL_MATCH for this purpose: https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm - such as EDIT_DISTANCE, or JARO_WINKLER.
Here is an example using the Jaro Winkler algorithm:
merge into to_country t
using (
select * from (
select t.rowid as trowid
,f.country_code as fcode
,t.country_code as tcode
,case when t.country_code is null then 1
else row_number() over (
partition by t.country_code
order by utl_match.jaro_winkler_similarity(f.country_code,t.country_code) desc)
end as match_no
from from_country f
left join to_country t
on utl_match.jaro_winkler_similarity(f.country_code,t.country_code) > 70
) where match_no = 1
) s
on (s.trowid = t.rowid)
when matched then update set country_code = s.fcode
when not matched then insert (country_code) values (s.fcode);
SQL Fiddle: http://sqlfiddle.com/#!4/f512d/23
Note that I've picked an arbitrary cutoff of >70%. This is because UGANDA vs. USA has a Jaro Winkler similarity of 70.
This results in the following:
United States of America
USA
UGANDA
CON NEW
To see how these algorithms fare, run something like this:
select f.country_code as fcode
,t.country_code as tcode
,utl_match.edit_distance_similarity(f.country_code,t.country_code) as ed
,utl_match.jaro_winkler_similarity(f.country_code,t.country_code) as jw
from from_country f
cross join to_country t
order by 2, 4 desc;
FCODE TCODE ED JW
======================== ====== === ===
CON NEW CON 43 86
CON CORRECT CON 28 83
UGANDA CON 17 50
United States of America CON 0 0
UGANDA UGANDA 100 100
United States of America UGANDA 9 46
CON NEW UGANDA 15 43
CON CORRECT UGANDA 0 41
UGANDA USA 34 70
United States of America USA 13 62
CON CORRECT USA 0 0
CON NEW USA 0 0
SQL Fiddle: http://sqlfiddle.com/#!4/f512d/22

ORA-00001: unique constraint (MYUSER.ADI_PK) violated

I have two tables, adv_institution and institution. institution has 5000+ rows, while adv_institution has 1400+
I want to use Oracle MERGE to back-fill records to adv_institution from institution. These two tables have about four fields tin common which I can use to back-fill.
Here is my entire MERGE statement
merge into
adv_institution to_t
using (
select
uni.*,
adv_c.country_cd as con_code_text
from
(
select
institution_cd,
name,
institution_status,
country_cd
from
institution uni
where
uni.institution_status = 'ACTIVE' and
uni.country_cd is not null
group by
institution_cd,
name,
institution_status,
country_cd
order by
name
) uni,
country_cd c_cd,
adv_country adv_c
where
uni.country_cd = c_cd.country_cd and
c_cd.description = adv_c.country_cd
) from_t
on
(
to_t.VENDOR_INSTITUTION_CD = from_t.INSTITUTION_CD or
to_t.INSTITUTION_CD = from_t.NAME
)
WHEN NOT MATCHED THEN INSERT (
to_t.INSTITUTION_CD,
to_t.INSTITUTION_NAME,
to_t.SHORT_NAME,
to_t.COUNTRY_CD,
to_t.NOTE,
to_t.UNIT_TERMINOLOGY,
to_t.COURSE_TERMINOLOGY,
to_t.CLOSED_IND,
to_t.UPDATE_WHO,
to_t.UPDATE_ON,
to_t.CALLISTA_INSTITUTION_CD
)
VALUES (
from_t.NAME,
from_t.NAME,
'',
from_t.con_code_text,
'',
'UNIT',
'COURSE',
'N',
'MYUSER',
SYSDATE,
from_t.institution_cd
);
The error I got is
Error report -
ORA-00001: unique constraint (MYUSER.ADI_PK) violated
ADI_PK means adv_institution.institution_cd is a primary key and it must be unique.
That is because in WHEN NOT MATCHED THEN INSERT there is an insert statement. I insert from_t.NAME into to_t.INSTITUTION_CD.
It looks like from_t.NAME has the same value at least twice, when inserting into to_t.INSTITUTION_CD
But I did a group statement to make sure from_t.NAME is unique:
(
select
institution_cd,
name,
institution_status,
country_cd
from
institution uni
where
uni.institution_status = 'ACTIVE' and
uni.country_cd is not null
group by
institution_cd,
name,
institution_status,
country_cd
order by
name
) uni
I am not sure I understand the issue correctly. I tried all I can, but still no luck.
I think your main issue is with group by.
Please consider below example:
desc temp_inventory;
Name Type
--------------------- -----------
WAREHOUSE_NO NUMBER(2)
ITEM_NO NUMBER(10)
ITEM_QUANTITY NUMBER(10)
WAREHOUSE_NO ITEM_NO ITEM_QUANTITY
1 1000 100
1 2000 200
1 2000 300
If i write a query where I want warehouse_no to be unique:
select warehouse_no,item_quantity
from temp_inventory
group by warehouse_no,item_quantity
Its going to return the same 3 rows.. instead i want to group by..
select warehouse_no,sum(item_quantity)
from temp_inventory
group by warehouse_no
which will make the warehouse_no unique in this situation !
Also in cases where you have VARCHAR2 columns, you can use MAX, MIN on them as aggregate functions along with group by to make a unique key in the query.
Example:
Select object_type, min(object_name)
from user_objects group by object_type;
which will make the object_type unique & return only 1 corresponding object name for it.
So note that if there are duplicate's, in the end some records will be eliminated based on the aggregate function.
"But I did a group statement to make sure from_t.NAME is unique:"
But your query does not do that. It produces a set of distinct combinations of (institution_cd,name,institution_status,country_cd). Clearly such a set could contain multiple recurrences of name, one for each different value of country_cd. As you have four elements in your key you are virtually guaranteeing that your set will have multiple occurrences of name.
You compound this with the or in the ON conditions, which means you trigger the UNMATCHED logic if to_t.VENDOR_INSTITUTION_CD = from_t.INSTITUTION_CD even though there is already a record in the target table where to_t.INSTITUTION_CD = from_t.NAME.
The problem is that the MERGE statement is atomic. The set of records coming from the USING subquery must contain unique keys. When Oracle finds a second occurrence of the same name in the result set it doesn't say, I've already merged one of those, let's skip it. It has to hurl ORA-00001 because there is no way for Oracle to know which record is apply, which combination of (institution_cd,name,institution_status,country_cd) is the correct one.
To solve this you need to change the USING query to produce a result set with unique keys. It's your data model, you understand its business rules, so you're in the position to rewrite it properly. But maybe something like this:
select
name,
max(institution_cd) as institution_cd,
institution_status,
max(country_cd) as country_cd
from (
institution uni
where
uni.institution_status = 'ACTIVE' and
uni.country_cd is not null
group by
name,
institution_status
order by
name
) uni
Then you can simplify the MERGE ON clause to:
on
(
to_t.INSTITUTION_CD = from_t.NAME
)
The use of MAX() in the subquery is an inelegant kludge. I hope you can apply better business rules.

can I do insert in update of merge(Implementation SCD type 2)

I have source table and a target table I want to do merge such that there should always be insert in the target table. For each record updated there should ne a flag updated to 'Y' and when this in something is changed then record flag value should be chnaged to 'N' and a new row of that record is inserted in target such that the information of record that is updated should be reflected. Basically I want to implement SCD type2 . My input data is-
student_id name city state mobile
1 suraj bhopal m.p. 9874561230
2 ravi pune mh 9874563210
3 amit patna bihar 9632587410
4 rao banglore kr 9236547890
5 neel chennai tn 8301456987
and when my input chnages-
student_id name city state mobile
1 suraj indore m.p. 9874561230
And my output should be like-
surr_key student_id name city state mobile insert_Date end_date Flag
1 1 suraj bhopal m.p.9874561230 31/06/2015 1/09/2015 N
2 1 suraj indore m.p.9874561230 2/09/2015 31/12/9999 Y
Can anyone help me how can I do that?
You can do this with the use of trigger ,you can create before insert trigger on your target table which will update flag column of your source table.
Or you can have after update trigger on source table which will insert record in your target table.
Hope this helps
Regards,
So this should be the outline of your procedure steps. I used different columns in source and target for simplification.
Source (tu_student) - STUDENT_ID, NAME, CITY
Target (tu_student_tgt)- SKEY, STUDENT_ID, NAME, CITY, INSERT_DATE, END_DATE, IS_ACTIVE
The basic idea here is
Find the new records from source which are missing in target and Insert it. Set start_date as sysdate, end_date as 9999 and IsActive to 1.
Find the records which are updated (like your Bhopal -> Indore case). So we have to do 2 operations in target for it
Update the record in target and set end date as sysdate and IsActive to 0.
Insert this record in target which has new values. Set start_date as sysdate, end_date as 9999 and IsActive = 1.
-- Create a new oracle sequence (test_utsav_seq in this example)
---Step 1 - Find new inserts (records present in source but not in target
insert into tu_student_tgt
(
select
test_utsav_seq.nextval as skey,
s.student_id as student_id,
s.name as name,
s.city as city,
sysdate as insert_date,
'31-DEC-9999' as end_date,
1 as Flag
from tu_student s
left outer join
tu_student_tgt t
on s.student_id=t.student_id
where t.student_id is null)
----Step 2 - Find skey which needs to be updated due to data chage from source and target. So get the active records from target and compare with source data. If mismatch found, we need to
-- a update this recods in target and mark it as Inactive.
-- b Insert a new record for same student_id with new data and mark it Active.
-- part 2a - find updates.
--these records need update. Save these skey and use it one by one while updating.
select t.skey
from tu_student s inner join
tu_student_tgt t
on s.student_id=t.student_id
where t.Flag = 1 and
(s.name!=t.name or
s.city!=t.city)
--2 b ) FInd the ids which needs to be inserted as they changed in source from target. Now as above records are marked inactive,
select s.student_id
from tu_student s inner join
tu_student_tgt t
on s.student_id=t.student_id
where t.Flag = 1 and
(s.name!=t.name or
s.city!=t.city)
---2a - Implement update
-- Now use skey from 2a in a loop and run update statements like below. Replace t.key = with the keys which needs to be updated.
update tu_student_tgt t
set t.student_id = (select s.student_id from tu_student s,tu_student_tgt t where s.student_id=t.student_id and t.key= -- id from 2a step . )
, t.name=(select s.name from tu_student s,tu_student_tgt t where s.student_id=t.student_id and t.key= --id from 2a step. )
, end_date = sysdate
, is_active = 0
where t.skey = -- id from 2a step
---2b Implement Insert use student_id found in 2a
--Insert these student id like step 1
insert into tu_student_tgt
(
select
test_utsav_seq.nextval as skey,
s.student_id as student_id,
s.name as name,
s.city as city,
sysdate as insert_date,
'31-DEC-9999' as end_date,
1 as Flag
from tu_student s
where s.student_id = -- ID from 2b step - Repeat for other ids
I cannot give you a simple example of SCD-2. If you understand SCD-2, you should understand this implementation.

Oracle Self-Join on multiple possible column matches - CONNECT BY?

I have a query requirement from ----. Trying to solve it with CONNECT BY, but can't seem to get the results I need.
Table (simplified):
create table CSS.USER_DESC (
USER_ID VARCHAR2(30) not null,
NEW_USER_ID VARCHAR2(30),
GLOBAL_HR_ID CHAR(8)
)
-- USER_ID is the primary key
-- NEW_USER_ID is a self-referencing key
-- GLOBAL_HR_ID is an ID field from another system
There are two sources of user data (datafeeds)... I have to watch for mistakes in either of them when updating information.
Scenarios:
A user is given a new User ID... The old record is set accordingly and deactivated (typically a rename for contractors who become fulltime)
A user leaves and returns sometime later. HR fails to send us the old user ID so we can connect the accounts.
The system screwed up and didn't set the new User ID on the old record.
The data can be bad in a hundred other ways
I need to know the following are the same user, and I can't rely on name or other fields... they differ among matching records:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100 and EX000005 are connected properly by the NEW_USER_ID field. The rename occurred before there were global HR IDs, so EX0T1100 doesn't have one. EX000005 was given a new user ID, 'GL110456', and the two are only connected by having the same global HR ID.
Cleaning up the data isn't an option.
The query so far:
select connect_by_root cud.user_id RootUser,
count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots,
level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle,
cud.user_id, cud.new_user_id, cud.global_hr_id,
cud.user_type_code UserType, ccud.last_name, cud.first_name
from css.user_desc cud
where cud.user_id in ('EX000005','EX0T1100','GL110456')
-- Using this so I don't get sub-users in my list of root users...
-- It complicates the matches with GLOBAL_HR_ID, however
start with cud.user_id not in (select cudsub.new_user_id
from css.user_desc cudsub
where cudsub.new_user_id is not null)
connect by nocycle (prior new_user_id = user_id);
I've tried various CONNECT BY clauses, but none of them are quite right:
-- As a multiple CONNECT BY
connect by nocycle (prior global_hr_id = global_hr_id)
connect by nocycle (prior new_user_id = user_id)
-- As a compound CONNECT BY
connect by nocycle ((prior new_user_id = user_id)
or (prior global_hr_id = global_hr_id
and user_id != prior user_Id))
UNIONing two CONNECT BY queries doesn't work... I don't get the leveling.
Here is what I would like to see... I'm okay with a resultset that I have to distinct and use as a subquery. I'm also okay with any of the three user IDs in the ROOTUSER column... I just need to know they're the same users.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
Ideas?
Update
Nicholas, your code looks very much like the right track... at the moment, the lead(user_id) over (partition by global_hr_id) gets false hits when the global_hr_id is null. For example:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME
FP004468 FP004469 AARON TIMOTHY
FP004469 FOONG KOK WAH
I've often wanted to treat nulls as separate records in a partition, but I've never found a way to make ignore nulls work. This did what I wanted:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
... but there's got to be a better way. I haven't been able to get the query to finish yet on the full-blown user data (about 40,000 users). Both global_hr_id and new_user_id are indexed.
Update
The query returns after about 750 seconds... long, but manageable. It returns 93k records, because I don't have a good way of filtering level 2 hits out of the root - you have start with global_hr_id is null, but unfortunately, that isn't always the case. I'll have to think some more about how to filter those out.
I've tried adding more complex start with clauses before, but I find that separately, they run < 1 second... together, they take 90 minutes >.<
Thanks again for you help... plodding away at this.
You have provided sample of data for only one user. Would be better to have a little bit more. Anyway, lets look at something like this.
SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as(
2 select 'EX0T1100', 'EX000005', null from dual union all
3 select 'EX000005', null, 00126121 from dual union all
4 select 'GL110456', null, 00126121 from dual
5 )
6 select connect_by_root(user_id) rootuser
7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot
8 , level nodlevel
9 , connect_by_isleaf
10 , user_id
11 , new_user_id
12 , global_hr_id
13 from (select user_id
14 , coalesce(new_user_id, usr) new_user_id1
15 , new_user_id
16 , global_hr_id
17 from ( select user_id
18 , new_user_id
19 , global_hr_id
20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr
21 from user_desc
22 )
23 )
24 start with global_hr_id is null
25 connect by prior new_user_id1 = user_id
26 ;
Result:
ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID
-------- ---------- ---------- ----------------- -------- ----------- ------------
EX0T1100 3 1 0 EX0T1100 EX000005
EX0T1100 3 2 0 EX000005 126121
EX0T1100 3 3 1 GL110456 126121

How to find Column with same (some x value) value repeated more than once? Needs to return those rows.

There is a table called contacts with columns id, name, address, ph_no etc.
I need to find out rows with the same name, if the rows count is more than 1, show those rows.
For example:
Table: contacts
id--------name--------address---------ph_no--------
111 apple U.K 99*******
112 banana U.S 99*******
123 grape INDIA 99*******
143 orange S.AFRICA 99*******
152 grape KENYA 99*******
For the above table I need to get rows with same column name data like the below:
id--------name--------address---------ph_no--------
123 grape INDIA 99*******
152 grape KENYA 99*******
I need to get the rows based on the name what I given as argument like below example syntax:
select * from contacts where name='grape' and it's count(*) >1 return those rows.
How can I achieve the solution for above problem.
As #vc74 suggests analytic functions would work work a lot better here; especially if your data has any volume.
select id, name, address, ph_no ...
from ( select c.*, count(name) over ( partition by name ) as name_ct
from contacts c )
where name_ct > 1
;
EDIT
restricting on specific names the table contacts should really have an index on name and the query would look like this:
select id, name, address, ph_no ...
from ( select c.*, count(name) over ( partition by name ) as name_ct
from contacts c
where name = 'grape' )
where name_ct > 1
;
select id, name, address, ph_no
from contacts
where name in
(
select name from contacts
group by name
having count(*) > 1
)
If you have access to Oracle's analytical functions there might be a more straightforward way
select *
from contacts c
where c.name in ( select cc.name
from contacts
group by cc.name
having count(1) > 1 );

Resources