Slowly changing dimensions- SCD1 and SCD2 implementation in Hive [closed] - hadoop

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am looking for SCD1 and SCD2 implementation in Hive (1.2.1). I am aware of the workaround to load SCD1 and SCD2 tables prior to Hive (0.14). Here is the link for loading SCD1 and SCD2 with the workaround approach http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
Now that Hive supports ACID operations just want to know if there is a better or direct way of loading it.

As HDFS is immutable storage it could be argued that versioning data and keeping history (SCD2) should be the default behaviour for loading dimensions. You can create a View in your Hadoop SQL query engine (Hive, Impala, Drill etc.) that retrieves the current state/latest value using windowing functions. You can find out more about dimensional models on Hadoop in my blog post, e.g. how to handle a large dimension and fact table.

Well, I work it around using two temp tables:
drop table if exists administrator_tmp1;
drop table if exists administrator_tmp2;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--review_administrator
CREATE TABLE if not exists review_administrator(
admin_id bigint ,
admin_name string,
create_time string,
email string ,
password string,
status_description string,
token string ,
expire_time string ,
granter_user_id bigint ,
admin_time string ,
effect_start_date string ,
effect_end_date string
)
partitioned by (current_row_indicator string comment 'current, expired')
stored as parquet;
--tmp1 is used for saving origin data
CREATE TABLE if not exists administrator_tmp1(
admin_id bigint ,
admin_name string,
create_time string,
email string ,
password string ,
status_description string ,
token string ,
expire_time string ,
granter_user_id bigint ,
admin_time string ,
effect_start_date string ,
effect_end_date string
)
partitioned by (current_row_indicator string comment 'current, expired:')
stored as parquet;
--tmp2 saving the scd data
CREATE TABLE if not exists administrator_tmp2(
admin_id bigint ,
admin_name string,
create_time string,
email string ,
password string ,
status_description string ,
token string ,
expire_time string ,
granter_user_id bigint ,
admin_time string ,
effect_start_date string ,
effect_end_date string
)
partitioned by (current_row_indicator string comment 'current, expired')
stored as parquet;
--insert origin data into tmp1
INSERT OVERWRITE TABLE administrator_tmp1 PARTITION(current_row_indicator)
SELECT
user_id as admin_id,
name as admin_name,
time as create_time,
email as email,
password as password,
status as status_description,
token as token,
expire_time as expire_time,
admin_id as granter_user_id,
admin_time as admin_time,
'{{ ds }}' as effect_start_date,
'9999-12-31' as effect_end_date,
'current' as current_row_indicator
FROM
ks_db_origin.gifshow_administrator_origin
;
--insert scd data into tmp2
--for the data unchanged
INSERT INTO TABLE administrator_tmp2 PARTITION(current_row_indicator)
SELECT
t2.admin_id,
t2.admin_name,
t2.create_time,
t2.email,
t2.password,
t2.status_description,
t2.token,
t2.expire_time,
t2.granter_user_id,
t2.admin_time,
t2.effect_start_date,
t2.effect_end_date as effect_end_date,
t2.current_row_indicator
FROM
administrator_tmp1 t1
INNER JOIN
(
SELECT * FROM review_administrator
WHERE current_row_indicator = 'current'
) t2
ON
t1.admin_id = t2.admin_id
AND t1.admin_name = t2.admin_name
AND t1.create_time = t2.create_time
AND t1.email = t2.email
AND t1.password = t2.password
AND t1.status_description = t2.status_description
AND t1.token = t2.token
AND t1.expire_time = t2.expire_time
AND t1.granter_user_id = t2.granter_user_id
AND t1.admin_time = t2.admin_time
;
--for the data changed , update the effect_end_date
INSERT INTO TABLE administrator_tmp2 PARTITION(current_row_indicator)
SELECT
t2.admin_id,
t2.admin_name,
t2.create_time,
t2.email,
t2.password,
t2.status_description,
t2.token,
t2.expire_time,
t2.granter_user_id,
t2.admin_time,
t2.effect_start_date as effect_start_date,
'{{ yesterday_ds }}' as effect_end_date,
'expired' as current_row_indicator
FROM
administrator_tmp1 t1
INNER JOIN
(
SELECT * FROM review_administrator
WHERE current_row_indicator = 'current'
) t2
ON
t1.admin_id = t2.admin_id
WHERE NOT
(
t1.admin_name = t2.admin_name
AND t1.create_time = t2.create_time
AND t1.email = t2.email
AND t1.password = t2.password
AND t1.status_description = t2.status_description
AND t1.token = t2.token
AND t1.expire_time = t2.expire_time
AND t1.granter_user_id = t2.granter_user_id
AND t1.admin_time = t2.admin_time
)
;
--for the changed data and the new data
INSERT INTO TABLE administrator_tmp2 PARTITION(current_row_indicator)
SELECT
t1.admin_id,
t1.admin_name,
t1.create_time,
t1.email,
t1.password,
t1.status_description,
t1.token,
t1.expire_time,
t1.granter_user_id,
t1.admin_time,
t1.effect_start_date,
t1.effect_end_date,
t1.current_row_indicator
FROM
administrator_tmp1 t1
LEFT OUTER JOIN
(
SELECT * FROM review_administrator
WHERE current_row_indicator = 'current'
) t2
ON
t1.admin_id = t2.admin_id
AND t1.admin_name = t2.admin_name
AND t1.create_time = t2.create_time
AND t1.email = t2.email
AND t1.password = t2.password
AND t1.status_description = t2.status_description
AND t1.token = t2.token
AND t1.expire_time = t2.expire_time
AND t1.granter_user_id = t2.granter_user_id
AND t1.admin_time = t2.admin_time
WHERE t2.admin_id IS NULL
;
--for the data already marked by 'expired'
INSERT INTO TABLE administrator_tmp2 PARTITION(current_row_indicator)
SELECT
t1.admin_id,
t1.admin_name,
t1.create_time,
t1.email,
t1.password,
t1.status_description,
t1.token,
t1.expire_time,
t1.granter_user_id,
t1.admin_time,
t1.effect_start_date,
t1.effect_end_date,
t1.current_row_indicator
FROM
review_administrator t1
WHERE t1.current_row_indicator = 'expired'
;
--populate the dim table
INSERT OVERWRITE TABLE review_administrator PARTITION(current_row_indicator)
SELECT
t1.admin_id,
t1.admin_name,
t1.create_time,
t1.email,
t1.password,
t1.status_description,
t1.token,
t1.expire_time,
t1.granter_user_id,
t1.admin_time,
t1.effect_start_date,
t1.effect_end_date,
t1.current_row_indicator
FROM
administrator_tmp2 t1
;
--drop the two temp table
drop table administrator_tmp1;
drop table administrator_tmp2;
-- --example data
-- --2017-01-01
-- insert into table review_administrator PARTITION(current_row_indicator)
-- SELECT '1','a','2016-12-31','a#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-01','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- --2017-01-02
-- insert into table administrator_tmp1 PARTITION(current_row_indicator)
-- SELECT '1','a','2016-12-31','a01#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-02','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- insert into table administrator_tmp1 PARTITION(current_row_indicator)
-- SELECT '2','b','2016-12-31','a#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-02','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- --2017-01-03
-- --id 1 is changed
-- insert into table administrator_tmp1 PARTITION(current_row_indicator)
-- SELECT '1','a','2016-12-31','a03#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-03','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- --id 2 is not changed at all
-- insert into table administrator_tmp1 PARTITION(current_row_indicator)
-- SELECT '2','b','2016-12-31','a#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-03','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- --id 3 is a new record
-- insert into table administrator_tmp1 PARTITION(current_row_indicator)
-- SELECT '3','c','2016-12-31','c#ks.com','password','open','token1','2017-12-31',
-- 0,'2017-12-31','2017-01-03','9999-12-31','current'
-- FROM default.sample_07 limit 1;
-- --now dim table will show you the right SCD.

Here's the detailed implementation of slowly changing dimension type 2 in Hive using exclusive join approach.
Assuming that the source is sending a complete data file i.e. old, updated and new records.
Steps-
Load the recent file data to STG table
Select all the expired records from HIST table
select * from HIST_TAB where exp_dt != '2099-12-31'
Select all the records which are not changed from STG and HIST using inner join and filter on HIST.column = STG.column as below
select hist.* from HIST_TAB hist
inner join STG_TAB stg
on hist.key = stg.key
where hist.column = stg.column
Select all the new and updated records which are changed from STG_TAB using exclusive left join with HIST_TAB and set expiry and effective date as below
select stg.*, eff_dt (yyyy-MM-dd), exp_dt (2099-12-31)
from STG_TAB stg
left join
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
on hist.key = stg.key
where hist.key is null
or hist.column != stg.column
Select all updated old records from the HIST table using exclusive left join with STG table and set their expiry date as shown below:
select hist.*, exp_dt(yyyy-MM-dd) from
(select * from HIST_TAB where exp_dt = '2099-12-31') hist
left join STG_TAB stg
on hist.key= stg.key
where hist.key is null
or hist.column!= stg.column
unionall queries from 2-5 and insert overwrite result to HIST table
More detailed implementation of SCD type 2 can be found here-
https://github.com/sahilbhange/slowly-changing-dimension

drop table if exists harsha.emp;
drop table if exists harsha.emp_tmp1;
drop table if exists harsha.emp_tmp2;
drop table if exists harsha.init_load;
show databases;
use harsha;
show tables;
create table harsha.emp (eid int,ename string,sal int,loc string,dept int,start_date timestamp,end_date timestamp,current_status string)
comment "emp scd implementation"
row format delimited
fields terminated by ','
lines terminated by '\n'
;
create table harsha.emp_tmp1 (eid int,ename string,sal int,loc string,dept int,start_date timestamp,end_date timestamp,current_status string)
comment "emp scd implementation"
row format delimited
fields terminated by ','
lines terminated by '\n'
;
create table harsha.emp_tmp2 (eid int,ename string,sal int,loc string,dept int,start_date timestamp,end_date timestamp,current_status string)
comment "emp scd implementation"
row format delimited
fields terminated by ','
lines terminated by '\n'
;
create table harsha.init_load (eid int,ename string,sal int,loc string,dept int)
row format delimited
fields terminated by ','
lines terminated by '\n'
;
show tables;
insert into table harsha.emp select 101 as eid,'aaaa' as ename,3400 as sal,'chicago' as loc,10 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 102 as eid,'abaa' as ename,6400 as sal,'ny' as loc,10 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 103 as eid,'abca' as ename,2300 as sal,'sfo' as loc,20 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 104 as eid,'afga' as ename,3000 as sal,'seattle' as loc,10 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 105 as eid,'ikaa' as ename,1400 as sal,'LA' as loc,30 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 106 as eid,'cccc' as ename,3499 as sal,'spokane' as loc,20 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
insert into table harsha.emp select 107 as eid,'toiz' as ename,4000 as sal,'WA.DC' as loc,40 as did,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from (select '123')x;
load data local inpath 'Documents/hadoop_scripts/t3.txt' into table harsha.emp;
load data local inpath 'Documents/hadoop_scripts/t4.txt' into table harsha.init_load;
insert into table harsha.emp_tmp1 select eid,ename,sal,loc,dept,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status
from harsha.init_load;
insert into table harsha.emp_tmp2
select a.eid,a.ename,a.sal,a.loc,a.dept,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'updated' as current_status from emp_tmp1 a
left outer join emp b on
a.eid=b.eid and
a.ename=b.ename and
a.sal=b.sal and
a.loc = b.loc and
a.dept = b.dept
where b.eid is null
union all
select a.eid,a.ename,a.sal,a.loc,a.dept,from_unixtime(unix_timestamp()) as start_date,from_unixtime(unix_timestamp('9999-12-31 23:59:59','yyyy-mm-dd hh:mm:ss')) as end_date,'current' as current_status from emp_tmp1 a
left outer join emp b on
a.eid = b.eid and
a.ename=b.ename and
a.sal=b.sal and
a.loc=b.loc and
a.dept=b.dept
where b.eid is not null
union all
select b.eid,b.ename,b.sal,b.loc,b.dept,b.start_date as start_date,from_unixtime(unix_timestamp()) as end_date,'expired' as current_status from emp b
inner join emp_tmp1 a on
a.eid=b.eid
where
a.ename <> b.ename or
a.sal <> b.sal or
a.loc <> b.loc or
a.dept <> b.dept
;
insert into table harsha.emp select eid,ename,sal,loc,dept,start_date,end_date,current_status from emp_tmp2;
records including expired:
select * from harsha.emp order by eid;
latest recods:
select a.* from emp a inner join (select eid ,max(start_date) as start_date from emp where current_status <> 'expired' group by eid) b on a.eid=b.eid and a.start_date=b.start_date;

I did use another approach when it come to managing data with SCDs:
Never update data that does exist inside your historical file or table.
Make sure that new rows will be compared to the most recent generation, for instance the load logic will add control columns : loaded_on, checksum and if needed a sequence column that would be used if multiple loads does occur the same day then comparing new data to the most recent generation will use both control columns and a key column that does exist inside your data like a customer or product key.
Now, the magic does take place by computing the checksum of all the column involved but the control columns, creating a unique finger print for each row. The finger print (checksum) column then will be used to determine if any columns have changed compared to the most recent generation (most recent generation is based on the latest state of the data based on the key, loaded_on and sequence).
Now, you know if a row coming from your daily update is new because there is no previous generation or if a row coming from your daily update will require to create a new row (new generation) inside your historical file or table and last if a row coming from your daily update does not have any changes therefore no need to create a row because there is no difference compared to previous generation.
The type of logic needed can be build using Apache Spark, in a single statement you can ask Spark to concatenate any number of columns of any datatypes then compute a hash value that is used to finger print it.
All together now you can develop a utility based on spark that will accept any data source and output a well organized, clean and slow dimensions aware historical file, table,... last, never update append only!

Related

Solving Procedure with no parameters

Hey guys just here to see if you guys can help me solve this Procedure problem I am running into. Long story short I made a new table called
Create Table ClientHistoricalPurchases(
ClientID varchar2(6) constraint clientidhistorical references Clients,
DistinctProducts number (9),
TotalProducts number(9),
TotalCost number (9,2),
Primary Key (ClientID));
And I want to populate/update this table by running a procedure that reads primarily from the following table:
create table OrderDetails(
OrderID varchar2(6) CONSTRAINT orddetpk PRIMARY KEY,
ProductID varchar2(6) CONSTRAINT prdfk REFERENCES Products ,
UnitPrice number(10,2),
Quantity number(4),
Discount number(3),
ShippingDate date);
I do a couple of joins with two more tables called Orders and Clients but those are trivial joins using the Primary Key's/FK.
So the goal of this procedure is that when I run it I want to loop through order details and I want to calculate the distinct amount of products bought by a Client, the total products and the total purchase amount and I want to update an existing record with the new values if its in the new ClientHistoricalPurchases table if not I want to add a new record for it. So this is what I wrote but its giving me errors:
Create or Replace Procedure Update_ClientHistPurch as
Cursor C1 is
Select orderid, orders.clientid, productid, unitprice, quantity, discount
from orderdetails
Inner join orders on orderdetails.orderid = orders.clientid
for update of TotalCost;
PurchaseRow c1%RowType;
DistinctProducts orderdetails.quantity%type;
TotalProducts orderdetails.quantity%type;
ProposedNewBalance orderdetails.unitprice%type;
Begin
Begin
Begin
Begin
Open C1;
Fetch c1 into PurchaseRow;
While c1% Found Loop
Select count(distinct productid)
into DistinctProducts
from orderdetails
Inner join orders on orderdetails.orderid = orders.orderid
Inner join clients on orders.clientid = clients.clientid
where clients.clientid = purchaserow.clientid;
end;
Select count(ProductID)
into TotalProducts
from orderdetails
Inner join orders on orderdetails.orderid = orders.orderid
Inner join clients on orders.clientid = clients.clientid
where clients.clientid = purchaserow.clientid;
end;
Select sum((unitprice * quantity) - discount)
into ProposedNewBalance
from orderdetails
Inner join orders on orderdetails.orderid = orders.orderid
Inner join clients on orders.clientid = clients.clientid
where clients.clientid = purchaserow.clientid;
end;
If purchaserow.clientid not in ClientHistoricalpurchases.clientid then
insert into ClientHistoricalPurchases values (purchaserow.clientid,DistinctProducts, TotalProducts, ProposedNewBalance);
End if;
If purchaserow.clientid in ClientHistoricalPurchases.clientid then
Update Clienthistoricalpurchases
set clienthistoricalpurchases.distinctproducts = distinctproducts, clienthistoricalpurchases.totalproducts = totalproducts, clienthistoricalpurchases.totalcost = ProposedNewBalance
where purchaserow.clientid = clienthistoricalpurchases.clientid;
end if;
end loop;
end;
Errors are the following:
Error(27,4): PLS-00103: Encountered the symbol ";" when expecting one
of the following: loop The symbol "loop" was substituted for ";"
to continue.
Error(33,7): PLS-00103: Encountered the symbol "JOIN"
when expecting one of the following: , ; for group having
intersect minus order start union where connect
Any help is appreciated guys. Thanks!
In addition to the comments and answers you've already been given, I believe you have massively overcomplicated your procedure. You're doing things very procedurally, rather than thinking in sets as you should be. You are also getting the aggregated columns in three queries that are essentially identical (e.g. same tables, join conditions and predicates) - you could combine them all to get the three results in a single query.
It looks like you're trying to insert into the clienthistoricalpurchases table if a row doesn't already exist for that client, otherwise you update the row. That immediately screams "MERGE statement" to me.
Combining all that, I think your current procedure should contain just a single merge statement:
MERGE INTO clienthistoricalpurchases tgt
USING (SELECT clients.client_id,
COUNT(DISTINCT od.productid) distinct_products,
COUNT(od.productid) total_products,
SUM((od.unitprice * od.quantity) - od.discount) proposed_new_balance
FROM orderdetails od
INNER JOIN orders
ON orderdetails.orderid = orders.orderid
INNER JOIN clients
ON orders.clientid = clients.clientid
GROUP BY clients.client_id) src
ON (tgt.clientid = src.client_id)
WHEN NOT MATCHED THEN
INSERT (tgt.clientid,
tgt.distinctproducts,
tgt.totalproducts,
tgt.totalcost)
VALUES (src.clientid,
src.distinct_products,
src.total_products,
src.proposed_new_balance)
WHEN MATCHED THEN
UPDATE SET tgt.distinctproducts = src.distinct_products,
tgt.totalproducts = src.total_products,
tgt.totalcost = src.proposed_new_balance;
However, I have some concerns over your current logic and/or data model.
It seems like you're expecting at most one row per clientid to appear in clienthistoricalpurchases. What if a clientid has two or more different orders? Currently you would overwrite any existing row.
Also, do you really want to apply this logic across all orders every single time it gets run?
Line 28 of your code, the first END that follows WHILE, should be END LOOP

SELECT within UPDATE gives an error

I am getting SQL Error: ORA-01779: cannot modify a column which maps to a non key-preserved table error on this statement:
UPDATE
(
SELECT CELLS.NUM, UND.CLIENT_PARAMS
FROM CELLS
LEFT OUTER JOIN UND
ON CELLS.UND_ID = UND.ID
WHERE CELLS.SASE = 1
) t
SET t.CLIENT_PARAMS = 'test';
I would like to update CLIENT_PARAMS field for all rows, which select returns.
The most straightforward (though possibly not the most efficient) way to update rows in one table which correspond directly to rows in another table via an identity column would be to use WHERE table1.column IN (SELECT id FROM table2 WHERE ...).
In this case:
UPDATE UND
SET client_params = 'test'
WHERE id IN
(SELECT und_id
FROM CELLS
WHERE SASE=1)
Try this
UPDATE und u
SET client_params = 'test'
WHERE EXISTS
(SELECT 1
FROM cells c
WHERE C.SASE = 1
AND c.und_id = u.id)

Copy records from one table to another with pl-sql

I want to copy records from one table to another.
The only records from table 1 that will be copied to table 2 are the ones that still dont exist in table 2.
If duplicate records exists in Table 1 then only be copied to table 2 the record with the larger size name.
I could already implement a query that almost does what I want.
The problem I have is when there are names with the same maximum size of characters.
In these cases, my query returns more than one record and I just want to insert one new record in table 2.
Does anyone know how I can fix this?
Here is my code:
For x in (Select distinct xdd.id_t, xdd.name_t
From table1 xdd
Where xdd.id_t not in (Select distinct det.id_t2
From table2 det)
And LENGTH(xdd.name_t) in (Select Max(LENGTH(xdd2.name_t))
From table1 xdd2
Where xdd2.id_t = xdd.id_t)
) Loop
Insert into id_t2 (id_t2, name_t2)
Values (x.id_t, x.name_t);
End loop;
Can you give me an example to solve this?
Sure. If I understood requirements correctly, then the merge statement will look similar to this one:
We use row_number() analytic function to choose a duplicate record with longer name_t
merge into table_two t2
using(
select id_t
, name_t
from (select id_t
, name_t
, row_number() over(partition by id_t
order by length(name_t) desc) as rn
from table_one) q
where q.rn = 1
) t1
on (t2.id_t = t1.id_t)
when not matched then
insert(id_t, name_t)
values(t1.id_t, t1.name_t)
SQLFiddle demo
This is a merge statement that should "upsert" data from table 1 into table 2. Matching keys should update only when the name field in table1 is greater than that of table 2. And inserts should occur when keys from table one are not matched to table 2.
MERGE INTO table2 D
USING (SELECT table1.id_t, table1.name_t FROM table1) S
ON (D.id_t2 = S.id_t)
WHEN MATCHED THEN UPDATE SET D.name_t2 = S.name_t
WHERE (LENGTH(S.name_t) > LENGTH(D.name_t2))
WHEN NOT MATCHED THEN INSERT (D.id_t, D.name_t)
VALUES (S.id_t2, S.name_t2);

How to merge data from multiple rows into single column in new table?

How do I merge data from multiple rows in one table to a single column in a new table?
create table new_paragraphs
(
id NUMBER
paragraph CLOB
);
create table old_paragraphs
(
id
paragraph CLOB
);
merge into new_paragraphs a
using (select * from old_paragraphs) b
on (id = id)
when matched then
update set a.paragraph = a.paragraph || b.paragraph;
-- Results in error: unable to get a stable set of rows in the source tables
The above throws an exception.
It would work if id were a primary key in at least *old_paragraphs* (or if it were unique for each id found in *new_paragraph*)
Other than that, you want to use aliases in on (id = id) so that it reads on (a.id = b.id):
merge into new_paragraphs a
using (select * from old_paragraphs) b
on (a.id = b.id)
when matched then
update set a.paragraph = a.paragraph || b.paragraph;
Why are you doing a MERGE here? Why not a simple UPDATE (assuming ID is the primary key of both tables)
UPDATE new_paragraphs a
SET paragraph = (select a.paragraph || b.paragraph
from old_paragraphs b
where a.id = b.id)
WHERE EXISTS (SELECT 1
FROM old_paragraphs b
WHERE a.id = b.id)

MERGE output cursor of SP into table?

i have a Stored Procedure which returns output as a ref cursor. I would like to store the output in another table using the MERGE statement. I'm having problems however mixing all the statements together (WITH, USING, MERGE etc..).
Can somebody assist? Thanks!
This is the table i want the output in (STOP_TIME is left out on purpose):
TABLE: **USER_ALLOCATION**
START_TIME date NOT NULL Primary Key
USER_ID number NOT NULL Primary Key
TASK_ID number NULL
This is the SP:
create or replace
PROCEDURE REPORT_PLAN_AV_USER
(
from_dt IN date,
to_dt IN date,
sysur_key IN number,
v_reservations OUT INTRANET_PKG.CURSOR_TYPE
)
IS
BEGIN
OPEN v_reservations FOR
with
MONTHS as (select FROM_DT + ((ROWNUM-1) / (24*2)) as DT from DUAL connect by ROWNUM <= ((TO_DT - FROM_DT) * 24*2) + 1),
TIMES as (select DT as START_TIME,(DT + 1/48) as STOP_TIME from MONTHS where TO_NUMBER(TO_CHAR(DT,'HH24')) between 8 and 15 and TO_NUMBER(TO_CHAR(DT,'D')) not in (1,7))
select
TIMES.START_TIME,
TIMES.STOP_TIME,
T.TASK_ID,
sysur_key USER_ID,
from
TIMES
left outer join (ALLOCATED_USER u INNER JOIN REQUIRED_RESOURCE r ON u.AU_ID = r.RR_ID INNER JOIN TASK t ON r.TASK_ID = t.TASK_ID)
ON u.USER_ID = sysur_key AND t.PLAN_TYPE = 3 AND TIMES.start_time >= TRUNC30(t.START_DATE) AND TIMES.start_time < TRUNC30(t.FINISH_DATE)
where u.USER_ID is null OR u.USER_ID = sysur_key
order by START_TIME ASC;
END;
I don't think you can resuse the cursor unless you write a lot of procedural code.
Can't you write a single merge statement and drop procedure REPORT_PLAN_AV_USER?
If you still need procedure REPORT_PLAN_AV_USE you can create a view that you use in procedure REPORT_PLAN_AV_USER and in your merge statement (to prevent code duplication).

Resources