Please suppose you have an Oracle table, TABLE A, as follows:
In this table the main fields are FIELD1 and FIELD2. You can see that:
a) For the couple (AAA, 1) we have two values: 200.03 and 100.02;
b) For the couple (BBB, 3) we have two values: 300.04 and 400.05.
We would like to make a sum aggregation as follows, updating the following table:
In field3 of table B, we would like to store the sum of 200.03 and 100.02 with reference to the couple (AAA, 1), and we would like to store the sum of 300.04 and 400.05 with reference to the couple (BBB, 3).
Please imagine that we could have many different couples:
(ZZZ, 77)
(YYY, 12)
... and so on.
Please suppose that the record referred to a single couple may be more than two, in which case we should sum the values of all the records regarding the same couple.
In our simple case, the result will be the following:
The real case has a table A with about 20 million of records, so I would like to write the software in PL/SQL using BULK COLLECT, UPDATE and FORALL.
What would be the best approach? Please provide PL/SQL code in order to explain how to solve the problem.
Thank you very much for considering my request.
Frankly I wouldn't use BULK COLLECT and FORALL here - I'd use a MERGE statement. Try something like
MERGE INTO TABLE_B b
USING (SELECT FIELD1, FIELD2, SUM(FIELD3) AS TOTAL_FIELD3
FROM TABLE_A
GROUP BY FIELD1, FIELD2) a
ON (b.FIELD1 = a.FIELD1 AND
b.FIELD2 = a.FIELD2)
WHEN NOT MATCHED THEN
INSERT (FIELD1, FIELD2, FIELD3)
VALUES (a.FIELD1, a.FIELD2, a.TOTAL_FIELD3)
WHEN MATCHED THEN
UPDATE
SET FIELD3 = a.TOTAL_FIELD3;
Best of luck.
Related
In my power Bi I would like to count rows for all my tables and having this output:
Table Name
Row count
Table1
126
Table2
985
Table3
998
...
...
As long as I have few tables I can do
NEWTABLE = UNION(
ROW("TableName","Table1", "Rowcount",ROWSCOUNT(Table1)),
ROW("TableName","Table2", "Rowcount",ROWSCOUNT(Table2)),
...
)
But this starts to be complicated when I have many tables.
Is there a way I can do it? Like a loop or something?
Thank you
If you only need a metrics then you can use DaxStudio -> ViewMetrics
where cardinality is your "rowCounts"
If you need something more, then you can get all table name from DMV
select * from $SYSTEM.TMSCHEMA_TABLES
populate this as another table in your model, and use M language to loop through.
here useful example:
https://community.powerbi.com/t5/Power-Query/Power-query-Counting-rows-from-all-table-in-query-editor-but-not/td-p/1198489
I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.
I have two hive tables (t1 and t2) that I would like to compare. The second table has 5 additional columns that are not in the first table. Other than the five disjoint fields, the two tables should be identical. I am trying to write a query to check this. Here is what I have so far:
SELECT * FROM t1
UNION ALL
select * from t2
GROUP BY some_value
HAVING count(*) == 2
If the tables are identical, this should return 0 records. However, since the second table contains 5 extra fields, I need to change the second select statement to reflect this. There are almost 60 column names so I would really hate to write it like this:
SELECT * FROM t1
UNION ALL
select field1, field2, field3,...,fieldn from t2
GROUP BY some_value
HAVING count(*) == 2
I have looked around and I know there is no select * EXCEPT syntax, but is there a way to do this query without having to explicity name each column that I want included in the final result?
You should have used UNION DISTINCT for the logic you are applying.
However, the number and names of columns returned by each select_statement have to be the same otherwise a schema error is thrown.
You could have a look at this Python program that handles such comparisons of Hive tables (comparing all the rows and all the columns), and would show you in a webpage the differences that might appear: https://github.com/bolcom/hive_compared_bq
To skip the 5 extra fields, you could use the "--ignore-columns" option.
I have a simple INSERT query where I need to use UPDATE instead when the primary key is a duplicate. In MySQL this seems easier, in Oracle it seems I need to use MERGE.
All examples I could find of MERGE had some sort of "source" and "target" tables, in my case, the source and target is the same table. I was not able to make sense of the examples to create my own query.
Is MERGE the only way or maybe there's a better solution?
INSERT INTO movie_ratings
VALUES (1, 3, 5)
It's basically this and the primary key is the first 2 values, so an update would be like this:
UPDATE movie_ratings
SET rating = 8
WHERE mid = 1 AND aid = 3
I thought of using a trigger that would automatically execute the UPDATE statement when the INSERT was called but only if the primary key is a duplicate. Is there any problem doing it this way? I need some help with triggers though as I'm having some difficulty trying to understand them and doing my own.
MERGE is the 'do INSERT or UPDATE as appropriate' statement in Standard SQL, and probably therefore in Oracle SQL too.
Yes, you need a 'table' to merge from, but you can almost certainly create that table on the fly:
MERGE INTO Movie_Ratings M
USING (SELECT 1 AS mid, 3 AS aid, 8 AS rating FROM dual) N
ON (M.mid = N.mid AND M.aid = N.aid)
WHEN MATCHED THEN UPDATE SET M.rating = N.rating
WHEN NOT MATCHED THEN INSERT( mid, aid, rating)
VALUES(N.mid, N.aid, N.rating);
(Syntax not verified.)
A typical way of doing this is
performing the INSERT and catch a DUP_VAL_ON_INDEX and then perform an UPDATE instead
performing the UPDATE first and if SQL%Rows = 0 perform an INSERT
You can't write a trigger on a table that does another operation on the same table. That's causing an Oracle error (mutating tables).
I'm a T-SQL guy but a trigger in this case is not a good solution. Most triggers are not good solutions. In T-SQL, I would simply perform an IF EXISTS (SELECT * FROM dbo.Table WHERE ...) but in Oracle, you have to select the count...
DECLARE
cnt NUMBER;
BEGIN
SELECT COUNT(*)
INTO cnt
FROM mytable
WHERE id = 12345;
IF( cnt = 0 )
THEN
...
ELSE
...
END IF;
END;
It would appear that MERGE is what you need in this case:
MERGE INTO movie_ratings mr
USING (
SELECT rating, mid, aid
WHERE mid = 1 AND aid = 3) mri
ON (mr.movie_ratings_id = mri.movie_ratings_id)
WHEN MATCHED THEN
UPDATE SET mr.rating = 8 WHERE mr.mid = 1 AND mr.aid = 3
WHEN NOT MATCHED THEN
INSERT (mr.rating, mr.mid, mr.aid)
VALUES (1, 3, 8)
Like I said, I'm a T-SQL guy but the basic idea here is to "join" the movie_rating table against itself. If there's no performance hit on using the "if exists" example, I'd use it for readability.
Here's my original question:
merging two data sets
Unfortunately I omitted some intircacies, that I'd like to elaborate here.
So I have two tables events_source_1 and events_source_2 tables. I have to produce the data set from those tables into resultant dataset (that I'd be able to insert into third table, but that's irrelevant).
events_source_1 contain historic event data and I have to do get the most recent event (for such I'm doing the following:
select event_type,b,c,max(event_date),null next_event_date
from events_source_1
group by event_type,b,c,event_date,null
events_source_2 contain the future event data and I have to do the following:
select event_type,b,c,null event_date, next_event_date
from events_source_2
where b>sysdate;
How to put outer join statement to fill the void (i.e. when same event_type,b,c found from event_source_2 then next_event_date will be filled with the first date found
GREATLY APPRECIATE FOR YOUR HELP IN ADVANCE.
Hope I got your question right. This should return the latest event_date of events_source_1 per event_type, b, c and add the lowest event_date of event_source_2.
Select es1.event_type, es1.b, es1.c,
Max(es1.event_date),
Min(es2.event_date) As next_event_date
From events_source_1 es1
Left Join events_source_2 es2 On ( es2.event_type = es1.event_type
And es2.b = es1.b
And es2.c = es1.c
)
Group By c1.event_type, c1.b, c1.c
You could just make the table where you need to select a max using a group by into a virtual table, and then do the full outer join as I provided in the answer to the prior question.
Add something like this to the top of the query:
with past_source as (
select event_type, b, c, max(event_date)
from event_source_1
group by event_type, b, c, event_date
)
Then you can use past_source as if it were an actual table, and continue your select right after the closing parens on the with clause shown.
I end up doing two step process: 1st step populates the data from event table 1, 2nd step MERGES the data between target (the dataset from 1st step) and another source. Please forgive me, but I had to obfuscate table name and omit some columns in the code below for legal reasons. Here's the SQL:
INSERT INTO EVENTS_TARGET (VEHICLE_ID,EVENT_TYPE_ID,CLIENT_ID,EVENT_DATE,CREATED_DATE)
select VEHICLE_ID, EVENT_TYPE_ID, DEALER_ID,
max(EVENT_INITIATED_DATE) EVENT_DATE, sysdate CREATED_DATE
FROM events_source_1
GROUP BY VEHICLE_ID, EVENT_TYPE_ID, DEALER_ID, sysdate;
Here's the second step:
MERGE INTO EVENTS_TARGET tgt
USING (
SELECT ee.VEHICLE_ID VEHICLE_ID, ee.POTENTIAL_EVENT_TYPE_ID POTENTIAL_EVENT_TYPE_ID, ee.CLIENT_ID CLIENT_ID,ee.POTENTIAL_EVENT_DATE POTENTIAL_EVENT_DATE FROM EVENTS_SOURCE_2 ee WHERE ee.POTENTIAL_EVENT_DATE>SYSDATE) src
ON (tgt.vehicle_id = src.VEHICLE_ID AND tgt.client_id=src.client_id AND tgt.EVENT_TYPE_ID=src.POTENTIAL_EVENT_TYPE_ID)
WHEN MATCHED THEN
UPDATE SET tgt.NEXT_EVENT_DATE=src.POTENTIAL_EVENT_DATE
WHEN NOT MATCHED THEN
insert (tgt.VEHICLE_ID,tgt.EVENT_TYPE_ID,tgt.CLIENT_ID,tgt.NEXT_EVENT_DATE,tgt.CREATED_DATE) VALUES (src.VEHICLE_ID, src.POTENTIAL_EVENT_TYPE_ID, src.CLIENT_ID, src.POTENTIAL_EVENT_DATE, SYSDATE)
;