We're building a data warehouse on BigQuery, which includes data from a old Oracle 9 transactional database (still active), which does not include any indexing or timestamps.
Using Standard SQL, I would like to analyse changes in some tables imported from this database.
Simplifying the situation, imagine we have a two versions of the same table before and after as follows:
with before as (
select
'U123' as user, 'Gum' as product, '3' as quantity
union all
select
'U456', 'Tissue', '20'
union all
select
'U123', 'Cream', '1'
)
and
with after as (
select
'U123' as user, 'Gum' as product, '3' as quantity
union all
select
'U456', 'Tissue', '20'
union all
select
'U123', 'Cream', '3'
union all
select
'U456', 'Tomato', '5'
)
So that row 4 was added and row 3 modified.
What is the correct approach to compare data and locate changes given there is no indexing nor timestamps?
So the comparative method should output:
user | product | quantity
U123 | Cream | 3
U456 | Tomato | 5
I don't even know where to start.
Below is for BigQuery Standard SQL
#standardSQL
SELECT user, product, IFNULL(a.quantity, 0) - IFNULL(b.quantity, 0) AS quantity
FROM after a
FULL OUTER JOIN before b
USING(user, product)
WHERE IFNULL(a.quantity, 0) != IFNULL(b.quantity, 0)
When applied to sample data from your question as in below example
#standardSQL
WITH before AS (
SELECT 'U123' AS user, 'Gum' AS product, 3 AS quantity UNION ALL
SELECT 'U456', 'Tissue', 20 UNION ALL
SELECT 'U123', 'Cream', 1
), after AS (
SELECT 'U123' AS user, 'Gum' AS product, 3 AS quantity UNION ALL
SELECT 'U456', 'Tissue', 20 UNION ALL
SELECT 'U123', 'Cream', 3 UNION ALL
SELECT 'U456', 'Tomato', 5
)
SELECT user, product, IFNULL(a.quantity, 0) - IFNULL(b.quantity, 0) AS quantity
FROM after a
FULL OUTER JOIN before b
USING(user, product)
WHERE IFNULL(a.quantity, 0) != IFNULL(b.quantity, 0)
output is
Row user product quantity
1 U123 Cream 2
2 U456 Tomato 5
Oracle 9 keeps track of data change at Row level with the help of SCN (System Change Number). As a result any change performed through DML (INSERT/UPDATE) is internally captured with a TIMESTAMP.
How it works?
Create a Table with ROWDEPENDENCIES Option
Use SCN_TO_TIMESTAMP(ORA_ROWSCN) Function to get the TIMETAMP of Row Changes
Example:
-- Create Table
CREATE TABLE SCNTEST(USER NUMBER, PRODUCT NUMBER, QUANTITY NUMBER) ROWDEPENDENCIES;
-- Insert Data
INSERT ...
-- Query Data
SELECT USER, PRODUCT, QUANTITY, SCN_TO_TIMESTAMP(ORA_ROWSCN) FROM SCNTEST;
You can group data on SCN_TO_TIMESTAMP(ORA_ROWSCN) value to get before and after records.
Related
I have a scenario where I have to display a row 'n' number of times depending on the value in its quantity column.
Item Qty
abc 2
cde 1
Item Qty
abc 1
abc 1
cde 1
I am looking to convert the first table to the second.
I came across the site that I should be using the recursive WITH query.
My anchor member returns the original table.
SELECT ITEM, QTY
FROM lines
WHERE
JOB = TO_NUMBER ('1')
AND ITEM IN
(SELECT PART
FROM PICK
WHERE DELIVERY = '2')
My recursive member is as follows.
SELECT CTE.ITEM, (CTE.QTY - 1) QTY
FROM CTE
INNER JOIN
(SELECT ITEM, QTY
FROM LINES
WHERE JOB_ID = TO_NUMBER ('1')
AND ITEM IN
(SELECT PART
FROM PICK
WHERE DELIVERY = '2'
)) T
ON CTE.ITEM = T.ITEM
WHERE CTE.QTY > 1
My goal is to get all the parts and quantities first then and then for all parts with qty > 1 in the recursive step generate new rows to be added to the original result set and qty displayed in the new rows would be (original qty for that part - 1). The recursion would go on until qty becomes 1 for all the parts.
So this is what I had in the end.
WITH CTE (ITEM, QTY)
AS (
SELECT ITEM, QTY
FROM lines
WHERE
JOB = TO_NUMBER ('1')
AND ITEM IN
(SELECT PART
FROM PICK
WHERE DELIVERY = '2')
UNION ALL
SELECT CTE.ITEM, (CTE.QTY - 1) QTY
FROM CTE
INNER JOIN
(SELECT ITEM, QTY
FROM LINES
WHERE JOB_ID = TO_NUMBER ('1')
AND ITEM IN
(SELECT PART
FROM PICK
WHERE DELIVERY = '2'
)) T
ON CTE.ITEM = T.ITEM
WHERE CTE.QTY > 1)
SELECT ITEM, QTY
FROM CTE
ORDER BY 1, 2 DESC
I get the following error when I try the above
"ORA-32044: cycle detected while executing recursive WITH query"
How is it getting into a cycle? What did I miss in its working?
Also, Upon reading from another website If I used a "cycle clause". I was able to stop the cycle.
The clause I used was.
CYCLE
QUANTITY
SET
END TO '1'
DEFAULT '0'
If I used this before the select statement. I'm getting the desired output but I don't feel this is the right way of going about it. What exactly is the clause doing? What is the right way of using it?
Oracle Setup:
CREATE TABLE lines ( Item, Qty ) AS
SELECT 'abc', 2 FROM DUAL UNION ALL
SELECT 'cde', 1 FROM DUAL;
CREATE TABLE pick ( part, delivery ) AS
SELECT 'abc', 2 FROM DUAL UNION ALL
SELECT 'cde', 2 FROM DUAL;
Query 1: Using a hierarchical query:
SELECT Item,
COLUMN_VALUE AS qty
FROM lines l
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT 1
FROM DUAL
CONNECT BY LEVEL <= l.Qty
)
AS SYS.ODCINUMBERLIST
)
) t
WHERE item IN ( SELECT part FROM pick WHERE delivery = 2 )
Query 2: Using a recursive sub-query factoring clause:
WITH rsqfc ( item, qty ) AS (
SELECT item, qty
FROM lines l
WHERE item IN ( SELECT part FROM pick WHERE delivery = 2 )
UNION ALL
SELECT item, qty - 1
FROM rsqfc
WHERE qty > 1
)
SELECT item, 1 AS qty
FROM rsqfc;
Output:
ITEM | QTY
:--- | --:
abc | 1
abc | 1
cde | 1
db<>fiddle here
I am trying to create a month, year temp table that I can relate to in calculations, however I am having some issues. I am unable to create global temp tables due to restrictions and have to rely on the following kind of query.
WITH Months AS
(
SELECT LEVEL -1 AS ID
FROM DUAL
CONNECT BY LEVEL <=264
)
(SELECT
ROWNUM AS MO_SYS_ID,
TO_CHAR(ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), ID), 'YYYY'||'MM') AS MO_NM,
TO_CHAR(ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), ID), 'MON') AS MO_ABBR_NM,
TO_CHAR(ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), ID), 'MONTH') AS MO_FULL_NM,
TO_CHAR(ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), ID), 'MM')AS MO_NBR,
TO_CHAR(ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), ID), 'YYYY') AS YR_NBR
from Months;
What I really need to do is have this inserted into the temp table that I can recall. I do not have any fields that I can use from other tables either unfortunately. I need it to show 264 months from 1999.
Thank you
You can calculate a date column within the table expression, like this:
WITH Months AS (
SELECT LEVEL -1 AS ID, ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), LEVEL -1) as dt
FROM DUAL
CONNECT BY LEVEL <=264
)
SELECT *
from Months
If you are attempting to create date ranges, you could do this:
WITH Months AS (
SELECT LEVEL -1 AS ID
, ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), LEVEL -1) as start_dt
, ADD_MONTHS(TO_DATE('01/01/1999', 'DD/MM/YY'), LEVEL ) as end_dt
FROM DUAL
CONNECT BY LEVEL <=264
)
SELECT *
from yourtable t
inner join Months m on t.somecol >= m.start_dt and t.somecol < m.end_dt
I have 2 tables and I need to do a table compare:
TABLE A
LABEL
VALUE
TABLE B
LABEL
VALUE
Basically I want:
Records in where the values are not equal on matching labels
Records in TABLE A that are not in TABLE B
Records in TABLE B that are not in TABLE A
With that information, I can record the proper historical data I need to. It will show me where the value has changed, or where a label was added or deleted......you can say TABLE A is the "new" set of data, and TABLE B is the "old" set of data. So I can see what is being added, what was deleted, and what was changed.
Been trying with UNION & MINUS, but no luck yet.
Something like:
A LABEL A VALUE B LABEL B VALUE
---------------------------------------
XXX 5 XXX 3
YYY 2
ZZZ 4
WWW 7 WWW 8
If the labels and values are the same, I do not need them in the result set.
Here is one way (and possibly the most efficient way) to solve this problem. The main part is the subquery that does a UNION ALL and GROUP BY on the result, keeping only groups consisting of a single row. (The groups with two rows are those where the same row exists in both tables.) This method was invented by Marco Stefanetti - first discussed on the AskTom discussion board. The benefit of this approach - over the more common "symmetric difference" approach - is that each base table is read just once, not twice.
Then, to put the result in the desired format, I use a PIVOT operation (available since Oracle 11.1); in earlier versions of Oracle, the same can be done with a standard aggregate outer query.
Note that I modified the inputs to show the handling of NULL in the VALUE column also.
Important: This solution assumes LABEL is primary key in both tables; if not, it's not clear how the required output would even make sense.
with
table_a ( label, value ) as (
select 'AAA', 3 from dual
union all select 'CCC', null from dual
union all select 'XXX', 5 from dual
union all select 'WWW', 7 from dual
union all select 'YYY', 2 from dual
union all select 'HHH', null from dual
),
table_b ( label, value ) as (
select 'ZZZ', 4 from dual
union all select 'AAA', 3 from dual
union all select 'HHH', null from dual
union all select 'WWW', 8 from dual
union all select 'XXX', 3 from dual
union all select 'CCC', 1 from dual
)
-- End of test data (NOT PART OF THE SOLUTION!) SQL query begins below this line.
select a_label, a_value, b_label, b_value
from (
select max(source) as source, label as lbl, label, value
from (
select 'A' as source, label, value
from table_a
union all
select 'B' as source, label, value
from table_b
)
group by label, value
having count(*) = 1
)
pivot ( max(label) as label, max(value) as value for source in ('A' as a, 'B' as b) )
;
Output:
A_LABEL A_VALUE B_LABEL B_VALUE
------- ------- ------- -------
YYY 2
CCC CCC 1
WWW 7 WWW 8
ZZZ 4
XXX 5 XXX 3
In my database I have a table with column that indicates the code of each record ( aside from ID column ). this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record. Now the table has the column of codes with the following order :
+------+
code
+------+
1
+------+
2
+------+
3
+------+
5
+------+
I want a query to return 4 as the result.
Note that this query is highly frequent in my system and the best query with minimum execution time will be appreciated.
Is using a self-join acceptable? If so:
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL
UNION SELECT 5 FROM DUAL)
-- request:
SELECT COALESCE(MIN(d1.code+1),1)
FROM data d1 LEFT JOIN data d2 ON d1.code+1 = d2.code
WHERE d2.code IS NULL;
This will build the list of data.code without a successor. And using MIN(...+1) you will get the first empty slot. I used COALESCE(...) in order to handle the specific case where there isn't any entry in the data table.
And alternate form using a sequence generator might lead to better performances as is does not require the whole table to be traversed in order to perform the aggregate function MIN():
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 5 FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL)
-- request:
SELECT T.code FROM (SELECT d1.code
FROM (SELECT LEVEL code FROM DUAL CONNECT BY LEVEL < 9999) d1 LEFT JOIN data d2
ON d1.code = d2.code
WHERE d2.code IS NULL
ORDER BY d1.code ASC
) T WHERE ROWNUM < 2
The drawback is you now have an upper limit hard-coded. It might be dynamically inferred from the data table though. So is is not really blocking. I let you compare timings yourself.
this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record
Please note however this will lead to a race condition if two concurrent sessions try to insert a row at the same time. Given your example, they will both try to insert a row with code = 4 -- obviously both will not succeed in doing so as your column is unique...
I recently use the code below:
SELECT t1.id+1
FROM table t1
LEFT OUTER JOIN table t2 ON (t1.id + 1 = t2.id)
WHERE t2.id IS NULL
/* and rownum = 1 Need to use a sub select if you want this to work */
ORDER BY t1.id;
I run it every time that I need to insert a new row and use the minimum unused id.
I hope it works for your purposes.
select level unusedval from dual connect by level < 10
minus
select tno from t2);
you can change level condition dependents on max value.
My imaginary results would look like:
Category | Year | sum |
--------- ------ --------
A 2008 200
A 2009 0
B 2008 100
B 2009 5
... ... ...
i.e. the sum of the transactions per year and per category.
There are cases where a category does not have any transaction for one year. in those cases the 2nd line of the results will not appear. How do I have to re-write the above query in order to include 2008, 2009 for every category?
select category, to_char(trans_date, 'YYYY') year, sum(trans_value)
from transaction
group by category, to_char(trans_date, 'YYYY')
order by 1, 2;
With a partitioned outer join, you don't need a categories table.
I used the same transactions table as "dcp" used:
SQL> create table transactions
2 ( category varchar(1)
3 , trans_date date
4 , trans_value number(25,8)
5 );
Table created.
SQL> insert into transactions values ('A',to_date('2008-01-01','yyyy-mm-dd'),100.0);
1 row created.
SQL> insert into transactions values ('A',to_date('2008-02-01','yyyy-mm-dd'),100.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2008-01-01','yyyy-mm-dd'),50.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2008-02-01','yyyy-mm-dd'),50.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2009-08-01','yyyy-mm-dd'),5.0);
1 row created.
For the partitioned outer join you only need a set of years to partition outer join against. In the query below I used 2 years (2008 and 2009), but you can easily adjust that set.
SQL> with the_years as
2 ( select 2007 + level year
3 , trunc(to_date(2007 + level,'yyyy'),'yy') start_of_year
4 , trunc(to_date(2007 + level + 1,'yyyy'),'yy') - interval '1' second end_of_year
5 from dual
6 connect by level <= 2
7 )
8 select t.category "Category"
9 , y.year "Year"
10 , nvl(sum(t.trans_value),0) "sum"
11 from the_years y
12 left outer join transactions t
13 partition by (t.category)
14 on (t.trans_date between y.start_of_year and y.end_of_year)
15 group by t.category
16 , y.year
17 order by t.category
18 , y.year
19 /
Category Year sum
-------- ---------- ----------
A 2008 200
A 2009 0
B 2008 100
B 2009 5
4 rows selected.
Also note that I used start_of_year and end_of_year, so if you want to filter on trans_date and you have an index on that column, it could be used. Another option is to simply use trunc(t.trans_date) = y.year as on-condition.
Hope this helps.
Regards,
Rob.
You ideally need a table of categories and a table of years:
select c.category, y.year, nvl(sum(t.trans_value),0)
from categories c
cross join years y
left outer join transaction t
on to_char(t.trans_date, 'YYYY') = y.year
and t.category = c.category
group by c.category, y.year
order by 1, 2;
Hopefully you do have a table of categories, but you may well not have a table of years, in which case you can "fake" one like this:
with years as
( select 2007+rownum year
from dual
connect by rownum < 10) -- returns 2008, 2009, ..., 2017
select c.category, y.year, nvl(sum(t.trans_value),0)
from categories c
cross join years y
left outer join transaction t
on to_char(t.trans_date, 'YYYY') = y.year
and t.category = c.category
group by c.category, y.year
order by 1, 2;
Here's a complete, working example:
CREATE TABLE transactions (CATEGORY VARCHAR(1), trans_date DATE, trans_value NUMBER(25,8));
CREATE TABLE YEAR (YEAR NUMBER(4));
CREATE TABLE categories (CATEGORY VARCHAR(1));
INSERT INTO categories VALUES ('A');
INSERT INTO categories VALUES ('B');
INSERT INTO transactions VALUES ('A',to_date('2008-01-01','YYYY-MM-DD'),100.0);
INSERT INTO transactions VALUES ('A',to_date('2008-02-01','YYYY-MM-DD'),100.0);
INSERT INTO transactions VALUES ('B',to_date('2008-01-01','YYYY-MM-DD'),50.0);
INSERT INTO transactions VALUES ('B',to_date('2008-02-01','YYYY-MM-DD'),50.0);
INSERT INTO transactions VALUES ('B',to_date('2009-08-01','YYYY-MM-DD'),5.0);
INSERT INTO YEAR VALUES (2008);
INSERT INTO YEAR VALUES (2009);
SELECT b.category
, b.year
, SUM(nvl(a.trans_value,0))
FROM (SELECT to_char(a.trans_date,'YYYY') YEAR
, CATEGORY
, SUM(NVL(trans_value,0)) trans_value
FROM transactions a
GROUP BY to_char(a.trans_date,'YYYY')
, a.category ) a
, (SELECT
DISTINCT a.category
, b.year
FROM categories a
, YEAR b ) b
WHERE b.year = to_char(a.year(+))
AND b.category = a.category(+)
GROUP BY
b.category
, b.year
ORDER BY 1
,2;
Output:
CATEGORY YEAR SUM(NVL(A.TRANS_VALUE,0))
1 A 2008 200
2 A 2009 0
3 B 2008 100
4 B 2009 5