Data Migration - Verify Data loaded where Primary Key can change - oracle

I am currently trying to write SQL to verify the counts of the data that has been migrated from one application to another.
One of the main tables that is being migrated sometimes contains a primary key that already exists in the target application so it needs to be changed. This results in my counts not matching up.
I have a reference table for these changed primary keys but I'm not sure how to incorporate this reference table into my left join.
I really don't know how to include the condition where the key from Table A could be the key on Table B or the new key stored on the Reference table?
select count(*)
from table_b b
left join table_a a on
b.key = a.key
where a.key is null;
The reference table is really simple, two colmumns, old_number, new_number. It will only contain entries where the key in table A needed to be changed before being loaded into table B.
old_number, new_number
12345678, 13345678
23456781, 24456781
How can I include this scenario?
select count(*)
from table_b b
left join table_a a on
b.key = (a.key or new_number if it exists)
where a.key is null;
So, if the query can include the new_numbers in the reference table then the migration count should match the count in Table A.

This should work
select count() from table_b b, table_a a where b.key = a.key UNION select count() from table_b b, reference_table re where b.key = re.new_number;

Related

Oracle SQL: Single Index with two Columns vs index on one Column

I'm using Oracle12c
I have a table with a primary key and separate column.
create tableB(
ID number(10)
,data number(10)
);
ID is my primary key.
I have to join 3 tables on my query and the performance issue is the B.data without an index.
B.data contains 'null' values and multiple entries on the same numbers.
select A.examp from tabled D
join tableb B on D.data = B.data
join tablec C on B.ID = C.ID
join tablea A on C.val = A.val
where D.ID = :value;
So my question is what is the difference between an index that contains only one value like the data column
create index ind_tableb on tableb (data);
and an index that contains multiple columns like
create index ind_tableb on tableb (data, id);
Can i get an improvement by selecting the id in the index with the data in the way i join the columns ?
Thanks for any advise and help.
For this particular query, you want the two column index version:
create index ind_tableb on tableb (data, id);
The above index, if used, would let Oracle rapidly lookup tabled.data values for a potential match with a tableb.data value. If a match be found, then the same index would also contain the tableb.ID value for the next join to tablec. If you just used the single column version on tableb.data alone, then Oracle would have to seek back to the tableb table to find the ID values. This could hurt performance and might even cause the index to not be used.

Using "contains" as a way to join tables?

The Primary key in table one is used as in table 2 but it is modified as so:
Primary key Column in table 1: 123abc
Column in table 2: 123abc_1
I.e. the key is used but then _1 is added to create a unique value in the column of Table 2.
Is there any way that I can join the two tables, the data in the 2 columns is not identical but it very similar. Could I do something like:
SELECT *
FROM TABLE1 INNER JOIN
TABLE2
ON TABLE1.COUMN1 contains TABLE2.COLUMN2;
I.e. checking that the value in Table 1 is within the value in Table 2?
You can check only the first part of column2; for example
SELECT *
FROM TABLE1 INNER JOIN TABLE2
ON INSTR(COLUMN2, COLUMN1) = 1
or
ON COLUMN2 LIKE COLUMN1 || '%'
However, keeping foreign key in such a way can be really dangerous, not to think about performance on large DBs.
You'd better use a different column in Table2 to store the key of Table 1, even adding a constraint.

Multiple Table references in HIVE update statement

I have given update statement which works perfectly in RDBMS, but not working in HIVE. Currently In HIVE, you can't have reference of multiple tables(As in query TABLE TableA,TableB referenced)
UPDATE A
FROM TableA A, TableB B
SET DepartmentId = B.DepartmentId
WHERE A.CustomerId = B.CustomerId ;
I need your help on How can I achieve the same in HIVE ? possible alternative in HIVE ?
Since Hive doesn't support row level inserts and updates there are few workarounds. The answer mentioned above is one such.
One way would be to do the same thing and again insert overwrite into the same table.
INSERT OVERWRITE TABLE A
SELECT A.c1,A.c2, ... , B.DepartmentId , ..
FROM TableA A, TableB B
WHERE A.CustomerId = B.CustomerId ;
This will be like updating the same table.
Hive tables are immutable. So update is not possible. You can always rewrite entire table:
CREATE TABLE TableA_new
AS
SELECT A.c1,A.c2, ... , B.DepartmentId , ..
FROM TableA A, TableB B
WHERE A.CustomerId = B.CustomerId ;
Answer is partially true, Update is available from HIVE 0.14 ;) GL

Oracle query to get the results of a particular table using the result obtained from another table

I am new to Oracle, so kindly bear with me if the question sounds really naive.
So, I have two tables TableA and TableB which have say just two columns id, name for simplicity.
I now want to now get the id value for a particular value of name in TableA. If this would be the only requirement, this query would suffice -
SELECT id from TableA WHERE name = 'some_name';
Now, what I want to do is take this id and delete all the rows in TableB that match this id-
DELETE FROM TableB WHERE id = <id obtained from the above query>;
What is the composite query in oracle that would perform this function?
Thanks!
If you know that only a single id value is going to be returned for a particular name value, you'd just do
DELETE FROM tableB b
WHERE b.id = (SELECT a.id
FROM tableA a
WHERE a.name = 'some_name')
Note that the aliases are optional. However, adding aliases generally makes things clearer so no one has to guess which id or which name you're referring to at any point.
If there might be multiple id values in tableA for a given name, you'd just use an IN rather than an =
DELETE FROM tableB b
WHERE b.id IN (SELECT a.id
FROM tableA a
WHERE a.name = 'some_name')
This would also work if you knew that the query against tableA was only going to return one row. I'd prefer the equality query if you're sure that only one row would be returned, though. I'd generally rather get an error if my expectations were violated rather than potentially having unexpected rows get deleted.

oracle find difference between 2 tables

I have 2 tables that are the same structure. One is a temp one and the other is a prod one. The entire data set gets loaded each time and sometimes this dataset will have deleted records from the prior datasets. I load the dataset into temp table first and if any records were deleted I want to deleted them from the prod table also.
So how can I find the records that exist in prod but not in temp? I tried outer join but it doesn't seem to be working. It's returning all the records from the table in the left or right depending on doing left or right outer join.
I then also want to delete those records in the prod table.
One way would be to use the MINUS operator
SELECT * FROM table1
MINUS
SELECT * FROM table2
will show all the rows in table1 that do not have an exact match in table2 (you can obviously specify a smaller column list if you are only interested in determining whether a particular key exists in both tables).
Another would be to use a NOT EXISTS
SELECT *
FROM table1 t1
WHERE NOT EXISTS( SELECT 1
FROM table2 t2
WHERE t1.some_key = t2.some_key )
How about something like:
SELECT * FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
It'd work the same as a DELETE statement as well:
DELETE FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
MINUS can work here
The following statement combines results with the MINUS operator, which returns only rows returned by the first query but not by the second:
SELECT * FROM prod
MINUS
SELECT * FROM temp;
Minus will only work if the table structure is same

Resources