Synchronizing two tables using a stored procedure, only updating and adding rows where values do not match - performance

The scenario
I've got two tables with identical structure.
TABLE [INFORMATION], [SYNC_INFORMATION]
[ITEM] [nvarchar](255) NOT NULL
[DESCRIPTION] [nvarchar](255) NULL
[EXTRA] [nvarchar](255) NULL
[UNIT] [nvarchar](2) NULL
[COST] [float] NULL
[STOCK] [nvarchar](1) NULL
[CURRENCY] [nvarchar](255) NULL
[LASTUPDATE] [nvarchar](50) NULL
[IN] [nvarchar](4) NULL
[CLIENT] [nvarchar](255) NULL
I'm trying to create a synchronize procedure that will be triggered by a scheduled event at a given time every day.
CREATE PROCEDURE [dbo].[usp_SynchronizeInformation]
AS
BEGIN
SET NOCOUNT ON;
--Update all rows
UPDATE TARGET_TABLE
SET TARGET_TABLE.[DESCRIPTION] = SOURCE_TABLE.[DESCRIPTION],
TARGET_TABLE.[EXTRA] = SOURCE_TABLE.[EXTRA],
TARGET_TABLE.[UNIT] = SOURCE_TABLE.[UNIT],
TARGET_TABLE.[COST] = SOURCE_TABLE.[COST],
TARGET_TABLE.[STOCK] = SOURCE_TABLE.[STOCK],
TARGET_TABLE.[CURRENCY] = SOURCE_TABLE.[CURRENCY],
TARGET_TABLE.[LASTUPDATE] = SOURCE_TABLE.[LASTUPDATE],
TARGET_TABLE.[IN] = SOURCE_TABLE.[IN],
TARGET_TABLE.[CLIENT] = SOURCE_TABLE.[CLIENT]
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
--Add new rows
INSERT INTO SYNC_INFORMATION (ITEMNO, DESCRIPTION, EXTRA, UNIT, STANDARDCOST, STOCKTYPE, CURRENCY_ID, LASTSTANDARDUPDATE, IN_ID, CLIENTCODE)
SELECT
src.ITEM,
src.DESCRIPTION,
src.EXTRA,
src.UNIT,
src.COST,
src.STOCKTYPE,
src.CURRENCY_ID,
src.LASTUPDATE,
src.IN,
src.CLIENT
FROM LSERVER.dbo.INFORMATION src
LEFT JOIN SYNC_INFORMATION targ ON src.ITEMNO = targ.ITEMNO
WHERE
targ.ITEMNO IS NULL
END
Currently, this procedure (including some others that are also executed at the same time) takes about 15 seconds to execute.
I'm planning on adding a "Synchronize" button in my work interface so that users can manually synchronize when, for instance, a new item is added and needs to be used the same day.
But in order for me to do that, I need to trim those 15 seconds as much as possible.
Instead of updating every single row, like in my procedure, is it possible to only update rows that have values that does not match?
This would greatly increase the execution speed, since it doesn't have to update all the 4000 rows when maybe only 20 actually needs it.
Can this be done in a better way, or optimized?
Does it need improvements, if yes, where?
How would you solve this?
Would also appreciate some time differences between the solutions so I can compare them.
UPDATE
Using marc_s's CHECKSUM is really brilliant. The problem is that in some instances the information creates the same checksum. Here's an example, due to the classified content, I can only show you 2 columns, but I can say that all columns have identical information except these 2. To clarify: this screenshot is of all the rows that had duplicate CHECKSUMs. These are also the only rows with a hyphen in the ITEM column, I've looked.
The query was simply
SELECT *, CHECKSUM(*) FROM SYNC_INFORMATION

If you can change the table structure ever so slightly - you could add a computed CHECKSUM column to your two tables, and in the case the ITEM is identical, you could then check that checksum column to see if there are any differences at all in the columns of the table.
If you can do this - try something like this here:
ALTER TABLE dbo.[INFORMATION]
ADD CheckSumColumn AS CHECKSUM([DESCRIPTION], [EXTRA], [UNIT],
[COST], [STOCK], [CURRENCY],
[LASTUPDATE], [IN], [CLIENT]) PERSISTED
Of course: only include those columns that should be considered when making sure whether a source and a target row are identical ! (this depends on your needs and requirements)
This persists a new column to your table, which is calculated as the checksum over the columns specified in the list of arguments to the CHECKSUM function.
This value is persisted, i.e. it could be indexed, too! :-O
Now, you could simplify your UPDATE to
UPDATE TARGET_TABLE
SET ......
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE
TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
AND TARGET_TABLE.CheckSumColumn <> SOURCE_TABLE.CheckSumColumn
Read more about the CHECKSUM T-SQL function on MSDN!

Related

Complex procedure to adjust data continously

I solved this in SQL Server with a trigger. Now I face it on Oracle.
I have a big set of data that periodically increases with new items.
The item has these fundamental columns:
ID string identifier (not null)
DATETIME (not null)
optional (eventually null, always null for type 1) DATETIME_EMIS emission datetame equal to the DATETIME of the corresponding emission item
type (0 or 1)
value (only if type 1)
It is basically a logbook.
For example: An item with ID='FIREALARM' and datetime='2023-02-12 12:02' has closing like this:
ID='FIREALARM' in datetime='2023-02-12 15:11', emission datetime='2023-02-12 12:02' (equal to the emission item).
What I need is to obtain a final item in the destination table like this:
ID='FIREALARM' in DATETIME_BEGIN ='2023-02-12 12:02', DATETIME_END ='2023-02-12 15:11'
Not all the items have the closing datetime (the ones of Type=1 instead 0), in this case the next item should be use to close the previous one (with the problem of finding it). For example:
Item1:
ID='DEVICESTATUS', datetime='2023-02-12 22:11', Value='Broken' ;
Item2:
ID='DEVICESTATUS', datetime='2023-02-12 22:14', Value='Running'
Should result in
ID='DEVICESTATUS', DATETIME_BEGIN ='2023-02-12 22:11',DATETIME_END ='2023-02-12 22:14', Value='Broken'
The final data should be extracted by a select query as faster as possible.
The process of the elaboration should be independent from the order of inserting.
In SQL Server, I created a trigger with several operations which involve a temporary table, some queries on the inserted set and the entire destination table, so a complex procedure that is not worth to be shown to understand the problem.
Now I discovered that Oracle has some limitations and is not easy to port the trigger on it. For example is not easy to use a temporary table in the same way, and the operation are for each row.
I am asking what could be a good strategy in Oracle to elaborate the data in the final form considering that the set increase continuously and the open and the closure items must be reduce to a single item. I am not asking for a solution of the problem, I am trying to understand what could be the instrument in Oracle useful to achieve a complex elaboration like this. Thanks.
From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching:
SELECT *
FROM destination
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY datetime
MEASURES
FIRST(datetime) AS datetime_begin,
LAST(datetime) AS datetime_end,
FIRST(value) AS value
PATTERN ( ^ any_row+ $ )
DEFINE
any_row AS 1 = 1
)
Which, for the sample data:
CREATE TABLE destination (id, datetime, value) AS
SELECT 'DEVICESTATUS', DATE '2023-02-12' + INTERVAL '22:11' HOUR TO MINUTE, 'Broken' FROM DUAL UNION ALL
SELECT 'DEVICESTATUS', DATE '2023-02-12' + INTERVAL '22:14' HOUR TO MINUTE, 'Running' FROM DUAL;
Outputs:
ID
DATETIME_BEGIN
DATETIME_END
VALUE
DEVICESTATUS
2023-02-12 22:11:00
2023-02-12 22:14:00
Broken
fiddle

Update query is working on all rows even after selecting the particular values in oracle

I have written a query where I want to just update some of the LINK_ID. But it is updating all the rows in that table.
Here is my query
UPDATE APP_FIBERINV.TBL_FIBER_INV_CMPAPPROVED_INFO
SET NE_LENGTH =
(select MAINT_ZONE_NE_SPAN_LENGTH from APP_FIBERINV.TBL_FIBER_INV_JOBS WHERE LINK_ID IN ('MORV_1020','ANND_1017','BBSR_1047','DLHI_5417','MYSR_0104'));
I still doubt that the update statement you have posted updates all rows in the table. It must throw an error
ORA-01427: single-row subquery returns more than one row
instead, because your subquery returns five rows where it must be one, as you must find one value for each row you want to update.
This means your subquery is wrong. It selects five rows, where it must select one. You don't want to find the five values for 'MORV_1020', 'ANND_1017', but the one value for the link ID of the row you are updating.
You also want to update certain rows (those with the five link IDs), so you must add a WHERE clause at the end of your update statement.
UPDATE app_fiberinv.tbl_fiber_inv_cmpapproved_info i
SET ne_length =
(
SELECT j.maint_zone_ne_span_length
FROM app_fiberinv.tbl_fiber_inv_jobs j
WHERE j.link_id = i.span_link_id
)
WHERE span_link_id IN ('MORV_1020', 'ANND_1017', 'BBSR_1047', 'DLHI_5417', 'MYSR_0104');
Assuming both tables share the LINK_ID as primary and foreign key, you could just use a MERGE:
MERGE INTO APP_FIBERINV.TBL_FIBER_INV_CMPAPPROVED_INFO APPR_NFO
USING (
SELECT LINK_ID, MAINT_ZONE_NE_SPAN_LENGTH
FROM APP_FIBERINV.TBL_FIBER_INV_JOBS
WHERE LINK_ID IN ('MORV_1020','ANND_1017','BBSR_1047','DLHI_5417','MYSR_0104')
) INV_JOBS
ON ( APPR_NFO.SPAN_LINK_ID = INV_JOBS.LINK_ID)
WHEN MATCHED THEN UPDATE SET APPR_NFO.NE_LENGTH = INV_JOBS.MAINT_ZONE_NE_SPAN_LENGTH;

Conditional DELETE/INSERT/UPDATE in MERGE

I came across two example regarding MERGE with conditional DML
First example,
MERGE INTO bonuses D
USING (SELECT employee_id, salary, department_id FROM employees
WHERE department_id = 80) S
ON (D.employee_id = S.employee_id)
WHEN MATCHED THEN UPDATE SET D.bonus = D.bonus + S.salary*.01
DELETE WHERE (S.salary > 8000)
WHEN NOT MATCHED THEN INSERT (D.employee_id, D.bonus)
VALUES (S.employee_id, S.salary*.01)
WHERE (S.salary <= 8000);
I tend to understand that in MERGE, only the target table (D here) is modified. When we put a DML in WHEN, it is to act on the target table D. So in this case what do the conditions have to do with S, as in the DELETE and UPDATE clause. When do the WHERE come into action ? After the matching ? On the source/target before ON ?
Another related example with one more question
MERGE INTO destination d
USING source s
ON (s.id = d.id)
WHEN MATCHED THEN
UPDATE SET d.description = 'Updated',
d.status = 10
DELETE WHERE s.status = 10;
and
MERGE INTO destination d
USING source s
ON (s.id = d.id)
WHEN MATCHED THEN
UPDATE SET d.description = 'Updated',
d.status = 10
DELETE WHERE d.status = 10;
I don't get the difference between 2 scenarios : source versus target table in the WHERE clause.
Thanks in advance.
There are two parts to the MERGE operation: WHAT action to take (update of some sort, including inserts and deletions) - this is always ONLY on the target table; and WHEN to take the action - what condition must be met to initiate the update. The condition must refer to something in the target table, but it also refers to the source table.
In your first example: the target table only has employee id's and bounses. You want to increase each bonus by 1% of base salary, for each employee - and to add a bounus (when there was no row for that employee) for employees who weren't assigned a bonus at all. So you can't only look at the target table, you must also look somewhere else, where salaries are stored. In this case, "WHEN MATCHED" makes sure you look at the same employee id in both tables. Then you increase bonus by 1% of base salary; base salary is read from the source table. Then you delete the bonus altogether (there will be no row for the employee id in the BONUS table) if the employee has a base salary greater than 80,000 - that must be a business decision reflected in the database. So you see how you need to refer to data in places other than the target table, even though the changes themselves only affect the target.
In your second example, the effect will be the same.
In the first example,
1. Employees belonging to department 80 are identified. These employees may or may not have a bonus record against their employeeID in bonuses table.
2. If a bonus already exists in bonuses, increment bonus in bonuses for this employee by 1 percent of their salary. After that, if employee's salary is more than 8000 he must not have bonus, so remove his bonus record from bonuses.
3. if no bonus already exists and if employees salary is not more than 8000 add a new bonus record.
The sequence in this case for understanding purposes would be ON, WHEN MATCHED, THEN UPDATE, WHERE, DELETE, WHEN NOT MATCHED, WHERE, INSERT
In the second example,
query 1: If source record exists in destination,
a. update destination description and status.
b. Then if the source status is 10, then delete the record with same id from destination.
query 2: If source record exists in destination,
a. update destination description and status.
b. Then delete that record from destination.
In query2 update is redundant unless there are any triggers updating other tables.
The sequence in this case for understanding purposes would be ON, WHEN MATCHED, THEN UPDATE, WHERE, DELETE
Hope this helps.

Update using select query with multiple Rows in Oracle

Can any one please help me to solve this issue
Table Name:RW_LN
LN_ID RE_LN_ID RE_PR_ID
LN001 RN001 RN002
LN002 RN002 RN003
LN003 RN003 RN001
LN004 RN001 RN002
MY Update Query is:
update table RW_LN set RE_LN_ID=(
select LN_ID
from RW_LN as n1,RW_LN as n2
where n1.RE_LN_ID = n2.RE_PR_ID)
MY Expected Result is:
LN_ID RE_LN_ID
LN001 LN003
LN002 LN004
LN003 LN002
LN004 LN003
This above query shows error as SUB QUERY RETURNS MULTIPLE ROWS.Can any one provide the solution for this, I am Beginner in Oracle 9i.So Stuck in the logic
you can try to solve this with a distinct
update table RW_LN set RE_LN_ID=(
select distinct LN_ID
from RW_LN as n1,RW_LN as n2
where n1.RE_LN_ID = n2.RE_PR_ID)
if that still returns multiple rows, it means you are missing a join somewhere along the way or potentially have a bad schema that needs to use primary keys.
If you want to take the "biggest" corresponding LN_ID, you could do
update RW_LN r1
set r1.RE_LN_ID = (select MAX(LN_ID)
FROM RW_LN r2
where r1.RE_LN_ID = r2.RE_PR_ID);
see SqlFiddle
But you should explain why you choose (as new RE_LN_ID) LN004 instead of LN001 for LN_ID LN002 (cause you could choose both)
Just guessing, but possibly this is what you want.
update
RW_LN n1
set
RE_LN_ID=(
select n2.LN_ID
from RW_LN n2
where n1.RE_LN_ID = n2.RE_PR_ID)
where exists (
select null
from RW_LN n2
where n1.RE_LN_ID = n2.RE_PR_ID and
n2.ln_id is not null)
At the moment there is no correlation between the rows you are updating and the value being returned in the subquery.
The query reads as follows:
For every row in RW_LN change the value of RE_LN_ID to be:
the value of LN_ID in a row in RW_LN for which:
the RE_PR_ID equals the original tables value of RE_LN_ID
IF there exists at least one row in RW_LN for which:
RE_PR_ID is the same as RE_LN_ID in the original table AND
LN_ID is not null

How to? Correct sql syntax for finding the next available identifier

I think I could use some help here from more experienced users...
I have an integer field name in a table, let's call it SO_ID in a table SO, and to each new row I need to calculate a new SO_ID based on the following rules
1) SO_ID consists of 6 letters where first 3 are an area code, and the last three is the sequenced number within this area.
309001
309002
309003
2) so the next new row will have a SO_ID of value
309004
3) if someone deletes the row with SO_ID value = 309002, then the next new row must recycle this value, so the next new row has got to have the SO_ID of value
309002
can anyone please provide me with either a SQL function or PL/SQL (perhaps a trigger straightaway?) function that would return the next available SO_ID I need to use ?
I reckon I could get use of keyword rownum in my sql, but the follwoing just doens't work properly
select max(so_id),max(rownum) from(
select (so_id),rownum,cast(substr(cast(so_id as varchar(6)),4,3) as int) from SO
where length(so_id)=6
and substr(cast(so_id as varchar(6)),1,3)='309'
and cast(substr(cast(so_id as varchar(6)),4,3) as int)=rownum
order by so_id
);
thank you for all your help!
This kind of logic is fraught with peril. What if two sessions calculate the same "next" value, or both try to reuse the same "deleted" value? Since your column is an integer, you'd probably be better off querying "between 309001 and 309999", but that begs the question of what happens when you hit the thousandth item in area 309?
Is it possible to make SO_ID a foreign key to another table as well as a unique key? You could pre-populate the parent table with all valid IDs (or use a function to generate them as needed), and then it would be a simple matter to select the lowest one where a child record doesn't exist.
well, we came up with this... sort of works.. concurrency is 'solved' via unique constraint
select min(lastnumber)
from
(
select so_id,so_id-LAG(so_id, 1, so_id) OVER (ORDER BY so_id) AS diff,LAG(so_id, 1, so_id) OVER (ORDER BY so_id)as lastnumber
from so_miso
where substr(cast(so_id as varchar(6)),1,3)='309'
and length(so_id)=6
order by so_id
)a
where diff>1;
Do you really need to compute & store this value at the time a row is inserted? You would normally be better off storing the area code and a date in a table and computing the SO_ID in a view, i.e.
SELECT area_code ||
LPAD( DENSE_RANK() OVER( PARTITION BY area_code
ORDER BY date_column ),
3,
'0' ) AS so_id,
<<other columns>>
FROM your_table
or having a process that runs periodically (nightly, for example) to assign the SO_ID using similar logic.
If your application is not pure sql, you could do this in application code (ie: Java code). This would be more straightforward.
If you are recycling numbers when rows are deleted, your base table must be consulted when generating the next number. "Legacy" pre-relational schemes that attempt to encode information in numbers are a pain to make airtight when numbers must be recycled after deletes, as you say yours must.
If you want to avoid having to scan your table looking for gaps, an after-delete routine must write the deleted number to a separate table in a "ReuseMe" column. The insert routine does this:
begins trans
selects next-number table for update
uses a reuseme number if available else uses the next number
clears the reuseme number if applicable or increments the next-number in the next-number table
commits trans
Ignoring the issues about concurrency, the following should give a decent start.
If 'traffic' on the table is low enough, go with locking the table in exclusive mode for the duration of the transaction.
create table blah (soc_id number(6));
insert into blah select 309000 + rownum from user_tables;
delete from blah where soc_id = 309003;
commit;
create or replace function get_next (i_soc in number) return number is
v_min number := i_soc* 1000;
v_max number := v_min + 999;
begin
lock table blah in exclusive mode;
select min(rn) into v_min
from
(select rownum rn from dual connect by level <= 999
minus
select to_number(substr(soc_id,4))
from blah
where soc_id between v_min and v_max);
return v_min;
end;

Resources