Approaches that would allow to make the MERGE opearation in SQL work faster - performance

Could you recommed approaches that would allow to make the MERGE opearation in SQL to work faster?
I believe that the question is all about knowledge and experience and should not be considered as opinion based, since anything which would make the operation faster is definitely appropriate for the question and the faster the operation will become, the better the answer is.
In my particular case I have approximately 1.7 million records, which I am fetching in a recurring job and I use the records to update the existing records. In order to lock the real table (it is [LegalContractors]) as little as possible, I am using a temporary table (it is [LegalContractorTemps]) into which I add all the records from non SQL (but C#) code and after that I run the MERGE.
Here is what I am trying:
DELETE FROM [dbo].[LegalContractorTemps] WHERE [Code] IS NULL;
DELETE FROM [dbo].[LegalContractorTemps]
WHERE [Id] IN (
SELECT [Id]
FROM [dbo].[LegalContractorTemps] [Temp]
JOIN (
SELECT [Code], [Status], MAX([Id]) as [MaxId]
FROM [dbo].[LegalContractorTemps]
GROUP BY [Code], [Status]
HAVING COUNT([Id]) > 1
) [TempGroup]
ON ([Temp].[Code] = [TempGroup].[Code] AND [Temp].[Status] = [TempGroup].[Status] AND [MaxId] != [Id])
);
CREATE UNIQUE INDEX [CodeStatus]
ON [dbo].[LegalContractorTemps] ([Code], [Status]);
SELECT GETDATE() AS [beginTime];
MERGE [dbo].[LegalContractors] AS TblTarget
USING [dbo].[LegalContractorTemps] AS TblSource
ON (TblSource.[Code] = TblTarget.[Code] AND TblSource.[Status] = TblTarget.[Status])
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Code], [ShortName], [Name], [LegalAddress], [Status], [LastModified])
VALUES (TblSource.[Code], TblSource.[ShortName], TblSource.[Name], TblSource.[LegalAddress], TblSource.[Status], GETDATE())
WHEN MATCHED AND
(TblTarget.[ShortName] != TblSource.[ShortName] OR
TblTarget.[Name] != TblSource.[Name] OR
TblTarget.[LegalAddress] != TblSource.[LegalAddress]) THEN
UPDATE SET
TblTarget.[ShortName] = TblSource.[ShortName],
TblTarget.[Name] = TblSource.[Name],
TblTarget.[LegalAddress] = TblSource.[LegalAddress],
TblTarget.[LastModified] = GETDATE()
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
SELECT GETDATE() AS [endTime];
DROP INDEX [CodeStatus] ON [dbo].[LegalContractorTemps];
Right now the code shown above runs approximately 2 minutes.
I found this answer, but I was not able to apply it to my case, because I need the WHEN NOT MATCHED clause and I will have to perform a full scan anyway (whether or nor I will use the MERGE).

I would consider doing a modified flush and fill, rather than doing a MERGE at all.
The method I've had the most success with uses partition switching. You build three identical tables; the main table that your users pull from, a staging table that you use for applying CRUD operations, and a holding table that you'll only use during the transition period after your updates.
This will require a little re-tooling to shift your LastModified logic right into the CRUD operations you're performing during your updates.
Then, after the staging table is ready for prime time, truncate yesterday's copy of the holding table. Next, switch the data from the main table to the now-empty holding table. Switch the data from staging to main. Probably wrap all of that in an explicit transaction.
Boom. Your table is up-to-date. And you have a back up copy of yesterday's data in the holding table, just in case.
Tons of additional detail in these articles:
Comparison: Switching Tables vs. sp_rename
Why You Should Switch in Staging Tables Instead of Renaming Them

Related

SQL Azure - query with row_number() executes slow if columns with nvarchar of big size are included

I have the following query (generated by Entity Framework with standard paging. This is the inner query and I added the TOP 438 part):
SELECT TOP 438 [Extent1].[Id] AS [Id],
[Extent1].[MemberType] AS [MemberType],
[Extent1].[FullName] AS [FullName],
[Extent1].[Image] AS [Image],
row_number() OVER (ORDER BY [Extent1].[FullName] ASC) AS [row_number]
FROM [dbo].[ShowMembers] AS [Extent1]
WHERE 3 = CAST( [Extent1].[MemberType] AS int)
ShowMembers table has about 11K rows, but only 438 with MemberType == 3. The 'Image' column is of type nvarchar(2000) that holds the URL to the image on a CDN. If I include this column in the query (only in SELECT part), the query chokes up somehow and generates result in a range between 2-30 seconds (it differs in different runs). If I comment out that column, query runs fast as expected. If I include the 'Image' column, but comment out the row_number column, query also runs fast as expected.
Obviously, I've been too liberal with the size of the URL, so I started playing around with the size. I found out that if I set the Image column to nvarchar(884), then the query runs fast as expected. If I set it up to 885 it's slow again.
This is not bound to one column, but to the size of all columns in the SELECT statement. If I just increase the size by one, performance differences are obvious.
I am not a DB expert, so any advice is welcomed.
PS In local SQL Server 2012 Express there are no performance issues.
PPS Running the query with OFFSET 0 ROWS FETCH NEXT 438 ROWS ONLY (without the row_count column of course) is also slow.
Row_number has to sort all the rows to get you things in the order you want. Adding a larger column into the result set implies that it all get sorted and thus is much slower/does more IO. You can see this, btw, if you enable "set statistics io on" and "set statistics time on" in SSMS when debugging problems like this. It will give you some insight into the number of IOs and other operations happening at runtime in the query:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017
In terms of what you can do to make this query go faster, I encourage you to think about some things that may change the design of your database schema a bit. First, consider whether you actually need the rows sorted in a specific order at all. If you don't need things in order, it will be cheaper to iterate over them without the row_number (by a measurable amount). So, if you just want to conceptually iterate over each entry once, you can do that by doing an order by something more static that is still monotonic such as the identity column. Second, if you do need to have things in sorted order, then consider whether they are changing frequently/infrequently. If it is infrequent, it may be possible to just compute and persist a column value into each row that has the relative order that you want (and update it each time you modify the table). In this model, you could index the new column and then request things in that order (in the top-level order by in the query - row_number not needed). If you do need things dynamically computed like you are doing and you need things in an exact order all the time, your final option is to move the URL to a second table and join with it after the row_number. This will avoid the sort being "wide" in the computation of row_number.
Best of luck to you

Best SQL DB design for temp storage of millions of records

I have a database table that collects records at the rate of about 4 records per/sec/device. This table gets pretty big pretty fast. When a device completes its task another process will loop through all the records, perform some operations, combine them into 5 minute chunks, compress them and store them for later use. Then it deletes all the records in that table for that device.
Right now there are nearly 1 million records for several devices. I can loop through them just fine to perform the processing, it appears, but when I try to delete them I time out. Is there a way to delete these records more quickly? Perhaps by turning off object tracking temporarily? Using some lock hint? Would the design be better to simply create a separate table for each device when it begins its task and then just drop it once processing of the data is complete? The timeout is set to 10 minutes. I would really like to get that process to complete within that 10 minute period if possible.
CREATE TABLE [dbo].[case_waveform_data] (
[case_id] INT NOT NULL,
[channel_index] INT NOT NULL,
[seconds_between_points] REAL NOT NULL,
[last_time_stamp] DATETIME NOT NULL,
[value_array] VARBINARY (8000) NULL,
[first_time_stamp] DATETIME NULL
);
CREATE CLUSTERED INDEX [ClusteredIndex-caseis-channelindex] ON [dbo]. [case_waveform_data]
(
[case_id] ASC,
[channel_index] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NonClusteredIndex-all-fields] ON [dbo].[case_waveform_data]
(
[case_id] ASC,
[channel_index] ASC,
[last_time_stamp] ASC
)
INCLUDE ( [seconds_between_points],
[value_array]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY]
SQL Server 2008+ standard is the DB platform
UPDATE 3/31/2014:
I have started going down a path that seems to be problematic. Is this really all that bad?
I am creating a stored proc that takes a table-value parameter containing the data I want to append and a varchar parameter that contains a unique table name for the device. This stored proc is going to check for the existence of the table and, if it does not exist, create it with a specific structure. Then it will insert the data from the TVP. The problem I see is that I have to use dynamic sql in the SP as there seems to be no way to pass in a table name as a variable to either a CREATE or INSERT. Plus, every article I read on how to do this says not to...
Unfortunately, if I have a single table which is getting all the inserts at a frequency of 4/sec/device, just doing a count on the table for a specific case_id takes 17 minutes even with a clustered index on case_id and channel_index. So trying to delete them takes around 25 - 30 minutes. This also causes locking to occur and therefore the inserts start taking longer and longer which causes the service to get way behind. This even occurs when there is no deleting happening as well.
The described stored proc is designed to reduce the inserts from 4/sec/device to 1/sec/device as well as making it possible to just drop the table when done rather than deleting each record individually. Thoughts?
UPDATE 2 3/31/2014
I am not using cursors or any looping in the way you are thinking. Here is the code I use to loop through the records. This runs at an acceptable speed however:
using (SqlConnection liveconn = new SqlConnection(LiveORDataManager.ConnectionString))
{
using (SqlCommand command = liveconn.CreateCommand())
{
command.CommandText = channelQueryString;
command.Parameters.AddWithValue("channelIndex", channel);
command.CommandTimeout = 600;
liveconn.Open();
SqlDataReader reader = command.ExecuteReader();
// Call Read before accessing data.
while (reader.Read())
{
var item = new
{
//case_id = reader.GetInt32(0),
channel_index = reader.GetInt32(0),
last_time_stamp = reader.GetDateTime(1),
seconds_between_points = reader.GetFloat(2),
value_array = (byte[])reader.GetSqlBinary(3)
};
// Perform processing on item
}
}
}
The SQL I use to delete is trivial:
DELETE FROM case_waveform_data where case_id = #CaseId
This line takes 25+ minutes to delete 1 million rows
Sample data (value_array is truncated):
case_id channel_index seconds_between_points last_time_stamp value_array first_time_stamp
7823 0 0.002 2014-03-31 15:00:40.660 0x1F8B0800000000000400636060 NULL
7823 0 0.002 2014-03-31 15:00:41.673 0x1F8B08000000000004006360646060F80F04201A04F8418C3F4082DBFBA2F29E5 NULL
7823 0 0.002 2014-03-31 15:00:42.690 0x1F8B08000000000004006360646060F80F04201A04F8418C3F4082DBFB NULL
When deleting the large amount of data from Table, underneath SQL server marks those to delete and as a background job sql server will actually delete them from page as it gets some idle time. Also unlike Truncate; Delete is Log Enable.
If you would have Enterprise Edition, as other developers have suggested using Partition is the possible approach but you have standard edition.
Option:1 "Longer, Tedious, Subjective for 100% Performance"
lets say you keep the single Table approach. you can add a new column "IsProcessed" to indicate what records are already processed. when you insert new data it will have default value 0 so other processes consuming this data will now filter their query using this column as well. after processing you will need additional update on the table to mark those row as IsProcessed=1. Now you can create SQL Server JOB to delete top N rows where Isprocessed=1 and schedule that job as frequently as you can on ideal time slot. "TOP N" because you have to find out by try and error what is the best number for your environment. it may be 100, 1000, 10,000. in my experience if the number is smaller works best. increase the frequency of job execution. lets say "DELETE top 1000 From Table" takes 2 minutes. and you have 4 hours of clean window over night when this table is not being used, you can schedule this job to run every 5 minutes. 3 minutes is just buffer. and hence 12 exec/hour and 1000 rows per execution in 4 hours you will be deleting 48k rows from table. and then over Weekend you have larger window and you will have to catch up with remaining rows.
You can see in this approach lots of back and forth ad lot of minute details and yet it is not sure if this will be last for long for your needs in future. suddenly the input volume of data gets double and all your calculation will fall apart. another down side of this approach is consumer queries of the data will have to now relies on IsProcessed Column value. in your specific case consumer always read all data for a device so Indexing the table doesn't help you instead it will hurt the Insert process performance.
I did personalty experience this solution and last for 2 years in one of our client env.
Option:2 "Quick, Efficient, Make Sense to me, May Work for You"
Creating the One Table for device and as you mentioned using a stored procedure to create the table on the fly if does not exist. this is my most recent experience where we have Metadata driven ETL and all the ETL Target objects and API are getting created during run time based on user configuration. yes it is Dynamic SQL but if used wisely and once tested for performance it is not bad. The down side of this approach is debugging during initial phase if something not working. but in your case you know the table structures and it is fixed you are not dealing with daily change in table structure. That is why I think this is more suitable for your situation. Another thing is now you will have to also make sure TempDB is configured properly because using TVP and temp table increases the tempDB usage drastically so you initial and increment space assigned to tempdb, disk on which tempDB is located are two main thing to look at. As I said in Option-1, as the consumer proceses of the data always uses the ALL DATA I do not think you need any extra indexing in place. in fact I would test the performance w/o any index as well. it is like processing All Staging Data.
look at the sample code for this approch. If you feel positive and some doubt on inding or any other aspect of this approach lets us know.
Prepare the Schema objects
IF OBJECT_ID('pr_DeviceSpecificInsert','P') IS NOT NULL
DROP PROCEDURE pr_DeviceSpecificInsert
GO
IF EXISTS (
SELECT TOP 1 *
FROM sys.table_types
WHERE name = N'udt_DeviceSpecificData'
)
DROP TYPE dbo.udt_DeviceSpecificData
GO
CREATE TYPE dbo.udt_DeviceSpecificData
AS TABLE
(
testDeviceData sysname NULL
)
GO
CREATE PROCEDURE pr_DeviceSpecificInsert
(
#DeviceData dbo.udt_DeviceSpecificData READONLY
,#DeviceName NVARCHAR(200)
)
AS
BEGIN
SET NOCOUNT ON
BEGIN TRY
BEGIN TRAN
DECLARE #SQL NVARCHAR(MAX)=N''
,#ParaDef NVARCHAR(1000)=N''
,#TableName NVARCHAR(200)=ISNULL(#DeviceName,N'')
--get the UDT data into temp table
--because we can not use UDT/Table Variable in dynamic SQL
SELECT * INTO #Temp_DeviceData FROM #DeviceData
--Drop and Recreate the Table for Device.
BEGIN
SET #SQL ='
if object_id('''+#TableName+''',''u'') IS NOT NULL
drop table dbo.'+#TableName+'
CREATE TABLE dbo.'+#TableName+'
(
RowID INT IDENTITY NOT NULL
,testDeviceData sysname NULL
)
'
PRINT #SQL
EXECUTE sp_executesql #SQL
END
--Insert the UDT data in to actual table
SET #SQL ='
Insert INTO '+#TableName+N' (testDeviceData)
Select testDeviceData From #Temp_DeviceData
'
PRINT #SQL
EXECUTE sp_executesql #SQL
COMMIT TRAN
END TRY
BEGIN CATCH
ROLLBACK TRAN
SELECT ERROR_MESSAGE()
END CATCH
SET NOCOUNT OFF
END
Execute The sample Code
DECLARE #DeviceData dbo.udt_DeviceSpecificData
INSERT #DeviceData (testDeviceData)
SELECT 'abc'
UNION ALL SELECT 'xyz'
EXECUTE dbo.pr_DeviceSpecificInsert
#DeviceData = #DeviceData, -- udt_DeviceSpecificData
#DeviceName = N'tbl2' -- nvarchar(200)

Performance Issue with Oracle Merge Statements with more than 2 Million records

I am executing the below MERGE statement for Insert Update operation.
It is working fine for 1 to 2 million records but for more than 4 to 5 billion records it takes 6 to 7 hours to complete.
Can anyone suggest some alternative or performance tips for Merge Statement
merge into employee_payment ep
using (
select
p.pay_id vista_payroll_id,
p.pay_date pay_dte,
c.client_id client_id,
c.company_id company_id,
case p.uni_ni when 0 then null else u.unit_id end unit_id,
p.pad_seq pay_dist_seq_nbr,
ph.payroll_header_id payroll_header_id,
p.pad_id vista_paydist_id,
p.pad_beg_payperiod pay_prd_beg_dt,
p.pad_end_payperiod pay_prd_end_d
from
stg_paydist p
inner join company c on c.vista_company_id = p.emp_ni
inner join payroll_header ph on ph.vista_payroll_id = p.pay_id
left outer join unit u on u.vista_unit_id = p.uni_ni
where ph.deleted = '0'
) ps
on (ps.vista_paydist_id = ep.vista_paydist_id)
when matched then
update
set ep.vista_payroll_id = ps.vista_payroll_id,
ep.pay_dte = ps.pay_dte,
ep.client_id = ps.client_id,
ep.company_id = ps.company_id,
ep.unit_id = ps.unit_id,
ep.pay_dist_seq_nbr = ps.pay_dist_seq_nbr,
ep.payroll_header_id = ps.payroll_header_id
when not matched then
insert (
ep.employee_payment_id,
ep.vista_payroll_id,
ep.pay_dte,
ep.client_id,
ep.company_id,
ep.unit_id,
ep.pay_dist_seq_nbr,
ep.payroll_header_id,
ep.vista_paydist_id
) values (
seq_employee_payments.nextval,
ps.vista_payroll_id,
ps.pay_dte,
ps.client_id,
ps.company_id,
ps.unit_id,
ps.pay_dist_seq_nbr,
ps.payroll_header_id,
ps.vista_paydist_id
) log errors into errorlog (v_batch || 'EMPLOYEE_PAYMENT') reject limit unlimited;
Try using the Oracle hints:
MERGE /*+ append leading(PS) use_nl(PS EP) parallel (12) */
Try to using hints to optimize inner using query.
Processing lots of data takes lots of time...
Here are some things that may help you (assuming there is not a probolem with bad execution plan):
Adding a where-clause in the UPDATE-part to only update records when the values are actually different. If you are merging the same data over and over again and only a smaller subset of the data is actually modified, this will improve performance.
If you indeed are processing the same data over and over again, investigate whether you can add some modification flag/date to only process new records since last time.
Depending on the kind of environment and when/who is updating your source tables, investigate whether a truncate-insert approach is beneficial. Remember to set the indexes unusuable on before hand.
I think your best bet here is to exploit the patterns in your data. This is something oracle does not know about, so you may have to get creative.
I was working on a similar problem and a good solution i found was to break the query up.
The primary reason big table merges are a bad idea is because of the in memory storage of the result of the using query. Because the PGA gets filled up pretty quickly so the database starts using the temporary table space of sort operations and joins. The temp tablespace being on disk is excruciatingly slow. The use of excessive temp table space can be easily avoided by splitting the query into two queries.
So the below query
merger into emp e
using (
select a,b,c,d from (/* big query here */)
) ec
on /*conditions*/
when matched then
/* rest of merge logic */
can become
create table temp_big_query as select a,b,c,d from (/* big query here */);
merger into emp e
using (
select a,b,c,d from temp_big_query
) ec
on /*conditions*/
when matched then
/* rest of merge logic */
if the using query also has CTEs and sub queries try breaking that query up to use more temp tables like the one shown above. Also avoid using parallel hints because they mostly tend to slow the query down unless the query itself has something that can be done in parallel, try using indexes instead instead as much as possible parallel should be used as the last option for optimization.
I know some references are missing please feel free to comment and add references or point out mistakes in my answer.

ORACLE db performance tuning

We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.
First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.
Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.

ORA-30926: unable to get a stable set of rows in the source tables

I am getting
ORA-30926: unable to get a stable set of rows in the source tables
in the following query:
MERGE INTO table_1 a
USING
(SELECT a.ROWID row_id, 'Y'
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
I've ran table_1 it has data and also I've ran the inside query (src) which also has data.
Why would this error come and how can it be resolved?
This is usually caused by duplicates in the query specified in USING clause. This probably means that TABLE_A is a parent table and the same ROWID is returned several times.
You could quickly solve the problem by using a DISTINCT in your query (in fact, if 'Y' is a constant value you don't even need to put it in the query).
Assuming your query is correct (don't know your tables) you could do something like this:
MERGE INTO table_1 a
USING
(SELECT distinct ta.ROWID row_id
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
You're probably trying to to update the same row of the target table multiple times. I just encountered the very same problem in a merge statement I developed. Make sure your update does not touch the same record more than once in the execution of the merge.
A further clarification to the use of DISTINCT to resolve error ORA-30926 in the general case:
You need to ensure that the set of data specified by the USING() clause has no duplicate values of the join columns, i.e. the columns in the ON() clause.
In OP's example where the USING clause only selects a key, it was sufficient to add DISTINCT to the USING clause. However, in the general case the USING clause may select a combination of key columns to match on and attribute columns to be used in the UPDATE ... SET clause. Therefore in the general case, adding DISTINCT to the USING clause will still allow different update rows for the same keys, in which case you will still get the ORA-30926 error.
This is an elaboration of DCookie's answer and point 3.1 in Tagar's answer, which from my experience may not be immediately obvious.
How to Troubleshoot ORA-30926 Errors? (Doc ID 471956.1)
1) Identify the failing statement
alter session set events ‘30926 trace name errorstack level 3’;
or
alter system set events ‘30926 trace name errorstack off’;
and watch for .trc files in UDUMP when it occurs.
2) Having found the SQL statement, check if it is correct (perhaps using explain plan or tkprof to check the query execution plan) and analyze or compute statistics on the tables concerned if this has not recently been done. Rebuilding (or dropping/recreating) indexes may help too.
3.1) Is the SQL statement a MERGE?
evaluate the data returned by the USING clause to ensure that there are no duplicate values in the join. Modify the merge statement to include a deterministic where clause
3.2) Is this an UPDATE statement via a view?
If so, try populating the view result into a table and try updating the table directly.
3.3) Is there a trigger on the table? Try disabling it to see if it still fails.
3.4) Does the statement contain a non-mergeable view in an 'IN-Subquery'? This can result in duplicate rows being returned if the query has a "FOR UPDATE" clause. See Bug 2681037
3.5) Does the table have unused columns? Dropping these may prevent the error.
4) If modifying the SQL does not cure the error, the issue may be with the table, especially if there are chained rows.
4.1) Run the ‘ANALYZE TABLE VALIDATE STRUCTURE CASCADE’ statement on all tables used in the SQL to see if there are any corruptions in the table or its indexes.
4.2) Check for, and eliminate, any CHAINED or migrated ROWS on the table. There are ways to minimize this, such as the correct setting of PCTFREE.
Use Note 122020.1 - Row Chaining and Migration
4.3) If the table is additionally Index Organized, see:
Note 102932.1 - Monitoring Chained Rows on IOTs
Had the error today on a 12c and none of the existing answers fit (no duplicates, no non-deterministic expressions in the WHERE clause). My case was related to that other possible cause of the error, according to Oracle's message text (emphasis below):
ORA-30926: unable to get a stable set of rows in the source tables
Cause: A stable set of rows could not be got because of large dml activity or a non-deterministic where clause.
The merge was part of a larger batch, and was executed on a live database with many concurrent users. There was no need to change the statement. I just committed the transaction before the merge, then ran the merge separately, and committed again. So the solution was found in the suggested action of the message:
Action: Remove any non-deterministic where clauses and reissue the dml.
SQL Error: ORA-30926: unable to get a stable set of rows in the source tables
30926. 00000 - "unable to get a stable set of rows in the source tables"
*Cause: A stable set of rows could not be got because of large dml
activity or a non-deterministic where clause.
*Action: Remove any non-deterministic where clauses and reissue the dml.
This Error occurred for me because of duplicate records(16K)
I tried with unique it worked .
but again when I tried merge without unique same proble occurred
Second time it was due to commit
after merge if commit is not done same Error will be shown.
Without unique, Query will work if commit is given after each merge operation.
I was not able to resolve this after several hours. Eventually I just did a select with the two tables joined, created an extract and created individual SQL update statements for the 500 rows in the table. Ugly but beats spending hours trying to get a query to work.
As someone explained earlier, probably your MERGE statement tries to update the same row more than once and that does not work (could cause ambiguity).
Here is one simple example. MERGE that tries to mark some products as found when matching the given search patterns:
CREATE TABLE patterns(search_pattern VARCHAR2(20));
INSERT INTO patterns(search_pattern) VALUES('Basic%');
INSERT INTO patterns(search_pattern) VALUES('%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found NUMBER);
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',0);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',0);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',0);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',0);
MERGE INTO products p USING
(
SELECT search_pattern FROM patterns
) o
ON (p.name LIKE o.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=1;
SELECT * FROM products;
If patterns table contains Basic% and Super% patterns then MERGE works and first three products will be updated (found). But if patterns table contains Basic% and %thing search patterns, then MERGE does NOT work because it will try to update second product twice and this causes the problem. MERGE does not work if some records should be updated more than once. Probably you ask why not update twice!?
Here first update 1 and second update 1 are the same value but only by accident. Now look at this scenario:
CREATE TABLE patterns(code CHAR(1),search_pattern VARCHAR2(20));
INSERT INTO patterns(code,search_pattern) VALUES('B','Basic%');
INSERT INTO patterns(code,search_pattern) VALUES('T','%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found CHAR(1));
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',NULL);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',NULL);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',NULL);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',NULL);
MERGE INTO products p USING
(
SELECT code,search_pattern FROM patterns
) s
ON (p.name LIKE s.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=s.code;
SELECT * FROM products;
Now first product name matches Basic% pattern and it will be updated with code B but second product matched both patterns and cannot be updated with both codes B and T in the same time (ambiguity)!
That's why DB engine complaints. Don't blame it! It knows what it is doing! ;-)

Resources