Why FINAL modifier doesn't trigger Clickhouse's merging procedure - clickhouse

There's a ReplacingMergeTree table in Clickhouse, when executing select count(1) from tbl, it yields 71961920. Then I executed select count(1) from tbl FINAL trying to trigger the merge procedure as noted in the official document:
but it turns out after running the above sql with FINAL, the sql without FINAL still yields the "wrong" result as if it was not being merged at all. Could anyone help explain? Thanks~

In order to force clickhouse to merge tables use OPTIMIZE keyword.
https://clickhouse.com/docs/en/sql-reference/statements/optimize/
but keep in mind this:
Although you can run an unscheduled merge using the OPTIMIZE query, do not count on using it, because the OPTIMIZE query will read and write a large amount of data. - ClickHouse ReplacingMergeTree

Related

Oracle 19: Why IN gets converted to Exist in explain plan and any suggestions around it

Please see image below:
As per that the IN query got converted to Exists in explain plan. Any reason for that? does it mean Oracle automatically converts IN to Exists?
Also any suggestion to reduce the cost? this statement is a part of a SP and it receives ~ separated string ('123') for example (63278~63282~63285~63288~63291~63296~63299~63302~63305~63308~63311~63314~63319~63322~63325~63329~63332~63253~63256~63260~63264~63267~63272~63275~63279~63283~63286~63289~63292~63297~63300~63303~63306~63309~63312~63315~63320~63323~63326~63330~63333~63269~63258~63277~63294~63317~63262~63270~63281~63295~63318~63328~63254~63257~63261~63265~63268~63273~63276~63280~63284~63287~63290~63293~63298~63301~63304~63307~63310~63313~63316~63321~63324~63327~63331~63334) in query. It takes around 10 to 15 mins to execute.
How can we generate explain plan for entire stored proc? We are using Oracle 19.
Thank you in advance.
IN clause retrieves all records which match with the given set of values. It acts as multiple OR conditions. IN clause scans all rows fetched from the inner query.
But, EXISTS is a Boolean operator that returns either True or False. Its used in combination to a sub-query. If the subquery returns any row, it returns True else False. If the result of data large inside the IN clause, then not recommended using IN. For getting high performance most time uses EXISTS vs IN. For that Oracle and PostgreSQL converts your IN to EXISTS
Since you are doing the job in a PL/SQL procedure you could create (out of the procedure) a GLOBAL TEMPORARY TABLE with DELETE ON COMMIT, in the procedure you INSERT in this table the result of the sub select with the CONNECT BY, then your replace the SELECT ... CONNECT BY by a SELECT in the temporary table. The temp table will be emptied at the end of the procedure and this method is session safe. And you have benefit of the index and probably a better plan. You could also compare the UPDATE with 2 ones: splitting the OR condition on 2 statements.

Slow Query performance due to not exists in oracle

this is my query it take more time to execute it can anyone make it faster!!!
I think the not exists causes more time consuming but I don't know how to convert it to left outer join with more conditions I have changed it many times but the result was changed with it.
thanks in advance.
As per basic tuning principle use exists or not exists if the query used inside not exists or exists has huge data.if it doesn't have huge data use IN or NOT IN instead
Also remove the distinct in SELECT DISTINCT t.tax_payer_no, taxestab.estab_no and use it in the CTE query and see how much time it makes
with data as (
SELECT t.tax_payer_no tax_payer_no,taxestab.estab_no estab_no.. rest of your query)
select count(1),tax_payer_no,estab_no from data
group by tax_payer_no,estab_no

Stored procedure for Select and Inner select query in Oracle

I have a query like this:
select PROMOTER_DSMID,
PROMOTER_NAME,
PROMOTER_MSISDN,
RETAILER_DSMID,
RETAILER_MSISDN,
RETAILER_NAME ,
ATTENDANCE_FLAG,
ATTENDANCE_DATE
from PROMO_ATTENDANCE_DETAILS
where PROMOTER_DSMID not in
(SELECT PROMOTER_DSMID
FROM PROMO_ATTENDANCE_DETAILS
WHERE PROMOTERS_ASM_DSMID='ASM123'
AND ATTENDANCE_FLAG='TRUE'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17')
and PROMOTERS_ASM_DSMID='ASM123'
AND ATTENDANCE_FLAG='FALSE'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17';
This query is taking too much time when I run this in PROD database because of large number of records.
I need to write a procedure for this but am not able to get the correct approach of how to write a procedure. Somebody please guide me.
"was thinking to write a proc in which inner select statement can put the data in some temporary table and then from that temporary table I can run the outer select statement"
No need for that. Use a WITH clause to select the data once and use it twice.
with cte as (
select PROMOTER_DSMID,
PROMOTER_NAME,
PROMOTER_MSISDN,
RETAILER_DSMID,
RETAILER_MSISDN,
RETAILER_NAME ,
ATTENDANCE_FLAG,
ATTENDANCE_DATE
from PROMO_ATTENDANCE_DETAILS
where PROMOTERS_ASM_DSMID='ASM123'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17'
)
select *
from cte
where ATTENDANCE_FLAG='FALSE'
AND PROMOTER_DSMID not in
(SELECT PROMOTER_DSMID
FROM cte
where ATTENDANCE_FLAG='TRUE')
;
This will perform better than a temporary table, which involve a lot of disk I/O.
There are other possible performance improvements, depending on the usual tuning
considerations: data volume and skew, indexes, etc

How to insert while avoiding unique constraints with oracle

We have a process that aggregates some data and inserts the results into another table that we use for efficient querying. The problem we're facing is that we now have multiple aggregators running at roughly the same time.
We use the original records id as the primary key in this new table - a unique constraint. However, if two aggregation processes are running at the same time, one of them will error with a unique constraint violation.
Is there a way to specify some kind of locking mechanism which will make the second writer wait until the first is finished? Alternatively, is there a way to tell oracle to ignore that specific row and continue with the rest?
Unfortunately it's not practical to reduce the aggregation to a single process, as the following procedures rely on an up to date version of the data being available and those procedures do need to scale out.
Edit:
The following is my [redacted] query:
INSERT INTO
agg_table
SELECT
h.id, h.col, h.col2
FROM history h
JOIN call c
ON c.callid = h.callid
WHERE
h.id > (SELECT coalesce(max(id),0) FROM agg_table)
It is possible run an INSERT statement with an error logging clause. The example from the Oracle docs is as follows:
INSERT INTO dw_empl
SELECT employee_id, first_name, last_name, hire_date, salary, department_id
FROM employees
WHERE hire_date > sysdate - 7
LOG ERRORS INTO err_empl ('daily_load') REJECT LIMIT 25
Alternatively, you could try using a [MERGE][2] statement. You would be merging into the summary table with a select from the detail table. If a match is not found, you INSERT and if it is found you would UPDATE. I believe this solution will handle your concurrency issues, but you would need to test it.
have a look at FOR UPDATE clause. If you correctly write the SELECT statement with FOR UPDATE clause within a transaction before your update/insert statements you will be able to "lock" the required records
Serialising the inserts is probably the best way, as there's no method that will get you round the problem of the multiple inserts being unable to see what each one is doing.
DBMS_Lock is probably the appropriate serialisation mechanism.

ORA-30926: unable to get a stable set of rows in the source tables

I am getting
ORA-30926: unable to get a stable set of rows in the source tables
in the following query:
MERGE INTO table_1 a
USING
(SELECT a.ROWID row_id, 'Y'
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
I've ran table_1 it has data and also I've ran the inside query (src) which also has data.
Why would this error come and how can it be resolved?
This is usually caused by duplicates in the query specified in USING clause. This probably means that TABLE_A is a parent table and the same ROWID is returned several times.
You could quickly solve the problem by using a DISTINCT in your query (in fact, if 'Y' is a constant value you don't even need to put it in the query).
Assuming your query is correct (don't know your tables) you could do something like this:
MERGE INTO table_1 a
USING
(SELECT distinct ta.ROWID row_id
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
You're probably trying to to update the same row of the target table multiple times. I just encountered the very same problem in a merge statement I developed. Make sure your update does not touch the same record more than once in the execution of the merge.
A further clarification to the use of DISTINCT to resolve error ORA-30926 in the general case:
You need to ensure that the set of data specified by the USING() clause has no duplicate values of the join columns, i.e. the columns in the ON() clause.
In OP's example where the USING clause only selects a key, it was sufficient to add DISTINCT to the USING clause. However, in the general case the USING clause may select a combination of key columns to match on and attribute columns to be used in the UPDATE ... SET clause. Therefore in the general case, adding DISTINCT to the USING clause will still allow different update rows for the same keys, in which case you will still get the ORA-30926 error.
This is an elaboration of DCookie's answer and point 3.1 in Tagar's answer, which from my experience may not be immediately obvious.
How to Troubleshoot ORA-30926 Errors? (Doc ID 471956.1)
1) Identify the failing statement
alter session set events ‘30926 trace name errorstack level 3’;
or
alter system set events ‘30926 trace name errorstack off’;
and watch for .trc files in UDUMP when it occurs.
2) Having found the SQL statement, check if it is correct (perhaps using explain plan or tkprof to check the query execution plan) and analyze or compute statistics on the tables concerned if this has not recently been done. Rebuilding (or dropping/recreating) indexes may help too.
3.1) Is the SQL statement a MERGE?
evaluate the data returned by the USING clause to ensure that there are no duplicate values in the join. Modify the merge statement to include a deterministic where clause
3.2) Is this an UPDATE statement via a view?
If so, try populating the view result into a table and try updating the table directly.
3.3) Is there a trigger on the table? Try disabling it to see if it still fails.
3.4) Does the statement contain a non-mergeable view in an 'IN-Subquery'? This can result in duplicate rows being returned if the query has a "FOR UPDATE" clause. See Bug 2681037
3.5) Does the table have unused columns? Dropping these may prevent the error.
4) If modifying the SQL does not cure the error, the issue may be with the table, especially if there are chained rows.
4.1) Run the ‘ANALYZE TABLE VALIDATE STRUCTURE CASCADE’ statement on all tables used in the SQL to see if there are any corruptions in the table or its indexes.
4.2) Check for, and eliminate, any CHAINED or migrated ROWS on the table. There are ways to minimize this, such as the correct setting of PCTFREE.
Use Note 122020.1 - Row Chaining and Migration
4.3) If the table is additionally Index Organized, see:
Note 102932.1 - Monitoring Chained Rows on IOTs
Had the error today on a 12c and none of the existing answers fit (no duplicates, no non-deterministic expressions in the WHERE clause). My case was related to that other possible cause of the error, according to Oracle's message text (emphasis below):
ORA-30926: unable to get a stable set of rows in the source tables
Cause: A stable set of rows could not be got because of large dml activity or a non-deterministic where clause.
The merge was part of a larger batch, and was executed on a live database with many concurrent users. There was no need to change the statement. I just committed the transaction before the merge, then ran the merge separately, and committed again. So the solution was found in the suggested action of the message:
Action: Remove any non-deterministic where clauses and reissue the dml.
SQL Error: ORA-30926: unable to get a stable set of rows in the source tables
30926. 00000 - "unable to get a stable set of rows in the source tables"
*Cause: A stable set of rows could not be got because of large dml
activity or a non-deterministic where clause.
*Action: Remove any non-deterministic where clauses and reissue the dml.
This Error occurred for me because of duplicate records(16K)
I tried with unique it worked .
but again when I tried merge without unique same proble occurred
Second time it was due to commit
after merge if commit is not done same Error will be shown.
Without unique, Query will work if commit is given after each merge operation.
I was not able to resolve this after several hours. Eventually I just did a select with the two tables joined, created an extract and created individual SQL update statements for the 500 rows in the table. Ugly but beats spending hours trying to get a query to work.
As someone explained earlier, probably your MERGE statement tries to update the same row more than once and that does not work (could cause ambiguity).
Here is one simple example. MERGE that tries to mark some products as found when matching the given search patterns:
CREATE TABLE patterns(search_pattern VARCHAR2(20));
INSERT INTO patterns(search_pattern) VALUES('Basic%');
INSERT INTO patterns(search_pattern) VALUES('%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found NUMBER);
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',0);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',0);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',0);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',0);
MERGE INTO products p USING
(
SELECT search_pattern FROM patterns
) o
ON (p.name LIKE o.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=1;
SELECT * FROM products;
If patterns table contains Basic% and Super% patterns then MERGE works and first three products will be updated (found). But if patterns table contains Basic% and %thing search patterns, then MERGE does NOT work because it will try to update second product twice and this causes the problem. MERGE does not work if some records should be updated more than once. Probably you ask why not update twice!?
Here first update 1 and second update 1 are the same value but only by accident. Now look at this scenario:
CREATE TABLE patterns(code CHAR(1),search_pattern VARCHAR2(20));
INSERT INTO patterns(code,search_pattern) VALUES('B','Basic%');
INSERT INTO patterns(code,search_pattern) VALUES('T','%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found CHAR(1));
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',NULL);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',NULL);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',NULL);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',NULL);
MERGE INTO products p USING
(
SELECT code,search_pattern FROM patterns
) s
ON (p.name LIKE s.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=s.code;
SELECT * FROM products;
Now first product name matches Basic% pattern and it will be updated with code B but second product matched both patterns and cannot be updated with both codes B and T in the same time (ambiguity)!
That's why DB engine complaints. Don't blame it! It knows what it is doing! ;-)

Resources