H2 MERGE: Column not found - h2

I've got the following table:
create table companies (id identity, version int not null, last_modified timestamp not null);
insert into companies (version, last_modified) values (0, NOW());
I then create a PreparedStatement and supply a value for index 1:
merge into companies (id, version, last_modified) values(?, version + 1, NOW())
H2 fails with this error:
Column "VERSION" not found
I understand that H2 doesn't like version + 1 on the right-hand side, but it's not clear how to return 0 for new rows and version + 1 for existing rows. Is there an easier way than using a select statement with a union?

You could use:
merge into companies (id, version, last_modified)
values(?, coalesce((select version + 1 from companies where id = ?), 0), NOW())

Thomas's answer of
merge into companies (id, version, last_modified)
values(?, coalesce((select version + 1 from companies where id = ?), 0), NOW())
Becomes rather cumbersome if you (as I do) want to conditionally insert or update several fields - you need a coalesce((select ...), default) for each one!
It would appear that a more general answer needs to be two statements:
MERGE INTO companies (id) key (id) VALUES (?)
UPDATE companies SET version=1+IFNULL(version,0), otherfields... WHERE id=?
In other words: don't use MERGE for multiple conditional changes (where you need an expression and not just a value) in a record.
I'd love to be proven wrong on this...

Related

Oracle select rows from a query which are not exist in another query

Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query

Merge using update insert new rows

I have below merge query where i want to update the date and for perfromance issue i am using ROWID logic.
But i would like to know does it anyhow inserts new rows ? I just want to update the table TEST_GRP and dont want any insertion of new rows.
As i am using ROWID logic for the first time i am really not sure whether it insert new rows or just update the table.
MERGE INTO TEST_GRP tgt
USING (SELECT ID,
ROWID r_id,
row_number() over (partition by ID ORDER BY DT_DATE) rn
FROM TEST_GRP) src
ON (tgt.rowid = src.r_id AND src.rn = 1)
WHEN MATCHED THEN
UPDATE SET DT_DATE = to_date('01.01.2017', ''dd.mm.yyyy'')
WHERE DT_DATE != to_date('01.01.2016', ''dd.mm.yyyy'')
and DB_NAME = 'ARD';
It will update the rows with no problem and does not insert new rows.
At your UPDATE statment probalie it can cause you problem the quote at format mask SET DT_DATE = to_date('01.01.2017', ''dd.mm.yyyy'') DT_DATE != to_date('01.01.2016', ''dd.mm.yyyy'')
You don't have to add insert clause to mergestatement as stated in the docs :
merge_update_clause ... You can specify this clause by itself or with the
merge_insert_clause
And you don't have in it your code so not insert(s) will happen.
merge_insert_clause:

Simple condition break down query optimizer and its performance

I have a simple query:
select top 10 *
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2
This query is usually executed for 0.5s:
But the same query with uncommented condition is executed for 5s (!!!) and have a very strange execution plan (Revisions and SqlServers are joined although they have no linked columns and the most selective condition "r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2" is executed at the end of query processing:
ValidationStatus is ordinary int not null column.
Columns Revision2UploadLocations.RevisionId, Revisions.DatabaseId, Databases.InstanceId are indexed.
Here is description of tables:
CREATE TABLE [SqlServers]
(
[Id] int identity(1,1) NOT NULL CONSTRAINT PK_SqlServers PRIMARY KEY,
...
)
CREATE TABLE [Databases](
[Id] int identity(1,1) NOT NULL CONSTRAINT PK_Databases PRIMARY KEY,
[InstanceId] int NOT NULL,
[Name] nvarchar(128) NOT NULL,
...
CONSTRAINT FK_Databases_SqlServers FOREIGN KEY ([InstanceId]) REFERENCES [SqlServers]([Id])
)
CREATE INDEX [IX_Databases_DatabaseId] ON [Databases] ([InstanceId] ASC)
CREATE TABLE [Revisions]
(
[Id] int identity(1, 1) NOT NULL,
[DatabaseId] int NOT NULL,
[BackupStatus] tinyint NOT NULL,
[ValidationStatus] tinyint NOT NULL,
...
CONSTRAINT PK_Revisions PRIMARY KEY([Id]),
CONSTRAINT FK_Revisions_Databases FOREIGN KEY ([DatabaseId]) REFERENCES [Databases]([Id])
)
CREATE INDEX [IX_Revisions_DatabaseId] ON [Revisions] ([DatabaseId] ASC)
CREATE TABLE [Revision2UploadLocations]
(
[Id] int NOT NULL IDENTITY (1, 1) CONSTRAINT PK_Revision2UploadLocations PRIMARY KEY,
[Status] int NOT NULL,
RevisionId int NOT NULL,
[ChecksumWasSent] bit NOT NULL,
CONSTRAINT FK_r2l_Revisions FOREIGN KEY ([RevisionId]) REFERENCES [Revisions]([Id])
)
CREATE INDEX [IX_Revision2UploadLocations_RevisionId] ON [Revision2UploadLocations] ([RevisionId] ASC)
How I can improve performance of this query?
EDIT Now I have some more details:
Some tables (SqlServers and Databases) have 1-10 records, but Revisions and Revision2UploadLocations) have 500K+ records, so query optimize decide to use full scan instead index search for small tables and take it first.
Query Performance Tuning (SQL Server Compact):
A small table is one whose contents fit in one or just a few data pages. Avoid indexing very small tables because it is typically more efficient to do a table scan.
As a temprary solution I tried to use query hint FORCE ORDER: Query Hint (SQL Server Compact)
and response time decreased from 5sec to 0.5sec.
But I don't think that it's a good solution.
The Geoffrey's solution doesn't give you the expected result.
The first statement selects 10 rows without garanties that their r.ValidationStatus are 2 or 3. So finaly, you can get less than 10 rows (or even no rows at all).
I think you can rewrite you query as this:
SELECT top 10 *
FROM Revisions r
INNER JOIN Revision2UploadLocations r2l
ON r2l.RevisionId = r.Id
AND r2l.[ChecksumWasSent] = 0
AND r2l.Status = 2
INNER JOIN [Databases] [D] on [D].[Id] = [R].[DatabaseId]
INNER JOIN [SqlServers] [S] on [S].[Id] = [D].[InstanceId]
WHERE r.ValidationStatus in (2, 3)
And if r2l.[ChecksumWasSent] datatype is bit (boolean) with :
more 0 than 1, you can create an index on RevisionId + Status
very much more 1 than 0, you can create and inde RevisionId + ChecksumWasSent + Status
I have found in the past if I insert first to a temp table the first part of your query, with the field you want to further filter on ("ValidationStatus"], then query your temp table the performance/speed is much better.
So the initial query would be this:
select *
into #tmp
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2
then the final select would be:
select * from #tmp
where ValidationStatus in (2,3)
No need for indexes, and I know its weird how the optimizer doesn't always work but this approach has been useful to me several times in the past.

SQL Query Performance with count

I have 2 tables, COMPANY and EMPLOYEE.
COMPANY_ID is the primary key of the COMPANY table and foreign key for EMPLOYEE table. The COMPANY_ID is a 10 digit number. We are generate a 3 number combination and query the database.
The select statement has regex to bulk load the company based on COMPANY_ID. The query is executed multiple times with different patterns
i.e.
regexp_like(COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)') .
Existing query looks something like this -
select *
from COMPANY company
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
The new requirement is to retrieve the company information along with the employee count. For example if a company has 10 employees, then the query should return all the columns of the COMPANY table, along with employee count i.e. 10
This is the select statement that I came up with -
select
nvl(count_table.cont_count, 0), company.*
from
COMPANY company,
(select company.COMPANY_ID, count(company.COMPANY_ID) as cont_count
from COMPANY company, EMPLOYEE employee
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and company.CONTACT_ID = employee.CONTACT_ID
group by (company.COMPANY_ID)) count_table
where
regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Above query works, but it takes double the time compared to the previous statement. Is there a better way to retrieve the employee count?
Note: Oracle database is in use.
You don't need to execute that expensive REGEXP_LIKE twice:
select nvl(count_table.cont_count,0),company.*
from COMPANY company
,( select employee.COMPANY_ID, count(employee.COMPANY_ID) as cont_count
from EMPLOYEE employee
group by (employee.COMPANY_ID)
) count_table
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Or you could use a scalar subquery:
select company.*
, (select count(*)
from employee e
where e.company_id = c.company_id
)
from COMPANY c
where regexp_like(c.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
And personally I would ditch the slow REGEXP_LIKE for something like:
where substr(c.company_id,1,3) between '000' and '009'
The derived table does not add value, thus I would get rid of it and use a scalar query (because I do not know all of your columns in the company table to properly do a group by):
select c.*,
nvl(
(select count(1)
from employee emp
where emp.company_id = c.company_id
),0) employee_count
from company c
where regexp_like(c.company_id, '^(000|001|002|003|004|005|006|007|008|009)')
Also, if performance is still an issue, I would consider modifying your where statement to not use a regexp.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Addendum
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I see that the question explicitly identifies that the employee table has company_id as a foreign key. Since this is clarified, I am removing this statement:
The data model for these tables is not intuitive (would you not have
company_id as a foreign key in the employees table?).

Rownum in the join condition

Recently I fixed the some bug: there was rownum in the join condition.
Something like this: left join t1 on t1.id=t2.id and rownum<2. So it was supposed to return only one row regardless of the “left join”.
When I looked further into this, I realized that I don’t understand how Oracle evaluates rownum in the "left join" condition.
Let’s create two sampe tables: master and detail.
create table MASTER
(
ID NUMBER not null,
NAME VARCHAR2(100)
)
;
alter table MASTER
add constraint PK_MASTER primary key (ID);
prompt Creating DETAIL...
create table DETAIL
(
ID NUMBER not null,
REF_MASTER_ID NUMBER,
NAME VARCHAR2(100)
)
;
alter table DETAIL
add constraint PK_DETAIL primary key (ID);
alter table DETAIL
add constraint FK_DETAIL_MASTER foreign key (REF_MASTER_ID)
references MASTER (ID);
prompt Disabling foreign key constraints for DETAIL...
alter table DETAIL disable constraint FK_DETAIL_MASTER;
prompt Loading MASTER...
insert into MASTER (ID, NAME)
values (1, 'First');
insert into MASTER (ID, NAME)
values (2, 'Second');
commit;
prompt 2 records loaded
prompt Loading DETAIL...
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (1, 1, 'REF_FIRST1');
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (2, 1, 'REF_FIRST2');
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (3, 1, 'REF_FIRST3');
commit;
prompt 3 records loaded
prompt Enabling foreign key constraints for DETAIL...
alter table DETAIL enable constraint FK_DETAIL_MASTER;
set feedback on
set define on
prompt Done.
Then we have this query :
select * from master t
left join detail d on d.ref_master_id=t.id
The result set is predictable: we have all the rows from the master table and 3 rows from the detail table that matched this condition d.ref_master_id=t.id.
Result Set
Then I added “rownum=1” to the join condition and the result was the same
select * from master t
left join detail d on d.ref_master_id=t.id and rownum=1
The most interesting thing is that I set “rownum<-666” and got the same result again!
select * from master t
left join detail d on d.ref_master_id=t.id and rownum<-666.
Due to the result set we can say that this condition was evaluated as “True” for 3 rows in the detail table. But if I use “inner join” everything goes as supposed to be.
select * from master t
join detail d on d.ref_master_id=t.id and rownum<-666.
This query doesn’t return any row,because I can't imagine rownum to be less then -666 :-)
Moreover, if I use oracle syntax for outer join, using “(+)” everything goes well too.
select * from master m ,detail t
where m.id=t.ref_master_id(+) and rownum<-666.
This query doesn’t return any row too.
Can anyone tell me, what I misunderstand with outer join and rownum?
ROWNUM is a pseudo-attribute of result sets, not of base tables. ROWNUM is defined after rows are selected, but before they're sorted by an ORDER BY clause.
edit: I was mistaken in my previous writeup of ROWNUM, so here's new information:
You can use ROWNUM in a limited way in the WHERE clause, for testing if it's less than a positive integer only. See ROWNUM Pseudocolumn for more details.
SELECT ... WHERE ROWNUM < 10
It's not clear what value ROWNUM has in the context of a JOIN clause, so the results may be undefined. There seems to be some special-case handling of expressions with ROWNUM, for instance WHERE ROWNUM > 10 always returns false. I don't know how ROWNUM<-666 works in your JOIN clause, but it's not meaningful so I would not recommend using it.
In any case, this doesn't help you to fetch the first detail row for each given master row.
To solve this you can use analytic functions and PARTITION, and combine it with Common Table Expressions so you can access the row-number column in a further WHERE condition.
WITH numbered_cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY t.id ORDER BY d.something) AS rn
FROM master t LEFT OUTER JOIN detail d ON d.ref_master_id = t.id
)
SELECT *
FROM numbered_cte
WHERE rn = 1;
if you want to get the first three values from the join condition change the select statement like this.
select *
from (select *
from master t left join detail d on d.ref_master_id=t.id)
where rownum<3;
You will get the required output. Take care on unambigiously defined column names when using *
Let me give an absolute answer which u can run directly with out making any changes to the code.
select *
from (select t.id,t.name,d.id,d.ref_master_id,d.name
from master t left join detail d on d.ref_master_id=t.id)
where rownum<3;
A ROWNUM filter doesn't make any sense in a join, but it isn't being rejected as invalid.
The explain plan will either include the ROWNUM filter or exclude it. If it includes it, it will apply the filter to the detail table after applying the other join condition(s). So if you put in ROWNUM=100 (which will never be satisfied) all the detail rows are excluded and then the outer join kicks in.
If you put in ROWNUM=1 it seems to drop the filter.
And if you query
with
a as (select rownum a_val from dual connect by level < 10),
b as (select rownum*2 b_val from dual connect by level < 10)
select * from a left join b on a_val < b_val and rownum in (1,3);
you get something totally weird.
It probably should be rejected as an error, so expect nonsensical things to happen

Resources