In my database I have a table with column that indicates the code of each record ( aside from ID column ). this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record. Now the table has the column of codes with the following order :
+------+
code
+------+
1
+------+
2
+------+
3
+------+
5
+------+
I want a query to return 4 as the result.
Note that this query is highly frequent in my system and the best query with minimum execution time will be appreciated.
Is using a self-join acceptable? If so:
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL
UNION SELECT 5 FROM DUAL)
-- request:
SELECT COALESCE(MIN(d1.code+1),1)
FROM data d1 LEFT JOIN data d2 ON d1.code+1 = d2.code
WHERE d2.code IS NULL;
This will build the list of data.code without a successor. And using MIN(...+1) you will get the first empty slot. I used COALESCE(...) in order to handle the specific case where there isn't any entry in the data table.
And alternate form using a sequence generator might lead to better performances as is does not require the whole table to be traversed in order to perform the aggregate function MIN():
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 5 FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL)
-- request:
SELECT T.code FROM (SELECT d1.code
FROM (SELECT LEVEL code FROM DUAL CONNECT BY LEVEL < 9999) d1 LEFT JOIN data d2
ON d1.code = d2.code
WHERE d2.code IS NULL
ORDER BY d1.code ASC
) T WHERE ROWNUM < 2
The drawback is you now have an upper limit hard-coded. It might be dynamically inferred from the data table though. So is is not really blocking. I let you compare timings yourself.
this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record
Please note however this will lead to a race condition if two concurrent sessions try to insert a row at the same time. Given your example, they will both try to insert a row with code = 4 -- obviously both will not succeed in doing so as your column is unique...
I recently use the code below:
SELECT t1.id+1
FROM table t1
LEFT OUTER JOIN table t2 ON (t1.id + 1 = t2.id)
WHERE t2.id IS NULL
/* and rownum = 1 Need to use a sub select if you want this to work */
ORDER BY t1.id;
I run it every time that I need to insert a new row and use the minimum unused id.
I hope it works for your purposes.
select level unusedval from dual connect by level < 10
minus
select tno from t2);
you can change level condition dependents on max value.
Related
Imagine I have a table with some field one of which is array off date.
as below
col1 col2 alldate Max_date
1 2 ["2021-02-12","2021-02-13"] "2021-02-13"
2 3 ["2021-01-12","2021-02-13"] "2021-02-13"
4 4 ["2021-01-12"] "2021-01-12"
5 3 ["2021-01-11","2021-02-12"] "2021-02-12"
6 7 ["2021-02-13"] "2021-02-13"
I need to write a query such that to select only the one which has max date in there array. And there is a column which has max date as well.
Like the select statement should show
col1 col2 alldate Max_date
1 2 ["2021-02-12","2021-02-13"] "2021-02-13"
2 3 ["2021-01-12","2021-02-13"] "2021-02-13"
6 7 ["2021-02-13"] "2021-02-13"
The table is huge so a optimized query is needed.
Till now I was thinking of
select col1, col2, maxdate
from t1 where array_contains((select max(max_date) from t1)::variant,date));
But to me it seems running select statement per query is a bad idea.
Any Suggestion
If you want pure speed using lateral flatten is 10% faster than the array_contains approach over 500,000,000 records on a XS warehouse. You can copy paste below code straight into snowflake to test for yourself.
Why is the lateral flattern approach faster?
Well if you look at the query plans the optimiser filters at first step (immediately culling records) where as the array_contains waits until the 4th step before doing the same. The filter is the qualifier of the max(max_date) ...
Create Random Dataset:
create or replace table stack_overflow_68132958 as
SELECT
seq4() col_1,
UNIFORM (1, 500, random()) col_2,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_1,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_2,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_3,
ARRAY_CONSTRUCT(random_date_1, random_date_2, random_date_3) date_array,
greatest(random_date_1, random_date_2, random_date_3) max_date,
to_array(greatest(random_date_1, random_date_2, random_date_3)) max_date_array
FROM
TABLE (GENERATOR (ROWCOUNT => 500000000)) ;
Test Felipe/Mike approach -> 51secs
select
distinct
col_1
,col_2
from
stack_overflow_68132958
qualify
array_contains(max(max_date) over () :: variant, date_array);
Test Adrian approach -> 47 secs
select
distinct
col_1
, col_2
from
stack_overflow_68132958
, lateral flatten(input => date_array) g
qualify
max(max_date) over () = g.value;
I would likely use a CTE for this, like:
WITH x AS (
SELECT max(max_date) as max_max_date
FROM t1
)
select col1, col2, maxdate
from t1
cross join x
where array_contains(x.max_max_date::variant,alldate);
I have not tested the syntax exactly and the data types might vary things a bit, but the concept here is that the CTE will be VERY fast and return a single record with a single value. A MAX() function leverage metadata in Snowflake, so it won't even use a warehouse to get it.
That said, the Snowflake profiler is pretty smart, so your query might actually create the exact same query profile as this statement. Test them both and see what the Profile looks like to see if it truly makes a difference.
To build on Mike's answer, we can do everything in the QUALIFY, without the need for a CTE:
with t1 as (
select 'a' col1, 'b' col2, '2020-01-01'::date maxdate, array_construct('2020-01-01'::date, '2018-01-01', '2017-01-01') alldate
)
select col1, col2, alldate, maxdate
from t1
qualify array_contains((max(maxdate) over())::variant, alldate)
;
Note that you should be careful with types. Both of these are true:
select array_contains('2020-01-01'::date::string::variant, array_construct('2020-01-01', '2019-01-01'));
select array_contains('2020-01-01'::date::variant, array_construct('2020-01-01'::date, '2019-01-01'));
But this is false:
select array_contains('2020-01-01'::date::variant, array_construct('2020-01-01', '2019-01-01'));
You have some great answers, which I only saw, after i wrote mine up.
If your data types, match, you should be good to go, copy paste direct into snowflake ... and this should work.
create or replace schema abc;
use schema abc;
create or replace table myarraytable(col1 number, col2 number, alldates variant, max_date timestamp_ltz);
insert into myarraytable
select 1,2,array_construct('2021-02-12'::timestamp_ltz,'2021-02-13'::timestamp_ltz), '2021-02-13'
union
select 2,3,array_construct('2021-01-12'::timestamp_ltz,'2021-02-13'::timestamp_ltz),'2021-02-13'
union
select 4,4,array_construct('2021-01-12'::timestamp_ltz) , '2021-01-12'
union
select 5,3,array_construct('2021-01-11'::timestamp_ltz,'2021-02-12'::timestamp_ltz) , '2021-02-12'
union
select 6,7,array_construct('2021-02-13'::timestamp_ltz) , '2021-02-13';
select * from myarraytable
order by 1 ;
WITH cte_max AS (
SELECT max(max_date) as max_date
FROM myarraytable
)
select myarraytable.*
from myarraytable, cte_max
where array_contains(cte_max.max_date::variant, alldates)
order by 1 ;
I have a table in mysql database this data.
id date close previous_close
1 07-10-2020 200 300
2 06-10-2020 300 1000
3 05-10-2020 0 1000
4 04-10-2020 1000 15
I've had a look at using the lag() function but can't get my head round it. How can I craft a query such that the calculated column previous_close obtains the most recently available value in the close column where it's not zero?
Here is one way:
WITH cte AS (
SELECT *,
MAX(CASE WHEN close > 0 THEN date END) OVER
(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS last_date
FROM yourTable
)
SELECT
t1.id,
t1.date,
t1.close,
t2.close AS previous_close
FROM cte t1
LEFT JOIN yourTable t2
ON t2.date = t1.last_date
ORDER BY t1.date DESC;
The strategy here is that the CTE finds the latest date corresponding to a non zero close occurring strictly before each given row, as sorted by date. Then, all we have to do is to join this CTE back to your original table to find the actual matching latest previous close value.
I have a table student_info, it has column "status", status can be P (present), A (absent), S (ill), T ( transfer), L (left).
I am looking for expected output as below.
status count(*)
P 12
S 1
A 2
T 0
L 0
But output is coming like as below:
Status Count(*)
P 12
S 1
A 2
we need rows against status T and L as well with count zero though no record exist in DB.
#mkuligowski's approach is close, but you need an outer join between the CTE providing all of the possible status values, and then you need to count the entries that actually match:
-- CTE to generate all possible status values
with stored_statuses (status) as (
select 'A' from dual
union all select 'L' from dual
union all select 'P' from dual
union all select 'S' from dual
union all select 'T' from dual
)
select ss.status, count(si.status)
from stored_statuses ss
left join student_info si on si.status = ss.status
group by ss.status;
STATUS COUNT(SI.STATUS)
------ ----------------
P 12
A 2
T 0
S 1
L 0
The CTE acts as a dummy table holding the five statuses you want to count. That is then outer joined to your real table - the outer join means the rows from the CTE are still included even if there is no match - and then the rows that are matched in your table are counted. That allows the zero counts to be included.
You could also do this with a collection:
select ss.status, count(si.status)
from (
select column_value as status from table(sys.odcivarchar2list('A','L','P','S','T'))
) ss
left join student_info si on si.status = ss.status
group by ss.status;
It would be preferable to have a physical table which holds those values (and their descriptions); you could also then have a primary/foreign key relationship to enforce the allowed values in your existing table.
If all the status values actually appear in your table, but you have a filter which happens to exclude all rows for some of them, then you could get the list of all (used) values from the table itself instead of hard-coding it.
If your initial query was something like this, with a completely made-up filter:
select si.status, count(*)
from student_info si
where si.some_condition = 'true'
group by si.status;
then you could use a subquery to get all the distinct values from the unfiltered table, outer join from that to the same table, and apply the filter as part of the outer join condition:
select ss.status, count(si.status)
from (
select distinct status from student_info
) ss
left join student_info si on si.status = ss.status
and si.some_condition = 'true'
group by ss.status;
It can't stay as a where clause (at least here, where it's applying to the right-hand-side of the outer join) because that would override the outer join and effectively turn it back into an inner join.
You should store somewhere your statuses (pherhaps in another table). Otherwise, you list them using subquery:
with stored_statuses as (
select 'P' code, 'present' description from dual
union all
select 'A' code, 'absent' description from dual
union all
select 'S' code, 'ill' description from dual
union all
select 'T' code, 'transfer' description from dual
union all
select 'L' code, 'left' description from dual
)
select ss.code, count(*) from student_info si
left join stored_statuses ss on ss.code = si.status
group by ss.code
I have to fetch the first and last row of the table in Toad.
I have used the following query
select * from grade_master where rownum=(select max(rownum) from grade_master)
select * from grade_master where rownum=1
The second query works to fetch the first row. but the first not working. Anyone please help me.
Thanks in advance
Such request makes sense if you specify sort order of the results - there are no such things in database as "first" and "last" rows if sort order is not specified.
SQL> with t as (
2 select 'X' a, 1 b from dual union all
3 select 'C' , 2 from dual union all
4 select 'A' a, 3 b from dual
5 )
6 select a, b, decode(rn, 1, 'First','Last')
7 from (
8 select a, b, row_number() over(order by a) rn,
9 count(*) over() cn
10 from t
11 )
12 where rn in (1, cn)
13 order by rn
14 /
A B DECOD
- ---------- -----
A 3 First
X 1 Last
In oracle the data is not ordered until you specify the order in you sql statement.
So when you do:
select * from grade_master
oracle will give the rows in anyway it want wants.
OTOH if you do
select * from grade_master order by id desc
Then oracle will give the rows back ordered by id descending.
So to get the last row you could do this:
select *
from (select * from grade_master order by id desc)
where rownum = 1
The rownum is determined BEFORE the "order by" clause is assessed, so what this query is doing is ordering the rows descending (the inside query) and then giving this ordered set to the outer query. The outer gets the first row of the set then returns it.
Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";