ClickHouse correlated subquery - clickhouse

I've some table. Each date contains data snapshots for 15 days ago and 50 days ahead. If I'll go to my table and run SELECT WHERE date = '2022-01-27' I will get rows with block_date from 2022-01-13 to 2022-03-18. I need to build reports per month, so at date = '2022-01-27' I need to see rows with block_date from 2022-01-01 to 2022-01-31. I can find this missing information if I go back a day.
I tried to run query below to achieve this, but I got an error and according to these docs: https://clickhouse.com/docs/en/sql-reference/operators/exists/ I can't use EXISTS here. How can I modify my query? I don't have any ideas.
SELECT
`date`,
block_date,
total_plan_volume_grp20
FROM schema.table
UNION ALL
SELECT
addDays(`date`, 1),
block_date,
total_plan_volume_grp20
FROM schema.table t1
where exists (
select true
from schema.table t2
where t2.`date` = addDays(t1.`date`, 1))
UNION ALL
SELECT
addDays(`date`, 2),
block_date,
total_plan_volume_grp20
FROM schema.table t1
where exists (
select true
from schema.table t2
where t2.`date` = addDays(t1.`date`, 2)

Related

Oracle SQL Developer get table rows older than n months

In Oracle SQL Developer, I have a table called t1 who have two columns col1 defined as NUMBER(19,0) and col2 defined as TIMESTAMP(3).
I have these rows
col1 col2
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
3 26/11/21 10:27:11,750000000
4 26/11/21 10:27:59,606000000
5 16/12/21 11:47:04,105000000
6 16/12/21 12:29:27,101000000
My sysdate looks like this:
select sysdate from dual;
SYSDATE
03/03/22
I want to create a stored procedure (SP) which will delete rows older than 2 months and displayed message n rows are deleted
But when i execute this statement
select * from t1 where to_date(TRUNC(col2), 'DD/MM/YY') < add_months(sysdate, -2);
I don't get the first 2 rows of my t1 table. I get more than 2 rows
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
How can i get these rows and deleted it please ?
In Oracle, a DATE data type is a binary data type consisting of 7 bytes (century, year-of-century, month, day, hour, minute and second). It ALWAYS has all of those components and it is NEVER stored with a particular formatting (such as DD/MM/RR).
Your client application (i.e. SQL Developer) may choose to DISPLAY the binary DATE value in a human readable manner by formatting it as DD/MM/RR but that is a function of the client application you are using and not the database.
When you show the entire value:
SELECT TO_CHAR(ADD_MONTHS(sysdate, -2), 'YYYY-MM-DD HH24:MI:SS') AS dt FROM DUAL;
Then it outputs (depending on time zone):
DT
2022-01-03 10:11:28
If you compare that to your values then you can see that 2022-01-03 12:00:00 is not "more than 2 months ago" so it will not be matched.
What you appear to want is not "more than 2 months ago" but "equal to or more than 2 months, ignoring the time component, ago"; which you can get using:
SELECT *
FROM t1
WHERE col2 < add_months(TRUNC(sysdate), -2) + INTERVAL '1' DAY;
or
SELECT *
FROM t1
WHERE TRUNC(col2) <= add_months(TRUNC(sysdate), -2);
(Note: the first query would use an index on col2 but the second query would not; it would require a function-based index on TRUNC(col2) instead.)
Also, don't use TO_DATE on a column that is already a DATE or TIMESTAMP data type. TO_DATE takes a string as the first argument and not a DATE or TIMESTAMP so Oracle will perform an implicit conversion using TO_CHAR and if the format models do not match then you will introduce errors (and since any user can set their own date format in their session parameters at any time then you may get errors for one user that are not present for other users and is very hard to debug).
db<>fiddle here
Perhaps just:
select *
from t1
where col2 < add_months(sysdate, -2);

Oracle query to find records took more than 24hrs to process

I am having a situation were I have to find out such records from the tables who takes more than 24 hrs two load in DW.
so for this I am having two tables
Table 1 :- Which contains the stats about each and every load
Table 2 :- Which contains the stats about when we received the each file to load
Now I want only those records which took more than 24 hrs to load.
The date on which I have received a file is in table 2 whereas when its load is finished in in table 1, so table2 may have more than 1 entries for each file.
I have developed a below query but it's taking more time
SELECT
rcd.file_date,
rcd.recived_on as "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on,'DY') as "Day",
round((rcd.loaded_On - rcd.recived_on)*24,2) as "time required"
FROM (
SELECT
tbl1.file_date,
(SELECT tbl2.recived_on
FROM ( SELECT recived_on
FROM table2
Where fileName = tbl1.feedName
order by recived_on) tbl2
WHERE rownum = 1) recived_on,
tbl1.loaded_On,
to_char(tbl2.recived_on,'DY'),
round((tbl1.loaded_On - tbl2.recived_on)*24,2)
FROM Table1 tbl1 ,
Table1 tbl2
WHERE
tbl1.id=tbl2.id
AND tbl1.FileState = 'Success'
AND trunc(loaded_On) between '25-Feb-2020' AND '03-Mar-2020'
) rcd
WHERE (rcd.loaded_On - rcd.recived_on)*24 > 24;
I think a lot of your problem most likely stems from the use of the subquery in your column list of your inner query. Maybe try using an analytic function instead. Something like this:
SELECT rcd.file_date,
rcd.recived_on AS "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on, 'DY') AS "Day",
round((rcd.loaded_On - rcd.recived_on) * 24, 2) AS "time required"
FROM (SELECT tbl1.file_date,
MIN(tbl2.recived_on) OVER (PARTITION BY tbl2.filename) AS recived_on,
tbl1.loaded_On
FROM Table1 tbl1
INNER JOIN Table1 tbl2 ON tbl1.id = tbl2.id
WHERE tbl1.FileState = 'Success'
AND trunc(loaded_On) BETWEEN '25-Feb-2020' AND '03-Mar-2020') rcd
WHERE (rcd.loaded_On - rcd.recived_on) * 24 > 24;
Also, you were selecting some columns in the inner query and not using them, so I removed them.

Add next unique value to SQL column

I have two tables which I am trying to join based on two criteria. One of the criteria is that a date from t1 is between a date in t2 and the next date in t2. The other is that the name from t1 matches the name from t2.
I.e. if t2 looks like this:
Record Name Date
1 A1234 2016-01-03 04:58:00
2 A1234 2015-12-15 08:34:00
3 A5678 2016-01-04 03:14:00
4 A1234 2016-01-05 21:06:00
Then:
Any records from t1 for Name A1234 with dates between 2016-01-03 04:58:00 and 2016-01-05 21:06:00 would be joined to record 1.
Any records from t1 for Name A1234 with dates between 2015-12-15 08:34:00 and 2016-01-03 04:58:00 would be joined to record 2
Any records from t1 for A1234 after the date of record 4 would be joined to record 4
Any records from t1 for A5678 would be joined to record 3 because there's only one date.
My initial approach is to use a correlated subquery to find the next date. However, due to a large number of records, I determined this would take over a year to execute because it searches all of t2 for the next later date during each iteration. Original SQLite:
CREATE TABLE outputtable AS SELECT * FROM t1, t2 d
WHERE t1.Name = d.Name AND t1.Date BETWEEN d.Date AND (
SELECT * FROM (
SELECT Date from t2
WHERE t2.Name = d.Name
ORDER BY Date ASC )
WHERE Date > d.Date
LIMIT 1 )
Now, I would like to find the next date only once for all records in t2 and create a new column in t2 that contains the next date. This way, I only search for the next date about 400,000 times instead of 56 billion times, significantly improving my performance.
Thus the output of the query I'm looking for would make t2 look like this:
Record Name Date Next_Date
1 A1234 2016-01-03 04:58:00 2016-01-05 21:06:00
2 A1234 2015-12-15 08:34:00 2016-01-03 04:58:00
3 A5678 2016-01-04 03:14:00 2999-12-31 23:59:59
4 A1234 2016-01-05 21:06:00 2999-12-31 23:59:59
Then I would be able to simply query whether t1.Date is between t2.Date and t2.Next_Date.
How can I build a query that will add the next date to a new column in t2?
Rather than add the new column, you should just be able to use a query like the one below to join the tables:
SELECT
T1.*,
T2_1.*
FROM
T1
INNER JOIN T2 T2_1 ON
T2_1.Name = T1.Name AND
T2_1.some_date < T1.some_date
LEFT OUTER JOIN T2 T2_2 ON
T2_2.Name = T1.Name AND
T2_2.some_date > T2_1.some_date
LEFT OUTER JOIN T2 T2_3 ON
T2_3.Name = T1.Name AND
T2_3.some_date > T2_1.some_date AND
T2_3.some_date < T2_2.some_date
WHERE
T2_3.Name IS NULL
You can do the same with NOT EXISTS, but this method often has better performance.
You can speed up (sub)queries by using proper indexes.
To check which indexes are actually used, use EXPLAIN QUERY PLAN.
Your original query, without any indexes, would be executed by SQLite 3.10.0 like this:
0|0|0|SCAN TABLE t1
0|1|1|SEARCH TABLE t2 AS d USING AUTOMATIC COVERING INDEX (name=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SCAN TABLE t2
1|0|0|USE TEMP B-TREE FOR ORDER BY
(The "automatic" index is created temporarily just for this query; the optimizer has estimated that this would still be faster than not using any index.)
In this case, you get the most optimal query plan by indexing all columns used for lookups:
create index i1nd on t1(name, date);
create index i2nd on t2(name, date);
0|0|1|SCAN TABLE t2 AS d
0|1|0|SEARCH TABLE t1 USING INDEX i1nd (name=? AND date>? AND date<?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE t2 USING COVERING INDEX i2nd (name=? AND date>?)
I've used this method on tables with around 1 mil rows with success. Obviously, creating an index that will cover this query will help performance.
This approach uses RANK to create a value to join against. After creating the RANK in a CTE (I use this for readability reasons, please correct for style or personal preference), use a sub-query to join rnk to rnk + 1; aka the next date.
Here's an example of what the code looks like using your sample values.
IF OBJECT_ID('tempdb..#T2') IS NOT NULL
DROP TABLE #T2
CREATE TABLE #T2
(
Record INT NOT NULL PRIMARY KEY,
Name VARCHAR(10),
[DATE] DATETIME,
)
INSERT INTO #T2
VALUES (1, 'A1234', '2016-01-03 04:58:00'),
(2, 'A1234', '2015-12-15 08:34:00'),
(3, 'A5678', '2016-01-04 03:14:00'),
(4, 'A1234', '2016-01-05 21:06:00');
WITH Rank_Dates
AS (Select *
,rank() OVER(PARTITION BY #t2.name ORDER BY #t2.date DESC) AS rnk
FROM #T2)
select RD1.Record,
RD1.Name,
RD1.DATE,
COALESCE (RD2.DATE, '2999-12-31 23:59:59') AS NEXT_DATE
FROM Rank_Dates RD1
LEFT JOIN Rank_Dates RD2
ON RD1.rnk = RD2.rnk + 1
AND RD1.Name = RD2.Name
ORDER BY RD1.Record -- ORDER BY is optional
;
EDIT: added code output below.
The code above produces the following output.
Record Name DATE NEXT_DATE
1 A1234 2016-01-03 04:58:00.000 2016-01-05 21:06:00.000
2 A1234 2015-12-15 08:34:00.000 2016-01-03 04:58:00.000
3 A5678 2016-01-04 03:14:00.000 2999-12-31 23:59:59.000
4 A1234 2016-01-05 21:06:00.000 2999-12-31 23:59:59.000
On a random note. Would using the CURRENT_TIMESTAMP in place of hard coding '2999-12-31 23:59:59.000' produce a similar result?

Aggregate only new rows from source table

I got one Source table with a timestamp column (YYYY.MM.DD HH24:MI:SS) and a target table with aggregated rows on daily basis (Date column: YYYY.MM.DD).
My Problem is: How do I bring new data from source to target and aggregate it?
I tried:
select
a.Sales,
trunc(a.timestamp,'DD') as TIMESTAMP,
count(1) as COUNT,
from
tbl_Source a
where trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY)from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
group by a.sales,
trunc(a.Timestamp,'DD')
The problem with that is: when I have a row with timestamp '2013.11.15 00:01:32' and the max day from target is the 14th of november, it will only aggregate the 15th. Would I use >= instead of > some rows would get loaded twice.
It looks like you are looking for a merge statement: If the day is already present in tbl_target then update the count else insert the record.
merge into tbl_target dest
using
(
select sales, trunc(timestamp) as theday , count(*) as sales_count
from tbl_Source
where trunc(timestamp) >= ( select nvl(max(time_to_day),to_date('01.01.1975','dd.mm.yyyy')) from tbl_target )
group by sales, trunc(timestamp)
) src
on (src.theday = dest.time_to_day)
when matched then update set
dest.sales_count = src.sales_count
when not matched then
insert (time_to_day, sales_count)
values (src.theday, src.sales_count)
;
As far as I can understand your question: you need to get everything since the last reload to target table.
The problem here: you need this date, but it is truncated during the update.
If my guesses are correct you cannot do anything except to store the date of reload as an additional column because there is no way to get it back from the data presented here.
about your query:
count(*) and count(1) are the same in performance (proved many times, at least in 10-11 versions) - do not make this count(1), looks really ugly
do not use nvl, use coalesce instead of it - it is much faster
I would write your query like that:
with t as (select max(b.time_to_day) mx from tbl_target b)
select a.sales,trunc(a.timestamp,'dd') as timestamp,count(*) as count
from tbl_source a,t
where trunc(a.timestamp,'dd') > t.mx or t.mx is null
group by a.sales,trunc(a.timestamp,'dd')
Does this fit your needs:
WHERE trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
i.e. instead of 2013-11-15 00:00:00 compare to 2013-11-16 23:59:59
Update:
This one?
WHERE trunc(a.timestamp,'DD') BETWEEN nvl((select MAX(b.TIME_TO_DAY) from ...) AND nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from ...)

"BETWEEN" SQL Keyword for Oracle Dates -- Getting an error in Oracle

I have dates in this format in my database "01-APR-12" and the column is a DATE type.
My SQL statement looks like this:
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
When I try to do it that way, I get this error -- ORA-01839: date not valid for month specified.
Can I even use the BETWEEN keyword with how the date is setup in the database?
If not, is there another way I can get the output of data that is in that date range without having to fix the data in the database?
Thanks!
April has 30 days not 31.
Change
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
to
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '30-APR-12');
and you should be good to go.
In case the dates you are checking for range from 1st day of a month to the last day of a month then you may modify the query to avoid the case where you have to explicitly check the LAST day of the month
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno
AND s.salestype = 1 AND (s.salesdate BETWEEN '01-APR-12' AND LAST_DAY(TO_DATE('APR-12', 'MON-YY'));
The LAST_DAY function will provide the last day of the month.
The other answers are missing out on something important and will not return the correct results. Dates have date and time components. If your salesdate column is in fact a date that includes time, you will miss out on any sales that happened on April 30 unless they occurred exactly at midnight.
Here's an example:
create table date_temp (temp date);
insert into date_temp values(to_date('01-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
insert into date_temp values(to_date('30-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
table DATE_TEMP created.
1 rows inserted.
1 rows inserted.
select * from date_temp where temp between '01-APR-2014' and '30-APR-2014';
Query Result: 01-APR-14
If you want to get all records from April that includes those with time-components in the date fields, you should use the first day of the next month as the second side of the between clause:
select * from date_temp where temp between '01-APR-2014' and '01-MAY-2014';
01-APR-14
30-APR-14

Resources