MonetDB SELECT query slow when we don't add single quote in where clause - monetdb

I have a table around 1.5TB, with 23 billion records. When I try to get the number of rows for a specific date, it takes around 30 mins if the date is not enclosed in single quotes, but takes 2 mins if it is. The date field is of type DECIMAL(8)
This query takes 2 mins:
select count(*) from rwow where date = '20160222';
But this one takes 30 mins:
select count(*) from rwow where date = 20160222;
They both return the same result. Could someone help me understand why this is?

Related

Oracle SQL Developer get table rows older than n months

In Oracle SQL Developer, I have a table called t1 who have two columns col1 defined as NUMBER(19,0) and col2 defined as TIMESTAMP(3).
I have these rows
col1 col2
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
3 26/11/21 10:27:11,750000000
4 26/11/21 10:27:59,606000000
5 16/12/21 11:47:04,105000000
6 16/12/21 12:29:27,101000000
My sysdate looks like this:
select sysdate from dual;
SYSDATE
03/03/22
I want to create a stored procedure (SP) which will delete rows older than 2 months and displayed message n rows are deleted
But when i execute this statement
select * from t1 where to_date(TRUNC(col2), 'DD/MM/YY') < add_months(sysdate, -2);
I don't get the first 2 rows of my t1 table. I get more than 2 rows
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
How can i get these rows and deleted it please ?
In Oracle, a DATE data type is a binary data type consisting of 7 bytes (century, year-of-century, month, day, hour, minute and second). It ALWAYS has all of those components and it is NEVER stored with a particular formatting (such as DD/MM/RR).
Your client application (i.e. SQL Developer) may choose to DISPLAY the binary DATE value in a human readable manner by formatting it as DD/MM/RR but that is a function of the client application you are using and not the database.
When you show the entire value:
SELECT TO_CHAR(ADD_MONTHS(sysdate, -2), 'YYYY-MM-DD HH24:MI:SS') AS dt FROM DUAL;
Then it outputs (depending on time zone):
DT
2022-01-03 10:11:28
If you compare that to your values then you can see that 2022-01-03 12:00:00 is not "more than 2 months ago" so it will not be matched.
What you appear to want is not "more than 2 months ago" but "equal to or more than 2 months, ignoring the time component, ago"; which you can get using:
SELECT *
FROM t1
WHERE col2 < add_months(TRUNC(sysdate), -2) + INTERVAL '1' DAY;
or
SELECT *
FROM t1
WHERE TRUNC(col2) <= add_months(TRUNC(sysdate), -2);
(Note: the first query would use an index on col2 but the second query would not; it would require a function-based index on TRUNC(col2) instead.)
Also, don't use TO_DATE on a column that is already a DATE or TIMESTAMP data type. TO_DATE takes a string as the first argument and not a DATE or TIMESTAMP so Oracle will perform an implicit conversion using TO_CHAR and if the format models do not match then you will introduce errors (and since any user can set their own date format in their session parameters at any time then you may get errors for one user that are not present for other users and is very hard to debug).
db<>fiddle here
Perhaps just:
select *
from t1
where col2 < add_months(sysdate, -2);

Oracle - Show 0 if no data for the month

i'm trying to show some averages over the past 12 months but there is no data for June/July so i want the titles for the months to display but just 0's in the 3 columns
currently it's only showing August - May which is 10 rows so it's throwing off formulas and charts etc.
select to_char(Months.Period,'YYYY/MM') As Period, coalesce(avg(ec.hours_reset),0) as AvgOfHOURSReset, coalesce(AVG(ec.cycles_reset),0) as AvgofCycles_Reset, Coalesce(AVG(ec.days_reset),0) as AvgofDAYS_Reset
from (select distinct reset_date as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months
left outer join engineering_compliance ec on ec.reset_date = months.Period
WHERE EC.EO = 'AT CHECK'
group by to_char(Months.Period,'YYYY/MM')
order by to_char(Months.Period,'YYYY/MM')
;
(select distinct to_char(reset_date,'YYYY/MM') as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months;
That query is pretty good, it's not far from working.
You would need to replace the Months table part. You want exactly one row per month, regardless of whether there's any data in the ec table.
You could maybe synthesize some data without going to any actual table in your own schema.
For example:
SELECT
extract(month from add_months(sysdate,level-1)) Row_Month,
extract(year from add_months(sysdate,level-1)) Row_Year,
to_char(add_months(sysdate,level-1),'YYYY/MM') Formatted_Date,
trunc(add_months(sysdate,level-1),'mon') Join_Date
FROM dual
CONNECT BY level <= 12;
gives:
ROW_MONTH,ROW_YEAR,FORMATTED_DATE,JOIN_DATE
6,2016,'2016/06',1/06/2016
7,2016,'2016/07',1/07/2016
8,2016,'2016/08',1/08/2016
9,2016,'2016/09',1/09/2016
10,2016,'2016/10',1/10/2016
11,2016,'2016/11',1/11/2016
12,2016,'2016/12',1/12/2016
1,2017,'2017/01',1/01/2017
2,2017,'2017/02',1/02/2017
3,2017,'2017/03',1/03/2017
4,2017,'2017/04',1/04/2017
5,2017,'2017/05',1/05/2017
Option 1: Write that subselect inline into your query, replacing sysdate with the start month and the figure 12 on the last line can be altered for the number of months you want in the series.
Option 2 (can be reused more conveniently in a variety of situations and queries): Write a view with a long series of months (for example, Jan 1970 to Dec 2199) using my SQL above. You can then join to that view on join_date with whatever start and end months you want. It will give you one row per month and you can pick up the formatted date from its column.

95 percentile hourly data per day in HP Vertica

I was attempting to find the 95 percentile of all the values per hour and display them at daily level. Here is snippet of the code I am working on:
select distinct columnA
,date(COLLECTDATETIME) as date_stamp
,hour(COLLECTDATETIME) as hour_stamp
,PERCENTILE_DISC(0.95) WITHIN GROUP(order by PARAMETER_VALUE)
over (PARTITION BY hour(COLLECTDATETIME)) as max_per_day
from TableA
where
columnA = 'abc'
and PARAMETER_NAME = 'XYZ';
Right now the result set gives me the same value per hour each day, but it doesn't the 95 percentile value for a given hour per day.
Just a thought, but have you tried converting PARAMETER_VALUE into one of the data types that are accepted by the ORDER BY expression (INTEGER, FLOAT, INTERVAL, or NUMERIC)?
For example, you could try WITHIN GROUP(order by PARAMETER_VALUE::FLOAT).
You need to add an aggregate query on the top of the subquery (the percentile). Either max/min (because in each scope the percentiles are the same) percentile_disc is an analytics function but not aggregate function
SELECT dateid,
hour,
MAX(max_per_day) as max_per_day
FROM (
SELECT date(COLLECTDATETIME) AS dateid,
hour(COLLECTDATETIME) AS hour,
percentile_disc(0.95) WITHIN GROUP(order by PARAMETER_VALUE) OVER (PARTITION BY date(COLLECTDATETIME), hour(COLLECTDATETIME)) as max_per_day
WHERE ......
)
GROUP BY dateid, hour

Oracle SQL To compare 1 or 2 or more dates to be within a given period

I have a scenario where I need to compare 2 or more dates for given period.
I'm able to succeed when comparing 1 date to a period using between function. But challenge is when I have 2 dates to compare in parallel, getting single row sub query error
select A
from ORDER
where Date1 between sysdate and (sysdate-10)
Above query works fine for single date, please help to get a solution when I have Date 1 and Date 2 and need to compare against the same period (sysdate and (sysdate-10)) and I may have more than 2 dates as well.
Thanks
Shankar
Not having a proper description of your tables or the data they contain, it is difficult to know what you want.
Perhaps something like:
SELECT A
FROM ORDER
GROUP BY A
HAVING COUNT( CASE WHEN datecolumn BETWEEN SYSDATE - 10 AND SYSDATE THEN 1 ELSE NULL END ) > 0

Optimised Hive query with JOIN , having million records

I have 2 tables-
bpm_agent_data - 40 Million records , 5 Columns
bpm_loan_data - 20 Million records, 5 Columns
Now I ran a query in Hive-
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data where bpm_loan_data.id = bpm_agent_data.id;
which is taking long long time to complete.
What should be the ideal way to write the query in HIVE so that Reducer must not take so much time.
Found the solution for the above query,
replaced where with ON
select count(bpm_agent_data.AgentID), count(bpm_loan_data.LoanNumber) from bpm_agent_data JOIN bpm_loan_data ON( bpm_loan_data.id = bpm_agent_data.id);

Resources