I have two queries in Oracle SQL that are equivalent.
SELECT ... FROM TABLE WHERE timestamp = TO_DATE('2017-07-01', 'YYYY-MM-DD')
and
SELECT ... FROM TABLE WHERE
timestamp >= TO_TIMESTAMP('2017-07-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND
timestamp < TO_TIMESTAMP('2017-07-02 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
Generally, I need to run this everyday (automated) so the first query will suffice for the application. However, for the first few runs I need some custom date-time boundaries, so I might manually intervene and use the second query instead.
What I observed is that the first one run faster. Under the hood, is this really the case? Is the performance difference significant enough? Can someone explain?
Devil is in the details.
1) How many records in the table?
2) How many records satisfy
timestamp = TO_DATE('2017-07-01', 'YYYY-MM-DD')
3) How many records satisfy
timestamp >= TO_TIMESTAMP('2017-07-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND
timestamp < TO_TIMESTAMP('2017-07-02 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
4) Does the table have stats collected? Does the timestamp column has histogram statistics?
5) Do you have an index on the timestamp column? Or it might be (sub)partitioned by timestamp?
It might be easier just send DDLs for both table and index - it will be really helpful.
Assuming you have timestamp column indexed, for the first query you're looking up by one value, in another case it is a range of value. So depending on stats and many other factors some of which are mentioned above, Oracle can choose to switch to full table scan for example, if it thinks that the second predicate returns much more rows so that its less expensive to read table directly.
I know it might be more questions than answers, but Oracle Database is very flexible and with flexibility comes complexity. Hope some of the above information will be helpful.
Also, a simple explain plan, sqlplus autrotrace or best case a 10053 trace or a 10046 trace can show a more definitive answere what's going on there.
Related
There are records in table for particular date. But when I query with that value, I am unable to filter the records.
select * from TBL_IPCOLO_BILLING_MST
where LAST_UPDATED_DATE = '03-09-21';
The dates are in dd-mm-yy format.
To the answer by Valeriia Sharak, I would just add a few things since your question is tagged Oracle. I was going to add this as a comment to her answer, but it's too long.
First, it is bad practice to compare dates to strings. Your query, for example, would not even execute for me -- it would end with ORA-01843: not a valid month. That is because Oracle must do an implicit type conversion to convert your string "03-09-21" to a date and it uses the current NLS_DATE_FORMAT setting to do that (which in my system happens to be DD-MON-YYYY).
Second, as was pointed out, your comparison is probably not matching rows due LAST_UPDATED_DATE having hours, minutes, and seconds. But a more performant solution for that might be:
...
WHERE last_update_date >= TO_DATE('03-09-21','DD-MM-YY')
AND last_update_date < TO_DATE('04-09-21','DD-MM-YY')
This makes the comparison without wrapping last_update_date in a TRUNC() function. This could perform better in either of the following circumstances:
If there is an index on last_update_date that would be useful in your query
If the table with last_update_date is large and is being joined to other tables (because it makes it easier for Oracle to estimate the number of rows from your table that are inputs to the join).
Your column might contain hours and seconds, but they can be hidden.
So when you filter on the date, oracle implicitly adds time to the date. So basically you are filtering on '03-09-21 00:00:00'
Try to trunc your column:
select * from TBL_IPCOLO_BILLING_MST
where trunc(LAST_UPDATED_DATE) = '03-09-21';
Hope, I understood your question correctly.
Oracle docs
TL;DR; Is there any way to tell knex.js to not use parameter binding? Instead inject the value into the raw query?
We are currently using knex.js in a Node.js environment with Oracle as our database. We have run into a case with poor query performance and have narrowed it down to the parameter binding on a partitioned table.
Our table is partitioned on a CREATE_DATE column and knex.js is generating a query that looks something like this:
select col1, col2 from my_table where create_date >= ? and create_date < ?
If I understand the Oracle documentation, and based on some testing, Oracle is using dynamic pruning in this case and is causing some pretty poor performance for us. If I manually re-run the query like this it is very fast:
select col1, col2 from my_table where create_date >= to_date('2020-05-20', 'YYYY-MM-DD') and create_date < to_date('2020-05-21', 'YYYY-MM-DD')
Running an explain plan on both of those queries gives vastly different performance results. The first one has a much higher cost than the second.
Is there any way to tell knex.js to use a literal value rather than use parameter binding?
As Knex is a query builder, it has a way to pass raw query, or part of it.
knex('my_table')
.columns('col1', 'col2')
.whereRaw("create_date >= to_date('2020-05-20', 'YYYY-MM-DD')")
.whereRaw("create_date < to_date('2020-05-21', 'YYYY-MM-DD')");
I have a query that selects records from a table that are older than 72 days.
SELECT id FROM TABLE_NAME WHERE TIMESTAMP <= SYSDATE - INTERVAL '72' HOUR;
The performance of this query is horrible, so I have added an index to the TIMESTAMP column.
This works fine with thousands of records, but when the record count is 10 million (even more, sometimes), I hardly see any performance improvement with the index.
My guess is that the arithmetic operation is killing the performance of the query.
Please tell me if there are any other approaches to speeding up this query.
Assuming that the timestamp column is of the type TIMESTAMP, the problem is that the implicit conversion from DATE (which is returned by SYSDATE) to TIMESTAMP kills the index.
You could add a function-based index or you could change the use of SYSDATE to SYSTIMESTAMP.
I have tried searches on Google and various other sites with no luck, even making the search terms as vague as possible.
I have a query with data across multiple days but I only want to select data that has the time between 21:45 and 22:45.
The temp table built with the whole data has the data column that was converted from to_date to to_char so changing it back to to_date or to_timestamp is necessary, I think.
The problem is I have tried both of those and get invalid month errors. For example to_date(complete_date, 'hh24:mi:ss') gives me the error.
I'm not sure how to filter for a timestamp interval without giving a hard coded date.
Many thanks in advance. I am using Oracle Sql and unfortunately I don't have the query at the moment. It's on the computer at work. If a reply comes and I am at work I can reply back with more information.
With details as (
select to_char(complete_date, 'yyyy-mm-dd hh24:mi:ss') complete_date,
to_char(complete_date, 'hh24:mi:ss') as ts from table
where complete_date between trunc(sysdate)-30 and trunc(sysdate) )
select * from details where ts between '21:45:00' and '22:45:00'
I was able to filter by timestamp by using:
Round(Extract(hour from to_timestamp(complete_date, 'yyyy-mm-dd hh24:mi:ss')) + (extract(minute from to_timestamp(complete_date, 'yyyy-mm-dd hh24:mi:ss'))/60),2) count_ts
Then filter for count_ts between 21.75 and 22.75. This will allow me to get any data between those times no matter the day.
Problem solved.
workitem_routing_stats table is having around 1000000 records .all records are acceesed thats why we are using full scan hint. it takes around 25 seconds to execute is there is any way to tune this query.
SELECT /*+ full(wrs) */
wrs.NODE_ID,
wrs.bb_id--,
SUM(CASE WHEN WRS.START_TS >= (SYSTIMESTAMP-NUMTODSINTERVAL(7,'day'))
AND wrs.END_TS <= SYSTIMESTAMP THEN (wrs.WORKITEM_COUNT) END) outliers_last_sevend,
SUM(CASE WHEN WRS.START_TS >= (SYSTIMESTAMP-NUMTODSINTERVAL(30,'day'))
AND wrs.END_TS <= SYSTIMESTAMP THEN (wrs.WORKITEM_COUNT) END)
outliers_last_thirtyd ,
SUM(CASE WHEN WRS.START_TS >= (SYSTIMESTAMP-NUMTODSINTERVAL(90,'day'))
AND wrs.END_TS <= SYSTIMESTAMP THEN (wrs.WORKITEM_COUNT) END)
outliers_last_ninetyd ,
SUM(wrs.WORKITEM_COUNT)outliers_year
FROM workitem_routing_stats wrs
WHERE wrs.START_TS BETWEEN (SYSTIMESTAMP-numtodsinterval(365,'day')) AND SYSTIMESTAMP
AND wrs.END_TS BETWEEN (SYSTIMESTAMP-numtodsinterval(365,'day')) AND SYSTIMESTAMP
GROUP BY wrs.NODE_ID,wrs.bb_id ;
You may range partition the table in a monthly manner on START_TS column. (will scan only the year you are interested in)
Secondly(not a very intelligent solution) you may add a parallel(wrs 4) hint if your storage is powerfull.
You can combine these two things.
a full scan is going to be painful in any case...
however - you may avoid some computation if you simply put in the proper numbers instead of calling the conversion functions:
(SYSTIMESTAMP-numtodsinterval(365,'day'))
should just be the same as
(SYSTIMESTAMP-365)
this should remove overhead of calling the function, and parsing the parameter string ('day')
one other possibility - it seems that maybe this data will be adding new timestamps as of today, but the rest is just history...
if this is the case, then you could add a summary table to hold the summarized historic information and only query this current table for the recent stuff, and UNION to the summary table for the older stuff.
you will then need to think through the JOB or other scheduled process to get the summaries populated, but it would save you a ton in this query time.