Knex.js parameter binding with Oracle partitions - oracle

TL;DR; Is there any way to tell knex.js to not use parameter binding? Instead inject the value into the raw query?
We are currently using knex.js in a Node.js environment with Oracle as our database. We have run into a case with poor query performance and have narrowed it down to the parameter binding on a partitioned table.
Our table is partitioned on a CREATE_DATE column and knex.js is generating a query that looks something like this:
select col1, col2 from my_table where create_date >= ? and create_date < ?
If I understand the Oracle documentation, and based on some testing, Oracle is using dynamic pruning in this case and is causing some pretty poor performance for us. If I manually re-run the query like this it is very fast:
select col1, col2 from my_table where create_date >= to_date('2020-05-20', 'YYYY-MM-DD') and create_date < to_date('2020-05-21', 'YYYY-MM-DD')
Running an explain plan on both of those queries gives vastly different performance results. The first one has a much higher cost than the second.
Is there any way to tell knex.js to use a literal value rather than use parameter binding?

As Knex is a query builder, it has a way to pass raw query, or part of it.
knex('my_table')
.columns('col1', 'col2')
.whereRaw("create_date >= to_date('2020-05-20', 'YYYY-MM-DD')")
.whereRaw("create_date < to_date('2020-05-21', 'YYYY-MM-DD')");

Related

Is there a difference between using between vs '> & <' when querying a hive table partitioned on date string?

I have a use of selecting data from a large hive table partitioned on date (format : yyyyMMdd), the hive query is required to fetch few fields from 6 months of data (total 180 date partitions. Currently the query looks like :
SELECT field_1, field_2
FROM table
WHERE `date` BETWEEN '20181125' and '20190525'
Wanted to know if changing the query to use >= & <= makes any difference in terms of performance.
SELECT field_1, field_2
FROM table
WHERE `date`>='20181125' AND `date`<='20190525'
I cannot think of any significant changes in the performance while using < > instead of Between keyword .
How ever using IN keyword and listing down all the dates between the range will have a slight advantage over the other two scenarios .
SELECT field_1, field_2 FROM table WHERE dates in ('20181125','20181126',...,'20190524','20190525');
>=, <= and BETWEEN should generate the same execution plans, though it may be different in your Hive version.
Use EXPLAIN, it shows the query execution plan. Only plan can help to answer this question for sure. Check EXPLAIN DEPENDENCY, it prints input_partitions to be scanned and you will see if partition pruning works or not in each case.
If plans are the same for >=, <=, BETWEEN and IN then it works the same and the performance should be the same.

Oracle SQL TO_DATE and TO_TIMESTAMP

I have two queries in Oracle SQL that are equivalent.
SELECT ... FROM TABLE WHERE timestamp = TO_DATE('2017-07-01', 'YYYY-MM-DD')
and
SELECT ... FROM TABLE WHERE
timestamp >= TO_TIMESTAMP('2017-07-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND
timestamp < TO_TIMESTAMP('2017-07-02 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
Generally, I need to run this everyday (automated) so the first query will suffice for the application. However, for the first few runs I need some custom date-time boundaries, so I might manually intervene and use the second query instead.
What I observed is that the first one run faster. Under the hood, is this really the case? Is the performance difference significant enough? Can someone explain?
Devil is in the details.
1) How many records in the table?
2) How many records satisfy
timestamp = TO_DATE('2017-07-01', 'YYYY-MM-DD')
3) How many records satisfy
timestamp >= TO_TIMESTAMP('2017-07-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND
timestamp < TO_TIMESTAMP('2017-07-02 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
4) Does the table have stats collected? Does the timestamp column has histogram statistics?
5) Do you have an index on the timestamp column? Or it might be (sub)partitioned by timestamp?
It might be easier just send DDLs for both table and index - it will be really helpful.
Assuming you have timestamp column indexed, for the first query you're looking up by one value, in another case it is a range of value. So depending on stats and many other factors some of which are mentioned above, Oracle can choose to switch to full table scan for example, if it thinks that the second predicate returns much more rows so that its less expensive to read table directly.
I know it might be more questions than answers, but Oracle Database is very flexible and with flexibility comes complexity. Hope some of the above information will be helpful.
Also, a simple explain plan, sqlplus autrotrace or best case a 10053 trace or a 10046 trace can show a more definitive answere what's going on there.

How to use Oracle indexed field in a query

I have a large Oracle table with an indexed date_time field: "DISCONNECT_DATE"
When I use the following where clause my query runs quickly:
DISCONNECT_DATE > TO_DATE('01-DEC-2016', 'DD-MON-YYYY') AND
DISCONNECT_DATE < TO_DATE('01-JAN-2017', 'DD-MON-YYYY')
When I use the following where clause my query runs (very) slowly:
extract(month from disconnect_date) = '12' and
extract(year from disconnect_date) = '2016'
They are both more or less equivalent in their intentions. Why does the former work and the later not? (I don't think I have this problem in SQL SERVER)
(I am using PL SQL Developer to write the query)
The issue is the use of indexes. In the first, all the functions are on the "constant" side, not on the "column" side. So, Oracle can readily see that an index can be applied.
The logic that does indexing, though, doesn't understand extract(), so the index doesn't get used. If you want to use that construct, you can create an index on function calls:
create index idx_t_ddyear_ddmonth on t(extract(month from disconnect_date), extract(year from disconnect_date));
Note: extract() returns a number not a string, so you should get rid of the single quotes. Mixing data types can also confuse the optimizer.

Using Date in Where Clause

I am trying to place a date in a where clause. I want to update all rows in which the date column is before or after a certain date. How do I specify that I only want to update these columns. Here is the coding that I have so far (not including specific column names):
update table1
set column1 = value
where (select date from table2) < date;
Am I on the right track?
Also, could someone please explain the difference between SQL and PL/SQL. I am taking a class in PL/SQL at the moment. Whenever I post a question on this forum I say that I have a question in PL/SQL, but the people who answer my question say that a certain function - update/if/case/etc. - is a SQL statement and not a PL/SQL statement. What is the difference?
-Neil
Your update statement
update table1
set column1 = value
where (select date from table2) < date;
is correct and it will work but only if the inner query (select date from table2) returns a single row. If you are trying to compare to specific date you don't need the inner query, for example:
update table1
set column1 = value
where to_date('01/02/2012', 'DD/MM/YY') < date;
You can adjust date format mask to whatever format of data you prefer. to_date will convert from char to date type, and to_char will do the opposite.
SQL is a standardized query language that is supported by all compliant relational databases (with some proprietary extensions sometimes). SQL is not a programming language. PL/SQL is a procedural programming language that is supported on Oracle only (Postgres has similar syntax). PL/SQL is SQL + regular programming language features like conditional statements (if/else), loops (for), functions and procedures and such. PL/SQL is used whenever it's too difficult or impossible to get some data using SQL solely.
As Aleksey mentioned, your query is correct but you need to either [1] set conditions around the sub-SQL to only return ONE record or [2] make sure the data in tabl2 only has ONE record when it runs.
ie
If you have to refer to data from another table in your WHERE clause consider explicit joins (example in SQL Server) ...
update t1
set t1.column1 = value -- <-- some arbitary value here I assume?
from table1 t1
inner join table2 t2
on (t2.key = t1.key) -- you need to specify the primary keys of the tables here
where t2.date < t1.date
That way you are not assuming table2 has only one record. It can have many records as long as they relate to table1 via their keys/indexes and the WHERE clause simply makes sure you only UPDATE based on data from table2 that has a date LESS THAN the date in table1.

hibernate oracle rownum issue

SELECT * FROM (
select *
from tableA
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
on execution of this query, due to some Oracle optimization, the query takes about 14 minutes to execute . If I remove the where clause , the query executes in seconds. most of the columns of the table have indexes on them, including the ones mentioned above. I do not have much flexibility on the structure of the query as I use hibernate.
This query returns results instantly too, with the correct result:
SELECT *
FROM (
select *
from tableA,
dual
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
is there something I can do, using hibernate?
UPDATE: I use EntityManager.createQuery(), and I use setMaxResults(25) and setFirstResult() too. the query above is what hibernate's query looks like, upon observation of logs
I don't get the explain plans exactly matched to your queries, but it seems oracle using a different index for the two queries.
Can you create an index containing columnA and columnL?
If you have an index only containing columnA, you MIGHT be able to drop that without a large effect on performance of other queries.
An alternative would be to add a hint to use the index used in the faster query. But this would require you to use native sql.
this means you are using hibernate/jpa? If so, I guess you are using the EntityManager.createNativeQuery() to create the query? Try removing your where-restriction and use the .setMaxResults(25) on the Query instead.
Anyways, why do you need the outer-select? Wouldn't
select *
from tableA
where ColumnA = 'randomText'
AND ROWNUM <= 25
ORDER BY columnL ASC
produce the desired results?

Resources