How Does SELECT TOP N work in Sybase ASE - performance

I was wondering how this query is executed :
SELECT TOP 10 * FROM aSybaseTable
WHERE aCondition
The fact is that this query is taking too much time to return results.
So I was wondering if the query is smart enough to stop when the results reach 10 rows of if it returns all the possible results and then print only the 10 first rows.
Thanks in advance for your replies !

When using select top N the query is still executed fully, just the data page reads stop after the specified number of rows is affected. All the index page reads, and sorts still have to occur, so depending on the complexity of the where condition or subqueries, it can definitely still take time to execute. select top N is functionally similar to using set rowcount

Michael is right, but there is one special case here that really needs to be mentioned.
Query WILL execute faster and partially if no order by and group by clauses are used.
But this case is rarely useful since then you will get random N rows which fulfill the condition, in order they are physically located in table/index.

Related

Pagination with multiple join with large table size

In oracle 12c while trying to fetch data with doing multiple join and filtering the data, then doing pagination using rownum, the tables first gets joined and filtered and then result set ordered with order by and then data is fetched. so in case of large number of rows all the join operations are very costly. Even though we need just 100 out of 1 million result set still all data need to be prepared. Any idea to improvise this process?
If you only need a subset of data you could hint query with FIRST_ROWS(n).
Improved Response Time with FIRST_ROWS(n) Hint for ORDER BY Queries
You use the FIRST_ROWS(n) hint in cases where you want the first number (n) of rows in the shortest possible time.

Bind variables results in full table scan in Oracle

Checking the query cost on a table with 1 million records results in full table scan while the same query in oracle with actual values results in significant lesser cost.
Is this expected behaviour from Oracle ?
Is there a way to tell Oracle not to scan the full table ?
The query is scanning the full table when bind variables are used:
The query cost reduces significantly with actual variables:
This is a pagination query. You want to retrieve a handful of records from the table, filtering on their position in the filtered set. Your projection includes all the columns of the table, so you need to query the table to get the whole row. The question is, why do the two query variants have different plans?
Let's consider the second query. You are passing hard values for the offsets, so the optimizer knows that you want the eleven most recent rows in the sorted set. The set is sorted by an indexed column. The most important element is that the optimizer knows you want 11 rows. 11 is a very small sliver of one million, so using an indexed read to get the required rows is an efficient way of doing things. The path starts at the far end of the index, reads the last eleven entries and retrieves the rows.
Now, your first query has bind variables for the starting and finishing offsets and also for the number of rows to be returned. This is crucial: the optimizer doesn't know whether you want to return eleven rows or eleven thousand rows. So it opts for a very high cardinality. The reason for this is that index reads perform very badly for retrieving large numbers of rows. Full table scans are the best way of handling big slices of our tables.
Is this expected behaviour from Oracle ?
Now you understand this you will can see that the answer to this question is yes. The optimizer makes the best decision it can with the information we give it. When we provide hard values it can be very clever. When we provide vague data it has to guess; sometimes its guesses aren't the ones we expected.
Bind variables are very useful for running the same query with different values when the expected result set is similar. But using bind variables to specify ranges means the result sets can potentially vary tremendously in size.
Is there a way to tell Oracle not to scan the full table ?
If you can fix the pagesize, thus removing the :a2 parameter, that would allow the optimizer to produce a much more accurate plan. Alternatively, if you need to vary the pagesize within a small range (say 10 - 100) then you could try a /*+ cardinality (100) */ hint in the query; provided the cardinality value is within the right order of magnitude it doesn't have to be the precise value.
As with all performance questions, the devil is in the specifics. So you need to benchmark various performance changes and choose the best fit for your particular use case(s).

Order of multiple conditions in where clause in oracle [duplicate]

Let's say I have a table called PEOPLE having three columns, ID, LastName, and FirstName. None of these columns are indexed.
LastName is more unique, and FirstName is less unique.
If I do two searches:
select * from PEOPLE where FirstName="F" and LastName="L"
select * from PEOPLE where LastName="L" and FirstName="F"
My belief is the second one is faster because the more unique criterion (LastName) comes first in the where clause, and records will get eliminated more efficiently. I don't think the optimizer is smart enough to optimize the first SQL query.
Is my understanding correct?
No, that order doesn't matter (or at least: shouldn't matter).
Any decent query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query.
I know the SQL Server query optimizer will pick a suitable index - no matter which order you have your two conditions in. I assume other RDBMS will have similar strategies.
What does matter is whether or not you have a suitable index for this!
In the case of SQL Server, it will likely use an index if you have:
an index on (LastName, FirstName)
an index on (FirstName, LastName)
an index on just (LastName), or just (FirstName) (or both)
On the other hand - again for SQL Server - if you use SELECT * to grab all columns from a table, and the table is rather small, then there's a good chance the query optimizer will just do a table (or clustered index) scan instead of using an index (because the lookup into the full data page to get all other columns just gets too expensive very quickly).
The order of WHERE clauses should not make a difference in a database that conforms to the SQL standard. The order of evaluation is not guaranteed in most databases.
Do not think that SQL cares about the order. The following generates an error in SQL Server:
select *
from INFORMATION_SCHEMA.TABLES
where ISNUMERIC(table_name) = 1 and CAST(table_name as int) <> 0
If the first part of this clause were executed first, then only numeric table names would be cast as integers. However, it fails, providing a clear example that SQL Server (as with other databases) does not care about the order of clauses in the WHERE statement.
ANSI SQL Draft 2003 5WD-01-Framework-2003-09.pdf
6.3.3.3 Rule evaluation order
...
Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions is generally performed from left to right. However, it is implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or if the results of the expressions can be determined without completely evaluating all parts of the expression.
copied from here
No, all the RDBMs first start by analysing the query and optimize it by reordering your where clause.
Depending on which RDBM you are you using can display what is the result of the analyse (search for explain plan in oracle for instance)
M.
It's true as far as it goes, assuming the names aren't indexed.
Different data would make it wrong though. In order to find out which way to do it, which could differ every time, the DBMS would have to run a distinct count query for each column and compare the numbers, that would cost more than just shrugging and getting on with it.
Original OP statement
My belief is the second one is faster because the more unique criterion (LastName) comes first in >the where clause, and records will get eliminated more efficiently. I don't think the optimizer is >smart enough to optimize the first sql.
I guess you are confusing this with selecting the order of columns while creating the indexes where you have to put the more selective columns first than second most selective and so on.
BTW, for the above two query SQL server optimizer will not do any optimization but will use Trivila plan as long as the total cost of the plan is less than parallelism threshold cost.

Join Performance Not Good enough - Oracle SQL Query Optimisation

I am investigating a problem where our application takes too much time to get data from Oracle Database. In my investigation, I found that the slowness of the query traces to the join between tables and because of the aggregate function- SUM.
This may look simple but I am not a good with SQL query optimization.
The query is below
SELECT T1.TONNES, SUM(R.TONNES) AS TOTAL_TONNES
FROM
RECLAIMED R ,
(SELECT DELIVERY_OUT_ID, SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_IN_ID=53773 GROUP BY DELIVERY_OUT_ID) T1
where
R.DELIVERY_OUT_ID = T1.DELIVERY_OUT_ID
GROUP BY
T1.TONNES
SUM(R.TONNES) is the total tonnes per delivery out.
SUM(TONNES) is the total tonnes per delivery in.
My table looks like
I have 16 million entries in this table, and by trying multiple delivery_in_id's by average I am getting about 6 seconds for the query to comeback.
I have similar database (complete copy but only have 4 million entries) and when the same query is applied I am getting less than 1 seconds.
They have both the same indexes so I am confident that index is not a problem.
I am certain that it is just the data, it is heavy on the first database(16 million). I have a feeling that when this query is optimized then the problem will be solved.
Open for suggestions : )
Are the two DB on the same server? If it's not, first, compare the computer configuration, settings and running applications.
If there is no differences, you can try to check if your have NULL values in the column you want to SUM. Use NVL-function to improve your query if there are some.
Also, you may "Analyse index" (or "Rebuild Index"). It cleans up the index. (it's quite fast and safe for your data).
If it is not helping, look if the TABLESPACE of your table is not full. It might have some impact... but I am not sure.
;-)
I've solved the performance problem by updating the stored procedure. It is optimized in a way by adding a filter in the first table before joining the second table. Below is the outcome stored procedure
SELECT R.DELIVERY_IN_ID, R.DELIVERY_OUT_ID, SUM(R.TONNES),
(SELECT SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_OUT_ID=R.DELIVERY_OUT_ID) AS TOTAL_TONNES
FROM
CTSBT_RECLAIMED R
WHERE DELIVERY_IN_ID=53733
GROUP BY DELIVERY_IN_ID, R.DELIVERY_OUT_ID
The result in timing/performance is huge for my case since I am joining a huge table(16M).This query now goes for less than a second. The slowness, I am suspecting is due to 1 table having no index(see T1) and even though in my case it is only about 20 items, it is enough to slow the query down because it compares it to 16 million entries.
The optimized query does the filtering of this 16 million and merge to T1 after.
Should there be a better way how to optimized this? Probably. But I am happy with this result and solved what I intended to solve. Now moving on.
Thanks for those who commented.

ORA-01795 - why is the maximum number of expressions limited to 1000

SO is full of work-arounds, but I'm wondering about the historical reasons behind the 1000 limit for "maximum number of expressions" in IN clause?
It might be because, there is potential of being abused with tons of values. And every value in it will be transformed into equivalent OR condition.
For example NAME IN ('JOHN', 'CHARLES'..) would be transformed into NAME = 'JOHN' OR NAME = 'CHARLES'
So, it might impact the performance..
But note Oracle still supports
SELECT ID FROM EMP WHERE NAME IN (SELECT NAME FROM ATTENDEES)
In this case, the optimizer doesn't convert into multiple OR conditions, but make a JOIN instead..
This restriction is not only for IN list, but on any expression list. Documentation says :
A comma-delimited list of expressions can contain no more than 1000 expressions.
Your question is WHY the limit is 1000. Why not 100 or 10000 or a million? I guess it relates to the limit of the number of columns in a table, which is 1000. Perhaps, this relation is true in Oracle internally to make the expression list and the columns to match with the DML statement.
But, for a good design, the limit 1000 itself is big. Practically, you won't reach the limit.
And, a quote from the famous AskTom site on similar topic,
We'll spend more time parsing queries then actually executing them!
Update My own thoughts
I think Oracle is quite old in DB technology, that these limits were made then once and they never had to think about it again. All expression list have 1000 limit. And a robust design never let the users to ask Oracle for an explanation. And Tom's answer abour parsing always make me think that all this limit purpose back then in 70s or 80s was more of computation issue. The algorithms based on C might have needed some limit and Oracle came uo with 1000.
Update 2 : From application and it's framework point of view
As a DBA, I have seen so many develpers approaching me with performance issues which are actually issues with application framework generating the queries to fetch the data from database. The application provides the functionality to the users to add filters, which eventually form the AND, OR logic within the IN list of the query. Internally Oracle expands it as query rewrite in the optimization stage as OR logic. And the query becomes huge, thus increasing the time to PARSE it. Most of the times, it suppresses the index usage. So, this is one of the cases where a query is generated with huge IN list, via application framework.

Resources