SCROLL CURSOR to first search-result using an INDEX - performance

Problem:
I want to provide my users with an Excel-Grid-like Client-Side Application. That client accesses the PostgreSQL-Server over a Network-Connection.
The client offers a "Find"-Functionality. Instead of filtering and showing only the matching results, the "Find"-function just jumps to the first matching Row in the Grid. (Like the "Find"-Function in Excel)
To reduce the bandwidth-usage and prevent lame LIMIT/OFFSET-Selects, I am using PostgreSQL with server-side cursors to allow scrolling over the sorted Table:
BEGIN WORK;
DECLARE mCursor SCROLL CURSOR FOR
SELECT *
FROM table
ORDER BY xyz
Scrolling and retrieval of the result-data is handled by calling Move/Fetch each time the client scrolls within the grid:
MOVE FORWARD/BACKWARD <offset> IN mCursor; FETCH 40 FROM mCursor;
Now i want to add the "Find"-Functionality witch uses an Index to find the first matching Result-Offset. The only way i know to integrate this functionality is to open a new connection and run the following query and then move the cursor to the returned rowNo:
SELECT t.rowNo
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnName ASC) AS rowNo
FROM table
) t
WHERE t.ColumnName LIKE 'xyz%'
LIMIT 1
Problem: This query is extremely slow because it can't use an index (2-3 Seconds for ~300k Rows).
Is there an other way to integrate this task more efficient?
Maybe by reading the offset directly from the index-data? Or by starting a query within the Cursor? Or is there a database-system allowing this functionality?

It would not be possible to use an index only if the pattern started with a %.
I guess the problem is not that it can't use an index but that it has to scan the whole index to enumerate all table rows. Show the explain.
This will limit the index scan up to the searched pattern
SELECT min(t.rowNo)
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName ASC) AS rowNo,
ColumnName
FROM table
where ColumnName <= 'xyz' || repeat('z', 100) -- Get all possible like 'xyz%'
) t
WHERE t.ColumnName LIKE 'xyz%'

Related

Oracle11g select query with pagination

I am facing a big performance problem when trying to get a list of objects with pagination from an oracle11g database.
As far as I know and as much as I have checked online, the only way to achieve pagination in oracle11g is the following :
Example : [page=1, size=100]
SELECT * FROM
(
SELECT pagination.*, rownum r__ FROM
(
select * from "TABLE_NAME" t
inner join X on X.id = t.id
inner join .....
where ......
order
) pagination
WHERE rownum <= 200
)
WHERE r__ > 100
The problem in this query, is that the most inner query fetching data from the table "TABLE_NAME" is returning a huge amount of data and causing the overall query to take 8 seconds (there are around 2 Million records returned after applying the where clause, and it contains 9 or 10 join clause).
The reason of this is that the most inner query is fetching all the data that respects the where clause and then the second query is getting the 200 rows, and the third to exclude the first 100 to get the second pages' data we need.
Isn't there a way to do that in one query, in a way to fetch the second pages' data that we need without having to do all these steps and cause performance issues?
Thank you!!
It depends on your sorting options (order by ...): database needs to sort whole dataset before applying outer where rownum<200 because of your order by clause.
It will fetch only 200 rows if you remove your order by clause. In some cases oracle can avoid sort operations (for example, if oracle can use some indexes to get requested data in the required order). Btw, Oracle uses optimized sorting operations in case of rownum<N predicates: it doesn't sort full dataset, it just gets top N records instead.
You can investigate sort operations deeper using sort trace event: alter session set events '10032 trace name context forever, level 10';
Furthermore, sometimes it's better to use analytic functions like
select *
from (
select
t1.*
,t2.*
,row_number()over([partition by ...] order by ...) rn
from t1
,t2
where ...
)
where rn <=200
and rn>=100
because in some specific cases Oracle can transform your query to push sorting and sort filter predicates to the earliest possible steps.

Return max by two columns within dataset?

The problem I am facing is I am trying to query SAP HANA to bring back a list of unique codes that refer to one instance of a change being made to a database. For a bit of background to the below image, each change has a relevant Site ID and Product No. that I am using together as variables, in order to find out the TS Number for the most recent date.
However, when I use the SELECT MAX(DATAB) function, it forces me to use aGROUP BY clause. But, because I cannot omit the TS Number from the GROUP BY clause, it returns all three.
Is there a way to get the max date, for any given combination of Product No. and Site ID, and only return the TS Number for that date? In this example, it would be fine to use TOP 1 but this is just a scaled-down example from a query that will look at many combinations of Product No. and Site ID (with the desired outcome being a list of all of the TS Numbers that relate to the most recent change for that product/store combination, that I will use for a join to another query).
Any help would be appreciated. If full table design etc. is required so that people can attempt to replicate the problem I will happily provide this but am hoping there's a simple solution I have not thought of...
Many thanks
As in any other SQL-DB that supports window functions, you can use row_number() or rank() function to get the desired result. Which one to use depends on how you want to handle tie values.
If you just want exactly one TS-Number in case there are more than one TS-Number for the same MAXDATE, use the following SQL:
select dat, ts_nr, pr_nr, site
from
(select *, row_number() over ( partition by pr_nr, site order by dat desc ) rownum
from mytab
)
where rownum = 1;
Be aware, that the result is non-deterministic. However, you can (should in most cases!) make it deterministic by adding ts_nr to the order by in the window order by clause. Then you get either the highest or lowest TS-Number for the same MAXDATE, depending on the sort order.
If you want all TS-Numbers in case there are several TS-Numbers for the same MAXDATE, use rank() instead of row_number(), like this:
select dat, ts_nr, pr_nr, site
from
(select *, rank() over ( partition by pr_nr, site order by dat desc ) ranknum
from mytab
)
where ranknum = 1;

A fast query that selects the number of rows in each table

I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.

Oracle, slow performance when using sub select

I have a view that is very slow if you fetch all rows. But if I select a subset (providing an ID in the where clause) the performance is very good. I cannot hardcode the ID so I create a sub select to get the ID from another table. The sub select only returns one ID. Now the performance is very slow and it seems like Oracle is evaluating the whole view before using the where clause. Can I somehow help Oracle so SQL 2 and 3 have the same performance? I’m using Oracle 10g
1 slow
select * from ci.my_slow_view
2 fast
select * from ci.my_slow_view where id = 1;
3 slow
select * from ci.my_slow_view where id in (select id from active_ids)
How about
select * from ci.my_slow_view where id = (select id from active_ids)
Replacing the "in" with an "=" will tell Oracle that you expect the "select id from active_ids" to return only a single row.
This is the expected behavior...
3 is slow because Oracle will perform a "full table scan", which means that your indexes are not helping there (your where clause does not contain any constant or range and is unbounded, which implies that whatever index you use, all the rows are potentially candidates for the join condition.
Possible improvment:
First, check that the indexes are ok on your join/pk columns (id in my_slow_view and active_ids). This is necessary for the second step:
Second, generate table statistics for your table and views, that will make the Oracle cache memory optimizer kicks in.
(It should work because it is assumed that your active_ids table is small enough to be fully in memory.)
Second approach:
Write a stored procedure in PL/SQL where your id is an in parameter and rewrite your SQL so that it is used a bound parameter.
That should give you the flexibility you need (no hard coded ids), and the speed of the fastest query.
I cannot hardcode the ID so I create a
sub select to get the ID from another
table. The sub select only returns one
ID.
Most likely, gathering statistics on the small table (while it contains a single row) will help, since that should help Oracle realize that it is small and encourage it to use the index on ID.
However, it sounds like this is really not the right solution to your original problem. Generally, when one wants to perform a query repeatedly with a different lookup value, the best way is to use a bind variable. The basic method of doing this in SQLPlus would be:
SQL> variable id number
SQL> exec :id := 1;
SQL> select * from ci.my_slow_view where id = :id ;
SQL> exec :id := 2;
SQL> select * from ci.my_slow_view where id = :id ;
The details of implementing this depend on the environment you are developing in.
Or:
select * from ci.my_slow_view, active_ids
where my_slow_view.id = active_ids.id;

Oracle command hangs when using view for "WHERE x IN..." subquery

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle.
Our Oracle team gave us the following script to run, and it works well:
SELECT ROWID, [columns]
FROM [table]
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROWID
FROM [table]
WHERE ROWID > '[previous_batch_last_rowid]'
ORDER BY ROWID
)
WHERE ROWNUM <= 10000
)
ORDER BY ROWID
10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service.
Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.
But this script does not work, even though it follows the same form as the previous one:
SELECT VIEW_UNIQUE_ID, [columns]
FROM [view]
WHERE VIEW_UNIQUE_ID IN (
SELECT VIEW_UNIQUE_ID
FROM (
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
)
WHERE ROWNUM <= 10000
)
ORDER BY VIEW_UNIQUE_ID
It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates.
I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs.
Why doesn't this query work for views? In what important way are they not interchangeable here?
Updated: the architecture of the solution stipulates that it is to be stateless, so I shouldn't try to make the web service preserve any index state information between requests from consumers.
they added a column to the views
(VIEW_UNIQUE_ID) that concatenates the
ROWIDs from the component tables to
construct a UID for each union.
God, that is the most obscene idea I've seen in a long time.
Let's say the view is a simple one like
SELECT C.CUST_ID, C.CUST_NAME, O.ORDER_ID, C.ROWID||':'||O.ROWID VIEW_UNIQUE_ID
FROM CUSTOMER C JOIN ORDER O ON C.CUST_ID = O.CUST_ID
Every time you want to do the
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
It has to build that entire result set, apply the filter, and order it. For anything other than trivially sized tables, that will be a nightmare.
Stop using the database to paginate/chunk the data here and do that in the client. Open the database connection, execute the query, fetch the first ten thousand rows from the query, index them, fetch the next ten thousand. Don't close and reopen the query each time, only after you've processed each row. You'll be able to forget about ordering.
For stateless, you need to re-architect. The whole thing with concatenated ROWIDs will not fly.
Start by putting the records to be processed into a fresh table, then you can flag them/process them/delete them in chunks.
INSERT INTO pending_table
SELECT 'N' state_flag, v.* FROM view v;
<start looping here>
UPDATE pending_table
SET state_flag = 'P'
WHERE ROWNUM < 10000;
COMMIT;
SELECT * FROM pending_table
WHERE state_flag = 'P';
<client processing>
DELETE FROM pending_table
WHERE state_flag = 'P';
<go back to start of loop, and keep going until pending_table is empty>

Resources