I have a simple ORACLE Query which I should rewrite it to be run on postgresql with same output as below
Select X,Y FROM table_name order by Y
in case of I have only the below data in the table
Here you are the difference between PG and oracle in ordering the data
Do you have idea why such this difference occurs?
Different Default ordering
There is no such thing as "default ordering" - neither in Oracle nor in Postgres (or in any other relational database). Tables in a relational database represent un-ordered sets.
You are sorting on a column that contains the same value for both (all) rows. This is essentially the same as not sorting at all, because you have not defined any sort criteria to break those ties. Without an additional sort column the database is free to return the rows with the same sort value in any order it likes.
If you want the rows sorted by column x you need to include that column in the order by
select X,Y
FROM table_name
order by x,y;
or maybe you want order by y,x - it's not clear from your question (and the hardly readable screen shots)
Related
With an Oracle 11g Database, in a select query with an order by on one column.
What is the ordering behavior for records having the same value?
I have found no clear information and it seems Oracle did not define any default behavior.
There is no defined behavior for rows with equal values in the column(s) you're ordering by. The database is free to return them in any internal order, much like there is no defined behavior for the order of rows returned by a query without an order by clause. Note that this means that subsequent executions of the same query may return the results in a different order.
My sample oracle query structure is this:
SELECT <LIST_OF_COLUMNS> FROM <TABLE_NAME>
WHERE COLUMN_01 = <SOMETHING> AND COLUMN_02 = <SOMETHING> AND COLUMN_03 = <SOMETHING>
The table has over 1 Million records. I have indexed COLUMN_01, COLUMN_02 and COLUMN_03 separately. The above query is working fine and provide results as expected.
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance? If so, is there an order for composite index columns?
If I use OR instead of AND like this query, will it improve performance?
SELECT <LIST_OF_COLUMNS> FROM <TABLE_NAME>
WHERE COLUMN_01 = <SOMETHING> OR COLUMN_02 = <SOMETHING> OR COLUMN_03 = <SOMETHING>
If you create a composite index based on the three columns AND
you never query the table using only one column in the WHERE
clause, you won't need the single column indexes. Otherwise, it
depends on what columns participate in a query. And this case should
be well analyzed and tested.
Columns order in a composite index does matter. Columns should be ordered by uniqueness where the least distinct column goes first. It helps trim down the number of rows matched the query predicate and thus speed up performance.
It also should be noticed that Oracle can use a composite index with queries that do not contain all the index columns in their predicates.
For example:
create index idx1 on table_name (col1, col2, col3);
/*In this query Oracle can use index idx1 as a standard one-column index,
because col1 is the first column in the index*/
select * from table_name where col1 = 'some_value';
/*Here Oracle can still use the composite index,
but in this case it will use INDEX SKIP SCAN (assuming col1 equals ANY value),
which reduces query performance comparing to an ordinary index*/
select * from table_name where col2 = 'some_value1' and col3 = 'some_value2';
OR or AND operators do not really matter here. What is more important is the number of rows which match the given predicate.
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance?
Probably. One index which satisfies all WHERE criteria serves as a complete access path and hence is more effective than a single column index access path. The optimizer chooses one index, so it will index read all the rows matching (say) COLUMN_02 criterion and filter those rows using the other columns' criteria.
The price you pay for this improvement in performance is the overhead of maintaining an additional index. So you should consider whether you need all three single column indexes (for other queries).
is there an order for composite index columns?
Yes. Put them in ascending order of distinct values. The leading index column should be the least discriminating column. Having a unique key as the leading column is probably a disaster, although there are edge cases, so be sure to benchmark.
If I use OR instead of AND like this query, will it improve performance?
You're going to be returning more rows, which in itself is more work. It is also hard to use indexes in such a situation, so most likely you're facing a Full Table Scan. But why not try it and see what happens?
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance?
For this query: likely. For INSERT/UPDATE/DELETE: the performance will deteriorate.
So you'll need to measure and see whether improvement in some queries justifies the deterioration in others.
If so, is there an order for composite index columns?
Not for this query. If you prefix-compress the index, you may choose the order that compresses the best, otherwise it shouldn't matter much.
However, there may be other queries that use only some of the indexed columns, in which case you'd want to make sure the columns that are actually used are at the leading edge of the index.
If I use OR instead of AND like this query, will it improve performance?
No. Separate indexes (that you already have) are what is needed in this case.
I have a few tables with about 17M rows that all have a date column I would like to be able to utilize frequently for searches. I am considering either just throwing an index on the column and see how things go or sorting the items by date as a one time operation and then inserting everything into a new table so that the primary key ascends as the date ascends.
Since these are both pretty time consuming I thought it might be worth it to ask here first for input.
The end goal is for me to load sql queries into pandas for some analysis if that is relevant here.
The index on a date column makes sense when you are going to search the table for a given date(s), e.g.:
select * from test
where the_date = '2016-01-01';
-- or
select * from test
where the_date between '2016-01-01' and '2016-01-31';
-- etc
In these queries there is no matter whether the sort order of primary key and the date column are the same or not. Hence rewriting the data to the new table will be useless. Just create an index.
However, if you are going to use the index only in ORDER BY:
select * from test
order by the_date;
then a primary key integer index may be significantly (2-4 times) faster then an index on a date column.
Postgres supports to some extend clustered indexes, which is what you suggest by removing and reinserting the data.
In fact, removing and reinserting the data in the order you want will not change the time the query takes. Postgres does not know the order of the data.
If you know that the table's data does not change. Then cluster the data based on the index you create.
This operation reorders the table based on the order in the index. It is very effective until you update the table. The syntax is:
CLUSTER tableName USING IndexName;
See the manual for details.
I also recommend you use
explain <query>;
to compare two queries, before and after an index. Or before and after clustering.
I have a massive table in which I can't do any more partitioning or sub-partitioning, nor am allowed to do any alter. I want to query its records by batches, and thought a good way would be using the last two digits from the account numbers (wouldn't have any other field splitting records as evenly).
I guess I'd need to at least index that somehow (remember I can't alter table to add a virtual column either).
Is there any kind of index to be used in such situation?
I am using Oracle 11gR2
You can use function based index:
create index two_digits_idx on table_name (substr(account_number, -2));
This index will work only in queries like that:
select ...
from table_name t ...
where substr(account_number, -2) = '25' -- or any other two digits
For using index, you need to use in a query the same expression like in an index.
How can I determine if an Oracle index is clustered or unclustered?
I've done
select FIELD from TABLE where rownum <100
where FIELD is the field on which is built the index. I have ordered tuples, but the result is wrong because the index is unclustered.
By default all indexes in Oracle are unclustered. The only clustered indexes in Oracle are the Index-Organized tables (IOT) primary key indexes.
You can determine if a table is an IOT by looking at the IOT_TYPE column in the ALL_TABLES view (its primary key could be determined by querying the ALL_CONSTRAINTS and ALL_CONS_COLUMNS views).
Here are some reasons why your query might return ordered rows:
Your table is index-organized and FIELD is the leading part of its primary key.
Your table is heap-organized but the rows are by chance ordered by FIELD, this happens sometimes on an incrementing identity column.
Case 2 will return sorted rows only by chance. The order of the inserts is not guaranteed, furthermore Oracle is free to reuse old blocks if some happen to have available space in the future, disrupting the fragile ordering.
Case 1 will most of the time return ordered rows, however you shouldn't rely on it since the order of the rows returned depends upon the algorithm of the access path which may change in the future (or if you change DB parameter, especially parallelism).
In both case if you want ordered rows you should supply an ORDER BY clause:
SELECT field
FROM (SELECT field
FROM TABLE
ORDER BY field)
WHERE rownum <= 100;
There is no concept of a "clustered index" in Oracle as in SQL Server and Sybase. There is an Index-Organized Table, which is similar but not the same.
"Clustered" indices, as implemented in Sybase, MS SQL Server and possibly others, where rows are physically stored in the order of the indexed column(s) don't exist as such in Oracle. "Cluster" has a different meaning in Oracle, relating, I believe, to the way blocks and tables are organized.
Oracle does have "Index Organized Tables", which are physically equivalent, but they're used much less frequently because the query optimizer works differently.
The closest I can get to an answer to the identification question is to try something like this:
SELECT IOT_TYPE FROM user_tables
WHERE table_name = '<your table name>'
My 10g instance reports IOT or null accordingly.
Index Organized Tables have to be organized on the primary key. Where the primary key is a sequence generated value this is often useless or even counter-productive (because simultaneous inserts get into conflict for the same block).
Single table clusters can be used to group data with the same column value in the same database block(s). But they are not ordered.