will index be used when UPPER() the variable first? - oracle

I may have encountered a full table scan in Oracle database. I can't excute the explain command in the database, simply put, I don't have the permission.
And I'm trying to figure out the following question.
If I have an index on NAME in table
With this query:
select OID
from table
where NAME=UPPER(v1)
and TYPE=v2
and PID=v3
and OID<>v4
and PID =v5`
(v1 is a variable)
Will the oracle use index on name to select OID?
I have read some material, and it says with a function in where condition the NAME index won't be used. But the upper() is a special function, so I'm not quiet sure about the material I saw before.
And here is the second question after the answer of #mathguy:
If I create an index using create index INDEX_NAME on table(upper(NAME));
will the query:
select OID,PID
from table
where PID=v1
and NAME=UPPER(v2)
use the index INDEX_NAME?
OR the index will be used in the above question, and the query is just not efficient so they take much time to execute?

If you have an index on name, then the optimizer MAY use the index in the example you gave. It may choose not to use it (for example if it estimates that a relatively large fraction of rows will be returned anyway); but if say only 0.1% of rows would be returned, by all means the index will be used. (If that still doesn't happen, make sure statistics are up-to-date.)
What will prevent the use of an index is if you wrapped name within upper(). What happens on the right-hand side - whether you have v1 or upper(v1) or even a much more complicated expression - is irrelevant as long as name doesn't also appear in that complicated expression on the right-hand side.

Perhaps this will help...
In Oracle, you can create an index on a function (a function index), so if you created your index on the function UPPER(NAME) instead of just NAME, Oracle may be more likely to use the index (although it still might choose not to depending on other factors.)
Here's a link that describes function indexes

Related

Indexing for the columns in ORACLE

I have below query, because of the huge data in the MATTER Table, it is taking huge time for LIKE statement to execute, so I was thinking of using the CONTEXT Index and using CONTAIN.
Shall I do indexing only on Matter_title or some other column as well,. Based on the below select query
Inputs highly appreciated
SELECT DISTINCT dm.MATTER_SEQ
FROM MATTER dm
,MATTER_TYPE dmt
,MATTER_SUBTYPE dms
,STATUS ds
,FILING df
WHERE dm.MATTER_TYPE_SEQ=dmt.MATTER_TYPE_SEQ
AND dm.MATTER_SUBTYPE_SEQ=dms.MATTER_SUBTYPE_SEQ
AND dm.STATUS_CODE NOT IN ('abc','jkl','xyz')
AND dm.STATUS_CODE = DS.STATUS_CODE
AND dm.IS_EXTERNAL='1'
AND dm.IS_DELETED='0'
AND dm.MATTER_SEQ = df.MATTER_SEQ
AND trunc(dm.CREATED_DATE) between '01-NOV-95' AND '02-OCT-18'
AND upper(dm.MATTER_TITLE) like(upper (q'{%jdasuidhajsndjahs%}'))
It sounds like you're already aware that LIKE with a leading wildcard ('%ABC') is notoriously inefficient since it typically can't use indexes and does a full table scan.
If the other optimizing suggestions don't help much, you probably would see better performance with a Context index. Be sure to set the SUBSTRING_INDEX preference so it'll specifically prepare the index for infix searches like yours. See this Ask Tom for more details. (If you will also have wildcards in the middle of strings ('ABC%DEF'), you might also want to set the PREFIX options.)
begin
ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
end;
create index matter_title_idx on MATTER(MATTER_TITLE)
indextype is ctxsys.context
parameters ('wordlist SUBSTRING_PREF');
Also note that Context indexes are case-insensitive by default, so you don't need to do UPPER(). I haven't tried using q'' literals with contains, so I'm not sure how this'll work.
AND CONTAINS(dm.MATTER_TITLE, q'{%jdasuidhajsndjahs%}') > 0
Try creating function Indexes upper(dm.MATTER_TITLE) and second trunc(dm.CREATED_DATE).
Also I am considering that the columns in the Join conditions already have indexes. If not have them indexed.

Adding Index To A Column Having Flag Values

I am a novice in tuning oracle queries thus need help.
If I have a sql query like:
select a.ID,a.name.....
from a,b,c
where a.id=b.id
and ....
and b.flag='Y';
then will adding index to the FLAG column of table b help to tune the query by any means? The FLAG column has only 2 values Y and N
With a standard btree index, the SQL engine can find the row or rows in the index for the specified value quickly due to its binary structure, then use the physical address (the rowid) stored in the index to access the desired row in a second hop. It's like looking in the index of a book to find the page number. So that is:
Go to index with the key value you want to look up.
The index tells you the physical address in the table.
Go straight to that physical address.
That is nice and quick for something like a unique customer ID. It's still OK for something nonunique, like a customer ID in a table of orders, although the database has to go through the index entries and for each one go to the indicated address. That can still be faster than slogging through the entire table from top to bottom.
But for a column with only two distinct values, you can see that it is going to be more work going through all of the index entries for 'Y' for example, and for each one going to the indicated location in the table, than it would be to just forget the index and scan the whole table in one shot.
That's unless the values are unevenly distributed. If there are a million Y rows and ten N rows then an index will help you find those N rows fast but be no use for Y.
Adding an index to a column with only 2 values normally isn't very useful, because Oracle might just as well do a full table scan.
From your query it looks like it would be more useful to have an index on id, because that would help with the join a.id=b.id.
If you really want to get into tuning then learn to use "explain plan", as that will give you some indication of how much work Oracle needs to do for a query. Add (or remove) an index, then rerun the explain plan.

PostgreSQL: Create index on length of all table fields

I have a table called profile, and I want to order them by which ones are the most filled out. Each of the columns is either a JSONB column or a TEXT column. I don't need this to a great degree of certainty, so typically I've ordered as follow:
SELECT * FROM profile ORDER BY LENGTH(CONCAT(profile.*)) DESC;
However, this is slow, and so I want to create an index. However, this does not work:
CREATE INDEX index_name ON profile (LENGTH(CONCAT(*))
Nor does
CREATE INDEX index_name ON profile (LENGTH(CONCAT(CAST(* AS TEXT))))
Can't say I'm surprised. What is the right way to declare this index?
To measure the size of the row in text representation you can just cast the whole row to text, which is much faster than concatenating individual columns:
SELECT length(profile::text) FROM profile;
But there are 3 (or 4) issues with this expression in an index:
The syntax shorthand profile::text is not accepted in CREATE INDEX, you need to add extra parentheses or default to the standard syntax cast(profile AS text)
Still the same problem that #jjanes already discussed: only IMMUTABLE functions are allowed in index expressions and casting a row type to text does not pass this requirement. You could build a fake IMMUTABLE wrapper function, like Jeff outlined.
There is an inherent ambiguity (that applies to Jeff's answer as well!): if you have a column name that's the same as the table name (which is a common case) you cannot reference the row type in CREATE INDEX since the identifier always resolves to the column name first.
Minor difference to your original: This adds column separators, row decorators and possibly escape characters to the text representation. Shouldn't matter much to your use case.
However, I would suggest a more radical alternative as crude indicator for the size of a row: pg_column_size(). Even shorter and faster and avoids issues 1, 3 and 4:
SELECT pg_column_size(profile) FROM profile;
Issue 2 remains, though: pg_column_size() is also only STABLE. You can create a simple and cheap SQL wrapper function:
CREATE OR REPLACE FUNCTION pg_column_size(profile)
RETURNS int LANGUAGE sql IMMUTABLE AS
'SELECT pg_catalog.pg_column_size($1)';
and then proceed like #jjanes outlined. More details:
Does PostgreSQL support "accent insensitive" collations?
Note that I created the function with the row type profile as parameter. Postgres allows function overloading, which is why we can use the same function name. Now, when we feed the matching row type to pg_column_size() our custom function matches more closely according to function type resolution rules and is picked instead of the polymorphic system function. Alternatively, use a separate name and possibly make the function polymorphic as well ...
Related:
Is there a way to disable function overloading in Postgres
You can declare a function which is falsely marked "immutable" and build an index on that.
CREATE OR REPLACE FUNCTION len_immut(record)
RETURNS int
LANGUAGE plperl
IMMUTABLE
AS $function$
## This function lies about its immutability.
## Use it with care. It is useful for indexing
## entire table rows.
return length(join ",", values %{$_[0]});
$function$
and then
create index on profile (len_immut(profile));
SELECT * FROM profile ORDER BY len_immut(profile) DESC;
Since the function is falsely labelled as immutable, the index may become out of date if you do things like add or drop columns on the table, or change the types of columns.

Oracle optimizer is not accepting index hint

When I run the merge query then index cannot read and query is running very slow please advise me.
Index in stage_dim_accounts(rbc_code)
Index in map_rbc_etl(free_code_9)
MERGE INTO stage_dim_accounts t
USING map_rbc_etl s ON (t.rbc_code = s.free_code_9)
WHEN MATCHED THEN UPDATE
SET t.indx_no= s.indx_no
WHERE s.annexure= 'AXN-I'
AND (.free_code_9 <> 'NA' AND s.free_code_9 <> '0')
AND t.rbc_code <> 'NA'
Thanks in advance
The optimizer is smart enough to know that your indexes are useless.
An index on free_code might be useful if most of the values in that column were either '0' or 'NA'. As you haven't provided any information regarding data volumes or distribution we can't tell. But you have other restriction criteria on map_rbc_etl, so the database needs to go to the table anyway. My guess is that optimizer has chosen to use a full table scan on map_rbc_etl because that's quicker than a huge number of indexed reads.
This is because an indexed read is two operations - read the index, read the row. So it only pays dividends if the percentage of rows read is tiny. Otherwise it is just more efficient to read all the rows and winnow them in memory.
Here is the great "secret" of tuning: indexed reads are not always faster; full table scans are not always bad.
Similar logic applies to reading the stage_dim_accounts. The indexed column is unlikely to be selective. Unless ... unless the number of rows in map_rbc_etl is very small and only matches a small selection of rows in stage_dim_accounts. My previous comment on data metrics applies again.
indexes to use
on map_rbc_etl( free_code_9, annexure)
and on stage_dim_accounts(rbc_code);
now these may not be used for reasons in previous answer.
Additional reasons an index may not be used are:
1. The optimizer decides it would be more efficient not to use index.
2. if column is on view and has function call on column. To use this use function based indexes.
3. you perform mathematical operation in query. Note you can look at explain plan and create index to match how it is loading the rows.
4. you concat columns together in where clause. Use function based index for overcoming this.
5. You do not include first column in concatenated index in where clause of your statement. Note that Oracle 9i or greater do skip scanning and can use the index.
6. You use or clause. In this case it is best to create one index for all but the or clause and one for each of the or values then it will use all indexes appropriately.
if you don't know how to use function based indexes an example for a to_upper() in where clause you would use the following
create indexName on tableName(to_upper(colname));
any oracle sql function (built in or user created) can be in the index.

Full-text search in Postgres or CouchDB?

I took geonames.org and imported all their data of German cities with all districts.
If I enter "Hamburg", it lists "Hamburg Center, Hamburg Airport" and so on. The application is in a closed network with no access to the internet, so I can't access the geonames.org web services and have to import the data. :(
The city with all of its districts works as an auto complete. So each key hit results in an XHR request and so on.
Now my customer asked whether it is possible to have all data of the world in it. Finally, about 5.000.000 rows with 45.000.000 alternative names etc.
Postgres needs about 3 seconds per query which makes the auto complete unusable.
Now I thought of CouchDb, have already worked with it. My question:
I would like to post "Ham" and I want CouchDB to get all documents starting with "Ham". If I enter "Hamburg" I want it to return Hamburg and so forth.
Is CouchDB the right database for it? Which other DBs can you recommend that respond with low latency (may be in-memory) and millions of datasets? The dataset doesn't change regularly, it's rather static!
If I understand your problem right, probably all you need is already built in the CouchDB.
To get a range of documents with names beginning with e.g. "Ham". You may use a request with a string range: startkey="Ham"&endkey="Ham\ufff0"
If you need a more comprehensive search, you may create a view containing names of other places as keys. So you again can query ranges using the technique above.
Here is a view function to make this:
function(doc) {
for (var name in doc.places) {
emit(name, doc._id);
}
}
Also see the CouchOne blog post about CouchDB typeahead and autocomplete search and this discussion on the mailing list about CouchDB autocomplete.
Optimized search with PostgreSQL
Your search is anchored at the start and no fuzzy search logic is required. This is not the typical use case for full text search.
If it gets more fuzzy or your search is not anchored at the start, look here for more:
Similar UTF-8 strings for autocomplete field
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
In PostgreSQL you can make use of advanced index features that should make the query very fast. In particular look at operator classes and indexes on expressions.
1) text_pattern_ops
Assuming your column is of type text, you would use a special index for text pattern operators like this:
CREATE INDEX name_text_pattern_ops_idx
ON tbl (name text_pattern_ops);
SELECT name
FROM tbl
WHERE name ~~ ('Hambu' || '%');
This is assuming that you operate with a database locale other than C - most likely de_DE.UTF-8 in your case. You could also set up a database with locale 'C'. I quote the manual here:
If you do use the C locale, you do not need the xxx_pattern_ops
operator classes, because an index with the default operator class is
usable for pattern-matching queries in the C locale.
2) Index on expression
I'd imagine you would also want to make that search case insensitive. so let's take another step and make that an index on an expression:
CREATE INDEX lower_name_text_pattern_ops_idx
ON tbl (lower(name) text_pattern_ops);
SELECT name
FROM tbl
WHERE lower(name) ~~ (lower('Hambu') || '%');
To make use of the index, the WHERE clause has to match the the index expression.
3) Optimize index size and speed
Finally, you might also want to impose a limit on the number of leading characters to minimize the size of your index and speed things up even further:
CREATE INDEX lower_left_name_text_pattern_ops_idx
ON tbl (lower(left(name,10)) text_pattern_ops);
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%');
left() was introduced with Postgres 9.1. Use substring(name, 1,10) in older versions.
4) Cover all possible requests
What about strings with more than 10 characters?
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~ (lower(left('Hambu678910',10)) || '%');
AND lower(name) ~~ (lower('Hambu678910') || '%');
This looks redundant, but you need to spell it out this way to actually use the index. Index search will narrow it down to a few entries, the additional clause filters the rest. Experiment to find the sweet spot. Depends on data distribution and typical use cases. 10 characters seem like a good starting point. For more than 10 characters, left() effectively turns into a very fast and simple hashing algorithm that's good enough for many (but not all) use cases.
5) Optimize disc representation with CLUSTER
So, the predominant access pattern will be to retrieve a bunch of adjacent rows according to our index lower_left_name_text_pattern_ops_idx. And you mostly read and hardly ever write. This is a textbook case for CLUSTER. The manual:
When a table is clustered, it is physically reordered based on the index information.
With a huge table like yours, this can dramatically improve response time because all rows to be fetched are in the same or adjacent blocks on disk.
First call:
CLUSTER tbl USING lower_left_name_text_pattern_ops_idx;
Information which index to use will be saved and successive calls will re-cluster the table:
CLUSTER tbl;
CLUSTER; -- cluster all tables in the db that have previously been clustered.
If you don't want to repeat it:
ALTER TABLE tbl SET WITHOUT CLUSTER;
However, CLUSTER takes an exclusive lock on the table. If that's a problem, look into pg_repack or pg_squeeze, which can do the same without exclusive lock on the table.
6) Prevent too many rows in the result
Demand a minimum of, say, 3 or 4 characters for the search string. I add this for completeness, you probably do it anyway.
And LIMIT the number of rows returned:
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%')
LIMIT 501;
If your query returns more than 500 rows, tell the user to narrow down his search.
7) Optimize filter method (operators)
If you absolutely must squeeze out every last microsecond, you can utilize operators of the text_pattern_ops family. Like this:
SELECT name
FROM tbl
WHERE lower(left(name, 10)) ~>=~ lower('Hambu')
AND lower(left(name, 10)) ~<=~ (lower('Hambu') || chr(2097151));
You gain very little with this last stunt. Normally, standard operators are the better choice.
If you do all that, search time will be reduced to a matter of milliseconds.
I think a better approach is keep your data on your database (Postgres or CouchDB) and index it with a full-text search engine, like Lucene, Solr or ElasticSearch.
Having said that, there's a project integrating CouchDB with Lucene.

Resources