Indexing for the columns in ORACLE - oracle

I have below query, because of the huge data in the MATTER Table, it is taking huge time for LIKE statement to execute, so I was thinking of using the CONTEXT Index and using CONTAIN.
Shall I do indexing only on Matter_title or some other column as well,. Based on the below select query
Inputs highly appreciated
SELECT DISTINCT dm.MATTER_SEQ
FROM MATTER dm
,MATTER_TYPE dmt
,MATTER_SUBTYPE dms
,STATUS ds
,FILING df
WHERE dm.MATTER_TYPE_SEQ=dmt.MATTER_TYPE_SEQ
AND dm.MATTER_SUBTYPE_SEQ=dms.MATTER_SUBTYPE_SEQ
AND dm.STATUS_CODE NOT IN ('abc','jkl','xyz')
AND dm.STATUS_CODE = DS.STATUS_CODE
AND dm.IS_EXTERNAL='1'
AND dm.IS_DELETED='0'
AND dm.MATTER_SEQ = df.MATTER_SEQ
AND trunc(dm.CREATED_DATE) between '01-NOV-95' AND '02-OCT-18'
AND upper(dm.MATTER_TITLE) like(upper (q'{%jdasuidhajsndjahs%}'))

It sounds like you're already aware that LIKE with a leading wildcard ('%ABC') is notoriously inefficient since it typically can't use indexes and does a full table scan.
If the other optimizing suggestions don't help much, you probably would see better performance with a Context index. Be sure to set the SUBSTRING_INDEX preference so it'll specifically prepare the index for infix searches like yours. See this Ask Tom for more details. (If you will also have wildcards in the middle of strings ('ABC%DEF'), you might also want to set the PREFIX options.)
begin
ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
end;
create index matter_title_idx on MATTER(MATTER_TITLE)
indextype is ctxsys.context
parameters ('wordlist SUBSTRING_PREF');
Also note that Context indexes are case-insensitive by default, so you don't need to do UPPER(). I haven't tried using q'' literals with contains, so I'm not sure how this'll work.
AND CONTAINS(dm.MATTER_TITLE, q'{%jdasuidhajsndjahs%}') > 0

Try creating function Indexes upper(dm.MATTER_TITLE) and second trunc(dm.CREATED_DATE).
Also I am considering that the columns in the Join conditions already have indexes. If not have them indexed.

Related

oracle - can I use contain and near with a clob? Need to speed up query

We have a query that takes 48 minutes to run a search on a clob. The query is written as if it is not a clob column and uses contains and near. This search for 3 words within a certain word distance from each other is important. I'm needing to speed this up and want to do an index on the clob, but don't know if that would work and don't fully understand how to do it. I found this from Tom Burleson
http://www.dba-oracle.com/t_clob_search_query.htm OR https://asktom.oracle.com/pls/apex/asktom.search?tag=oracle-text-contains-search-with-near-is-very-slow
, but can't figure out how to do it with contains and near to enable the search of 3 words withing a certain distance from each other.
current script:
SELECT clob_field
FROM clob_table
WHERE contains(clob_field,'NEAR (((QUICK),(FOX),(LAZY)),5)') > 0;
Want to use something like this if it will act like indexing:
SELECT clob_field
FROM clob_table
WHERE contains(dbms_lob.substr(clob_field,'near(((QUICK),(FOX),(LAZY)),5)')) > 0;
If not, I need to do indexing, but I don't quite understand how to use CTXCAT and CONTEXT (https://docs.oracle.com/cd/A91202_01/901_doc/text.901/a90122/ind4.htm). I also don't like what I read here that says that if one uses CTXCAT for indexing a clob you have to use CONTEXT, or something like that. It can't affect the other queries that are done on this field.
Thanks in advance!
Contains won't work unless it is globally indexed, so I had to index the field and then could get the original query working.

will index be used when UPPER() the variable first?

I may have encountered a full table scan in Oracle database. I can't excute the explain command in the database, simply put, I don't have the permission.
And I'm trying to figure out the following question.
If I have an index on NAME in table
With this query:
select OID
from table
where NAME=UPPER(v1)
and TYPE=v2
and PID=v3
and OID<>v4
and PID =v5`
(v1 is a variable)
Will the oracle use index on name to select OID?
I have read some material, and it says with a function in where condition the NAME index won't be used. But the upper() is a special function, so I'm not quiet sure about the material I saw before.
And here is the second question after the answer of #mathguy:
If I create an index using create index INDEX_NAME on table(upper(NAME));
will the query:
select OID,PID
from table
where PID=v1
and NAME=UPPER(v2)
use the index INDEX_NAME?
OR the index will be used in the above question, and the query is just not efficient so they take much time to execute?
If you have an index on name, then the optimizer MAY use the index in the example you gave. It may choose not to use it (for example if it estimates that a relatively large fraction of rows will be returned anyway); but if say only 0.1% of rows would be returned, by all means the index will be used. (If that still doesn't happen, make sure statistics are up-to-date.)
What will prevent the use of an index is if you wrapped name within upper(). What happens on the right-hand side - whether you have v1 or upper(v1) or even a much more complicated expression - is irrelevant as long as name doesn't also appear in that complicated expression on the right-hand side.
Perhaps this will help...
In Oracle, you can create an index on a function (a function index), so if you created your index on the function UPPER(NAME) instead of just NAME, Oracle may be more likely to use the index (although it still might choose not to depending on other factors.)
Here's a link that describes function indexes

Oracle optimizer is not accepting index hint

When I run the merge query then index cannot read and query is running very slow please advise me.
Index in stage_dim_accounts(rbc_code)
Index in map_rbc_etl(free_code_9)
MERGE INTO stage_dim_accounts t
USING map_rbc_etl s ON (t.rbc_code = s.free_code_9)
WHEN MATCHED THEN UPDATE
SET t.indx_no= s.indx_no
WHERE s.annexure= 'AXN-I'
AND (.free_code_9 <> 'NA' AND s.free_code_9 <> '0')
AND t.rbc_code <> 'NA'
Thanks in advance
The optimizer is smart enough to know that your indexes are useless.
An index on free_code might be useful if most of the values in that column were either '0' or 'NA'. As you haven't provided any information regarding data volumes or distribution we can't tell. But you have other restriction criteria on map_rbc_etl, so the database needs to go to the table anyway. My guess is that optimizer has chosen to use a full table scan on map_rbc_etl because that's quicker than a huge number of indexed reads.
This is because an indexed read is two operations - read the index, read the row. So it only pays dividends if the percentage of rows read is tiny. Otherwise it is just more efficient to read all the rows and winnow them in memory.
Here is the great "secret" of tuning: indexed reads are not always faster; full table scans are not always bad.
Similar logic applies to reading the stage_dim_accounts. The indexed column is unlikely to be selective. Unless ... unless the number of rows in map_rbc_etl is very small and only matches a small selection of rows in stage_dim_accounts. My previous comment on data metrics applies again.
indexes to use
on map_rbc_etl( free_code_9, annexure)
and on stage_dim_accounts(rbc_code);
now these may not be used for reasons in previous answer.
Additional reasons an index may not be used are:
1. The optimizer decides it would be more efficient not to use index.
2. if column is on view and has function call on column. To use this use function based indexes.
3. you perform mathematical operation in query. Note you can look at explain plan and create index to match how it is loading the rows.
4. you concat columns together in where clause. Use function based index for overcoming this.
5. You do not include first column in concatenated index in where clause of your statement. Note that Oracle 9i or greater do skip scanning and can use the index.
6. You use or clause. In this case it is best to create one index for all but the or clause and one for each of the or values then it will use all indexes appropriately.
if you don't know how to use function based indexes an example for a to_upper() in where clause you would use the following
create indexName on tableName(to_upper(colname));
any oracle sql function (built in or user created) can be in the index.

Full-text search in Postgres or CouchDB?

I took geonames.org and imported all their data of German cities with all districts.
If I enter "Hamburg", it lists "Hamburg Center, Hamburg Airport" and so on. The application is in a closed network with no access to the internet, so I can't access the geonames.org web services and have to import the data. :(
The city with all of its districts works as an auto complete. So each key hit results in an XHR request and so on.
Now my customer asked whether it is possible to have all data of the world in it. Finally, about 5.000.000 rows with 45.000.000 alternative names etc.
Postgres needs about 3 seconds per query which makes the auto complete unusable.
Now I thought of CouchDb, have already worked with it. My question:
I would like to post "Ham" and I want CouchDB to get all documents starting with "Ham". If I enter "Hamburg" I want it to return Hamburg and so forth.
Is CouchDB the right database for it? Which other DBs can you recommend that respond with low latency (may be in-memory) and millions of datasets? The dataset doesn't change regularly, it's rather static!
If I understand your problem right, probably all you need is already built in the CouchDB.
To get a range of documents with names beginning with e.g. "Ham". You may use a request with a string range: startkey="Ham"&endkey="Ham\ufff0"
If you need a more comprehensive search, you may create a view containing names of other places as keys. So you again can query ranges using the technique above.
Here is a view function to make this:
function(doc) {
for (var name in doc.places) {
emit(name, doc._id);
}
}
Also see the CouchOne blog post about CouchDB typeahead and autocomplete search and this discussion on the mailing list about CouchDB autocomplete.
Optimized search with PostgreSQL
Your search is anchored at the start and no fuzzy search logic is required. This is not the typical use case for full text search.
If it gets more fuzzy or your search is not anchored at the start, look here for more:
Similar UTF-8 strings for autocomplete field
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
In PostgreSQL you can make use of advanced index features that should make the query very fast. In particular look at operator classes and indexes on expressions.
1) text_pattern_ops
Assuming your column is of type text, you would use a special index for text pattern operators like this:
CREATE INDEX name_text_pattern_ops_idx
ON tbl (name text_pattern_ops);
SELECT name
FROM tbl
WHERE name ~~ ('Hambu' || '%');
This is assuming that you operate with a database locale other than C - most likely de_DE.UTF-8 in your case. You could also set up a database with locale 'C'. I quote the manual here:
If you do use the C locale, you do not need the xxx_pattern_ops
operator classes, because an index with the default operator class is
usable for pattern-matching queries in the C locale.
2) Index on expression
I'd imagine you would also want to make that search case insensitive. so let's take another step and make that an index on an expression:
CREATE INDEX lower_name_text_pattern_ops_idx
ON tbl (lower(name) text_pattern_ops);
SELECT name
FROM tbl
WHERE lower(name) ~~ (lower('Hambu') || '%');
To make use of the index, the WHERE clause has to match the the index expression.
3) Optimize index size and speed
Finally, you might also want to impose a limit on the number of leading characters to minimize the size of your index and speed things up even further:
CREATE INDEX lower_left_name_text_pattern_ops_idx
ON tbl (lower(left(name,10)) text_pattern_ops);
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%');
left() was introduced with Postgres 9.1. Use substring(name, 1,10) in older versions.
4) Cover all possible requests
What about strings with more than 10 characters?
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~ (lower(left('Hambu678910',10)) || '%');
AND lower(name) ~~ (lower('Hambu678910') || '%');
This looks redundant, but you need to spell it out this way to actually use the index. Index search will narrow it down to a few entries, the additional clause filters the rest. Experiment to find the sweet spot. Depends on data distribution and typical use cases. 10 characters seem like a good starting point. For more than 10 characters, left() effectively turns into a very fast and simple hashing algorithm that's good enough for many (but not all) use cases.
5) Optimize disc representation with CLUSTER
So, the predominant access pattern will be to retrieve a bunch of adjacent rows according to our index lower_left_name_text_pattern_ops_idx. And you mostly read and hardly ever write. This is a textbook case for CLUSTER. The manual:
When a table is clustered, it is physically reordered based on the index information.
With a huge table like yours, this can dramatically improve response time because all rows to be fetched are in the same or adjacent blocks on disk.
First call:
CLUSTER tbl USING lower_left_name_text_pattern_ops_idx;
Information which index to use will be saved and successive calls will re-cluster the table:
CLUSTER tbl;
CLUSTER; -- cluster all tables in the db that have previously been clustered.
If you don't want to repeat it:
ALTER TABLE tbl SET WITHOUT CLUSTER;
However, CLUSTER takes an exclusive lock on the table. If that's a problem, look into pg_repack or pg_squeeze, which can do the same without exclusive lock on the table.
6) Prevent too many rows in the result
Demand a minimum of, say, 3 or 4 characters for the search string. I add this for completeness, you probably do it anyway.
And LIMIT the number of rows returned:
SELECT name
FROM tbl
WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%')
LIMIT 501;
If your query returns more than 500 rows, tell the user to narrow down his search.
7) Optimize filter method (operators)
If you absolutely must squeeze out every last microsecond, you can utilize operators of the text_pattern_ops family. Like this:
SELECT name
FROM tbl
WHERE lower(left(name, 10)) ~>=~ lower('Hambu')
AND lower(left(name, 10)) ~<=~ (lower('Hambu') || chr(2097151));
You gain very little with this last stunt. Normally, standard operators are the better choice.
If you do all that, search time will be reduced to a matter of milliseconds.
I think a better approach is keep your data on your database (Postgres or CouchDB) and index it with a full-text search engine, like Lucene, Solr or ElasticSearch.
Having said that, there's a project integrating CouchDB with Lucene.

Improve SQL Server 2005 Query Performance

I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>

Resources