Adding Index To A Column Having Flag Values - oracle

I am a novice in tuning oracle queries thus need help.
If I have a sql query like:
select a.ID,a.name.....
from a,b,c
where a.id=b.id
and ....
and b.flag='Y';
then will adding index to the FLAG column of table b help to tune the query by any means? The FLAG column has only 2 values Y and N

With a standard btree index, the SQL engine can find the row or rows in the index for the specified value quickly due to its binary structure, then use the physical address (the rowid) stored in the index to access the desired row in a second hop. It's like looking in the index of a book to find the page number. So that is:
Go to index with the key value you want to look up.
The index tells you the physical address in the table.
Go straight to that physical address.
That is nice and quick for something like a unique customer ID. It's still OK for something nonunique, like a customer ID in a table of orders, although the database has to go through the index entries and for each one go to the indicated address. That can still be faster than slogging through the entire table from top to bottom.
But for a column with only two distinct values, you can see that it is going to be more work going through all of the index entries for 'Y' for example, and for each one going to the indicated location in the table, than it would be to just forget the index and scan the whole table in one shot.
That's unless the values are unevenly distributed. If there are a million Y rows and ten N rows then an index will help you find those N rows fast but be no use for Y.

Adding an index to a column with only 2 values normally isn't very useful, because Oracle might just as well do a full table scan.
From your query it looks like it would be more useful to have an index on id, because that would help with the join a.id=b.id.
If you really want to get into tuning then learn to use "explain plan", as that will give you some indication of how much work Oracle needs to do for a query. Add (or remove) an index, then rerun the explain plan.

Related

Oracle compressed/b-tree index how and when to use

I would like to add a compressed index to the Oracle Applications workflow table hr.pqh_ss_transaction_history in order to access specific types of workflows (process_name) and workflows for specific people (selected_person_id).
There are lots of repeating values in process_name although the data is skewed. I would however want to access the TFG_HR_NEW_HIRE_PLACE_JSP_PRC and TFG_HR_TERMINATION_JSP_PRC process types.
"PROCESS_NAME","CNT"
"HR_GENERIC_APPROVAL_PRC",40347
"HR_PERSONAL_INFO_JSP_PRC",39284
"TFG_HR_NEW_HIRE_PLACE_JSP_PRC",18117
"TFG_HREMPSTS_TERMS_CHG_JSP_PRC",14076
"TFG_HR_TERMINATION_JSP_PRC",8764
"HR_ADV_INDIVIDUAL_COMP_PRC",4907
"TFG_HR_SIT_NOAPP",3979
"TFG_YE_TAX_PROV",2663
"HR_TERMINATION_JSP_PRC",1310
"HR_CHANGE_PAY_JSP_PRC",953
"TFG_HR_SIT_EXIT_JSP_PRC",797
"HR_SIT_JSP_PRC",630
"HR_QUALIFICATION_JSP_PRC",282
"HR_CAED_JSP_PRC",250
"TFG_HR_EMP_TERM_JSP_PRC",211
"PER_DOR_JSP_PRC",174
"HR_AWARD_JSP_PRC",101
"TFG_HR_SIT_REP_MOT",32
"TFG_HR_SIT_NEWPOS_NIB_JSP_PRC",30
"TFG_HR_SIT_NEWPOS_INBU_JSP_PRC",28
"HR_NEW_HIRE_PLACE_JSP_PRC",22
"HR_NEWHIRE_JSP_PRC",6
selected_person_id would obviously be more selective. Unfortunately there are 3774 nulls for this column and the highest count after that is 73 for one person. A lot of people would only have 1 row. The total row count is 136963.
My query would be in this format:
select psth.item_key,
psth.creation_date,
psth.last_update_date
from hr.pqh_ss_transaction_history psth
where nvl(psth.selected_person_id, :p_person_id) = :p_person_id
and psth.process_name = 'HR_TERMINATION_JSP_PRC'
order by psth.last_update_date
I am on Oracle 12c release 1.
I assume it would be a good idea to put a non-compressed b-tree index on selected_person_id since the values returned would fall in the less than 5% of the total rows scenario, but how do you handle the nulls in the column which would not go into the index when you select using nvl(psth.selected_person_id, :p_person_id) = :p_person_id? Is there a more efficient way to write the sql and how should you create this index?
For process_name I would like to use a compressed b-tree index. I am assuming that the statement is
CREATE INDEX idxname ON pqh_ss_transaction_history(process_name) COMPRESS
where there would be an implicit second column for rowid. Is it safe for it to use rowid here, since normally it is not advised to use rowid? Is the skewed data an issue (most of the time I would be selecting on the high volume side)? I don't understand how compressed indexes would be efficient. For b-tree indexes you would normally want to return 5% of the data, otherwise a full table scan is actually more efficient. How does the compressed index return so many rowids and then do lookup into the full table using those rowids, faster than a full table scan?
Or since the optimizer will only be able to use one of the two indexes should I rather create an uncompressed function based index with selected_person_id and process_name concatenated?
Perhaps you could create this index:
CREATE INDEX idxname ON pqh_ss_transaction_history
(process_name, NVL(selected_person_id,-1)) COMPRESS 1
Then change your query to:
select psth.item_key,
psth.creation_date,
psth.last_update_date
from hr.pqh_ss_transaction_history psth
where nvl(psth.selected_person_id, -1) in (:p_person_id,-1)
and psth.process_name = 'HR_TERMINATION_JSP_PRC'
order by psth.last_update_date

will index be used when UPPER() the variable first?

I may have encountered a full table scan in Oracle database. I can't excute the explain command in the database, simply put, I don't have the permission.
And I'm trying to figure out the following question.
If I have an index on NAME in table
With this query:
select OID
from table
where NAME=UPPER(v1)
and TYPE=v2
and PID=v3
and OID<>v4
and PID =v5`
(v1 is a variable)
Will the oracle use index on name to select OID?
I have read some material, and it says with a function in where condition the NAME index won't be used. But the upper() is a special function, so I'm not quiet sure about the material I saw before.
And here is the second question after the answer of #mathguy:
If I create an index using create index INDEX_NAME on table(upper(NAME));
will the query:
select OID,PID
from table
where PID=v1
and NAME=UPPER(v2)
use the index INDEX_NAME?
OR the index will be used in the above question, and the query is just not efficient so they take much time to execute?
If you have an index on name, then the optimizer MAY use the index in the example you gave. It may choose not to use it (for example if it estimates that a relatively large fraction of rows will be returned anyway); but if say only 0.1% of rows would be returned, by all means the index will be used. (If that still doesn't happen, make sure statistics are up-to-date.)
What will prevent the use of an index is if you wrapped name within upper(). What happens on the right-hand side - whether you have v1 or upper(v1) or even a much more complicated expression - is irrelevant as long as name doesn't also appear in that complicated expression on the right-hand side.
Perhaps this will help...
In Oracle, you can create an index on a function (a function index), so if you created your index on the function UPPER(NAME) instead of just NAME, Oracle may be more likely to use the index (although it still might choose not to depending on other factors.)
Here's a link that describes function indexes

Oracle optimizer is not accepting index hint

When I run the merge query then index cannot read and query is running very slow please advise me.
Index in stage_dim_accounts(rbc_code)
Index in map_rbc_etl(free_code_9)
MERGE INTO stage_dim_accounts t
USING map_rbc_etl s ON (t.rbc_code = s.free_code_9)
WHEN MATCHED THEN UPDATE
SET t.indx_no= s.indx_no
WHERE s.annexure= 'AXN-I'
AND (.free_code_9 <> 'NA' AND s.free_code_9 <> '0')
AND t.rbc_code <> 'NA'
Thanks in advance
The optimizer is smart enough to know that your indexes are useless.
An index on free_code might be useful if most of the values in that column were either '0' or 'NA'. As you haven't provided any information regarding data volumes or distribution we can't tell. But you have other restriction criteria on map_rbc_etl, so the database needs to go to the table anyway. My guess is that optimizer has chosen to use a full table scan on map_rbc_etl because that's quicker than a huge number of indexed reads.
This is because an indexed read is two operations - read the index, read the row. So it only pays dividends if the percentage of rows read is tiny. Otherwise it is just more efficient to read all the rows and winnow them in memory.
Here is the great "secret" of tuning: indexed reads are not always faster; full table scans are not always bad.
Similar logic applies to reading the stage_dim_accounts. The indexed column is unlikely to be selective. Unless ... unless the number of rows in map_rbc_etl is very small and only matches a small selection of rows in stage_dim_accounts. My previous comment on data metrics applies again.
indexes to use
on map_rbc_etl( free_code_9, annexure)
and on stage_dim_accounts(rbc_code);
now these may not be used for reasons in previous answer.
Additional reasons an index may not be used are:
1. The optimizer decides it would be more efficient not to use index.
2. if column is on view and has function call on column. To use this use function based indexes.
3. you perform mathematical operation in query. Note you can look at explain plan and create index to match how it is loading the rows.
4. you concat columns together in where clause. Use function based index for overcoming this.
5. You do not include first column in concatenated index in where clause of your statement. Note that Oracle 9i or greater do skip scanning and can use the index.
6. You use or clause. In this case it is best to create one index for all but the or clause and one for each of the or values then it will use all indexes appropriately.
if you don't know how to use function based indexes an example for a to_upper() in where clause you would use the following
create indexName on tableName(to_upper(colname));
any oracle sql function (built in or user created) can be in the index.

Speeding up a postgres query (which works on 2 tables)

I am doing, in postgresql, something like this:
select A.first,
count(B.second) as count,
array_agg(A.second) as second,
array_agg(A.third) as third,
array_agg(B.kids) as kids
from A join B on A.first=B.second
group by A.first;
And it's taking forever (also because the tables are pretty big). Limiting the output to 10 row and looking with explain analyze told me there's a nested loop which is huge and takes most of the time.
Is there any way in which I can write this query (which I'll then use in CREATE TABLE AS to create a new table) to speed it up, while conserving the same output, which is what I want?
Thanks!
Ensure the column bring used as a foreign key is indexed:
create index b_second on b(second);
Without such an index, every row of a would cause a table scan of b, which would make your query crawl.

Improve SQL Server 2005 Query Performance

I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>

Resources