How do I build effecient SQL filters? - performance

After taking an advanced T-SQL performance/query tuning class, something that I thought I remembered hearing was that you can speed up some queries just a little bit if you put your date(time) filters first.
Ex:
WHERE
RunDate = '12/1/2015' AND
OtherFilters = etc...
But does this really only count if I have indexes in place on these columns I filter on for this table?
So to add to this just a little, should I be building my filters off of the indexes on any tables referenced in the query? Such that my first filters of the query are based on my indexes?
Ex:
WHERE
ID > 1000 AND
RunDate <= '1/1/206' AND
OtherFilters = etc...
Where ID and RunDate are part of my indexes/primary key.

The order of filters in WHERE clause does not matter. As long as you have index on the fields, SQL Server knows how to use your filters.
Assume you have index on (ID, RunDt) and you have both ID and RunDt in your WHERE clause. SQL Server first filters the data on ID and then from that subset rows, will filter on RunDt.
This scenario may change if you have other indexes depends on selectivity of your data.
Also if you have clustered index on RunDt, SQL will first filter on RunDt and then ID.
You don't need to worry about the order of your filters in WHERE clause, as long as you have the right order of columns in your index definition.

TSQL is just a logical representation
The query optimizer will set the actual execution order that is most efficient
It messes up some times but for the most part it is spot on
If you have a clustered PK on ID then this will typically be done first
Appears even the OP is confused about the question
Can only answer the stated question
But does this really only count if I have indexes in place on these
columns I filter on for this table?
The order in the where does not matter for columns with indexes
The order in the where does not matter for columns without indexes
The order in the where does not matter

Related

Adding Index To A Column Having Flag Values

I am a novice in tuning oracle queries thus need help.
If I have a sql query like:
select a.ID,a.name.....
from a,b,c
where a.id=b.id
and ....
and b.flag='Y';
then will adding index to the FLAG column of table b help to tune the query by any means? The FLAG column has only 2 values Y and N
With a standard btree index, the SQL engine can find the row or rows in the index for the specified value quickly due to its binary structure, then use the physical address (the rowid) stored in the index to access the desired row in a second hop. It's like looking in the index of a book to find the page number. So that is:
Go to index with the key value you want to look up.
The index tells you the physical address in the table.
Go straight to that physical address.
That is nice and quick for something like a unique customer ID. It's still OK for something nonunique, like a customer ID in a table of orders, although the database has to go through the index entries and for each one go to the indicated address. That can still be faster than slogging through the entire table from top to bottom.
But for a column with only two distinct values, you can see that it is going to be more work going through all of the index entries for 'Y' for example, and for each one going to the indicated location in the table, than it would be to just forget the index and scan the whole table in one shot.
That's unless the values are unevenly distributed. If there are a million Y rows and ten N rows then an index will help you find those N rows fast but be no use for Y.
Adding an index to a column with only 2 values normally isn't very useful, because Oracle might just as well do a full table scan.
From your query it looks like it would be more useful to have an index on id, because that would help with the join a.id=b.id.
If you really want to get into tuning then learn to use "explain plan", as that will give you some indication of how much work Oracle needs to do for a query. Add (or remove) an index, then rerun the explain plan.

Scan on DynamDB table or Query on secondary global index or a local index (What's the best solution)

I have AWS DynamoDB table called "Users", whose hash key/primary key is "UserID" which consist of emails. It has two attributes, first called "Daily Points" and second "TimeSpendInTheApp". Now I need to run a query or scan on the table, that will give me top 50 users which have the highest points and top 50 users which have spend the most time in the app. Now this query will be executed only once a day by cron aws lambda. I am trying to find the best solutions for this query or scan. For me, the cost is most important than speed/or efficiency. As maintaining secondary global index or a local index on points can be costly operations, as I have to assign Read and Write units for those indexes, which I want to avoid. "Users" table will have a maximum of 100,000 to 150,000 records and on average it will have 50,000 records. What are my best options? Please suggest.
I am thinking, my first option is, I can scan the whole table on Filter Expression for records above certain points (5000 for example), after this scan, if 50 or more than 50 records are found, then simply sort the values and take the top 50 records. If this scan returns no or very less results then reduce the Filter Expression value (3000 for example), then again do the same scan operation. If Filter Expression value (2500 for example) returns too many records, like 5000 or more, then reduce the Filter Expression value. Is this even possible, I guess it would also need to handle pagination. Is it advisable to scan on a table which has 50,000 record?
Any advice or suggestion will be helpful. Thanks in advance.
Firstly, creating indexes for the above use case doesn't simplify the process as it doesn't have solution for aggregation or sorting.
I would export the data to HIVE and run the queries rather than writing code to determine the result especially as it is going to be a batch executed only once per day.
Something like below:-
Create Hive table:-
CREATE EXTERNAL TABLE hive_users(userId string, dailyPoints bigint, timeSpendInTheApp bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Users",
"dynamodb.column.mapping" = "userId:UserID,dailyPoints:Daily_Points,timeSpendInTheApp:TimeSpendInTheApp");
Queries:-
SELECT dailyPoints, userId from hive_users sort by dailyPoints desc;
SELECT timeSpendInTheApp, userId from hive_users sort by timeSpendInTheApp desc;
Hive Reference

Oracle optimizer is not accepting index hint

When I run the merge query then index cannot read and query is running very slow please advise me.
Index in stage_dim_accounts(rbc_code)
Index in map_rbc_etl(free_code_9)
MERGE INTO stage_dim_accounts t
USING map_rbc_etl s ON (t.rbc_code = s.free_code_9)
WHEN MATCHED THEN UPDATE
SET t.indx_no= s.indx_no
WHERE s.annexure= 'AXN-I'
AND (.free_code_9 <> 'NA' AND s.free_code_9 <> '0')
AND t.rbc_code <> 'NA'
Thanks in advance
The optimizer is smart enough to know that your indexes are useless.
An index on free_code might be useful if most of the values in that column were either '0' or 'NA'. As you haven't provided any information regarding data volumes or distribution we can't tell. But you have other restriction criteria on map_rbc_etl, so the database needs to go to the table anyway. My guess is that optimizer has chosen to use a full table scan on map_rbc_etl because that's quicker than a huge number of indexed reads.
This is because an indexed read is two operations - read the index, read the row. So it only pays dividends if the percentage of rows read is tiny. Otherwise it is just more efficient to read all the rows and winnow them in memory.
Here is the great "secret" of tuning: indexed reads are not always faster; full table scans are not always bad.
Similar logic applies to reading the stage_dim_accounts. The indexed column is unlikely to be selective. Unless ... unless the number of rows in map_rbc_etl is very small and only matches a small selection of rows in stage_dim_accounts. My previous comment on data metrics applies again.
indexes to use
on map_rbc_etl( free_code_9, annexure)
and on stage_dim_accounts(rbc_code);
now these may not be used for reasons in previous answer.
Additional reasons an index may not be used are:
1. The optimizer decides it would be more efficient not to use index.
2. if column is on view and has function call on column. To use this use function based indexes.
3. you perform mathematical operation in query. Note you can look at explain plan and create index to match how it is loading the rows.
4. you concat columns together in where clause. Use function based index for overcoming this.
5. You do not include first column in concatenated index in where clause of your statement. Note that Oracle 9i or greater do skip scanning and can use the index.
6. You use or clause. In this case it is best to create one index for all but the or clause and one for each of the or values then it will use all indexes appropriately.
if you don't know how to use function based indexes an example for a to_upper() in where clause you would use the following
create indexName on tableName(to_upper(colname));
any oracle sql function (built in or user created) can be in the index.

order of records in result set

May order of rows in unordered query (like select * from smth) be different in different queries (in the same and not same session) if there are no updates to database table?
The order or rows returned from a query should never be relied upon unless you have included a specific ORDER BY clause in your query.
You may find that even without the ORDER BY the results appear in the same order but you could not guarentee this will be the case and to rely on it would be foolish especially when the ORDER BY clause will fulfill your requirements.
See this question: Default row ordering for select query in oracle
It has an excellent quote from Tom Kyte about record ordering.
So to answer your question: Yes, the order of rows in an unordered query may be different between queries and sessions as it relies on multiple factors of which you may have no control (if you are not a DBA etc.)
Hope this helps...

Improve SQL Server 2005 Query Performance

I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>

Resources