Is "SELECT COUNT(column)" faster/slower than "SELECT COUNT(*)"? [duplicate] - performance

This question already has answers here:
COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better? [duplicate]
(5 answers)
Closed 8 years ago.
I'm running a query like this in MSSQL2008:
select count(*)
from t1
inner join t2 on t1.id = t2.t1_id
inner join t3 on t1.id = t3.t1_id
Assume t1.id has a NOT NULL constraint. Since they're inner joins and t1.id can never be null, using count(t1.id) instead of count(*) should produce the exact same end result. My question is: Would the performance be the same?
I'm also wondering whether the joins could affect this. I realize that adding or removing a join will affect both performance and the length of the result set. Suppose that without changing the join pattern, you set count to target only one table. Would it make any difference? In other words, is there a difference between these two queries:
select count(*) from t1 inner join t2 on t1.id = t2.t1_id
select count(t1.*) from t1 inner join t2 on t1.id = t2.t1_id
COUNT(id) vs. COUNT(*) in MySQL answers this question for MySQL, but I couldn't find answers for MS-SQL specifically, and I can't find anything at all that takes the join factor into account.
NOTE: I tried to find this information on both Google and SO, but it was difficult to figure out how to word my search.

I tried a few SELECT COUNT(*) FROM MyTable vs. SELECT COUNT(SomeColumn) FROM MyTable with various sizes of tables, and where the SomeColumn once is a clustering key column, once it's in a non-clustered index, and once it's in no index at all.
In all cases, with all sizes of tables (from 300'000 rows to 170 million rows), I never see any difference in terms of either speed nor execution plan - in all cases, the COUNT is handled by doing a clustered index scan --> i.e. scanning the whole table, basically. If there is a non-clustered index involved, then the scan is on that index - even when doing a SELECT COUNT(*)!
There doesn't seem to be any difference in terms of speed or approach how those things are counted - to count them all, SQL Server just needs to scan the whole table - period.
Tests were done on SQL Server 2008 R2 Developer Edition

select count(*) will be slower as it attempts to fetch everything. Specifying a column (PK or any other indexed column) will speed up things as the query engine knows ahead of time what it is looking for. It'll also use an index as opposed to going against the table.

Related

Queries on user_cons_columns much slower in Oracle 12c

We're in the process of upgrading from Oracle 11g to 12c, and have noticed that queries on user_cons_columns seem to be quite a bit slower.
For example this is about 4 times as slow, even on a smaller dataset:
select uc.search_condition
from user_constraints uc inner join user_cons_columns ucc on ucc.CONSTRAINT_NAME = uc.CONSTRAINT_NAME
where ucc.table_name = :upper_table_name
and ucc.column_name = :upper_column
Could it just be a matter of gathering statistics?
In my experience selects from user_constraints and user_cons_columns and other data dictionary views have been slow for several major Oracle versions. Not just 12c. Doing dbms_stats.gather_dictionary_stats; speeded up the first query below between 10-20%.
But what was really helpful was to rewrite the query to select from with clause "tables" using the /*+materialized*/ hint instead of selecting directly from the user_ tables.
This query is very slow on my setup, about 150 seconds: (it returns all foreign keys on a list of tables, including table and column names in both ends of the foreign keys)
select
cc.table_name, cc.position, cc.constraint_name, cc.column_name,
cr.table_name r_table_name, ccr.constraint_name r_constraint_name, ccr.column_name r_column_name
from user_constraints c
join user_cons_columns cc on cc.constraint_name=c.constraint_name and
cc.owner=c.owner and
cc.table_name=c.table_name
join user_constraints cr on cr.owner=c.r_owner and
cr.constraint_name=c.r_constraint_name and
cr.constraint_type in ('P','U')
join user_cons_columns ccr on ccr.constraint_name=cr.constraint_name and
ccr.owner=cr.owner and
ccr.table_name=cr.table_name and
ccr.position=cc.position
where c.constraint_type='R'
and c.table_name in ('TABLE_A', 'TABLE_B', ........a list of about 157 table names.......)
order by cc.table_name, cc.position, constraint_name, column_name, cc.position;
After rewriting to this, the query use just 1-8 seconds:
with
uc as (select /*+materialize*/ owner,table_name,constraint_name,constraint_type,r_owner,r_constraint_name from user_constraints),
ucc as (select /*+materialize*/ owner,table_name,constraint_name,position,column_name from user_cons_columns)
select
cc.table_name, cc.position, cc.constraint_name, cc.column_name,
cr.table_name r_table_name, ccr.constraint_name r_constraint_name, ccr.column_name r_column_name
from uc c
join ucc cc on cc.constraint_name=c.constraint_name and cc.owner=c.owner and cc.table_name=c.table_name
join uc cr on cr.owner=c.r_owner and cr.constraint_name=c.r_constraint_name and cr.constraint_type in ('P','U')
join ucc ccr on ccr.constraint_name=cr.constraint_name and ccr.owner=cr.owner and ccr.table_name=cr.table_name and ccr.position=cc.position
where c.constraint_type='R'
and c.table_name in ('TABLE_A', 'TABLE_B', ........a list of about 157 table names.......)
order by cc.table_name, cc.position, constraint_name, column_name, cc.position;
I also tried * instead of listing just the needed columns in the with tables, but that didn't help. I'm guessing it's because Oracle ignore /*+materialize*/ hints if too much data is to be remembered/cached.
1. Gather dictionary stats.
begin
dbms_stats.gather_dictionary_stats;
end;
/
2. Gather fixed object stats.
begin
dbms_stats.gather_fixed_objects_stats;
end;
/
There are also a few rare data dictionary objects that are never analyzed unless you specifically call them with dbms_stats.gather_table_stats.
3. Look for broken data dictionary objects. In some rare cases character set problems can cause data dictionary performance problems. Run an EXPLAIN PLAN on the SELECT and look for anything "weird", like NLSSORT in the predicates that would prevent an index access.
4. Check My Oracle Support. I've seen bugs before for data dictionary views that degrade with new versions. Sometimes there's an alternate version of the data dictionary view that fixes the problem. I searched on My Oracle Support and "Data Dictionary Select Taking A Very Long Time in 12c (Doc ID 2251730.1)" may be relevant here. I can't post the contents of that article here so go to support.oracle.com and check out the workaround in that bug report.
5. Consider yourself lucky. If you only have one performance problem, and it's only four times slower, I'd consider that a successful upgrade.
I'm a bit late to this party but as suggested by Burleson, use the /*+ RULE */ hint with your queries on the Oracle data dictionary. This effectively turns off the optimizer.
Many have said not to use hints and that the RULE hint has been deprecated but it makes a huge difference in my case. One of my DBA_IND_COLUMNS queries which took 18 MINUTES to run now takes less than a second (Oracle 12cR1). At a loss to say why this works...

order by rownum — is it correct or not?

I have a canonical top-N query against an Oracle database suggested by all FAQs and HowTos:
select ... from (
select ... from ... order by ...
) where ronwum <= N
It works perfectly on Oracle 11, i.e. it returns top-N records in the order specified in inner select.
However it breaks on Oracle 12. It still returns the same top-N records, but they may get shuffled. The final order of these records is non-deterministic.
I googled but haven't found any related discussions. Looks like everyone else is always getting the correct record order from such select.
One finding was interesting though. I saw that some people use (without an explanation, unfortunately) an additional order by rownum clause in the outer select:
select ... from (
select ... from ... order by ...
) where ronwum <= N
order by rownum
(both rownum's here are references to the Oracle pseudocolumn; it's not something returned by inner select)
It appears to work. But with Oracle optimizer you can never be sure if it's just luck or a really correct solution.
The question is: does order by rownum guarantee correct ordering in this case or not, and why? Me and my colleagues could not come to agreement about it.
P.S. I'm aware of other ways to select top-N records, e.g. using row_number analytic function and fetch first clause introduced in Oracle 12. I'm also aware that I can repeat the same order by ... on the outer select. The question is about order by rownum only — is it correct or not.
Inner query and outer query may or may not give different order and hence different order of rownum. As rownum is already ordered and if you want to get top N records then best thing is to do is create alias of rownum in inner query and use it on outer query.
select ... from (
select rownum rn ... from ...
) where rn <= N
order by rn

Different results in Parallel Execution - Oracle

In my company's application there is a query in oracle using parallel execution (configured to 4 servers), it wasn't me who built it, but the developer put it that way for performance.
The query makes joins between views and tables and the weirdest thing is: sometimes it returns 11k results (incorrect), sometimes 27k results (correct).
After much research I found out that if I removed this parallel thing, it always returns the correct number: 27k. And if I increase the number of server to 6 or 7, it always returns the incorrect number: 11k.
The layout of the query is like this:
SELECT /*+ PARALLEL(NAME, 4) */ * FROM(
SELECT DISTINCT COLUMNS
FROM VIEW
JOIN TABLE1 ON (....)
JOIN TABLE2 ON (....)
JOIN TABLE3 ON (....)
ORDER BY 3
) NAME
Anyone has any idea why? I don't know much about this subject.

Why the Select * FROM Table where ID NOT IN ( list of int ids) query is slow in sql server ce?

well this problem is general in sql server ce
i have indexes on all the the fields.
also the same query but with ID IN ( list of int ids) is pretty fast.
i tried to change the query to OUTER Join but this just make it worse.
so any hints on why this happen and how to fix this problem?
That's because the index is not really helpful for that kind of query, so the database has to do a full table scan. If the query is (for some reason) slower than a simple "SELECT * FROM TABLE", do that instead and filter the unwanted IDs in the program.
EDIT: by your comment, I recognize you use a subquery instead of a list. Because of that, there are three possible ways to do the same (hopefully one of them is faster):
Original statement:
select * from mytable where id not in (select id from othertable);
Alternative 1:
select * from mytable where not exists
(select 1 from othertable where mytable.id=othertable.id);
Alternative 2:
select * from mytable
minus
select mytable.* from mytable in join othertable on mytable.id=othertable.id;
Alternative 3: (ugly and hard to understand, but if everything else fails...)
select * from mytable
left outer join othertable on (mytable.id=othertable.id)
where othertable.id is null;
This is not a problem in SQL Server CE, but overall database.
The OPERATION IN is sargable and NOT IN is nonsargable.
What this mean ?
Search ARGument Able, thies mean that DBMS engine can take advantage of using index, for Non Search ARGument Ablee the index can't be used.
The solution might be using filter statement to remove those IDs
More in SQL Performance Tuning by Peter Gulutzan.
ammoQ is right, index does not help much with your query. Depending on distribution of values in your ID column you could optimise the query by specifying which IDs to select rather than not to select. If you end up requesting say more than ~25% of the table index will not be used anyway though because for nonclustered indexed (which is the only type of indexes which SQL CE supports if memory serves) it would be cheaper to scan the table. Otherwise (if the query is actually selective) you could re-write query with ID ranges to select ('union all' may work better than 'or' to combine ranges if SQL CE supports 'union all', not sure)

Table Join Efficiency Question

When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?
In other words, is there a difference in efficiency between these two tables?
SELECT result
FROM result_tbl
JOIN test_tbl USING (test_id)
JOIN sample_tbl USING (sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
vs
SELECT result
FROM (SELECT result, test_id FROM result_tbl)
JOIN (SELECT test_id, sample_id FROM test_tbl) USING(test_id)
JOIN (SELECT sample_id FROM sample_tbl) USING(sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.
It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.
However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.
e.g.
SELECT t1.id1
FROM table1 t1
INNER JOIN table2 ON something = something
should almost always be
SELECT id1
FROM table1 t1
WHERE EXISTS( SELECT *
FROM table2
WHERE something = something )
For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.
Also this is a code smell and probably should be changed:
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A')
to
SELECT result
FROM request
WHERE EXISTS(...)
AND request_status = 'A'
No difference.
You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.
The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.
Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.

Resources