why is selecting a pk column faster than a non-indexed column?

why is selecting a pk column faster than a non-indexed column? - performance

I'm currently doing some tests and I noticed the following:
select field1 from table1
Will result into an index fast full scan when field1 is the primary key, thus with a low cost (in my case it is 4690), whereas
select field2 from table1
Will result into a table access full (there's no constraint nor index on field2, yet even with a regular index the result is the same), with a cost of 117591.
I'm aware of the gain when the indexes/constraints are involved in JOIN/WHERE clauses, but in my case there's nothing filtered: I don't understand why the PK should be faster because, anyway, I am retrieving all the rows...
Is it because of the uniqueness? Tom says that a unique index is the same as a conventional index, structurally, which really makes me wonder why selecting the PK would cost less than any other column.
Thanks for your enlightenments :-)
rgds.

A single-column b-tree index does not store data for rows that are NULL. So if you have an index on field2 but field2 allows NULL, Oracle can't do a scan on the index without risking potentially returning incorrect data. A full table scan is, therefore, the only valid way for Oracle to retrieve the data for the field2 column for every row in table1. If you add a NOT NULL constraint to field2, Oracle should be able to at least consider doing a full scan of the index.
Of course, whether or not the optimizer chooses to use the index (and the cost it ultimately assigns to using the index) will depend on the statistics that you've gathered both on the index and on the table. If your statistics are inaccurate, the optimizer's cost estimates are going to be inaccurate and so the plan that is generated is likely to be inefficient. That's one of the reasons that people are usually advised to be cautious about putting too much credence into Oracle's estimate of the cost of a plan-- if you're looking at a plan, it's likely because you suspect it is inefficient which should imply that you can't rely on the cost. You're generally much better served looking at the cardinality estimates for each step and determining whether those make sense given your distribution of data.

Related

Oracle Auto Partition strategy on Integer column

I need some help on how to perform auto partition on integer column, similar to how we do on date column like PARTITION BY RANGE (DIM_DT_ID) INTERVAL (NUMTODSINTERVAL(1,'DAY')).
I have 90 million rows and it sucks in performance and our SLA on query is 2 seconds, i would like to perform partition. What is the best approach and how do i enable auto partition on a Integer column
Our query will always filter by these columns like
select * from <tbname>
where ObjectID =1346785
and patentnumber=23456.

"i'm just making an example here, as i cant paste the original query for legality sake"
Fair enough, but the advice we give you will only be as good as the information you give us. So far, nothing you have posted suggests you need Partitioning.
The pasted query would perform well with a compound index, and would probably benefit from compression of the leading column:
create index your_table_lookup_index
on your_table(ObjectID, patentnumber) compress 1;
If that's a unique combination then make the index unique.
how do i enable auto partition on a Integer column
However, if you think you do have a genuine use case for Partitioning then we can use Interval Partitioning with integers as well as dates. This statement will create a table partitioned on objectid with a partition for every ten values.
create table your_table (
objectid number,
patentnumber number,
created_date date
)
partition by range (objectid)
interval (10)
(
partition p_00010 values less than (10)
);
On your posted figures that would be about 400 partitions with around 225000 rows per partition. Is that a good choice? Who can tell? You know your data and your use cases, we don't: perhaps a partition per objectid (i.e. with interval (1)) would be better.
You already have a table so you need to split it into Partitions. The standard of way of doing this would be
create a new table with your partitioning strategy (like above) but with the default partition ranged for values less than (MAXVALUE)
use partition exchange to move the existing table data into the new
structure
drop the old table and rename the table to the old table; resolve
foreign keys and other dependencies.
iteratively split the partition into the required range
This is a fairly time-consuming process. You have tagged your question [oracle12c]; if you're using Oracle 12c R2 you should definitely look at its online conversion mechanism, which is a single command. Find out more.
Remember that Partitioning for performance is a tricky game. While it can improve queries which return a large number of rows aligned with the Partition key it can make no difference to other queries, or even impair their performance. In particular, any query which does not include the partition key (objectid in your case) will likely perform worse after partitioning the table .
Final aside: as you know but for the benefit of future Seekers, Partitioning is a chargeable extra to the Enterprise Edition license. We're not allowed to use it unless we've paid for it.

When may it make sense to not have an index on a table and why?

For Oracle and being Relative to application tuning, when may it make sense to not have an index on a table and why?

There is a cost associated to having an index:
it takes up disk space
it slows down updates (index needs to be updated as well)
it makes query planning more complex (slightly slower, but more importantly increased potential for bad decisions)
These costs are supposed to be offset by the benefit of more efficient query processing (faster, fewer I/O).
If the index is not used enough to justify the cost, then having the index will be negative.

In particular, if your data distribution is low (think flags like 'Y' and 'N'), indexes won't help much. Think of it this way: if the number of distinct values in an index is low, the optimizer will probably choose not to use the index. An interesting aside is that if the column in the index is null, it might be much faster if your query criteria include actual values as nulls aren't indexed, which means that only the actual values (non null) are in that particular index,thereby not evaluating most of the rows in the table. In the "is null" case, it will never use an index - if you have a query with a "where" clause like "where mytable.mycolumn is null", abandon all indexes ye who enter here.

If a table has very little data (small number of rows) then it doesn't serve you to use an index. An index makes it quick to search on a specific attribute and if the application you are working with doesn't need a fast lookup then using an index does very little for you.

how to get around when normal index or bitmap index isn't useful

The columns that are in the where clause are not selective. They are all in 1 single table. In addition the expressions used are NOT EQUAL, OR, IS NULL, IS NOT NULL. The primary key is on the customer ID. I am not sure how to get around with this kind of data. Are there different indexing methods that can be created on table or other ways to solve the problem. I guess partitions won't be helpful either for breaking a table into one major section with large data. Any thoughts or workarounds will be useful.
I'm putting below the data for reference and sample queries for ease of understanding.
sample query
colA = 'Marketable' OR colA is null
NORMAL index: gets ignored due to OR and NULL operator. Moreover the queried data covers more than 95% of data in the table.
BITMAP index: gets ignored due to more than 96% data coverage.
sample query
colB = '7' OR colB = '6' OR colB = '5'
NORMAL or BITMAP: both not useful due to large data selection. Optimizer goes with full table scan using the primary key cust_id.
sample query
colC <> 'SPECIAL SEGMENT' OR colC is null (since the values can change, no specific value is passed)
combination sample query
NOT (colB = '6' OR colB = '3') AND
(colC <> 'SPECIAL SEGMENT' OR colC is null)

Full table scans are not evil. Index access is not always more efficient.
If you want to return the majority of the data in a table, you want to use a full table scan since that's the most efficient way of accessing large fractions of the data in the table. Indexes are great when you want to access relatively small fractions of the data in the table. But if you want most of the data, doing millions of index accesses is not going to be more efficient. In your first example, you want to return 9.2 million rows from a 9.3 million row table. A full table scan is the plan you want-- that's the most efficient way to retrieve 99% of the rows in the table. Anything else is going to be less efficient. You could, I suppose, potentially partition the table on A leading to full partition scans of the two large partitions. That's only going to cut, say 1% of the work your query needs to do, though, and may have negative impacts on other queries on that table.
Now, I'm always a bit suspicious about queries that want to return 99% of the rows in a table in the first place. It would make no sense to have such a query in an OLTP system, for example, because no human is going to page through 9.2 million rows of data. It wouldn't make sense to have that sort of query if the goal is to replicate data because it would almost certainly be more efficient to just replicate incremental changes rather than the entire data set every time. It might make sense to read almost all the rows if the goal is to perform some aggregations. But if this is something that happens enough to care about optimizing the analysis, you'd be better off looking at ways of pre-aggregating the data using materialized views and dimensions so that you can read and aggregate the data once and then just read your pre-aggregated values at runtime.
If you do really need to read all that data, you may also want to look into parallel query. If there are relatively few readers, it is more efficient to let Oracle do the full scan in parallel so that your session can utilize more of the available hardware. Of course, that means that you can have fewer simultaneous sessions since more hardware for you means less for others, so that's a trade-off you need to understand. If you're building an ETL process where there will only be a couple sessions loading data at any point, parallel query can provide substantial performance improvements.

Sybase - Performance considerations while indexing existing table

I have a table in SYBASE which has around 1mio rows. This table currently does not have any index created and I would like to create one now. My questions are
What precautions should I take before creating an index?
Does this process require more tablespace to be allocated?
Any other performance considerations I should take care of?
Cheers
Ranjith

From manual.
When to index
Use the following general guidelines:
If you plan to do manual insertions into the IDENTITY column, create
a unique index to ensure that the inserts do not assign a value that
has already been used.
A column that is often accessed in sorted order, that is, specified in the order by clause, probably should be indexed so that
Adaptive Server can take advantage of the indexed order.
Columns that are regularly used in joins should always be indexed, since the system can perform the join faster if the columns
are in sorted order.
The column that stores the primary key of the table often has a clustered index, especially if it is frequently joined to columns in
other tables. Remember, there can be only one clustered index per
table.
A column that is often searched for ranges of values might be a good choice for a clustered index. Once the row with the first value
in the range is found, rows with subsequent values are guaranteed to
be physically adjacent. A clustered index does not offer as much of
an advantage for searches on single values.
When not to index
In some cases, indexes are not useful:
Columns that are seldom or never referenced in queries do not benefit
from indexes, since the system seldom has to search for rows on the
basis of values in these columns.
Columns that can have only two or three values, for example, "male" and "female" or "yes" and "no", get no real advantage from
indexes.
Try
sp_spaceused tablename, 1
Here is link to documentation.
Yes - Updating statistics about indexes.
Here is link to documentation.

Sybase: Does the column order in a non-clustered index affect insert performance?

To be more specific (since the general answer to the subject is likely "yes"):
We have a table with lots of data in Sybase.
One of the columns is "date of insertion" (DATE, datetime type).
The clustered index on the table starts with the "DATE".
Question: For another, non-clustered index, does the order of columns (more specifically, whether "DATE" is the first or a second index column) affect the insert query performance?
Assume everything else is equal, e.g. the order of the second non-clustered index does not affect select query performance (even if it does, I don't care for the purposes of this question).

what locking scheme is in place on the table All-pages or data-pages? (you can find out by selecting lockscheme('table_name'). The (application-observed) performance of index-maintenance is much better with data-pages locking scheme.
An index is ordered. The insert time depends on the cost of maintaining that order. If you insert rows with monotonically increasing values for the columns which are indexed then the index will grow 'at the end' and performance will be great (modulo any concurrency issues due to multiple concurrent updaters). The index tree will have to be re-balanced from time to time but I believe that is a rapid operation.
If the order of inserts is not in the same order as an index then that index will have to have entries inserted 'into the middle' and this is likely to cause page-splits (modulo there being sufficient space left unallocated by setting a fill-factor as mentioned above) and also 'index fragmentation'
anyway, the answer -- as always -- is to conduct some experiments and measure elapsed time and IO activity. You also might want to look at the optdiag output.
pjjH

I believe it is largely determined by the Fill factor in the indexes and to a much lesser extent how selective the columns are.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio