Sybase: Does the column order in a non-clustered index affect insert performance? - performance

To be more specific (since the general answer to the subject is likely "yes"):
We have a table with lots of data in Sybase.
One of the columns is "date of insertion" (DATE, datetime type).
The clustered index on the table starts with the "DATE".
Question: For another, non-clustered index, does the order of columns (more specifically, whether "DATE" is the first or a second index column) affect the insert query performance?
Assume everything else is equal, e.g. the order of the second non-clustered index does not affect select query performance (even if it does, I don't care for the purposes of this question).

what locking scheme is in place on the table All-pages or data-pages? (you can find out by selecting lockscheme('table_name'). The (application-observed) performance of index-maintenance is much better with data-pages locking scheme.
An index is ordered. The insert time depends on the cost of maintaining that order. If you insert rows with monotonically increasing values for the columns which are indexed then the index will grow 'at the end' and performance will be great (modulo any concurrency issues due to multiple concurrent updaters). The index tree will have to be re-balanced from time to time but I believe that is a rapid operation.
If the order of inserts is not in the same order as an index then that index will have to have entries inserted 'into the middle' and this is likely to cause page-splits (modulo there being sufficient space left unallocated by setting a fill-factor as mentioned above) and also 'index fragmentation'
anyway, the answer -- as always -- is to conduct some experiments and measure elapsed time and IO activity. You also might want to look at the optdiag output.
pjjH

I believe it is largely determined by the Fill factor in the indexes and to a much lesser extent how selective the columns are.

Related

When may it make sense to not have an index on a table and why?

For Oracle and being Relative to application tuning, when may it make sense to not have an index on a table and why?
There is a cost associated to having an index:
it takes up disk space
it slows down updates (index needs to be updated as well)
it makes query planning more complex (slightly slower, but more importantly increased potential for bad decisions)
These costs are supposed to be offset by the benefit of more efficient query processing (faster, fewer I/O).
If the index is not used enough to justify the cost, then having the index will be negative.
In particular, if your data distribution is low (think flags like 'Y' and 'N'), indexes won't help much. Think of it this way: if the number of distinct values in an index is low, the optimizer will probably choose not to use the index. An interesting aside is that if the column in the index is null, it might be much faster if your query criteria include actual values as nulls aren't indexed, which means that only the actual values (non null) are in that particular index,thereby not evaluating most of the rows in the table. In the "is null" case, it will never use an index - if you have a query with a "where" clause like "where mytable.mycolumn is null", abandon all indexes ye who enter here.
If a table has very little data (small number of rows) then it doesn't serve you to use an index. An index makes it quick to search on a specific attribute and if the application you are working with doesn't need a fast lookup then using an index does very little for you.

Oracle: Possible to replace a standard non-unique single column index with an unique combined index?

I´m currently working on optimzing my database schema in regards of index structures. As I´d like to increase my DDL performance I´m searching for potential drop candidates on my Oracle 12c system. Here´s the scenario in which I don´t know what the consequences for the query performance might be if I drop the index.
Given two indexes on the same table:
- non-unique, single column index IX_A (indexes column A)
- unique, combined index UQ_AB (indexes column A, then B)
Using index monitoring I found that the query optimizer didn´t choose UQ_AB, but only IX_A (probably because it´s smaller and thus faster to read). As UQ_AB contains column A and additionally column B I´d like to drop IX_A. Though I´m not sure if I get any performance penalties if I do so. Does the higher selectivity of the combined unique index have any influence on the execution plans?
It could do, though it's quite likely to be minor (usually). Of course it depends on various things, for example how large the values in column B are.
You can look at various columns in USER_INDEXES to compare the two indexes, such as:
BLEVEL: tells you the "height" of the index tree (well, height is BLEVEL+1)
LEAF_BLOCKS: how many data blocks are occupied by the index values
DISTINCT_KEYS: how "selective" the index is
(You need to have analyzed the table first for these to be accurate). That will give you an idea of how much work Oracle needs to do to find a row using the index.
Of course the only way to really be sure is to benchmark and compare timings or even trace output.

Sybase - Performance considerations while indexing existing table

I have a table in SYBASE which has around 1mio rows. This table currently does not have any index created and I would like to create one now. My questions are
What precautions should I take before creating an index?
Does this process require more tablespace to be allocated?
Any other performance considerations I should take care of?
Cheers
Ranjith
From manual.
When to index
Use the following general guidelines:
If you plan to do manual insertions into the IDENTITY column, create
a unique index to ensure that the inserts do not assign a value that
has already been used.
A column that is often accessed in sorted order, that is, specified in the order by clause, probably should be indexed so that
Adaptive Server can take advantage of the indexed order.
Columns that are regularly used in joins should always be indexed, since the system can perform the join faster if the columns
are in sorted order.
The column that stores the primary key of the table often has a clustered index, especially if it is frequently joined to columns in
other tables. Remember, there can be only one clustered index per
table.
A column that is often searched for ranges of values might be a good choice for a clustered index. Once the row with the first value
in the range is found, rows with subsequent values are guaranteed to
be physically adjacent. A clustered index does not offer as much of
an advantage for searches on single values.
When not to index
In some cases, indexes are not useful:
Columns that are seldom or never referenced in queries do not benefit
from indexes, since the system seldom has to search for rows on the
basis of values in these columns.
Columns that can have only two or three values, for example, "male" and "female" or "yes" and "no", get no real advantage from
indexes.
Try
sp_spaceused tablename, 1
Here is link to documentation.
Yes - Updating statistics about indexes.
Here is link to documentation.

why is selecting a pk column faster than a non-indexed column?

I'm currently doing some tests and I noticed the following:
select field1 from table1
Will result into an index fast full scan when field1 is the primary key, thus with a low cost (in my case it is 4690), whereas
select field2 from table1
Will result into a table access full (there's no constraint nor index on field2, yet even with a regular index the result is the same), with a cost of 117591.
I'm aware of the gain when the indexes/constraints are involved in JOIN/WHERE clauses, but in my case there's nothing filtered: I don't understand why the PK should be faster because, anyway, I am retrieving all the rows...
Is it because of the uniqueness? Tom says that a unique index is the same as a conventional index, structurally, which really makes me wonder why selecting the PK would cost less than any other column.
Thanks for your enlightenments :-)
rgds.
A single-column b-tree index does not store data for rows that are NULL. So if you have an index on field2 but field2 allows NULL, Oracle can't do a scan on the index without risking potentially returning incorrect data. A full table scan is, therefore, the only valid way for Oracle to retrieve the data for the field2 column for every row in table1. If you add a NOT NULL constraint to field2, Oracle should be able to at least consider doing a full scan of the index.
Of course, whether or not the optimizer chooses to use the index (and the cost it ultimately assigns to using the index) will depend on the statistics that you've gathered both on the index and on the table. If your statistics are inaccurate, the optimizer's cost estimates are going to be inaccurate and so the plan that is generated is likely to be inefficient. That's one of the reasons that people are usually advised to be cautious about putting too much credence into Oracle's estimate of the cost of a plan-- if you're looking at a plan, it's likely because you suspect it is inefficient which should imply that you can't rely on the cost. You're generally much better served looking at the cardinality estimates for each step and determining whether those make sense given your distribution of data.

TSql, building indexes before or after data input

Performance question about indexing large amounts of data. I have a large table (~30 million rows), with 4 of the columns indexed to allow for fast searching. Currently I set the indexs (indices?) up, then import my data. This takes roughly 4 hours, depending on the speed of the db server. Would it be quicker/more efficient to import the data first, and then perform index building?
I'd temper af's answer by saying that it would probably be the case that "index first, insert after" would be slower than "insert first, index after" where you are inserting records into a table with a clustered index, but not inserting records in the natural order of that index. The reason being that for each insert, the data rows themselves would be have to be ordered on disk.
As an example, consider a table with a clustered primary key on a uniqueidentifier field. The (nearly) random nature of a guid would mean that it is possible for one row to be added at the top of the data, causing all data in the current page to be shuffled along (and maybe data in lower pages too), but the next row added at the bottom. If the clustering was on, say, a datetime column, and you happened to be adding rows in date order, then the records would naturally be inserted in the correct order on disk and expensive data sorting/shuffling operations would not be needed.
I'd back up Winston Smith's answer of "it depends", but suggest that your clustered index may be a significant factor in determining which strategy is faster for your current circumstances. You could even try not having a clustered index at all, and see what happens. Let me know?
Inserting data while indices are in place causes DBMS to update them after every row. Because of this, it's usually faster to insert the data first and create indices afterwards. Especially if there is that much data.
(However, it's always possible there are special circumstances which may cause different performance characteristics. Trying it is the only way to know for sure.)
It will depend entirely on your particular data and indexing strategy. Any answer you get here is really a guess.
The only way to know for sure, is to try both and take appropriate measurements, which won't be difficult to do.

Resources