oracle composite primary key vs index - oracle

I'm designing a table which I has a multiple foreign keys. What I did is create an extra column for primary key which will work more as a correlative, but I could also make the foreign keys as a composite primary key.
So my question is about performance: Is it better (at least for Oracle) to have a composite primary key than a index? What is better for my case?
Thanks!

As #Sylvain_Leroux points out, the term "better" is actually very ambiguous depending on your goals because there are tradeoffs to both approaches.
Ensure Composite Key is Actually Unique
First of all, if you want to use a composite primary key out of the foreign keys, then you must be sure that the combination of the foreign keys will be truly unique for each record. Otherwise, of course, you won't be able to use them as a primary key. If instead you are describing using a composite key made up of the foreign keys plus a surrogate key, that's kind of the worst of both worlds and is generally frowned upon.
ETL Back Room Considerations
The choice you are considering is a common one in OLAP, where a designer must choose whether or not to use a surrogate key for the fact table or a composite key comprised of the keys of the dimension tables. This advice from page 487 of Ralph Kimball's The Data Warehouse Toolkit Third Edition would therefore apply to your situation (you can consider your table as being analogous to what he describes as a fact table, and the foreign keys are for tables that he refers to as dimensions):
Fact table surrogate keys have a number of uses in the ETL back room. First, as previously described, they can be used as the basis for backing out or resuming an interrupted load. Second, they provide immediate and unambiguous identification of a single fact row without needing to constrain multiple dimensions to fetch a unique row. Third, updates to fact table rows can be replaced by inserts plus deletes because the fact table surrogate key is now the actual key for the fact table. Thus, a row containing updated columns can now be inserted into the fact table without overwriting the row it is to replace. When all such insertions are complete, then the underlying old rows can be deleted in a single step. Fourth, the fact table surrogate key is an ideal parent key to be used in a parent/child design. The fact table surrogate key appears as a foreign key in the child, along with the parent's dimension foreign key.
Performance Considerations
From a performance perspective, the records are stored in order by primary key(s) physically on the disk. That makes reads based on queries that use a foreign key (or keys) for lookup faster, but also could mean that writes will be slower if they require inserting records at points other than at the end. This is because the DBMS will have to physically move the records to make room (this is slightly oversimplified because there are some schemes employed by the DBMS to combat this, but they are overwhelmed if the inserts are numerous enough).
If you were to use a surrogate key, the insert problem wouldn't be an issue, but of course in situations where you are looking up by foreign keys, you wouldn't get the advantage of having your data in order physically on the disk. Assuming you would put an index on each foreign key, then that would add some overhead to insert tasks because the DBMS has to update multiple indices.
All of this is only noticeable with large amounts of data and will not make much of a difference for a relatively small amount of data.

Related

What happens to surrogate keys of transactional system , when converting it to dimensioanl schema?

Our OLTP systems use several surrogate keys .Now we want to create a dimensional model for our system for analysis. Should we keep OLTP system surrogate keys and natural keys and also create one more datamart surrogate key? or shall we ignore the OLTP system surrogate key and just keep the natural key from OLTP and datamart surrogate key?
The dimensional model's surrogate keys are specific to the dimensional model, and independent of any source keys you might have. You should definitely keep the natural keys and create a datamart surrogate key, but whether it is useful to also bring in the OLTP system's surrogate key as a back reference depends on whether it is useful in identifying rows back in the OLTP system- i.e. how important is that OLTP surrogate key? Normally I'd stick with just the new surrogate in the dimension and the natural key, but sometimes the surrogate key serves as the natural key too.

Problems with a primary key sequence

When adding new data to a form my primary key sequence increases by 1.
However if i was to delete a data and replace it with new data the sequence would carry on.
So for example my primary keys for data go 1,2,3,4,5,6,10 because of previously deleted rows.
I hope that makes sence.
SEQUENCE values in Oracle are guaranteed to be unique, but you cannot expect the values to form a contiguous sequence without any gaps.
Even if you would never delete any rows from the table, you're likely to see gaps at some point, because sequence values are cached (pre-reserved) between different transactions.
It is a SEQUENCE of numbers, it doesn't care if you have used the "current value" or not.
As opposed to MySQL, in Oracle the Sequence is not tied to a column, but it is a separate object that you can ask a value from (through your_sequence.nextval). To handle the uniqueness, it doesn't take back values and offer them again.
If you always want to have a dense sequence of ID-s even through deletion, you would have to either
rearrange the ID-s (read: change ID-s of the rows newer than the deleted one), or
without knowing your task, I would suggest using the DENSE_RANK analytic function for querying your dataset, and separating the real (in-table) ID-s from the ranking of the rows.

How to sort by counter in Cassandra?

Let's assume I have a forum software, and I would like to sort the threads by the amount of views it has. The views would be stored in a counter.
Having experience in relational databases, I thought this would be simple to solve, turns out it's not. I have thought about creating one massive row with the columns being counters (thus being sorted), but as a single row can only be stored on a single node, this does not seam feasible (beats the point of using Cassandra).
How can I sort by counter column in Cassandra?
You can't sort big data. That's one of the fundamental assumptions.
The only things that you can sort by on cassandra, are the things that cassandra uses to store its data - the row key and the column key.
Moving to NoSQL from normal SQL you have to drop the notion of being able to sort/join data. It's just (generally) not possible in Big Data implementations.
To update on this question:
Korya is correct that you cannot assume that ALL NoSQL of BigData nature cannot sort (MongoDB can sort and it is NoSql).
Regarding to Cassandra itself: you can sort any given elements of your Primary Key AFTER your partition key inside a Composite Key:
Example:
Primary Key ((A),B,C,D);
A is your partition Key.
B,C,D are part of your composite Key, and can now be sorted ASC (default) or DESC. If you want something naturally in latest first (ie time) then you would specify it in your schema:
WITH CLUSTERING ORDER BY (media_type_id ASC,media_id ASC);
As far as the question goes about counters:
You cannot sort the counter inside cassandra because the counter would need to be part of the KEY and the key is unique.
As pointed by Martin the solution refenced by a whitepage example of eBay they explain that two tables are used to keep track.

Are joins on an FK faster than joins without an FK?

Say I have two tables, a and b:
a {
pk as int
fk as int
...
}
b {
pk as int
...
}
I want to join a and b in a query like so:
FROM a
JOIN b on a.fk = b.pk
Which of the following scenarios will be faster?
a.fk is set up to be a foreign key on b.pk - b.pk is indexed
a.fk is set up to be a foreign key on b.pk - b.pk is not indexed
there is no relationship between the tables - b.pk is indexed
there is no relationship between the tables - b.pk is not indexed
Bonus question - how much faster/slower will each of these scenarios be?
If you could back up your answer with a reference then that'd be awesome. Thank you!
Best practice
Foreign Keys are a relational integrity tool, not a performance tool. You should always create indexes on FK columns to reduce lookups. SQL Server does not do this automatically.
As stated here Foreign keys boost performance
Logically, this gives following ranking performance wise
a.fk is set up to be a foreign key on b.pk - b.pk is indexed
there is no relationship between the tables - b.pk is indexed
a.fk is set up to be a foreign key on b.pk - b.pk is not indexed
there is no relationship between the tables - b.pk is not indexed
The performance differnces would be greatest between the indexed and non indexed versions, however whether it would be faster or slower would depend on whether it was a select or an insert. Having indexes and foreign key constraints slow down inserts but speed up selects (the index) or make the data more reliable (the FK). Since generally most inserts are not noticably slowed (unless you are doing large bulk inserts), it is usually in your best interests to have the FK and the index.
I'll ditto Lieven's answer. Just to reply to your bonus question of how much of a performance boost you get from creating an index, the answer is, "That depends".
If one or both tables are small and they are the only two tables in the query, the performance gain might be small to zero. When the number of records is small, sometimes it's faster to just read all the records rather than use the index anyway. The database engine should be smart enough to figure this out -- that's what "query optimization is all about".
Likewise, if you have other tables involved and other selection criteria, the DB engine may decide not to use this index, and that some other way of finding the records is faster.
At the other extreme, if you have two very large tables, creating an index on the field used to join them can cut run time by 99% or more.
That's why it's a good idea to learn to read the explain plans on your DB engine. If a query takes a long time, run the explain plan and see what it's doing. Often, creating a good index can dramatically improve a query.

How to choose and optimize oracle indexes? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I would like to know if there are general rules for creating an index or not.
How do I choose which fields I should include in this index or when not to include them?
I know its always depends on the environment and the amount of data, but I was wondering if we could make some globally accepted rules about making indexes in Oracle.
The Oracle documentation has an excellent set of considerations for indexing choices: http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/data_acc.htm#PFGRF004
Update for 19c: https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/designing-and-developing-for-performance.html#GUID-99A7FD1B-CEFD-4E91-9486-2CBBFC2B7A1D
Quoting:
Consider indexing keys that are used frequently in WHERE clauses.
Consider indexing keys that are used frequently to join tables in SQL statements. For more information on optimizing joins, see the section "Using Hash Clusters for Performance".
Choose index keys that have high selectivity. The selectivity of an index is the percentage of rows in a table having the same value for the indexed key. An index's selectivity is optimal if few rows have the same value. Note: Oracle automatically creates indexes, or uses existing indexes, on the keys and expressions of unique and primary keys that you define with integrity constraints.
Indexing low selectivity columns can be helpful if the data distribution is skewed so that one or two values occur much less often than other values.
Do not use standard B-tree indexes on keys or expressions with few distinct values. Such keys or expressions usually have poor selectivity and therefore do not optimize performance unless the frequently selected key values appear less frequently than the other key values. You can use bitmap indexes effectively in such cases, unless the index is modified frequently, as in a high concurrency OLTP application.
Do not index columns that are modified frequently. UPDATE statements that modify indexed columns and INSERT and DELETE statements that modify indexed tables take longer than if there were no index. Such SQL statements must modify data in indexes as well as data in tables. They also generate additional undo and redo.
Do not index keys that appear only in WHERE clauses with functions or operators. A WHERE clause that uses a function, other than MIN or MAX, or an operator with an indexed key does not make available the access path that uses the index except with function-based indexes.
Consider indexing foreign keys of referential integrity constraints in cases in which a large number of concurrent INSERT, UPDATE, and DELETE statements access the parent and child tables. Such an index allows UPDATEs and DELETEs on the parent table without share locking the child table.
When choosing to index a key, consider whether the performance gain for queries is worth the performance loss for INSERTs, UPDATEs, and DELETEs and the use of the space required to store the index. You might want to experiment by comparing the processing times of the SQL statements with and without indexes. You can measure processing time with the SQL trace facility.
There are some things you should always index:
Primary Keys - these are given an index automatically (unless you specify a suitable existing index for Oracle to use)
Unique Keys - these are given an index automatically (ditto)
Foreign Keys - these are not automatically indexed, but you should add one to avoid performance issues when the constraints are checked
After that, look for other columns that are frequently used to filter queries: a typical example is people's surnames.
From the 10g Oracle Database Application Developers Guide - Fundamentals, Chapter 5:
In general, you should create an index on a column in any of the following situations:
The column is queried frequently.
A referential integrity constraint exists on the column.
A UNIQUE key integrity constraint exists on the column.
Use the following guidelines for determining when to create an index:
Create an index if you frequently want to retrieve less than about 15% of the rows in a large table. This threshold percentage varies greatly, however, according to the relative speed of a table scan and how clustered the row data is about the index key. The faster the table scan, the lower the percentage; the more clustered the row data, the higher the percentage.
Index columns that are used for joins to improve join performance.
Primary and unique keys automatically have indexes, but you might want to create an index on a foreign key; see Chapter 6, "Maintaining Data Integrity in Application Development" for more information.
Small tables do not require indexes; if a query is taking too long, then the table might have grown from small to large.
Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are good candidates for indexing:
Values are unique in the column, or there are few duplicates.
There is a wide range of values (good for regular indexes).
There is a small range of values (good for bitmap indexes).
The column contains many nulls, but queries often select all rows having a value. In this case, a comparison that matches all the non-null values, such as:
WHERE COL_X >= -9.99 *power(10,125)
is preferable to
WHERE COL_X IS NOT NULL
This is because the first uses an index on COL_X (assuming that COL_X is a numeric column).
Columns with the following characteristics are less suitable for indexing:
There are many nulls in the column and you do not search on the non-null values.
Wow, that's just such a huge topic, it's hard to answer in this format. I srtongly recommend this book.
Relational Database Index Design and the Optimizers
by Tapio Lahdenmaki
You don't just use indexes to make table access faster, sometimes you make indexes to avoid table access altogether. Something not mentioned yet but vital.
There's a whole science to this if you really want to make your database perform maximally.
Ah, one specific optimization to Oracle is building reverse key indexes. If you have a PK index of a monoatomically increasing value, like a sequence, and you have highly concurrent inserts and don't plan to range scan that column then make it a reverse key index.
See how specific these optimizations can be?
Look into Database Normalization - you'll find a lot of good, industry standard rules about what keys should exist, how databases should be related, and hints on indexes.
-Adam
Usually one puts the ID columns up front and those usually identify the rows uniquely. A combination of columns can also do the same thing. As an example using cars... tags or license plates are unique and qualify for an index. They (the tags column) can qualify for the primary key. The owners name can qualify for an index if you are going to search on name. make of car really shouldn't get an index in the beginning as it's not going to vary too much. Indexes don't help if the data in the column doesn't vary too much.
Take a look at the SQL - what are the where clauses looking at. Those may need an index.
Measure. What is the issue - pages/queries taking too long ? what's being used for the queries. Create an index on those columns.
Caveats: indexes need time for updates and space.
and sometimes full table scans are quicker than an index. small tables can be scanned quicker than getting the index and then hitting the table. Look at your joins.

Resources