My colleague mentioned that our client DBA proposed the removal of all foreign key constraints in our project Oracle DB schema. Initially I did not agree with the decision. I am a developer not a DBA. So later realized that there could be some reasons behind the decision. So I am trying get the pros and cons of this decision.
Proj info:
Spring application with Hibernate persistent.
Oracle 10g DB
There are batch jobs use only SQL-loader or plain JDBC.
Here is my list of pros and cons (Please correct me if I am wrong)
Pros:
Since application persistent is managed by Hibernate, foreign key cascading is not necessary. it is managed by Hibernate with appropriate cascading option.
Hibernate DELETE action(includes delete cascading option) removes the foreign key table records before removing its primary key record (i.e to avoid referential integrity issue). This behavior is same for no-foreign-key case, foreign-key case and foreign-key-with-cascade case. But adding foreign-key will unnecessarily slow down Oracle delete operation.
Cons
Hibernate provides a mechanism for managing association between objects and cascading operations within association. But it never provides complete referential integrity solution that DB has.
Referential integrity is required for those batch jobs use only SQL-loader or plain JDBC.
Guys, I need your advice on this. If anyone of you are a DBA, please provide DBA side reasons.
Thank you.
I have never heard such a proposal from a DBA before! From an application developer, yes, but never from a Database Administrator. It beggars belief.
Tom Kyte has said many times (for example here): applications come and go, but data is forever.
In my own experience, I have worked on Oracle databases that are 20+ years old. They started out in Oracle 6 and got migrated up to 10G or 11g over the years - the same data. But the applications that sat on top? First they were Forms 3.0, then in some cases they got migrated to C++, in some got re-built in Forms 6i, in some rebuilt in Application Express. ADF is another possibility of course; or perhaps a SOA architecture...
What's so special about the current application development tool that it suddenly takes over Oracle's job as the DBMS?
I've worked on databases in projects that decided to drop referential integrity constraints.
We had to write "QC script" to detect orphaned rows with respect to every table relationship (orphaned rows would have been prevented by a foreign key constraint).
Then when (not if) they occured, we had to have policies for how to resolve the orphans. Choices included the following:
Delete orphaned rows.
Archive orphaned rows.
Update any orphaned foreign key values to NULL.
Update any orphaned foreign key values to some existing value in the parent table.
Live with the anomalies. Write more code to exclude orphans from reports. Maybe a set of VIEWs over all the tables?
You might want to schedule a recurring weekly meeting with the stakeholders of this database to review the QC script report, and decided what to do with each of the orphaned rows.
No framework can enforce referential integrity as reliably as constraints that run in the database. Only the database can provide truly atomic changes and ensure consistency.
Since database constraints are guaranteed they can, in some circumstances, allow additional optimizations.
For example, say you have a view
CREATE VIEW orders_vw AS
SELECT ord.order_id, ord.customer_id, lin.product_id
FROM orders ord JOIN order_lines lin on ord.order_id = lin.order_id
Then you have a query that does a SELECT product_id FROM orders_vw WHERE order_id = :val
With the integrity enforced, the database knows that any order_id in order_lines has one row in the parent table and, since no value from the orders table are actually selected, it can save work by not visiting the orders table.
Without the constraint, the database can't be sure that an entry in order_lines has a parent, so it has to do the extra work of visiting the orders table to check it.
Depending on your query patterns, you may find removing constraints actually increases the workload on the DB.
Usually, foreign key removal is what database performance optimization starts with. It's kind of trade-off: you sell guaranteed integrity on DBMS level and have to manage it yourself (which is fairly easy with Hibernate but requires to be very accurate in plain SQL), and you get increased query performance since foreign key checks in queries are quite expensive.
Related
I have inherited a datababase with tables that lack primary keys. It's an OLTP database. One of the tables in question has ~300k records, and has no primary key implemented, even though examining the rest of the schema tells me one column is used AS a primary key, ie being replicated in another table, with identical name, etc. ie. This is not an 'end of line' table
This database also does not implement FKs.
My question is - is there ANY valid reason for a table (in Oracle for that matter) NOT to have a primary key?
I think PK is mandatory for almost all cases. Lots of reasons will exist but I'll treat some of them.
prevent to insert duplicate rows
rows will be referenced, so it must have a key for it
I saw very few cases make tables without PK (e.g. table for logs).
Not specific to Oracle but I recall reading about one such use-case where mysql was highly customized for a dam (electricity generation) project, I think. The input data from sensors were in the order 100-1000 per second or something. They were using timestamps for each record so didn't need a primary key (like with logs/logging mentioned in another answer here).
So good reasons would be:
Overhead, in the case of high frequency transactions
Necessity or Un-necessity in that case
"Uniqueness" maintained or inferred by application, not by db
In a normalized table, if every record needs to be unique and every field is referenced in other tables, then having a PK additionally adds an index overhead and if the PK would never actually be used in any SQL query (imho, I disagree with this but it's possible). But it should still have a unique index encompassing all the fields.
Bad reasons are infinite :-)
The most frequent bad reason which is actually responsible for the lack of a primary key is when DBs are designed by application/code-developers with little or no DB experience, who want to (or think they should) handle all data constraints in the application.
Any valid reason? I'd say "No"--I'm a database guy--but there are places that insist on using the database as a dumb data store. They usually implement all integrity "constraints" in application code.
Putting integrity constraints into application code isn't usually done to improve performance. In fact, if you built one database that enforces all the known constraints, and you built another with functionally identical constraints only in application code, the first one would almost certainly run rings around the second one.
Instead, application-level constraints usually hope to increase flexibility. (And, in the process, some of the known constraints are usually dropped, which appears to improve performance.) If it becomes inconvenient to enforce certain constraints in order to bulk load some scruffy data, an application programmer can just side-step the application-level constraints for a little while, then clean up the data when it's more convenient.
I'm not a db expert but I remember a conversation with a friend who worked in the Oracle apps dept. who told me that this was done to handle emergencies. If there was a problem in some report being generated which you could fix by putting in a row, db level constraints often stand in your way. They generally implemented things like unique primary keys in the application rather than the database. It was inefficient but enough and for them and much more manageable in case of a disaster recovery scenario.
You need a primary key to enforce uniqueness for a subset of its columns (useful if you need to refer to individual rows). It also speeds up certain queries because of the index associated to it.
If you do not need that index, or that uniqueness constraint, then you may not need a primary key (the index does not come free).
An example that comes to mind are logging tables, that just record some data (that is never updated or queried for individual records).
There is a small overhead when inserting to a table with an index and you need an index if you have a primary key. Downside of course is that finding a row is very costly.
We have an application that generates some temporary tables and then processes the data. I dont really have control of the way the application creates this and the subsequent queries involved. What we have noticed is that Oracle uses a full table scan instead of using the index which is the primary key of the tables. If it used the primary key index the process would run a whole lot faster.
Since I do not have control over the select queries generated by the application I cannot use hints and force Oracle to use primary key index. Is there any other setting I could change somewhere that could force Oracle to use primary key index for the temporary tables?
The two most common reasons for a query not using indexes are:
It's quicker to do a full table scan.
Poor statistics.
If your queries are selecting all of the table or doing joins without mentioning a primary key in the where clause etc., chances are it's quicker to do a full scan. Without the query and indexes, and preferably an explain plan as well it's impossible to tell for certain.
I would, however, recommend that you ask your DBA to re-gather - I hope, if not gather for the first time - statistics on the table. Use dbms_stats.gather_table_stats, with an estimate percentage of 25%+.
If the tables are re-created each time the application is run then try and gather statistics after creation and primary key generation. If they are truncated and re-filled each time, then ask your DBA to rebuild them and the PK and then gather statistics as this could significantly increase query runtime.
With no control over anything I don't see how you can improve the query time any other way.
You can use hints without changing SQL by leveraging SQL Profiles. Wrap your hint(s) into a SQL Profile that takes effect for that particular SQL ID.
I understand you don't have control over SQL, I have many apps where I encounter the same restriction. After checking query structure and statistics as in Ben's post and you have proved that hinting to use the index will improve performance why not try a manually created SQL profile.
Christian Antognini has a great paper here about SQL Profiles and creating them manually. The paper mentions creating SQL Profiles manually is undocumented. I would agree undocumented, but that doesn't necessarily mean unsupported. I would say there is little documentation out there, but if you want proof that Oracle allows manual creation, check the API or look at the coe_xfr_sql_profile.sql file in the SQLT utility directory.
I also posted a cheatsheet on how to quickly manually create a SQL Profile here.
not sure if the subject entirely conveys what I'm trying to achieve, but let me explain:
We are building an application that uses Oracle as storage backend. Each year, last years dataset will be "Archived", and a new instance created and populated from scratch.
What are the options to do this within the same schema?
Keep version information on a record level (we presume this will be too slow for our use-case).
Keep version information on a table level, so for each new version, we will re-create all the tables but with a new version prefix. (We like this solution, since we can do it all in code).
?
Is there not something like partitions/personalities/namespaces available that will allow us to achieve this in Oracle?
My oracle experience is rather limited, any assistance will be greatly appreciated!
The RDBMS conceptual model is not very good at maintaining temporal versions of data. So it is not just Oracle which is lacking in this regard.
I am unclear why you think keeping version information at the record level will be too slow. Too slow in creating a new version? Or too slow where it comes to data retrieval during regular operations?
Here is how you could do it. Given a table CUSTOMERS with a business key of CUSTOMER_REF I might normally build it like this (I am using abbreviated syntax rather than best practice for reasons of space):
create table customers
( id number not null primary key
, customer_ref number not null unique key
, name varchar2(30) not null )
/
The versioned equivalent would look like this:
create table customers
( id number not null primary key
, customer_ref number not null
, version_number number
, name varchar2(30) not null
, constraint whatever unique (customer_ref, version_number) )
/
This works by keeping the current version of VERSION_NUMBER null, and only populating it at archival time. Any lookup is going to have to include and version_number is null. This will be a bit of a pain and you may need to include the column in any additional indexes you build.
Obviously maintaining all versions of the records in the same table will increase the size of your tables, which might have an effect on performance. Oracle's Partitioning option can definitely help here. It also would give you a neat way of creating next year's set of data. However, it is a chargeable extra on top of the Enterprise License, so it is an expensive option. Find out more..
The most time consuming aspect of this will be managing foreign key relationships in the new version of the table. Presuming you choose to use synthetic primary keys, the archival process will have to generate new IDs and then painstakingly cascade them to their dependent records in the new versions of referencing foreign keys.
Thinking about this makes discreet tables for each version seem very attractive. For ease of use I would keep the current version un-prefixed, so that archiving becomes a process simply of
create table customers_n as select * from customers;
You might want to avoid downtime while creating the versioned tables. In that case you could use materialized views to capture the tables' state during the run-up to the archival switchover. When the clock strikes twelve you can switch off the refresh. (caveat: this is thinking on the fly, I have never done anything like this so try before you buy.)
One pertinent advantage of multiple tables (and Partitioning) is that you can move the archived records to a READ ONLY tablespace. This not only preserves them from unwanted change, it also means you can exclude them from subsequent backups.
edit
I notice you have commented that the archived data can occasionbally be amended. In taht case moving it to READ ONLY tablespaces is not a go-er.
The only thing I wil add to what APC said is regarding your asking for "namespaces".
A namespace in Oracle is a schema, whereby you can have the same object name(s) in each schema.
Of course this all depends on how your app must access multiple versions, but I would lean towards a different schema for each year before I would use some sort of naming convention to maintain versions of tables in the same schema. The reason is, eventually you will have a nightmares. At least with different schemas, all DDL can be the same, all references to objects will be the same, and tools like ER modellers and query tools will work within the context of that schema. Data models change, so at some point you may need to run some compare tools, and if all your tables are named funky with some sort of version postfix, that won't work well.
Add a schema can be copied / moved with export or data pump quickly using the fromuser/touser or remap_schema options, so you won't need much code, except to do any cleanup of last years data out of the new version.
I find schemas are very useful as "containers" and most apps I host only have schema level privileges, so I'm guaranteed the app can be easily and quickly moved from instance to instance, or multiple copies of the app can be hosted side-by-side on the same instance.
Might the schema change between years. For example, in 2010 you have fifteen columns but in 2011 you add a sixteenth.
If so, will the same application work on both 2010 and 2011 data.
If the schema is static, I'd go for table with a 'YEAR' column and use VPD/RLS/FGAC to apply a YEAR = '2010' predicate.
I'd only worry about partitioning if performance was a problem.
1) Interval partition it by year and some date field in the row.
2) Add it at the end of each table and populate it with a sequence and trigger.
3) Then partition by interval year on this col.
We have an Oracle database here that's been around for about 10 years. It's passed through a lot of hands. In the course of those years, it's grown quite large, and there are some interesting anomalies in its design that have me perplexed.
Now, I'm historically a SQL Server developer. I used to steam and fume about the differences between The Microsoft Way(tm) and The Oracle Way(R). Now, I realize, they're just different. I also used to yank my hair out and slam my head against the desk thinking that the people who came before me were blind, deaf mutes jacked up on Jolt and Red Bull, who wrote code in Tourette's.NET.
(Yes, I'm going somewhere.)
As time passed, I realized that neither database platform was inherently better than the other. They're just different. Further, I also realized that the developers who came before me often had compelling reasons for designing and writing things the way they did. Just because I wasn't privy to it didn't make it untrue. Sure, the documentation could have been better, but still.
So here's where all this leads me:
We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?
These guys designed a pretty well-layed out database with lots of primary keys. But they didn't make a lot of use of indexes. When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?
We also avoided foreign key constraints like the plague. Not sure why we would have done that. Is there a reason to avoid them in Oracle? I can see a lot of reasons to use them to enforce data integrity between tables, and we're just not using them. I'm assuming that there's a compelling reason, and I'm just not privy to it.
Finally, is there a compelling reason to avoid the use of triggers (aside from the obvious pitfall that lies in performance hits)? We don't seem to be using those much either.
For the record, we're still using Oracle 9i.
Again, thanks for your patience, everyone. I'm an old Microsoft hand, so bending my brain around the Oracle Way is challenging at times. It's a big beast, with tons to learn, and sometimes, finding that information on the Web is a chore.
Thank His Noodliness for StackOverflow.
Salient Post-Post Points
Historically, we haven't used sequences, except in very rare cases.
Historically, we haven't used stored procedures or functions, except in very rare cases.
There are some references in very old documents to ERWIN. (Thanks to the poster below for bringing it to my memory.) Chances are, the bulk of the design was the product of an ORM, and the natural design flowed from that.
The vast majority of the SQL appears hard-coded in the application, and there's a lot of it.
I'm doing everything in my power to move us away from hard-coded SQL, and to get the SQL into the database where it belongs. But I'm trying to do that in a way that makes sense, is practical, and doesn't break the business in the process. (Read: On new software only.)
We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?
You cannot define two PRIMARY KEY's on one table in Oracle. You can define one PRIMARY KEY and one UNIQUE key on the same column set. I can see no point in such a design.
These guys designed a pretty well-layed out database with lots of primary keys. But they didn't make a lot of use of indexes. When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?
In Oracle, an index cannot be used for RANGE SCANS on something that doesn't constitute a leftmost prefix of this index.
A composite index on (col1, col2, col3) cannot be used to do a plain RANGE SCAN on col2 alone or col3 alone.
We also avoided foreign key constraints like the plague. Not sure why we would have done that. Is there a reason to avoid them in Oracle? I can see a lot of reasons to use them to enforce data integrity between tables, and we're just not using them. I'm assuming that there's a compelling reason, and I'm just not privy to it.
If you make all interaction with the database through a set of well-defined procedures, a MERGE statement can yield far better performance than a FOREIGN KEY with ON DELETE CASCADE. You, though, should be very very careful and get used to this programming paradigma.
Finally, is there a compelling reason to avoid the use of triggers (aside from the obvious pitfall that lies in performance hits)? We don't seem to be using those much either.
I personally don't use triggers at all. Not every business rule can be expressed in terms of cascading inserts or updates, and any two-pass DML operation will lead to mutating tables. If all interaction with the database is done via stored procedures (or packages), triggers become useless.
Using triggers means in fact using SQL statements inside CURSOR loops, which every SQL cheechako knows to be a bad thing.
You don't want to be seen using cursors instead of set-based operations, do you?
FOREIGN KEY's are not as bad as triggers (as long as you don't define CASCADE operations on them), since they just don't let you do wrong things at the expense of some performance loss.
But when your database grows large, you will notice that the rules for integrity checking are far more complex than just verifying that the values being inserted into one table exist in another one.
You will have to check newly inserted values against aggregates, complex joins, etc., and all will checks will imply having a corresponding value in other table, and failing these checks compromises your database integrity just as good as violating the FOREIGN KEY's
So it will turn out that these FOREIGN KEY's are double and triple checked anyway, and there is no point to keep data integrity rules scattered all around the database rather than having them in one place (a stored procedure that is always used for updating the data).
How can the same table belong to two schemas. It doesn't make any sense.
That given there is nothing inherently bad practice in the questions you have asked.
I develop a large .net application with Oracle database and we have an excellent Oracle DBA in our team. We have used Foreign key constraints wherever possible for data integrity. Triggers are used only to get a new value from sequence or for auditing purpose and not for any business logic. We have used multicolumn unique indexes for data integrity and single column non-unique indexes.
"In Oracle, an index cannot be used for RANGE SCANS on something that doesn't constitute a leftmost prefix."
I believe this is not true anymore since Oracle 10g.
"When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?"
You create indexes to speed up queries. If you query on "surname = 'Smith' and given_name = 'john'", then it is better to have a single index on (surname, given_name) than two separate indexes.
If no-one is complaining about performance, you probably don't need to worry about indexes.
Lots of primary keys.
We also avoided foreign key constraints.
Avoid the use of triggers.
Sounds like they used an ORM to fetch objects out of the database. That means fewer ultra-complex joins and SELECT statements and more simple SELECTS. It means constraints in the code, not the database. Similarly, "trigger"-like behavior is in the code.
Doesn't sound Oracle-specific. Sounds like the application has an ORM.
A lot of people, including me, don't like triggers because it makes it a lot harder to troubleshoot.
This pretty much sums up my opinion
I did Oracle database design for a large organization, and we used triggers as much as we could due to the fact that we had business rules that had to be enforced when data was coming from several directions (the application's GUI, and SQL scripts used for data migration). The business rules we enforced were pretty simple (date checking, checking for existence of rows in another table, etc...). If we tried to make them to complex, we got the dreaded "mutating table" error, which basically means you're trying to inspect the table that is currently changing. So triggers can be useful in some situations, but can cause headaches.
As far as indexes go, in my opinion it is -very- important to have indexes on the columns that are used for joining tables together. That's an easy way to increase performance.
About the foreign keys: since the database changed hands so much, I wonder if the foreign keys could have been dropped accidentally, somewhere along the line. I used PL-SQL developer and some seemingly-innocent operations (like adding/removing a column I think, but I'm not sure) caused the foreign keys to all be deleted.
They may have avoided using foreign constraints for performance. I'm told it can be very slow. They also make it difficult to bulk load data which may be inaccurate when loaded but will be corrected programatically.
"We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?"
A SQL Server database corresponds more to an Oracle user/schema. So you can have multiple tables in the same Oracle database belonging to different schemas/users. These are DIFFERENT tables (ie with different data inside, and potentially different columns/indexes...).
Sometimes bits of a business want a snaphot of the data (eg at month or year end). Sometimes, before a datafix, a DBA will create a copy of a table (possibly with a different name or in a different schema) just in case the datafix goes horribly wrong.
Either way, where you have copies of a table, one is probably out of date (intentionally).
Assuming that you are not in a data warehousing situation here -
Foreign keys ensure referential integrity and are absolutely vital. I can't think of a situation when you would not want them.
Indexes again are very important tools to ensure query performance.
Not sure why they would define PKs without Indexes - PKs are usually implemented via a unique index.
Using large indexes, I assume you mean indexes that compound multiple columns
Using ERWIN-engineered Oracle database need not result in such a design - so what you have is not an ERWIN artifact.
If I had to hazard a guess - I am thinking the designer was overly, un-necessarily trying to design for performance - he avoided indexes for update performance, he also avoided FK constraints for a similar 'imagined' performance.
Unless the database is being used for a unique kind of application in a very special way, there really is no grounds for omitting FKs, and Indices.
Regarding triggers, other posters have already weighed in - triggers will be useful for capturing business rules in one central-place (same for Stored Procedures - good for encapsulating Business Logic).
I'm working with a database and I want to start using LINQ To SQL with it. The database doesn't have any FKs inside of it right now for performance reasons. We are inserting millions of rows at a time to the DB which is why there aren't any FKs.
So I'm thinking I'm going to add nonenforced FKs to the database to describe the relationships between the tables for my LINQ To SQL but I don't want there to be a performance hit by adding nonenforced foreign keys.
Does anyone know what the effect of this might be?
Update: I'm using LINQ-To-SQL for the nonperformance intesive stuff. 80% of the data access is through stored procs on production. But for writing unit tests and other non performance critical tasks, LINQ-To-SQL makes data access really easy.
Update: Here is how you add a nonenforced FK
ALTER TABLE [dbo].[ACI] WITH NOCHECK ADD CONSTRAINT [FK_ACI_CustomerInformation] FOREIGN KEY([ACIOI])
REFERENCES [dbo].[CustomerInformation] ([ACI_OI])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[ACI] NOCHECK CONSTRAINT [FK_ACI_CustomerInformation]
GO
The answer can be different for different environments (data/logs on same drive, tempdb on same drive, lots of cache vs little, etc) so the best way to find this out is to benchmark. Create two identical databases, one with fk's and one without. Do your normal million-row-load into each database, and measure your transactions per second. That way you'll know for sure in your own environment.
Foreign keys will create non-clustered indexes in your table, which will improve performance of joins on foreign keys.
Extra indexes will decrease the performance of your insert/update/delete/merge statements and will increase table sizes.
http://msdn.microsoft.com/en-us/library/ms191195.aspx
Even when created with NOT FOR REPLICATION the indexes are still present and SQL Server will need to maintain them.
In your case I would either:
- use foreign keys and take performance hit
or
- not use foreign keys in production (goodbye data integrity) and run my tests against a copy of production database for which I would create foreign keys.
It may have some impact, especially at those volumes.
However I would test this on a similiar system first, so you can measure the impact, if any.
To be honest though, I would probably use hand written stored procedures for this, so you can optimize them as required, instead of using LINQ to SQL.
I realize this is an old question, but I want to comment on how bad a practice it is to create a FK that is not enforced on existing data. If in fact there is a need for a foreign key, you need to fix any bad data before adding the foreign key (which should have been added at design time) not try to ignore it. All you are doing is masking your very serious data integrity problem by refusing to notice it and do something about it. There is the occasional need to do this due to changed requirements, but it should not be considered as a first choice of techniques when adding a foreign key to a table that has data. Finding and fixing the bad data should be.
Data that has no relationship to the PK is useless. If I had a order table with a customer id that no longer existed in the customer table, how would I know who ordered the product? Of course this is why the FKs should have been enforced from the beginning whether you did million row inserts or not. I do multi-million row inserts through SSIS on a daily basis to many many tables that have foreign keys, to use this as a reason for not setting them up in the first place indicates a lack of understanding of database design. Sacrificing your data integrity to speed is ALWAYS a poor idea. Without data integrity, your database is unreliable and therfore useless.