I have a table with 300+ columns and hundreds of thousands of records. I need to re-name one of the existing columns.
Is there anything that I need to be worried about? Will this operation have any effect on the explain plans etc ?
Notes:
I am working on a live production database on Oracle 11g.
This column is not being used currently. It's not populated for any of the rows and I am 100% sure none of the existing queries refer to this column.
If "working on a live production database" means that you are going to try to do this without testing in lower environments while people are working, I would strongly caution against that plan.
Existing query plans that involve the table you're doing DDL on will be invalidated so those queries will need to be hard parsed again. That can easily be an expensive operation if there are large numbers of such queries. It is certainly possible that some query plans will change because something else has changed (i.e. statistics are different, settings are different, bind variables are different, etc.) They won't change because of the column name change but the column name change may result in changed plans.
Any queries that you're executing will, obviously, need to use the new name as soon as you rename the column. That generally means that you need to do a coordinated release where you modify the code (including stored procedures) as well as the column name. That, in turn, generally implies that you're doing this as part of a build that includes at least a bit of downtime. You probably could, if you have the enterprise edition, do edition-based redefinition without downtime but that adds complexity to the process and is something that you would absolutely need to test thoroughly before implementing it in prod.
If I have a VIEW with a bunch of INNER JOINs but I query against that VIEW SELECTing only columns that come from the main table, will SQL Server ignore the unnecessary joins in the VIEW while executing or do those joins still need to happen for some reason?
If it makes a different, this is on SQL Server 2008 R2. I know in either case that this is already not a great solution but but I'm attempting to find the lesser of 2 evils.
It might ignore the joins if they don't actually change the semantics. One example of this might be if you have a trusted foreign key constraint between the tables and you are only selecting columns from the referencing table (See example 9 in this article).
You would need to check the execution plan to be sure for your specific case.
If you don't pull fields from those tables, it may be faster to use an EXISTS clause - this will also prevent duplicates from the JOINed table cause dupes in your results.
Even if the optimizer ignores unnecessary joins you should just create another view to handle your particular case. Use and abuse of views (such as this case) can get out of hand and lead to obfuscation, confusion and very significant performance issues.
You might even consider refactoring the view that you're planning on using by having it join a set of "smaller" views to deliver the same data set that it does now... if it makes sense to do that of course.
Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.
We have an Oracle database here that's been around for about 10 years. It's passed through a lot of hands. In the course of those years, it's grown quite large, and there are some interesting anomalies in its design that have me perplexed.
Now, I'm historically a SQL Server developer. I used to steam and fume about the differences between The Microsoft Way(tm) and The Oracle Way(R). Now, I realize, they're just different. I also used to yank my hair out and slam my head against the desk thinking that the people who came before me were blind, deaf mutes jacked up on Jolt and Red Bull, who wrote code in Tourette's.NET.
(Yes, I'm going somewhere.)
As time passed, I realized that neither database platform was inherently better than the other. They're just different. Further, I also realized that the developers who came before me often had compelling reasons for designing and writing things the way they did. Just because I wasn't privy to it didn't make it untrue. Sure, the documentation could have been better, but still.
So here's where all this leads me:
We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?
These guys designed a pretty well-layed out database with lots of primary keys. But they didn't make a lot of use of indexes. When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?
We also avoided foreign key constraints like the plague. Not sure why we would have done that. Is there a reason to avoid them in Oracle? I can see a lot of reasons to use them to enforce data integrity between tables, and we're just not using them. I'm assuming that there's a compelling reason, and I'm just not privy to it.
Finally, is there a compelling reason to avoid the use of triggers (aside from the obvious pitfall that lies in performance hits)? We don't seem to be using those much either.
For the record, we're still using Oracle 9i.
Again, thanks for your patience, everyone. I'm an old Microsoft hand, so bending my brain around the Oracle Way is challenging at times. It's a big beast, with tons to learn, and sometimes, finding that information on the Web is a chore.
Thank His Noodliness for StackOverflow.
Salient Post-Post Points
Historically, we haven't used sequences, except in very rare cases.
Historically, we haven't used stored procedures or functions, except in very rare cases.
There are some references in very old documents to ERWIN. (Thanks to the poster below for bringing it to my memory.) Chances are, the bulk of the design was the product of an ORM, and the natural design flowed from that.
The vast majority of the SQL appears hard-coded in the application, and there's a lot of it.
I'm doing everything in my power to move us away from hard-coded SQL, and to get the SQL into the database where it belongs. But I'm trying to do that in a way that makes sense, is practical, and doesn't break the business in the process. (Read: On new software only.)
We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?
You cannot define two PRIMARY KEY's on one table in Oracle. You can define one PRIMARY KEY and one UNIQUE key on the same column set. I can see no point in such a design.
These guys designed a pretty well-layed out database with lots of primary keys. But they didn't make a lot of use of indexes. When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?
In Oracle, an index cannot be used for RANGE SCANS on something that doesn't constitute a leftmost prefix of this index.
A composite index on (col1, col2, col3) cannot be used to do a plain RANGE SCAN on col2 alone or col3 alone.
We also avoided foreign key constraints like the plague. Not sure why we would have done that. Is there a reason to avoid them in Oracle? I can see a lot of reasons to use them to enforce data integrity between tables, and we're just not using them. I'm assuming that there's a compelling reason, and I'm just not privy to it.
If you make all interaction with the database through a set of well-defined procedures, a MERGE statement can yield far better performance than a FOREIGN KEY with ON DELETE CASCADE. You, though, should be very very careful and get used to this programming paradigma.
Finally, is there a compelling reason to avoid the use of triggers (aside from the obvious pitfall that lies in performance hits)? We don't seem to be using those much either.
I personally don't use triggers at all. Not every business rule can be expressed in terms of cascading inserts or updates, and any two-pass DML operation will lead to mutating tables. If all interaction with the database is done via stored procedures (or packages), triggers become useless.
Using triggers means in fact using SQL statements inside CURSOR loops, which every SQL cheechako knows to be a bad thing.
You don't want to be seen using cursors instead of set-based operations, do you?
FOREIGN KEY's are not as bad as triggers (as long as you don't define CASCADE operations on them), since they just don't let you do wrong things at the expense of some performance loss.
But when your database grows large, you will notice that the rules for integrity checking are far more complex than just verifying that the values being inserted into one table exist in another one.
You will have to check newly inserted values against aggregates, complex joins, etc., and all will checks will imply having a corresponding value in other table, and failing these checks compromises your database integrity just as good as violating the FOREIGN KEY's
So it will turn out that these FOREIGN KEY's are double and triple checked anyway, and there is no point to keep data integrity rules scattered all around the database rather than having them in one place (a stored procedure that is always used for updating the data).
How can the same table belong to two schemas. It doesn't make any sense.
That given there is nothing inherently bad practice in the questions you have asked.
I develop a large .net application with Oracle database and we have an excellent Oracle DBA in our team. We have used Foreign key constraints wherever possible for data integrity. Triggers are used only to get a new value from sequence or for auditing purpose and not for any business logic. We have used multicolumn unique indexes for data integrity and single column non-unique indexes.
"In Oracle, an index cannot be used for RANGE SCANS on something that doesn't constitute a leftmost prefix."
I believe this is not true anymore since Oracle 10g.
"When they did use indexes, they tended to make one large index instead of many distinct indexes. Is there some compelling performance gain to be had from that?"
You create indexes to speed up queries. If you query on "surname = 'Smith' and given_name = 'john'", then it is better to have a single index on (surname, given_name) than two separate indexes.
If no-one is complaining about performance, you probably don't need to worry about indexes.
Lots of primary keys.
We also avoided foreign key constraints.
Avoid the use of triggers.
Sounds like they used an ORM to fetch objects out of the database. That means fewer ultra-complex joins and SELECT statements and more simple SELECTS. It means constraints in the code, not the database. Similarly, "trigger"-like behavior is in the code.
Doesn't sound Oracle-specific. Sounds like the application has an ORM.
A lot of people, including me, don't like triggers because it makes it a lot harder to troubleshoot.
This pretty much sums up my opinion
I did Oracle database design for a large organization, and we used triggers as much as we could due to the fact that we had business rules that had to be enforced when data was coming from several directions (the application's GUI, and SQL scripts used for data migration). The business rules we enforced were pretty simple (date checking, checking for existence of rows in another table, etc...). If we tried to make them to complex, we got the dreaded "mutating table" error, which basically means you're trying to inspect the table that is currently changing. So triggers can be useful in some situations, but can cause headaches.
As far as indexes go, in my opinion it is -very- important to have indexes on the columns that are used for joining tables together. That's an easy way to increase performance.
About the foreign keys: since the database changed hands so much, I wonder if the foreign keys could have been dropped accidentally, somewhere along the line. I used PL-SQL developer and some seemingly-innocent operations (like adding/removing a column I think, but I'm not sure) caused the foreign keys to all be deleted.
They may have avoided using foreign constraints for performance. I'm told it can be very slow. They also make it difficult to bulk load data which may be inaccurate when loaded but will be corrected programatically.
"We have a few tables in the database that have two separate owners. Both owners define identical primary key constraints on the table. This has me perplexed. Why would a table have multiple owners? And why would each owner define separate yet identical primary keys?"
A SQL Server database corresponds more to an Oracle user/schema. So you can have multiple tables in the same Oracle database belonging to different schemas/users. These are DIFFERENT tables (ie with different data inside, and potentially different columns/indexes...).
Sometimes bits of a business want a snaphot of the data (eg at month or year end). Sometimes, before a datafix, a DBA will create a copy of a table (possibly with a different name or in a different schema) just in case the datafix goes horribly wrong.
Either way, where you have copies of a table, one is probably out of date (intentionally).
Assuming that you are not in a data warehousing situation here -
Foreign keys ensure referential integrity and are absolutely vital. I can't think of a situation when you would not want them.
Indexes again are very important tools to ensure query performance.
Not sure why they would define PKs without Indexes - PKs are usually implemented via a unique index.
Using large indexes, I assume you mean indexes that compound multiple columns
Using ERWIN-engineered Oracle database need not result in such a design - so what you have is not an ERWIN artifact.
If I had to hazard a guess - I am thinking the designer was overly, un-necessarily trying to design for performance - he avoided indexes for update performance, he also avoided FK constraints for a similar 'imagined' performance.
Unless the database is being used for a unique kind of application in a very special way, there really is no grounds for omitting FKs, and Indices.
Regarding triggers, other posters have already weighed in - triggers will be useful for capturing business rules in one central-place (same for Stored Procedures - good for encapsulating Business Logic).
I'm thinking of using SqlMetal to auto-generate LinqToSql code for a simple and small database. The database will have only tables with some primary and foreign keys (i.e., no views, stored procedures, functions, etc.). I'd like to do all joins, grouping, sorting, and advanced data manipulation using Linq.
I have experience with LinqToObjects and LinqToXml, but I've never used a proper ORM or LinqToSql.
Some questions:
How steep is the learning curve for SqlMetal/LinqToSql, given my prior experience?
Is SqlMetal reliable for simple databases?
What kind of surprises, if any, might I encounter?
How would I automate my project so that the LinqToSql code gets regenerated every time I build or, better yet, rebuild my project? (I will be using Visual Studio 2008.)
Can you recommend a good tutorial for getting me up to speed quickly on using SqlMetal and LinqToSql?
The answer to your title question is yes it is a fine solution for database with tables only (and to a point can grow when your database grows to include other database constructs like views and stored procedures).
Pretty shallow. The main difference you'll find is that certain methods that work on Linq2Objects don't work for Linq2SQL. You have to be careful when mixing objects and Linq2SQL together. You'll know you hit this when you get an exception saying something to the effect of "The only thing we support is Contains".
Quite.
You'll probably find that in certain situations Linq2SQL doesn't generate the most efficient SQL.
Sometimes things can get tricky with transactions.
There is no update statement. Every time you want to update a set of records you have to select them, update each one and then SubmitChanges().
Unless you're only doing selects you need to have a primary key on your table. Without a primary key you can't delete or update.
Linq2SQL doesn't directly support many-to-many relationships.
A simple solution would be to setup a prebuild task that generates your Linq2SQL DataContexts every time you build.
As to your goal of using Linq2SQL for everything, be ready for some nasty LINQ queries (and nasty debugging cycles). The join syntax is verbose and depending on the complexity of your queries can quickly become harder to maintain than the SQL equivalent. Grouping isn't as easy as GROUP BY.
If you're serious about LINQ, pick up a copy of Linq in Action. It goes over all aspects of Linq and SqlMetal in easy to digest chapters.
Also, if you're interested, you could use SubSonic 3 which uses TT4 templates to generate your models, which is very automated in VS2008. You then use normal Linq queries to access data. This is probably the easiest way to go.