Oracle sequence values can be cached on the database side through use of the 'Cache' option when creating sequences. For example
CREATE SEQUENCE sequence_name
CACHE 1000;
will cache up to 1000 values for performance.
My question is whether these values can be cached in the the oracle drivers.
In my application I want to pull back a range of sequence values but don't want to have to go back to the database for each new value. I know Hibernate has similar functionality but I've been unable to find out exactly how it's accomplished.
Any advice would be greatly appreciated.
Thanks,
Matt
No, you can not reserve one batch of numbers in one session (if I understood correctly). Setting correct cache value would very likely make this acceptable from performance perspective.
If you still insist you can can create similar functionality yourself - to be able to reserve at once one range of numbers
As mentioned by igr it seems that the oracle drivers cannot cache sequence values in the java layer.
However, I have got round this by setting a large increment on the sequence and generating key values myself. The increment for a sequence can be set as follows:
CREATE SEQUENCE sequence_name
INCREMENT BY $increment;
In my application, every time sequence.nextval is executed I assume that the previous $increment values are reserved, and can be used as unique key values. This means that the database is hit once for every $increment key values that are generated.
So lets say for example that $increment=5000, and we have a starting sequence value of 1. When sequence.nextval is run for the first time the sequence value is incremented to 5001. I then assume that the values 2..5001 are reserved. Then the 5000 values are used in the application (In my use case, they are used for table primary keys), as soon as they are all used up sequence.nextval is run again to reserve another 5000 values, and the process repeated.
The only real downside I can see to this approach is that there is a tiny risk from someone running ddl to modify the $increment between the time the nextval is run and the $increment is used to generate values. Given this is very unlikely, and in my case the ddl will not be updated at runtime, the solution was acceptable.
I realise this doesn't directly answer the question I posed but hopefully it'll be useful to someone else.
Thanks,
Matt.
I have a table with 300+ columns and hundreds of thousands of records. I need to re-name one of the existing columns.
Is there anything that I need to be worried about? Will this operation have any effect on the explain plans etc ?
Notes:
I am working on a live production database on Oracle 11g.
This column is not being used currently. It's not populated for any of the rows and I am 100% sure none of the existing queries refer to this column.
If "working on a live production database" means that you are going to try to do this without testing in lower environments while people are working, I would strongly caution against that plan.
Existing query plans that involve the table you're doing DDL on will be invalidated so those queries will need to be hard parsed again. That can easily be an expensive operation if there are large numbers of such queries. It is certainly possible that some query plans will change because something else has changed (i.e. statistics are different, settings are different, bind variables are different, etc.) They won't change because of the column name change but the column name change may result in changed plans.
Any queries that you're executing will, obviously, need to use the new name as soon as you rename the column. That generally means that you need to do a coordinated release where you modify the code (including stored procedures) as well as the column name. That, in turn, generally implies that you're doing this as part of a build that includes at least a bit of downtime. You probably could, if you have the enterprise edition, do edition-based redefinition without downtime but that adds complexity to the process and is something that you would absolutely need to test thoroughly before implementing it in prod.
I have a table with many rows.
For testing purpose my colleagues are also using same table. The problem is that some time he is deleting the row which I was testing and some time I.
So is there any way in oracle so I can make some specific rows to be read only so other should not delete and edit that?
Thanks.
There are a number of differnt ways to tackle this problem.
As Sun Tzu said, the best thing would be if you and your colleagues use data sets which do not collide.
For instance perhaps you could each have your own database instance, on local PCs; whether this will suit depends on a number of factors, not the least of which is your licensing arrangements with Oracle. Alternatively, you could have separate schemas in a shared database; depending on your application you may need to you synonyms or special connectioms.
Another approach: everybody builds their own data sets, known as test fixtures. This is a good policy, because testing is only truly valid when it runs against a known state; if we make assumptions regarding the presence or absence of data how valid are our test results? The point is, the tests should clean up after themselves, removing any data created in fixtures and by the running of tests. With this tactic you need to agree ranges of IDs for each team member: they must only use records within their ranges for testing or development work.
I prefer these sorts of approach because they don't really change the way the application works (arguably except using different schemas and synonyms). More draconian methods are available.
If you have Enterprise Edition you can use Row Level Security to protect your records. This is a extension of the last point: you will need a mechanism for identifying your records, and some infrastructure to identify ownership within the session. But in addition to preventing other users rom deleting your data you can also prevent them inserting, updating or even viewing records which are with your range of IDs. Find out more.
A lighter solution is use a trigger as A B Cade suggests. You will still need to identifying your records and who is connected (because presumably from time-to-time you will still want to delete your records.
One last strategy: take your ball home. Get the table in the state you want it and make a data pump export. For extra vindictiveness you can truncate the table at this point. Then any time you want to use the table you run a data pump import. This will reset the table's state, wiping out any existing data. This is just an extreme version of test scripts creating their own data.
You can create a trigger that prevents deleting some specific rows.
CREATE OR REPLACE TRIGGER trg_dont_delete
BEFORE DELETE
ON <your_table_name>
FOR EACH ROW
BEGIN
IF :OLD.ID in (<IDs of rows you dont want to be deleted>) THEN
raise_application_error (-20001, 'Do not delete my records!!!');
END IF;
END;
Of course you can make it smarter - make the if statement rely on user, or get the records IDs from another table and so on
Oracle supports row level locking. you can prevent the others to delete the row, which one you are using. for knowing better check this link.
I have a table with data relating to several moments in time that I have to keep updated. To save space and time, however, each row in my table refers to a given day and hourly and quarter-hourly data for that day are scattered throughout the several columns in that same row. When updating the data for a particular moment in time I, therefore, must choose the column that has to be be updated through some programming logic in my PL/SQL procedures and functions.
Is there a way to dynamically choose the column or columns involved in an update/merge operation without having to assemble the query string anew every time? Performance is a concern and the throughput must be high, so I can't do anything that would perform poorly.
Edit: I am aware of normalization issues. However I still would like to know a good way for choosing the columns to be updated/merged dynamically and programatically.
The only way to dynamically choose what column or columns to use for a DML statement is to use dynamic SQL. And the only way to use dynamic SQL is to generate a SQL statement that can then be prepared and executed. Of course, you can assemble the string in a more or less efficient manner, you can potentially parse the statement once and execute it multiple times, etc. in order to minimize the expense of using dynamic SQL. But using dynamic SQL that performs close to what you'd get with static SQL requires quite a bit more work.
I'd echo Ben's point-- it doesn't appear that you are saving time by structuring your table this way. You'll likely get much better performance by normalizing the table properly. I'm not sure what space you believe you are saving but I would tend to doubt that denormalizing your table structure is going to save you much if anything in terms of space.
One way to do what is required is to create a package with all possible updates (which aren't that many, as I'll only update one field at a given time) and then choosing which query to use depending on my internal logic. This would, however, lead to a big if/else or switch/case-like statement. Is there a way to achieve similar results with better performance?
I'm about to have to rewrite some rather old code using SQL Server's BULK INSERT command because the schema has changed, and it occurred to me that maybe I should think about switching to a stored procedure with a TVP instead, but I'm wondering what effect it might have on performance.
Some background information that might help explain why I'm asking this question:
The data actually comes in via a web service. The web service writes a text file to a shared folder on the database server which in turn performs a BULK INSERT. This process was originally implemented on SQL Server 2000, and at the time there was really no alternative other than chucking a few hundred INSERT statements at the server, which actually was the original process and was a performance disaster.
The data is bulk inserted into a permanent staging table and then merged into a much larger table (after which it is deleted from the staging table).
The amount of data to insert is "large", but not "huge" - usually a few hundred rows, maybe 5-10k rows tops in rare instances. Therefore my gut feeling is that BULK INSERT being a non-logged operation won't make that big a difference (but of course I'm not sure, hence the question).
The insertion is actually part of a much larger pipelined batch process and needs to happen many times in succession; therefore performance is critical.
The reasons I would like to replace the BULK INSERT with a TVP are:
Writing the text file over NetBIOS is probably already costing some time, and it's pretty gruesome from an architectural perspective.
I believe that the staging table can (and should) be eliminated. The main reason it's there is that the inserted data needs to be used for a couple of other updates at the same time of insertion, and it's far costlier to attempt the update from the massive production table than it is to use an almost-empty staging table. With a TVP, the parameter basically is the staging table, I can do anything I want with it before/after the main insert.
I could pretty much do away with dupe-checking, cleanup code, and all of the overhead associated with bulk inserts.
No need to worry about lock contention on the staging table or tempdb if the server gets a few of these transactions at once (we try to avoid it, but it happens).
I'm obviously going to profile this before putting anything into production, but I thought it might be a good idea to ask around first before I spend all that time, see if anybody has any stern warnings to issue about using TVPs for this purpose.
So - for anyone who's cozy enough with SQL Server 2008 to have tried or at least investigated this, what's the verdict? For inserts of, let's say, a few hundred to a few thousand rows, happening on a fairly frequent basis, do TVPs cut the mustard? Is there a significant difference in performance compared to bulk inserts?
Update: Now with 92% fewer question marks!
(AKA: Test Results)
The end result is now in production after what feels like a 36-stage deployment process. Both solutions were extensively tested:
Ripping out the shared-folder code and using the SqlBulkCopy class directly;
Switching to a Stored Procedure with TVPs.
Just so readers can get an idea of what exactly was tested, to allay any doubts as to the reliability of this data, here is a more detailed explanation of what this import process actually does:
Start with a temporal data sequence that is ordinarily about 20-50 data points (although it can sometimes be up a few hundred);
Do a whole bunch of crazy processing on it that's mostly independent of the database. This process is parallelized, so about 8-10 of the sequences in (1) are being processed at the same time. Each parallel process generates 3 additional sequences.
Take all 3 sequences and the original sequence and combine them into a batch.
Combine the batches from all 8-10 now-finished processing tasks into one big super-batch.
Import it using either the BULK INSERT strategy (see next step), or TVP strategy (skip to step 8).
Use the SqlBulkCopy class to dump the entire super-batch into 4 permanent staging tables.
Run a Stored Procedure that (a) performs a bunch of aggregation steps on 2 of the tables, including several JOIN conditions, and then (b) performs a MERGE on 6 production tables using both the aggregated and non-aggregated data. (Finished)
OR
Generate 4 DataTable objects containing the data to be merged; 3 of them contain CLR types which unfortunately aren't properly supported by ADO.NET TVPs, so they have to be shoved in as string representations, which hurts performance a bit.
Feed the TVPs to a Stored Procedure, which does essentially the same processing as (7), but directly with the received tables. (Finished)
The results were reasonably close, but the TVP approach ultimately performed better on average, even when the data exceeded 1000 rows by a small amount.
Note that this import process is run many thousands of times in succession, so it was very easy to get an average time simply by counting how many hours (yes, hours) it took to finish all of the merges.
Originally, an average merge took almost exactly 8 seconds to complete (under normal load). Removing the NetBIOS kludge and switching to SqlBulkCopy reduced the time to almost exactly 7 seconds. Switching to TVPs further reduced the time to 5.2 seconds per batch. That's a 35% improvement in throughput for a process whose running time is measured in hours - so not bad at all. It's also a ~25% improvement over SqlBulkCopy.
I am actually fairly confident that the true improvement was significantly more than this. During testing it became apparent that the final merge was no longer the critical path; instead, the Web Service that was doing all of the data processing was starting to buckle under the number of requests coming in. Neither the CPU nor the database I/O were really maxed out, and there was no significant locking activity. In some cases we were seeing a gap of a few idle seconds between successive merges. There was a slight gap, but much smaller (half a second or so) when using SqlBulkCopy. But I suppose that will become a tale for another day.
Conclusion: Table-Valued Parameters really do perform better than BULK INSERT operations for complex import+transform processes operating on mid-sized data sets.
I'd like to add one other point, just to assuage any apprehension on part of the folks who are pro-staging-tables. In a way, this entire service is one giant staging process. Every step of the process is heavily audited, so we don't need a staging table to determine why some particular merge failed (although in practice it almost never happens). All we have to do is set a debug flag in the service and it will break to the debugger or dump its data to a file instead of the database.
In other words, we already have more than enough insight into the process and don't need the safety of a staging table; the only reason we had the staging table in the first place was to avoid thrashing on all of the INSERT and UPDATE statements that we would have had to use otherwise. In the original process, the staging data only lived in the staging table for fractions of a second anyway, so it added no value in maintenance/maintainability terms.
Also note that we have not replaced every single BULK INSERT operation with TVPs. Several operations that deal with larger amounts of data and/or don't need to do anything special with the data other than throw it at the DB still use SqlBulkCopy. I am not suggesting that TVPs are a performance panacea, only that they succeeded over SqlBulkCopy in this specific instance involving several transforms between the initial staging and the final merge.
So there you have it. Point goes to TToni for finding the most relevant link, but I appreciate the other responses as well. Thanks again!
I don't really have experience with TVP yet, however there is an nice performance comparison chart vs. BULK INSERT in MSDN here.
They say that BULK INSERT has higher startup cost, but is faster thereafter. In a remote client scenario they draw the line at around 1000 rows (for "simple" server logic). Judging from their description I would say you should be fine with using TVP's. The performance hit - if any - is probably negligible and the architectural benefits seem very good.
Edit: On a side note you can avoid the server-local file and still use bulk copy by using the SqlBulkCopy object. Just populate a DataTable, and feed it into the "WriteToServer"-Method of an SqlBulkCopy instance. Easy to use, and very fast.
The chart mentioned with regards to the link provided in #TToni's answer needs to be taken in context. I am not sure how much actual research went into those recommendations (also note that the chart seems to only be available in the 2008 and 2008 R2 versions of that documentation).
On the other hand there is this whitepaper from the SQL Server Customer Advisory Team: Maximizing Throughput with TVP
I have been using TVPs since 2009 and have found, at least in my experience, that for anything other than simple insert into a destination table with no additional logic needs (which is rarely ever the case), then TVPs are typically the better option.
I tend to avoid staging tables as data validation should be done at the app layer. By using TVPs, that is easily accommodated and the TVP Table Variable in the stored procedure is, by its very nature, a localized staging table (hence no conflict with other processes running at the same time like you get when using a real table for staging).
Regarding the testing done in the Question, I think it could be shown to be even faster than what was originally found:
You should not be using a DataTable, unless your application has use for it outside of sending the values to the TVP. Using the IEnumerable<SqlDataRecord> interface is faster and uses less memory as you are not duplicating the collection in memory only to send it to the DB. I have this documented in the following places:
How can I insert 10 million records in the shortest time possible? (lots of extra info and links here as well)
Pass Dictionary<string,int> to Stored Procedure T-SQL
Streaming Data Into SQL Server 2008 From an Application (on SQLServerCentral.com ; free registration required)
TVPs are Table Variables and as such do not maintain statistics. Meaning, they report only having 1 row to the Query Optimizer. So, in your proc, either:
Use statement-level recompile on any queries using the TVP for anything other than a simple SELECT: OPTION (RECOMPILE)
Create a local temporary table (i.e. single #) and copy the contents of the TVP into the temp table
I think I'd still stick with a bulk insert approach. You may find that tempdb still gets hit using a TVP with a reasonable number of rows. This is my gut feeling, I can't say I've tested the performance of using TVP (I am interested in hearing others input too though)
You don't mention if you use .NET, but the approach that I've taken to optimise previous solutions was to do a bulk load of data using the SqlBulkCopy class - you don't need to write the data to a file first before loading, just give the SqlBulkCopy class (e.g.) a DataTable - that's the fastest way to insert data into the DB. 5-10K rows isn't much, I've used this for up to 750K rows. I suspect that in general, with a few hundred rows it wouldn't make a vast difference using a TVP. But scaling up would be limited IMHO.
Perhaps the new MERGE functionality in SQL 2008 would benefit you?
Also, if your existing staging table is a single table that is used for each instance of this process and you're worried about contention etc, have you considered creating a new "temporary" but physical staging table each time, then dropping it when it's finished with?
Note you can optimize the loading into this staging table, by populating it without any indexes. Then once populated, add any required indexes on at that point (FILLFACTOR=100 for optimal read performance, as at this point it will not be updated).
Staging tables are good! Really I wouldn't want to do it any other way. Why? Because data imports can change unexpectedly (And often in ways you can't foresee, like the time the columns were still called first name and last name but had the first name data in the last name column, for instance, to pick an example not at random.) Easy to research the problem with a staging table so you can see exactly what data was in the columns the import handled. Harder to find I think when you use an in memory table. I know a lot of people who do imports for a living as I do and all of them recommend using staging tables. I suspect there is a reason for this.
Further fixing a small schema change to a working process is easier and less time consuming than redesigning the process. If it is working and no one is willing to pay for hours to change it, then only fix what needs to be fixed due to the schema change. By changing the whole process, you introduce far more potential new bugs than by making a small change to an existing, tested working process.
And just how are you going to do away with all the data cleanup tasks? You may be doing them differently, but they still need to be done. Again, changing the process the way you describe is very risky.
Personally it sounds to me like you are just offended by using older techniques rather than getting the chance to play with new toys. You seem to have no real basis for wanting to change other than bulk insert is so 2000.