Is there any performance overhead using Oracle Change Notification on a database table having fairly large size and on which 5k-8k operations are performed daily ?
After running it continuously for two days I have found few 'java.lang.IndexOutOfBoundException'
8k operations per day is trivial. My 10 year old laptop can do that. I suspect your java error is not performance/overhead related.
Related
I have one table which are number of rows '7515966' and this table depend on another tables. We create View for generating SSRS reports.
Now size of View is increase so that performance issue occur on report.
We start archiving data for large table. but i can't understand which methodology use please guide us..
Thank you...
Table partitioning in 2012 is only available in Enterprise Edition. See https://msdn.microsoft.com/en-us/library/cc645993(v=sql.110).aspx for details on what's available for each edition.
7million rows is not a lot of rows for SQL Server, we routinely deal with billions of rows. However, as your rows get into the 10s of millions range, you'll probably expose various performance gaps in your system. E.g. are your queries efficiently written so they only touch the rows they need, do you have the right indexes, are statistics up to date, is tempdb optimized, etc...
One common weak link in 9 out of 10 databases (regardless of make) I've worked with is the storage subsystem. Is yours able to keep up with the large data set you need to work with. Storage for databases should be designed and configured based on throughput, concurrency and latency requirements first. Space generally the last thing to worry about once the other requirements, including HA/DR, are met.
If you have deficiencies in your current system, you can pay for the expensive enterprise edition and implement table partitioning but you will likely still suffer performance problems soon after, if not immediately.
I'll need to migrate our main database to a new server and storage subsystem in a couple of weeks. Oracle 11 is currently running on Windows, and we will install a brand new SuSE for it. There will be no other major changes. Memory will be the same, and the server is just a bit newer.
My main concern is with the time it will take to create indexes. Our last experience recreating some indexes took very long, and since then I'm researching how to optimize it.
The current server has 128GB of memory and we're using Oracle ASSM (51GB for SGA and 44GB for PGA), and Spotlight On Oracle reports no physical read activity on datafiles. Everything is cached on memory, and the performance is great. Spotlight also informs that PGA consumption is only 500 MB.
I know my biggest table has 100 million rows, and occupy 15GB. Its indexes, however, occupy 53GB. When I recreate one of these, I can see a lot of write activity in the TEMP tablespace.
So the question is: how can I use all available memory in order to minimize TEMP activity ?
I'm not really sure if this is relevant, but we see an average of 300-350 users connections, and I raised initialization parameters to 700 max sessions.
Best regards,
You should consider setting WORKAREA_SIZE_POLICY to MANUAL for the session that will be doing the index rebuilds, and then setting SORT_AREA_SIZE to a sufficiently large number. (Max is O/S dependent, but 2GB would be a good starting point.)
Also, though you didn't make any mention of it, you should also consider using NOLOGGING to improve performance.
Hope that helps.
I'm running a simple bigQuery over my dataset which is about 84GB of log data.
The query takes approx 110 seconds to complete. Is this normal for a data set of this size?
After further investigation, it looks like your table was heavily fragmented. We usually have a coalesce process running to prevent this situation, but it had been off for a couple of weeks while we were verifying a bug fix. I've restarted the coalescer and run it against your table. Please let me know if you continue to see poor performance.
As a best practice, you may be better off importing somewhat less frequently in larger chunks, or splitting your data into time-based tables. BigQuery isn't really designed to handle high-volume small imports to the same table.
I need to load over 1 billion keys into Berkley DB and therefore I want to tune it in advance to get better performance. With standard configuration it takes me now about 15min to load 1'000'000 keys which is too slow.
Is there a proper way to tune for example the B+Tree of Berkley DB (node size etc...)?
(As an comparision, after tuning tokyo cabinet, it loads 1 billion keys in 25min).
P.S.
I'm looking for tuning tips as a code and not parameters to set for a running system (like jvm size etc...)
I'm curious, when TokyoCabinet loads 1B keys in 25 minutes what are the sizes of the keys/values being stored? What's the I/O systems and the storage system you're using? Are you using the term "load" to mean 1B transactional commits to permanent stable storage? That would be ~666,666 inserts/second, which is physically impossible given any I/O system I'm aware of. Multiply that number times the key and value size and now you're hopelessly beyond physical limits.
Please take a look at Gustavo Duarte's blog, read a bit about I/O systems and how things work in hardware and then review your statement. I'm very interested in finding out what exactly TokyoCabinet is doing and what it isn't doing. If I had to guess I'd say that either it's committing to file-system cache in the operating system, but not flushing (fdsync()-ing) those buffers to disk.
Full Disclosure: I'm a product manager at Oracle for Oracle Berkeley DB (a direct competitor of TokyoCabinet), I've been playing with these databases and the best hardware around for them for about ten years so I'm both biased and skeptical.
Berkeley DB has flags you can set on the transaction handle which mimic this and other similar methods of trading off durability (the "D" in ACID) for speed.
As far as how to make Berkeley DB Java Edition (BDB-JE) faster you can try the following:
Deferred writes: this delays writing
to the transaction log for as long as
possible (when buffers are full, it
flushes the data)
Sort your keys in advance: most
B-Trees (ours included) do much
better with in-order insertions for
fast load times-
Increasing the size of the log
files from the default of 10MiB to
something larger, like 100MiB, this
reduces I/O cost-
It's very important to be clear about claims of performance with databases. They seems simple, but it turns out to be very very tricky to get them right so that they don't ever corrupt data or lose committed transactions.
I hope this helps you a bit.
Bulk inserts on BDB-JE are an order of magnitude faster if you group them into a single transaction. The reason is that each single commit causes (by default) a sync write to disk while a transaction is synchronized on commit. In my application writing 100'000 small keys as single commits tooks more than a minute while in a transaction it takes just a few seconds.
I'm running a fairly substantial SSIS package against SQL 2008 - and I'm getting the same results both in my dev environment (Win7-x64 + SQL-x64-Developer) and the production environment (Server 2008 x64 + SQL Std x64).
The symptom is that initial data loading screams at between 50K - 500K records per second, but after a few minutes the speed drops off dramatically and eventually crawls embarrasingly slowly. The database is in Simple recovery model, the target tables are empty, and all of the prerequisites for minimally logged bulk inserts are being met. The data flow is a simple load from a RAW input file to a schema-matched table (i.e. no complex transforms of data, no sorting, no lookups, no SCDs, etc.)
The problem has the following qualities and resiliences:
Problem persists no matter what the target table is.
RAM usage is lowish (45%) - there's plenty of spare RAM available for SSIS buffers or SQL Server to use.
Perfmon shows buffers are not spooling, disk response times are normal, disk availability is high.
CPU usage is low (hovers around 25% shared between sqlserver.exe and DtsDebugHost.exe)
Disk activity primarily on TempDB.mdf, but I/O is very low (< 600 Kb/s)
OLE DB destination and SQL Server Destination both exhibit this problem.
To sum it up, I expect either disk, CPU or RAM to be exhausted before the package slows down, but instead its as if the SSIS package is taking an afternoon nap. SQL server remains responsive to other queries, and I can't find any performance counters or logged events that betray the cause of the problem.
I'll gratefully reward any reasonable answers / suggestions.
We finally found a solution... the problem lay in the fact that my client was using VMWare ESX, and despite the VM reporting plenty of free CPU and RAM, the VMWare gurus has to pre-allocate (i.e. gaurantee) the CPU for the SSIS guest VM before it really started to fly. Without this, SSIS would be running but VMWare would scale back the resources - an odd quirk because other processes and software kept the VM happily awake. Not sure why SSIS was different, but as I said, the VMWare gurus fixed this problem by reserving RAM and CPU.
I have some other feedback by way of a checklist of things to do for great performance in SSIS:
Ensure SQL login has BULK DATA permission, else data load will be very slow. Also check that the target database uses the Simple or Bulk Logged recovery model.
Avoid sort and merge components on large data - once they start swapping to disk the performance gutters.
Source sorted input data (according to the target table's primary key), and disable non-clustered indexes on target table, set MaximumInsertCommitSize to 0 on the destination component. This bypasses TempDB and log altogether.
If you cannot meet requirements for 3, then simply set MaximumInsertCommitSize to the same size as the data flow's DefaultMaxBufferRows property.
The best way to diagnose performance issues with SSIS Data Flows is with decomposition.
Step 1 - measure your current package performance. You need a baseline.
Step 2 - Backup your package, then edit it. Remove the Destination and replace it with a Row Count (or other end-of-flow-friendly transform). Run the package again to measure performance. Now you know the performance penalty incurred by your Destination.
Step 3 - Edit the package again, removing the next transform "up" from the bottom in the data flow. Run and measure. Now you know the performance penalty of that transform.
Step 4...n - Rinse and repeat.
You probably won't have to climb all the way up your flow to get an idea as to what your limiting factor is. When you do find it, then you can ask a more targeted performance question, like "the X transform/destination in my data flow is slow, here's how it's configured, this is my data volume and hardware, what options do I have?" At the very least, you'll know exactly where your problem is, which stops a lot of wild goose chases.
Are you issuing any COMMITs? I've seen this kind of thing slow down when the working set gets too large (a relative measure, to be sure). A periodic COMMIT should keep that from happening.
First thoughts:
Are the database files growing (without instant file initialization for MDFs)?
Is the upload batched/transactioned? AKA, is it one big transaction?)