Partition index to reduce buffer busy waits? - oracle

From time to time our Oracle response times decrease significally for a minute or two, without having extra load.
we were able to identify an insert statement, which produces a lot of buffer busy waits.
From the ADDM report, we got the following hint:
Consider partitioning the INDEX "IDX1" with object
ID 4711 in a manner that will evenly distribute concurrent DML across
multiple partitions.
To be honest: I am not sure what that means. I don't know what a partitioned index is. I only can Image that it means to create a Partition with a local index.
Can you help me out here?
There is a very high frequency of reading and writing to that table. no updates or deletes are used.
Thanks,
E.

I am not sure what that means.
Oracle is telling you that there is a lot of concurrent ("at the same time") activity on a very small part of your index. This happens a lot.
Consider an index column TAB1_PK on table TAB1 whose values are inserted from a sequence TAB1_S. Suppose you have 5 database sessions all inserting into TAB1 at the same time.
Because TAB1_PK is indexed, and because the sequence is generating values in numeric order, what happens is that all those sessions have to read and update the same blocks of the index at the same time.
This can cause a lot of contention -- way more than you would expect, due to the way indexes work with multi-version read consistency. I mean, in some rare situations (depending on how the transaction logic is written and the number of concurrent sessions), it can really be crippling.
The (really) old way to avoid this problem was to use a reverse key index. That way, the sequential column values did not all go to the same index blocks.
However, that is a two-edged sword. On the one hand, you get less contention because you're inserting all over the index (good). On the other hand, your rows are going all over the index, meaning you cannot cache them all. You've just turned a big logical I/O problem into a physical I/O problem!
Nowadays, we have a better solution -- a GLOBAL HASH PARTITION on the index.
With a GHP, you can specify the number of hash buckets and use that to trade-off between how much contention you need to handle vs how compact you want the index updates (for better buffer caching). The more index hash partitions you use, the better your concurrency but the worse your index block buffer caching will be.
I find a number (of global hash partitions) around 16 is pretty good.

Related

Can we reduce High water mark by using Gather stats in oracle?

I am very new to oracle and I have below query.
I have one table which has almost 6 L records.
In daily batch i need to delete almost 5.7 L record and insert it again in from another table. Note that i can not use truncate table because 30000 records are constant one which I should not delete.
Issue here is if I delete daily 5.67 L record, it may cause for High WaterMark issue.
So my query is can Gather Stats helps to reduce HWM?
I can do Oracle Gather Stats Daily.
You can use the SHRINK command to recover the space and reset the high water mark:
alter table your_table shrink space;
However, you should only do this if you need to. In your case it seems likely you need only do this if you are inserting your 567,000 records using the /* APPEND */ hint; this hint tells the optimiser to insert records above the HWM, which in your scenario would cause the table to grow, with vast amounts of empty space. Shrinking would definitely be useful here.
If you're just inserting records without the hint then they will mainly reuse the empty space vacated by the prior deletion, so you don't need to concern yourself with the HWM.
Incidentally, deleting and re-inserting 5.67L records every day sounds rather poor practice. There are probably better solutions (such as MERGE), depending on the underlying business rules you're trying to satisfy.
If you reinsert the same amount of data, the table should be roughly the same size. Therefore, I would not worry about the high water mark too much.
Having said that, it is good practice to gather statistics if the data has changed substantially, so I would recommend that.

Smart chunking from a huge table

I have a huge table in a data warehouse (Vertica). I am accessing this table in chunks for optimization purposes. The way I am deciding my chunks is pretty straightforward. I have a primary key column say A and I take a MAX(A). I have a chunk size of 20000 and I have now created (A/20000)+1 chunks. I frame query for each chunk and retrieve the data .
There problem with this approach is as follows:
My number of chunks is dependent on MAX(A) and MAX(A) is growing very fast and thereby my number of chunks increases with it as well.
I have decided on number 20000 because that is what gives me optimal performance. But distribution of primary key within the chunks of 20000 is so scattered. For example the 0-20000 might contain only 3 elements and range 20000-40000 might contain 500 elements and no ranges come close to 20000.
I am trying to figure whether there are any good approximation algorithm for this problem which minimizes the number of chunks and fill in close to 20000 primary keys in one chunk.
Any pointers towards the solution is appreciated.
I'm not sure what optimization purposes means, but I think the best approach would be to create a timestamp column, or use an eligible timestamp column to partition on. You could then partition on a larger frame of reference so there isn't a wide range between the partitions.
If the table is partitioned, it will be able to benefit from partition pruning. This means that Vertica can eliminate the storage containers during query execution which do not match on the timestamp predicate.
Otherwise, you can look at the segmentation clause and use the max/min from the storage containers. This could be slightly more complicated.

Time for updating a sqlite3 index fluctuates too much

I have a large-ish sqlite3 (3.6.22) database (about 1 GB, 5 million rows) with a single table indexed on one column. The problem is that the time to do a typical INSERT transaction fluctuates widely. I insert about 10000 rows at a time (wrapped in a transaction of course). Often it takes about 1.5 seconds, but about every fifth transaction it suddenly takes several minutes for the very same transaction to complete. I've done a lot of experimentation, and I've discovered that the phenomena only occurs if there is an index, which makes me think it is updating the index which takes a lot of time.
I need more consistent performance. A bit higher average insertion times times would be ok, if I can only avoid that some transactions suddenly takes 200x as long as the previous one... What should I do?
Here's the schema. The strings in blocks.md5 are always exactly 32 bytes long and likely unique. The rolling.value column will contain very large 64-bit integers.
CREATE TABLE blocks (blob char(32) NOT NULL,
offset long NOT NULL,
md5 char(32) NOT NULL,
row_md5 char(32));
CREATE TABLE rolling (value INT NOT NULL);
CREATE INDEX index_md5 ON blocks (md5);
CREATE UNIQUE INDEX index_rolling ON rolling (value);
I don't know exactly how sqlite indexes are implemented, but I'd expect the behavior you describe if they were storing the index on disk or reordering the data.
Imagine a scenario where when they are allocating blocks for the index, they start some page with N slots for data. When the page fills up, they have to allocate another and split the data between them.
When you're inserting your data, the ordering of the MD5 will be as random as it gets, so every page will fill up independently. There isn't any reasonable way for the indexing strategy to know that.
Other databases will even recommend using different indexing strategies than normal for strings, especially in the case of something like random MD5s.
Trying to do this in an all memory database would tell you whether its algorithmic or disk access.
I've only really tried to avoid this in an offline system where I could sort data before inserting. After it was all inserted I would index it and that was as fast as I could find. If you're doing 10k at a time, that might be your use case, though I don't know.

Are there any benefits from partitioning an Oracle on the same hdd?

I'm looking for the solution to improve insert time for concurrent inserts. Will I get any benefits from Oracle partitioning not providing dedicated hardware for every partition?
What is the bottleneck in your current insert process? I'm guessing from the "high concurrency" in your question that you're talking about an OLTP app where there are a large number of single-row inserts rather than a small number of many-row inserts that would be common in a data warehouse.
In an OLTP scenario, it is relatively unlikely that partitioning will decrease the time required to do a single-row insert. Assuming that you've already eliminated the obvious time wasters like triggers on the table, most of the insert overhead is likely to be index maintenance with a bit of I/O for the writes to the redo logs. Partitioning likely wouldn't reduce any of these because in an OLTP environment you generally can't load into a staging table and do a partition exchange which would reduce the index maintenance costs.
Well, like everything else, it depends.
Partitioning can reduce contention and eliminate hot blocks. For example, imagine if you will, a transaction system. If you partitioned by hash across some surrogate customer ID value, each index would be significantly smaller, and potentially less subject to contention and index root splits.
Another solution if you have concurrency problems is the use of reverse-key indexes against "one-legged" indexes - where an indexed sequence-populated column forces continue block-splits. However, using reverse-key indexes prevents range scans from using the index, so beware.
It really depends on what Oracle wait events are part of your critical transaction path. What you're waiting on will generally dictate what solution is appropriate.
So it could help. It could also make the situation worse. Without more information about what's adding wait time - if anything - the internet can't help solve the problem.

DB Index speed vs caching

We have about 10K rows in a table. We want to have a form where we have a select drop down that contains distinct values of a given column in this table. We have an index on the column in question.
To increase performance I created a little cache table that contains the distinct values so we didn't need to do a select distinct field from table against 10K rows. Surprisingly it seems doing select * from cachetable (10 rows) is no faster than doing the select distinct against 10K rows. Why is this? Is the index doing all the work? At what number of rows in our main table will there be a performance improvement by querying the cache table?
For a DB, 10K rows is nothing. You're not seeing much difference because the actual calculation time is minimal, with most of it consumed by other, constant, overhead.
It's difficult to predict when you'd start noticing a difference, but it would probably be at around a million rows.
If you've already set up caching and it's not detrimental, you may as well leave it in.
10k rows is not much... start caring when you reach 500k ~ 1 million rows.
Indexes do a great job, specially if you just have 10 different values for that index.
This depends on numerous factors - the amount of memory your DB has, the size of the rows in the table, use of a parameterised query and so forth, but generally 10K is not a lot of rows and particularly if the table is well indexed then it's not going to cause any modern RDBMS any sweat at all.
As a rule of thumb I would generally only start paying close attention to performance issues on a table when it passes the 100K rows mark, and 500K doesn't usually cause much of a problem if indexed correctly and accessed by such. Performance usually tends to fall off catastrophically on large tables - you may be fine on 500K rows but crawling on 600K - but you have a long way to go before you are at all likely to hit such problems.
Is the index doing all the work?
You can tell how the query is being executed by viewing the execution plan.
For example, try this:
explain plan for select distinct field from table;
select * from table(dbms_xplan.display);
I notice that you didn't include an ORDER BY on that. If you do not include ORDER BY then the order of the result set may be random, particularly if oracle uses the HASH algorithm for making a distinct list. You ought to check that.
So I'd look at the execution plans for the original query that you think is using an index, and at the one based on the cache table. Maybe post them and we can comment on what's really going on.
Incidentaly, the cache table would usually be implemented as a materialised view, particularly if the master table is generally pretty static.
Serious premature optimization. Just let the database do its job, maybe with some tweaking to the configuration (especially if it's MySQL, which has several cache types and settings).
Your query in 10K rows most probably uses HASH SORT UNIQUE.
As 10K most probably fit into db_buffers and hash_area_size, all operations are performed in memory, and you won't note any difference.
But if the query will be used as a part of a more complex query, or will be swapped out by other data, you may need disk I/O to access the data, which will slow your query down.
Run your query in a loop in several sessions (as many sessions as there will be users connected), and see how it performs in that case.
For future plans and for scalability, you may want to look into an indexing service that uses pure memory or something faster than the TCP DB round-trip. A lot of people (including myself) use Lucene to achieve this by normalizing the data into flat files.
Lucene has a built-in Ram Drive directory indexer, which can build the index all in memory - removing the dependency on the file system, and greatly increasing speed.
Lately, I've architected systems that have a single Ram drive index wrapped by a Webservice. Then, I have my Ajax-like dropdowns query into that Webservice for high availability and high speed - no db layer, no file system, just pure memory and if remote tcp packet speed.
If you have an index on the column, then all the values are in the index and the dbms never has to look in the table. It just looks in the index which just has 10 entries. If this is mostly read only data, then cache it in memory. Caching helps scalability and a lot by relieving the database of work. A query that is quick on a database with no users, might perform poorly if a 30 queries are going on at the same time.

Resources