ROWDEPENDENCIES Overhead in Oracle - oracle

I'm experimenting with a change capture architecture for ETL processing that is based on ora_rowscn, and have rebuilt the source tables with ROWDEPENDENCIES to isolate the SCN's to only those rows modified (as opposed to block-level tagging). I'm aware of the extra 6 bytes/row of space overhead, but it is not obvious to me what other impact this would have.
My question: What would be the extra work the RDBMS engine would do with rowdependencies enabled for commits and rollbacks? For my source tables with 100 to 500 rows/block I realize I must be writing 100-500x the number of SCN's (for our typical commits), but are there other side effects I'm missing?

Oracle introduced ROWDEPENDENCIES as part of a set of changes to optimize replication. It seems unlikely that they would have gone ahead if it did impact on performance. Certainly I haven't read of anything.
The inestimable Tom Kyte discusses using ROWDEPENDENCIES in one of his books, without any warnings or caveats (beyond mentioning the six bytes). If there are other gotachas I'm sure he would have said so.

Related

Will Shrinking and lowering the high water mark cause issues in OLTP systems

newb here, We have an old Oracle 10g instance that they have to keep alive until it is replaced. The nightly jobs have been very slow causing some issues. Every other Week there is a large process that does large amounts of DML (deletes, inserts, updates). Some of these tables have 2+ million rows. I noticed that some of the tables the HWM is higher than expected and in Toad I ran a database advisor check that recommended shrinking some tables, but I am concerned that the tables may need the space for DML operations or will shrinking them make the process faster or slower?
We cannot add cpu due to licensing costs
If you are accessing the tables with full scans and have a lot of empty space below the HWM, then yes, definitely reorg those (alter table move). There is no downside, only benefit. But if your slow jobs are using indexes, then the benefit will be minimal.
Don't assume that your slow jobs are due to space fragmentation. Use ASH (v$active_session_history) and SQL monitor (v$sql_plan_monitor) data or a graphical tool that utilizes this data to explore exactly what your queries are doing. Understand how to read execution plans and determine whether the correct plan is being used for your data. Tuning is unfortunately not a simple thing that can be addressed with a question on this forum.
In general, shrinking tables or rebuilding indexes should speed up reads of the table, or anything that does full table scans. It should not affect other DML operations.
When selecting or searching data, all of the empty blocks in the table and any indexes used by the query must still be read, so rebuilding them to reduce empty space and lower the high water mark will generally improve performance. This is especially true in indexes, where space lost to deleted rows is not recovered for reuse.

Use Vertica Database for OLTP data?

Can Vertica Database be used for OLTP data?
And if so what are the pros and cons on doing this?
Looking for a Vertica vs Oracle fight :)Since Oracle license is so costly, would Vertica do it job for a better price ?
thx all
Using Vertica as a transactional database is a bad idea. It's designed to be a data warehousing tool. Essentially, it reads and writes data in an optimized fashion. Lots of transactions? That's not what it is designed to do.
I would recommend that you look into VoltDB. Michael Stonebreaker who is the force behind Vertica founded that company as well. His basic philosophy is that Oracle, SQL Server, et al do not do well for high performance since they are designed to do everything. The future is having databases designed for specific tasks.
So he had some concepts for a data warehousing which became Vertica. For transactional databases, there's VoltDB. Not owned by HP, for the record.
For the record, I haven't used VoltDB. From what I know, it isn't as mature as Vertica is as a solution but it looks like it has a ton of promise.
HP Vertica is a column store database. The nature of the way that data is organised within a column store does not lend itself to rapid writes.
HP Vertica gets around this by having a WOS (Write Optimised Store) and ROS (Read Optimised Store which is file based).
Data is moved out of the WOS into the ROS fairly rapidly and the ROS itself has a "merge up" process that takes small ROS files and merges them together to form larger and therefore more easily scanned files.
If you tried to use Vertica for OLTP then what would happen would be that you'd get loads of ROS containers and possibly hit the default limit of 1024 ROS containers very quickly.
If you fronted the store with some form a queuing mechanism to pass through records in larger batches then this would result in fewer and larger ROS files. It would work but if you wanted to take your OLTP system to be reading very close to its writing activity it would not fit the use case.
The WOS/ROS mechanism is a neat work around for the fundamental performance penalty of writes in a column store DB but fundamentally Vertica is not an OLTP DB but rather a data mart technology that can ingest data in near real time
I think there are different ways to read into this question.
Can you use Vertica as an OLTP database?
First I'll define this question a bit. An OLTP database means the database itself is responsible for the transaction processing, not simply receiving somewhat normalized data.
My answer here is absolutely not, unless perhaps it is a single user database. There is practically no RI, no RI locking, table locks on DELETE/UPDATE, and you're likely to accumulate a delete vector in normal OLTP type usage.
You can work around some of these with some extensive middleware programming (distributed locks, heavy avoidance of DELETE/UPDATE, etc). But why? There are tons of options out there that are not Oracle, don't carry a huge price tag but give you everything you need for OLTP.
Can you use Vertica to ingest and query OLTP data?
Yes, definitely. Best to use Vertica towards its strengths, though. Queries in Vertica tend to have a fair amount of overhead, and you can plow through large amounts of data with ease, even normalized. I would not be using Vertica to primary run point queries, grabbing a few rows here and there. It isn't that you can't, but you can't with the same concurrency as other databases that are meant for this purpose.
TL;DR Use the right tool for the right job. I really love using Vertica, but just because I like to swing a hammer doesn't mean that every problem is a nail.
This question is a little old now but i'll share my experience.
I would not suggest vertica as OLTP unless you very carefully consider your workload.
As mentioned in other answers, Vertica has 2 types of storage. ROS is the Read Optimized Storage and WOS is the Write Optimized Storage. WOS is purely in memory so it performs better for inserts but queries slower as all the small updates need to be queried and unioned. Vertica can handle small loads in theory but in practice it didn't work out very well for us performance wise. Also there are drawbacks to WOS namely being that when the database fails WOS is not necessarily preserved when it rolls back to last good epoch. (ROS isn't either but in practice you lose a lot less from ROS).
ROS is a lot more reliable and gives better read performance but you will never be able to handle more than a certain number of queries without a careful design. Although vertica is horizontally scalable, in practice large tables get segmented across all nodes and therefore queries must run on all nodes. So adding more nodes doesn't mean handling more concurrent queries it just means less work per query. If your tables are small enough to be unsegmented then this might not be an issue for you.
Also worth noting is the OLTP typically implies lots concurrent transactions so you'll need to plan resource pools very carefully. By default vertica has a planned concurrency for the general resource pool of the minimum of number of cores per server or RAM/2GB. Essentially what this value does is determine the default memory allocation PER NODE for a segmented query. Therefore by default vertica will not let you run more queries than cores. You can adjust this value but once you hit a cap on memory theres no much you can do because the memory is allocated per node so adding more nodes doesn't even help. If you hit any errors at all for resource pool memory allocations that is the first config your should look at.
Additionally, Vertica is bad with deletes and updates (which resolve to a delete and an insert in the background) so if these are a regular part of your workload then Vertica is probably a bad choice. Personally we use MySQL for our dimension tables that require deletes/updates and then sync that data periodically into vertica to use for joins.
Personally I use Vertica as an OLTP-ish realtime-ish database. We batch our loads into 5 minute intervals which makes vertica happy in terms of how many/large the inserts are. These batches are inserted using COPY DIRECT so that they avoid WOS entirely (only do this if they are large batches as this forces ROS container creation and can be bad if you do it too often). As many projections as we can have are unsegmented to allow better scale out since this makes queries hit only 1 node and allocate memory on only 1 node. It has worked well for us so far and we load about 5 billion rows a day with realtime querying from our UI.
Up_one - considering the telecom use-case - are you doing CDR or something else?
To answer your original question yes Vertica may be a great fit but it depends on how you are loading the data, how you are doing updates, what your data size is and what your SLA is. I am really familiar in this space because I implemented Vertica at a telecom that I worked for at the time.

Setting workarea_size_policy to manual vs automatic

I am working on a data warehouse system which was upgraded about a year ago to Oracle 10g (now 10.2.0.5).
The database is set up with workarea_size_policy=auto and pga_aggregate_target=1G. Most of the ETL process is written in PL/SQL and this code generally sets workarea_size_policy=manual and sets the SORT_AREA_SIZE and HASH_AREA_SIZE for particular sessions when building specific parts of the warehouse.
The values chosen for the SORT_AREA_SIZE and HASH_AREA_SIZE are different for different parts of the build. These sizes are probably based on the expected amount of data that will be processed in each area.
The problem I am having is that this code is starting to cause a number of ORA-600 errors to occur. It is making me wonder if we should even be overriding the automatic settings at all.
The code that sets the manual settings was written many years ago by a developer who is no longer here. It was probably originally written for Oracle 8 with an amendment for Oracle 9 to set the workarea_size_policy to manual. No one really knows how the values used for HASH_AREA_SIZE and SORT_AREA_SIZE were found. They could be completely inappropriate for all I know.
After that long preamble, I've got a few questions.
How do I know when (if ever) I should be overriding the manual settings with workarea_size_policy=manual?
How do I find appropriate values for HASH_AREA_SIZE, SORT_AREA_SIZE, etc?
How do I benchmark that particular settings are actually providing any sort of benefit?
I'm aware that this is a pretty broad question but help would be appreciated.
I suggest you comment out the manual settings and do a test run only with automatic (dynamic) settings, like PGA_AGGREGATE_TARGET.
Management of Sort and Hash memory areas has improved a lot since Oracle 8!
It's hard to predetermine the memory requirements of your procedures, so the best is to test them with representative volumes of data and see how it goes.
You can then create an AWR report covering the timeframe of the execution of the procedures. There's a section in the report named PGA Memory Advisory. That will tell you if you need more memory assigned to PGA_AGGREGATE_TARGET, based on your current data volumes.
See sample here:
In this case you can clearly see that there's no need to go over the current 103 MB assigned, and you could actually stay at 52 MB without impacting the application.
Depending on the volumes we're talking about, if you can't assign more memory, some Sort or Hash operations might spill to a TEMPORARY tablespace, so make sure you have a properly sized one and possibly spread across as many disks / volumes as possible (see SAME configuration, also here).

What makes Oracle more scalable?

Oracle seems to have a reputation for being more scalable than other RDBMSes. After working with it a bit, I can say that it's more complex than other RDBMSes, but I haven't really seen anything that makes it more scalable than other RDBMSes. But then again, I haven't really worked on it in a whole lot of depth.
What features does Oracle have that are more scalable?
Oracle's RAC architecture is what makes it scalable where it can load balance across nodes and parallel queries can be split up and pushed to other nodes for processing.
Some of the tricks like loading blocks from another node's buffer cache instead of going to disc make performance a lot more scalable.
Also, the maintainability of RAC with rolling upgrades help make the operation of a large system more sane.
There is also a different aspect of scalability - storage scalability. ASM makes increasing the storage capacity very straightforward. A well designed ASM based solution, should scale past the 100s of terabyte size without needing to do anything very special.
Whether these make Oracle more scalable than other RDBMSs, I don't know. But I think I would feel less happy about trying to scale up a non-Oracle database.
Cursor sharing is (or was) a big advantage over the competition.
Basically, the same query plan is used for matching queries. An application will have a standard set of queries it issue (eg get the orders for this customer id). The simple way is to treat every query individually, so if you see 'SELECT * FROM ORDERS WHERE CUSTOMER_ID = :b1', you look at whether table ORDERS has an index on CUSTOMER_ID etc. As a result, you can spend as much work looking up meta data to get a query plan as actually retrieving the data. With simple keyed lookups, a query plan is easy. Complex queries with multiple tables joined on skewed columns are harder.
Oracle has a cache of query plans, and older/less used plans are aged out as new ones are required.
If you don't cache query plans, there's a limit to how smart you can make your optimizer as the more smarts you code into it, the bigger impact you have on each query processed. Caching queries means you only incur that overhead the first time you see the query.
The 'downside' is that for cursor sharing to be effective you need to use bind variables. Some programmers don't realise that and write code that doesn't get shared and then complain that Oracle isn't as fast as mySQL.
Another advantage of Oracle is the UNDO log. As a change is done, the 'old version' of the data is written to an undo log. Other database keep old versions of the record in the same place as the record. This requires VACUUM style cleanup operations or you bump into space and organisation issues. This is most relevant in databases with high update or delete activity.
Also Oracle doesn't have a central lock registry. A lock bit is stored on each individual data record. SELECT doesn't take a lock. In databases where SELECT locks, you could have multiple users reading data and locking each other or preventing updates, introducing scalability limits. Other databases would lock a record when a SELECT was done to ensure that no-one else could change that data item (so it would be consistent if the same query or transaction looked at the table again). Oracle uses UNDO for its read consistency model (ie looking up the data as it appeared at a specific point in time).
Tom Kyte's "Expert Oracle Database Architecture" from Apress does a good job of describing Oracle's architecture, with some comparisons with other rDBMSs. Worth reading.

How to minimize transaction overhead in Oracle?

I have to simultaneously load data into a table and run queries on it. Because of data nature, I can trade integrity for performance. How can I minimize the overhead of transactions?
Unfortunately, alternatives like MySQL cannot be used (due to non-technical reasons).
Other than the general optimization practices that apply to all databases such as eliminating full table scans, removing unused or inefficient indexes, etc., etc., here are a few things you can do.
Run in No Archive Log mode. This sacrifices recoverability for speed.
For inserts use the /*+ APPEND */ hint. This puts data into the table above the high water mark which does not create UNDO. The disadvantage is that existing free space is not used.
On the hardware side, RAID 0 over a larger number of smaller disks will give you the best insert performance, but depending on your usage RAID 10 with its better read performance may provide a better fit.
This said, I don't think you will gain much from any of these changes.
Perhaps I'm missing something, but since in Oracle readers don't block writers and writers don't block readers, what exactly is the problem you are trying to solve?
From the perspective of the sessions that are reading the data, sessions that are doing inserts aren't really adding any overhead (updates might add a bit of overhead as the reader would have to look at data in the UNDO tablespace in order to reconstruct a read-consistent view of the data). From the perspective of the sessions that are inserting the data, sessions that are doing reads aren't really adding any overhead. Of course, your system as a whole might have a bottleneck that causes the various sessions to contend for resources (i.e. if your inserts are using up 100% of the available I/O bandwidth, that is going to slow down queries that have to do physical I/O), but that isn't directly related to the type of operations that the different sessions are doing-- you can flood an I/O subsystem with a bunch of reporting users just as easily as with a bunch of insert sessions.
You want transaction isolation read uncommitted. I don't recommend it but that's what you asked for :)
This will allow you to breach transaction isolation and read uncommitted inserted data.
Please read this Ask Tom article: http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html.
UPDATE: I was actually mistaking, Oracle doesn't really support read uncommitted isolation level, they just mention it :).
How about you try disabling all constraints in your table, then inserting all the data, then enabling them back again?
i.e. alter session set constraints=deffered;
However, if you had not set the constraints in your table to defferable during table creation, there might arise a slight problem.
What kind of performance volumes are you looking at? Are inserts batched or numerous small ones?
Before banging your head against the wall trying to think of clever ways to have good performance, did you create any simple prototypes which would give you a better picture of the out-of-the-box performance? It could easily turn out that you don't need to do anything special to meet the goals.

Resources