Efficiency of storing an Oracle User Defined Type in an index organized table - oracle

Are there known issues with storing user defined types within index organized tables in Oracle 10G?
CREATE OR REPLACE TYPE MyList AS VARRAY(256) OF NUMBER(8,0);
CREATE TABLE myTable (
id NUMBER(10,0) NOT NULL,
my_list MyList NOT NULL)
CONSTRAINT pk_myTable_id PRIMARY KEY(id))
ORGANIZATION INDEX NOLOGGING;
With this type and table setup, I loaded via insert append ~2.4M records and it took 20G of space at which point I ran out of disk space. Looking at the size of the data types, this seemed to be taking up a lot of space for what was being stored. I then changed the table to be a regular table (not IOT) and I stored 6M+ records in ~7G of storage, added the PK index which took an additional 512M.
I've used IOT many times in the past, but not with a user defined type.
Why is it that the storage requirements when using a UDT and IOT are so high?

AFAICR, Oracle always stores VARRAY's out of row in the IOT's.
I'll now try to find the references in the docs to confirm this.

Related

Fast data migration on the same database

I'm trying to find a way to perform a migration from two tables on the same database. This migration should be as fast as possible in order to minimize the downtime.
To put it on an example lets say I have a person table like so:
person_table -> (id, name, address)
So a person as an Id, a name and an address. My system will contain millions of person registries and it was decided that the person table should be partitioned. To do so, I've created a new table:
partitioned_person_table->(id,name,address,partition_time)
Now this table will contain an extra column called partition_time. This is the partition key for this table since this is a range partition (one partition every hour).
Finally, I need to find a way to move all the information from the person_table to the partitioned_person_table with the best performance.
The first thing I could try is to simply create a statement like:
INSERT INTO partitioned_person_table (id, name, address, partition_time)
SELECT id, name, address, CURRENT_TIMESTAMP FROM person_table;
The problem is that when it comes to millions of registries this might become very slow (also the temporary tablespace might not be able to handle all this information)
My second approach was to use the EXCHANGE PARTITION method. Unfortunetly, I cannot do this because the tables contain diffrent column numbers.
Is there any other way that I can perfom this with the best performance (less downtime) ?
Thank you.
If you can live with the state, that all the current records would be located in one partition (and your INSERT approach suggest that), you may only
1) add a new column partition_time either as NULL or possible with metadata default only - required 12c
2) switch the table to a partitioned table either with online redefinition (if you have no maintainace window, where the table is offline) or with exchange partition otherwise.

Update Index Organized Tables using multiple UPDATE queries (temporary duplicates)

I need to update the primary key of a large Index Organized Table (20 million rows) on Oracle 11g.
Is it possible to do this using multiple UPDATE queries? i.e. Many smaller UPDATEs of say 100,000 rows at a time. The problem is that one of these UPDATE batches could temporarily produce a duplicate primary key value (there would be no duplicates after all the UPDATEs have completed.)
So, I guess I'm asking is it somehow possible to temporarily disable the primary key constraint (but which is required for an IOT!) or alter the table temporarily some other way. I can have exclusive and offline access to this table.
The only solution I can see is to create a new table and when complete, drop the original table and rename the new table to the original table name.
Am I missing another possibility?
You can't disable / drop the primary key constraint from an IOT, since it is a unique index by definition.
When I need to change an IOT like this, I either do a CTAS (create table as) for a new plain heap table, do my maintenance, and then CTAS a new IOT.
Something like:
create table t_temp as select * from t_iot;
-- do maintenance
create table t_new_iot as select * from t_temp;
If, however, you need to simply add or join a new field to the existing key, you can do this in one step by creating the new IOT structure, then populating directly from the old IOT with a query.
Unfortunately, this is one of the downsides to IOTs.
I would recommend following method:
Create new IOT table partitioned by system with single partition
with exactly same structure as current one.
Lock current IOT table to prevent any DML.
insert into new table as select from current table changing PK values in select. This step
could be repeated several times if needed. In this case it's better
to do it in another session to keep lock on original table.
Exchange partition of new table with original table.

Cassandra data model

I am a cassandra newbie trying to see how I can model our current sql data in cassandra. The database stores document metadata that includes document_id, last_modified_time, size_in_bytes among a host of other data, and the number of documents can be arbitrarily large and hence we are looking for a scalable solution for storage and query.
There is a requirement of 2 range queries
select all docs where last_modified_time >=x and last_modified_time
select all docs where size >= x and size <= y
And also a set of queries where docs needs to be grouped by specific metadata e.g.
select all docs where user in (x,y,z)
What is the best practice of designing the data model based on these queries?
My initial thought is to have a table (in Cassandra 2.0, CQL 3.0) with the last_mod_time as the secondary index as follows
create table t_document (
document_id bigint,
last_mod_time bigint ,
size bigint,
user text,
....
primary key (document_id, last_mod_time)
}
This should take care of query 1.
Do I need to create another table with the primary key as (document_id, size) for the query 2? Or can I just add the size as the third item in the primary key of the same table e.g. (document_id, last_mod_time, size). But in this case will the second query work without using the last_mod_time in the where clause?
For the query 3, which is all docs for one or more users, is it the best practice to create a t_user_doc table where the primary key is (user, doc_id)? Or a better approach is to create a secondary index on the user on the same t_document table?
Thanks for any help.
When it comes to inequalities, you don't have many choices in Cassandra. They must be leading clustering columns (or secondary indexes). So a data model might look like this:
CREATE TABLE docs_by_time (
dummy int,
last_modified_time timestamp,
document_id bigint,
size_in_bytes bigint,
PRIMARY KEY ((dummy),last_modified_time,document_id));
The "dummy" column is always set to the same value, and is sued as a placeholder partition key, with all data stored in a single partition.
The drawback to such a data model is that, indeed, all data is stored in a single partition. There is the maximum of 2 billion cells per partition, but more importantly, a single partition never spans nodes. So this approach doesn't scale.
You could create secondary indexes on a table:
CREATE TABLE docs (
document_id bigint,
last_modified_time timestamp,
size_in_bytes bigint,
PRIMARY KEY ((dummy),last_modified_time,document_id));
CREATE INDEX docs_last_modified on docs(last_modified);
However secondary indexes have important drawbacks (http://www.slideshare.net/edanuff/indexing-in-cassandra), and aren't recommended for data with high cardinality. You could mitigate the cardinality issue somewhat by reducing precision on last_modified_time by, say, only storing the day component.

Why a primary key automatically creates a clustered index

When i create a primary key in oracle table, why does it create a 'clustered' index by default. What is the reason for this automatic creation of a clustered index on creation of a primary key? is it just the Oracle designer's preference that he designed oracle in this way?
Oracle will create an index to police an unique constraint where no pre-existing index is suitable. Without the index, Oracle would need to serialize operations (such as a table lock) whenever someone tries to insert or delete a row (or update the PK).
Contrarily to MS-SQL Server, this index is not clustered on heap tables (default table organization), i.e. this index won't change the underlying table structure and natural order. The rows won't be reordered when Oracle creates the index. The index will be a B-tree index and will exist as a separate entity where each entry points to a row in the main table.
Oracle doesn't have clustered index as MS SQL, however indexed-organized tables share some properties with cluster-indexed tables. The PK is an integral part of such tables and has to be specified during creation.
(Oracle also has table clusters, but they are a completely different concept).
Creating Index is basic functionality of Primary key, it is also in SQL Server and MySQL, Clustered Index makes your searches faster.
The Database Engine automatically creates a unique index to enforce the uniqueness of the PRIMARY KEY constraint. If a clustered index does not already exist on the table or a nonclustered index is not explicitly specified, a unique, clustered index is created to enforce the PRIMARY KEY constraint.
Read this:
http://www.sqlskills.com/blogs/kimberly/the-clustered-index-debate-continues/

Oracle Table structure

I have a table in Oracle Database which has 60 columns. Following is the table structure.
ID NAME TIMESTAMP PROERTY1 ...... PROPERTY60
This table will have many rows. the size of the table will be in GBs. But the problem with the table structure is that in future if I have to add a new property, I have to change the schema. To avoid that I want to change the table structure to following.
ID NAME TIMESTAMP PROPERTYNAME PROPERTYVALUE
A sample row will be.
1 xyz 40560 PROPERTY1 34500
In this way I will be able to solve the issue but the size of the table will grow bigger. Will it have any impact on performance in terms on fetching data. I am new to Oracle. I need your suggestion on this.
if I have to add a new property, I have to change the schema
Is that actually a problem? Adding a column has gotten cheaper and more convenient in newer versions of Oracle.
But if you still need to make your system dynamic, in a sense that you don't have to execute DDL for new properties, the following simple EAV implementation would probably be a good start:
CREATE TABLE FOO (
FOO_ID INT PRIMARY KEY
-- Other fields...
);
CREATE TABLE FOO_PROPERTY (
FOO_ID INT REFERENCES FOO (FOO_ID),
NAME VARCHAR(50),
VALUE VARCHAR(50) NOT NULL,
CONSTRAINT FOO_PROPERTY_PK PRIMARY KEY (FOO_ID, NAME)
) ORGANIZATION INDEX;
Note ORGANIZATION INDEX: the whole table is just one big B-Tree, there is no table heap at all. Properties that belong to the same FOO_ID are stored physically close together, so retrieving all properties of the known FOO_ID will be cheap (but not as cheap as when all the properties were in the same row).
You might also want to consider whether it would be appropriate to:
Add more indexes in FOO_PROPERTY (e.g. for searching on property name or value). Just beware of the extra cost of secondary indexes in index-organized tables.
Switch the order of columns in the FOO_PROPERTY PK - if you predominantly search on property names and rarely retrieve all the properties of the given FOO_ID. This would also make the index compression feasible, since the leading edge of the index is now relatively wide string (as opposed to narrow integer).
Use a different type for VALUE (e.g. RAW, or even in-line BLOB/CLOB, which can have performance implications, but might also provide additional flexibility). Alternatively, you might even have a separate table for each possible value type, instead of stuffing everything in a string.
Separate property "declaration" to its own table. This table would have two keys: beside string NAME it would also have integer PROPERTY_ID which can then be used as a FK in FOO_PROPERTY instead of the NAME (saving some storage, at the price of more JOIN-ing).

Resources