Oracle table and index partitioning - risk and disadvantages - oracle

I have large unpartitioned tables in the database (100GB+), and to be able to improve performance I think about partitioning them, or maybe just indexes. Data comes in on regularly basis, and is selected by dates, so I think range partitioning by month of creation date would be good opion.
I am reading about oracle table and index partitioning, and it look quite promising.
But I have two questions, for which I can not find answers (I think my google skills are going down).
First one is:
What are risk and disadvantages of creating partitioned tables and indexes in oracle, in particular on such large and alive tables? Is there something that I should know about?
Second:
How to create partition on existing and unpartitioned table or index?

Besides the outage (see below) needed to partition your data, the main risk I see is that if you decide to partition your table and indexes, with local indexes, your performance will not be great for queries not relying on the partition key (date). But you can use global indexes in that case, and go back to similar performances.
The simplest way to create a partitioned table from an unpartitioned one, by far, is to use create table as select with a new name and all the partition storage detail, delete the unpartitioned table and renamed the new table as the old one. Obviously, this requires careful preparation, and an outage that can last a few minutes :)

Related

Oracle partitioning recommendations

Due to being locked down by Corona, I don't have easy access to my more knowledgeable colleagues, so I'm hoping for a few possible recommendations here.
We do quarterly and yearly "freezes" of a number of statistical entities with a large number (1-200) of columns. Everyone then uses these "frozen" versions as a common basis for all statistical releases in Denmark. Currently, we simply create a new table for each version.
There's a demand to test if we can consolidate these several hundred tables to 26 entity-based tables to make programming against them easier, while not harming performance too much.
A "freeze" is approximately 1 million rows and consists of: Year + Period + Type + Version.
For example:
2018_21_P_V1 = Preliminary Data for 2018 first quarter version 1
2019_41_F_V2 = Final Data for 2019 yearly version 2
I am simply not very experienced in the world of partitions. My initial thought was to partition on Year + Period and Subpartiton on Type + Version, but I am no longer sure this is the right approach, nor do I have a clear picture of which partitioning type would solve the problem best.
I am hoping someone can recommend an approach as it would help me tremendously and save me a lot of time "brute force" testing a lot of different combinations.
Based on your current situation which you explained I highly recommend that "USE THE PARTITIONING". No doubt.
It's highly effective and easy to use. You can read Oracle documentation about partitioning or search on the web for that to understand how to start.
In general, when you partition a table, Oracle looks at each partition as a separate table so don't worry about the speed of fetching data.
The most important step is to choose the best field(s) to establish your partitions based on. I used the date format (20190506) in number or int data type for my daily basis. Or (201907) for a monthly basis. You should design and test it.
The next is to decide about the sub-partitions. In some cases, you don't really need one. It depends on your data structure and your expectations from the data. What do you want to do with the data? Which fields are more important? (used in where clause, ...)
Then make some index(es) for each partition. Very important.
Another important point is that using partitions may have some changes in the way you code in pl/sql. For example, you can not use 2 or more partitions in a single query at the same time. You should select and fetch data from different partitions one by one.
And don't worry about 1 million records. I used partitioning for tables way larger than this and it works fine.
Goodluck

How does oracle manage a hash partition

I understand the concept of range partitioning. If i have a date column and i partition on that column based on month, then if my query has a where clause just filtering for a month, then i can hit a particular partition and get my data, without hitting the full table.
In Oracle docs i read that if a logical partitioning like 'month' is not available,(e.g, you partition on a column called customer id) ,then use a hash partitioning. So how will this work? Oracle will randomly divide the data and assign it to different partitions and assign a hash code to each partition?
But in this situation, when new data comes in, how does oracle know in which partition to put the new data? And when i query data, it seems there is no way to avoid hitting multiple partitions?
"how does oracle know in which partition to put the new data?"
From the documentation
Oracle Database uses a linear hashing algorithm and to prevent data
from clustering within specific partitions, you should define the
number of partitions by a power of two (for example, 2, 4, 8).
As for your other question ...
"when i query data, it seems there is no way to avoid hitting multiple
partitions?"
If you're searching for a single Customer ID then no. Oracle's hashing algorithm is consistent, so records with the same partition key end up in the same partition (obviously). But if you are searching for, say, all the new customers from the last month then yes. Oracle's hashing algorithm will strive to distribute records evenly so the latest records will be spread across the whole table.
So the real question is, why do we choose to partition a table? Performance is often the least compelling reason to partition. Better reasons include
availability each partition can reside on a different tablespace. Hence a problem with a tablespace will take out a slice of the table's data instead of the whole thing.
management partitioning provides a mechanism for splitting whole table jobs into clear batches. Partition exchange can make it easier to bulk load data.
As for performance, physical co-location of records can speed up some queries- those which are searching records by a defined range of keys. However, any queries which don't match the grain of the query won't perform faster (and may even perform slower) than a non-partitioned table.
Hash partitioning is unlikely to provide performance benefits, precisely because it shuffles the keys across the whole table. It will provide the availability and manageability benefits of partitioning (but is obviously not particularly amenable to partition exchange).
A hash is not random, it divides the data in a repeatable (but perhaps difficult-to-predict) fashion so that the same ID will always map to the same partition.
Oracle uses a hash algorithm that should usually spread the data evenly between partitions.

HBase Inner join and coprocessors

I am planning to do a project for implementing all aggregation operations in HBase. But I don’t know about its difficulty. I have only 6 months for completing that project. Should I go forward with it? I am planning to do it in java. I know that there are already some aggregation functions. But there in no INNER JOIN like queries now. I am planning to implement such type of queries. I don't know it’s a blunder or bluff.
I think technically we should distinguish two types of joins:
a) One small table + One Big Table. By small table I mean table which can be cached in memory of each node w/o seriously affecting cluster operation. In this case Join using coprocessor should be be possible by putting small table in the hash map, iterating over the node local part of the data of the big table and this way producing join results. In the Hive's term it is called "map" join http://www.facebook.com/note.php?note_id=470667928919.
b) Two big tables. I do not think it is viable to get it production quality in short time frame. I might state that such functionality is realm of MPP databases and serious part of their IP.
It is definitely harder in HBase than doing it in an RDBMS or a different Hadoop technology like PIG or Hive.

SQL query to search faster or using hash table

If I am looking for a record in the database, is writing a sql query to search the database directly faster OR is reading the entire data from the database into a hashtable and then searching in O(1) time faster?
This question is for experienced programmers who have faced such issues in the past.
If you know the primary key of the row or the column you are searching on is indexed, then doing the retrieval" using SQL will be much faster. Especially if your table does not fit into memory.
Making direct sql query to database would obviously be much faster, than first reading all the records into a hash table and searching from it. This will not only save your time in loading all the records firstly into a hash table and then searching through them. 2ndly it will also save lots of memory, that your hash tables will consume.
I have experienced this kind of situations. Hope this helps you!
Sql Server Database is more faster and better than Hash-table.
one important reason behind.
Hash table reads the data once from secondary storage and then loaded into memory.
now, it is easy to identify that what will happen?
By Storing data in a huge manner, system will be slow. it will difficult to manipulate and retrieve the records.....
Despite, DBMS is being considered well convenient environment as compare to hash table. if you are trying to get results with few thousands of records then you do not have need to create index. it depends on need. Thus, it is much easy to get answers from remote machine with three tier applications. it takes care about row count, IO Speed etc.
If the SQL table is not indexed you'd have to benchmark to find your answer. Since there are lots of factors such as the row count, IO speed, network speed (if database is on a remove machine), it is hard to just give an answer to the question
On the other hand, indexing the table is a better choice. Just, leave the DBMS's job to DBMS.

Oracle Hierarchical Query Performance

We're looking at using Oracle Hierarchical queries to model potentially very large tree structures (potentially infinitely wide, and depth of 30+). My understanding is that hierarchal queries provide a method to write recursively joining SQL but they it does not provide any real performance enhancements over if you were to manually write an equivalent query... is this the case? What sort of experiences have people had, performance wise, with using oracle hierarchical queries?
Well the short answer is that without the hierarchical extension (connect by) you couldn't write a recursive query. You could programmitically issue many queries which were recurisively linked.
The rule of thumb with everything database is, especially oracle, is that if you can issue your result in a single query it will almost always be faster than doing it programatically.
My experiences have been with much smaller sets, so I can't speak for how well heirarchical queries will perform for large sets.
When doing these tree retrievals, you typically have these options
Query everything and assemble the tree on the client side.
Perform one query for each level of the tree, building on what you know that you need from the previous query results
Use the built in stuff Oracle provides (START WITH,CONNECT BY PRIOR).
Doing it all in the database will reduce unnecessary round trips or wasteful queries that pull too much data.
Try partitioning the data within you hierarchical table and then limiting the partition included in the query.
CREATE TABLE
loopy
(key NUMBER, key_hier number, info VARCHAR2, part NUMBER)
PARTITION BY
RANGE (part)
(
PARTITION low VALUES LESS THAN (1000),
PARTITION mid VALUES LESS THAN (10000),
PARTITION high VALUES LESS THAN (MAXVALUE)
);
SELECT
info
FROM
loopy PARTITION(mid)
CONNECT BY
key = key_hier
START WITH
key = <some value>;
The interesting problem now becomes your partitioning strategy. Oracle provides several options.
I've seen that using connect by can be slow but compared to what? There isn't really another option except building a result set using recursive PL/SQL calls (slower) or doing it on your client side.
You could try separating your data into a mapping (hierarchy definition) and lookup tables (the display data) and then joining them back together. I guess I wouldn't expect much of a gain assuming you are getting the hierarchy data from indexed fields but its worth a try.
Have you tried it using the connect by yet? I'm a big fan of trying different variations.

Resources