oracle partitioning on columns frequently used in joins and where conditions - oracle

The customer table contains 9.5 million records. The customer_id column is the primary key. The database is Oracle.
Questions:
1) Should the table contain main partitions or sub-partitions? How do I decide?
Also, I don't think indexing columnA or columnB will help here because of the type of data.
TableA.columnA (varchar) has more than 80% of the records for columnA values 5,6,7. The columnA has values from 1 to 7 only.
TableA.columnB (varchar) has 90% of the records for columnB value = 102. The columnB has values from 1 to 999.
Moreover, the typical queries are (in no particular order):
Query1: where tableA.columnA = values
Query2: where tableA.columnB = values
Query3: where tableA.columnA = values AND/OR tableA.columnB = values
2) When we create sub-partitions, what happens if the query only contains a where clause for sub-partition column? Does the query execution go directly to sub-partition or through main partition?
3) the join contains tableA.partitioned_column = tableB.indexed_column
(eg. customer_Table.branch_code = branch_table.branch_code)
Does partitioning help in the case of JOIN? Will it improve performance?

1) It's very difficult to answer not knowing table structure, the way it's usually used etc. But generally for big tables partitioning is very often necessity.
2) If you will not specify partition then Oracle will have to browse through all partitions to find where the subpartition is (which is not very slow). And then use partition pruning on subpartition. It will be still significantly faster then not having subpartitions at all. But the best situation is to refer in WHERE to partition and subpartition.
3) For 99% I think it will help, because Oracle can use partition pruning to get at once needed rows from tableA. You will be 100% sure if you check query plan. But the best situation is when both column are partition keys.

If 80-90% of these columns have the same values and they are the most often queried values, then partitioning will help some. You would be pruning 10-20% of the data during these queries but you probably want to find another way for Oracle to hone in on the data your query needs (dates, perhaps?)
The value distribution in your two columns also brings up the point of statistics and making sure they are being gathered properly (with histograms to describe the skew in these columns).
As #psur points out, without knowing the details of your system it's hard give concrete suggestions.

Related

Oracle Auto Partition strategy on Integer column

I need some help on how to perform auto partition on integer column, similar to how we do on date column like PARTITION BY RANGE (DIM_DT_ID) INTERVAL (NUMTODSINTERVAL(1,'DAY')).
I have 90 million rows and it sucks in performance and our SLA on query is 2 seconds, i would like to perform partition. What is the best approach and how do i enable auto partition on a Integer column
Our query will always filter by these columns like
select * from <tbname>
where ObjectID =1346785
and patentnumber=23456.
"i'm just making an example here, as i cant paste the original query for legality sake"
Fair enough, but the advice we give you will only be as good as the information you give us. So far, nothing you have posted suggests you need Partitioning.
The pasted query would perform well with a compound index, and would probably benefit from compression of the leading column:
create index your_table_lookup_index
on your_table(ObjectID, patentnumber) compress 1;
If that's a unique combination then make the index unique.
how do i enable auto partition on a Integer column
However, if you think you do have a genuine use case for Partitioning then we can use Interval Partitioning with integers as well as dates. This statement will create a table partitioned on objectid with a partition for every ten values.
create table your_table (
objectid number,
patentnumber number,
created_date date
)
partition by range (objectid)
interval (10)
(
partition p_00010 values less than (10)
);
On your posted figures that would be about 400 partitions with around 225000 rows per partition. Is that a good choice? Who can tell? You know your data and your use cases, we don't: perhaps a partition per objectid (i.e. with interval (1)) would be better.
You already have a table so you need to split it into Partitions. The standard of way of doing this would be
create a new table with your partitioning strategy (like above) but with the default partition ranged for values less than (MAXVALUE)
use partition exchange to move the existing table data into the new
structure
drop the old table and rename the table to the old table; resolve
foreign keys and other dependencies.
iteratively split the partition into the required range
This is a fairly time-consuming process. You have tagged your question [oracle12c]; if you're using Oracle 12c R2 you should definitely look at its online conversion mechanism, which is a single command. Find out more.
Remember that Partitioning for performance is a tricky game. While it can improve queries which return a large number of rows aligned with the Partition key it can make no difference to other queries, or even impair their performance. In particular, any query which does not include the partition key (objectid in your case) will likely perform worse after partitioning the table .
Final aside: as you know but for the benefit of future Seekers, Partitioning is a chargeable extra to the Enterprise Edition license. We're not allowed to use it unless we've paid for it.

what is skewed column in Oracle

I found some bottleneck of my query which select data from only single table then require time and i used non unique key index on two column and with column used in where clause.
select name ,isComplete from Student where year='2015' and isComplete='F'
Now i found some concept from internet like skewed column so what is it?
have an idea then plz help me?
and how to resolve problem of skewed column?
and how skewed column affect performance of the Query?
Skewed columns are columns in which the data is not evenly distributed among the rows.
For example, suppose:
You have a table order_lines with 100,000,000 rows
The table has a column named customer_id
You have 1,000,000 distinct customers
Some (very large) customers can have hundreds of thousands or millions of order lines.
In the above example, the data in order_lines.customer_id is skewed. On average, you'd expect each distinct customer_id to have 100 order lines (100 million rows divided by 1 million distinct customers). But some large customers have many, many more than 100 order lines.
This hurts performance because Oracle bases its execution plan on statistics. So, statistically speaking, Oracle thinks it can access order_lines based on a non-unique index on customer_id and get only 100 records back, which it might then join to another table or whatever using a NESTED LOOP operation.
But, then when it actually gets 1,000,000 order lines for a particular customer, the index access and nested loop join are hideously slow. It would have been far better for Oracle to do a full table scan and hash join to the other table.
So, when there is skewed data, the optimal access plan depends on which particular customer you are selecting!
Oracle lets you avoid this problem by optionally gathering "histograms" on columns, so Oracle knows which values have lots of rows and which have only a few. That gives the Oracle optimizer the information it needs to generate the best plan in most cases.
Full table scan and Index Scan both are depend on the Skewed column.
and Skewed column is nothing but your spread like gender column contain 60 male and 40 female.

performance for sum oracle

I have to sum a huge number of data with aggregation and where clause, using this query
what I am doing is like this : I have three tables one contains terms the second contains user terms , and the third contains correlation factor between term and user term.
I want to calculate the similarity between the sentence that that user inserted with an already existing sentences, and take the results greater than .5 by summing the correlation factor between sentences' terms
The problem is that this query takes more than 15 min. because I have huge tables
any suggestions to improve performance please?
insert into PLAG_SENTENCE_SIMILARITY
SELECT plag_TERMS.SENTENCE_ID ,plag_User_TERMS.SENTENCE_ID,
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length),
plag_TERMs.isn,
plag_user_terms.isn
FROM plag_TERM_CORRELATIONS3,
plag_TERMS,
Plag_User_TERMS
WHERE ( Plag_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM1
AND Plag_User_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM2
AND Plag_User_Terms.ISN=123)
having
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length) >0.5
group by (plag_User_TERMS.SENTENCE_ID,plag_TERMS.SENTENCE_ID , plag_TERMs.isn, plag_terms.sentence_length,plag_user_terms.sentence_length, plag_user_terms.isn);
plag_terms contains more than 50 million records and plag_correlations3 contains 500000
If you have a sufficient amount of free disk space, then create a materialized view
over the join of the three tables
fast-refreshable on commit (don't use the ANSI join syntax here, even if tempted to do so, or the mview won't be fast-refreshable ... a strange bug in Oracle)
with query rewrite enabled
properly physically organized for quick calculations
The query rewrite is optional. If you can modify the above insert-select, then you can just select from the materialized view instead of selecting from the join of the three tables.
As for the physical organization, consider
hash partitioning by Plag_User_Terms.ISN (with a sufficiently high number of partitions; don't hesitate to partition your table with e.g. 1024 partitions, if it seems reasonable) if you want to do a bulk calculation over all values of ISN
single-table hash clustering by Plag_User_Terms.ISN if you want to retain your calculation over a single ISN
If you don't have a spare disk space, then just hint your query to
either use nested loops joins, since the number of rows processed seems to be quite low (assumed by the estimations in the execution plan)
or full-scan the plag_correlations3 table in parallel
Bottom line: Constrain your tables with foreign keys, check constraints, not-null constraints, unique constraints, everything! Because Oracle optimizer is capable of using most of these informations to its advantage, as are the people who tune SQL queries.

Create Oracle partitioning table without PK

We have a huge table which are 144 million rows available right now and also increasing 1 million rows each day.
I would like to create a partitioning table on Oracle 11G server but I am not aware of the techniques. So I have two question :
Is it possible to create a partitioning table from a table that don't have PK?
What is your suggestion to create a partitioning table like huge records?
Yes, but keep in mind that the partition key must be a part of PK
Avoid global indexes
Chose right partitioning key - have it prepared for some kind of future maintenance ( cutting off oldest or unnecessary partitions, placing them in separate tablespaces... etc)
There are too many things to consider.
"There are several non-unique index on the table. But, the performance
is realy terrible! Just simple count function was return result after
5 minutes."
Partitioning is not necessarily a performance enhancer. The partition key will allow certain queries to benefit from partition pruning i.e. those queries which drive off the partition key in the WHERE clause. Other queries may perform worse, if there WHERE clause runs against the grain of the partition key,
It is difficult to give specific advice because the details you've posted are so vague. But here are some other possible ways of speeding up queries on big tables:
index compression
parallel query
better, probably compound, indexes.

Oracle 'pseduo-fact' view

Assumptions:
I have a number of tables comprised of facts and foreign keys ('dimensional' and 'key-value' type). For example, ENCOUNTER:
ID - primary key
dimensions
LOCATION_ID
PATIENT_ID
key-value
TYPE_ID
STATUS_ID
PATIENT_CLASS_ID
DISPOSITION_ID
...
facts
ADMISSION_DATE
DISCHARGE_DATE
...
I don't have the option to create a data warehouse
I would like to simplify the data structure for reporting
My approach is to create a number of pseudo-dimensional views ('D_LOCATION' based on the DEPARTMENT and LOCATION tables) and pseudo-fact views ('F_ENCOUNTER' based on ENCOUNTER table). In the pseudo-fact view, I would JOIN the key-value tables (e.g. STATUS, PATIENT_CLASS) to the fact table to include the name fields (e.g. STATUS.NAME, PATIENT_CLASS.NAME).
Questions:
If a query selects a subset of all of the fields from F_ENCOUNTER (i.e. not all of the key-value.name fields), is the Oracle 10g optimizer smart enough to exclude some of the key-value table joins (i.e. the ones that aren't included in the query)?
Is there anything that I can do to optimize this architecture (other than indices)
Is there another approach?
** edit **
Goals (in order of importance):
reduce query complexity; increase query consistency; decrease report-development time
optimize query-processing
minimize administrator burden
decrease storage
One optimization suggestion is not to use key-value pair tables. The concept of a Dimension table is that each record should contain all information about that concept without needing to join to normalized tables - i.e. turning a star schema into a snowflake schema.
While values might be repeated across dimension table records, it has the advantage of fewer joins in your reporting queries. Denormalizing tables in this way might seem counter intuitive but where performance is paramount it is usually the best solution.
I don't believe Oracle would exclude any joins done in the view, because the joins can impact the number of rows returned. (As when an inner join fails to match any rows, making the whole result set empty.)
What are the goals of your optimization? Query speed? query simplicity? storage efficiency? If you can sacrifice storage efficiency for better query performance, then replace the key-value references with the values themselves (TYPE_NAME instead of TYPE_ID, PATIENT_CLASS_NAME instead of PATIENT_CLASS_ID, etc.).
[Edit:] If the original architecture cannot be modified, consider using a materialized view. It would essentially pre-compute the joins and store the result set, giving you speedy query time at the cost of extra storage space and possibly-not-fresh data. You can control the latter by specifying an appropriate refresh policy. See http://en.wikipedia.org/wiki/Materialized_view and http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/mv.htm for further details.

Resources