Oracle always uses HASH JOIN even when both tables are huge?

Oracle always uses HASH JOIN even when both tables are huge? - oracle

my understanding is that HASH JOIN only makes sense when one of the 2 tables is small enough to fit into memory as a hash table.
but when I gave a query to oracle, with both tables having several hundred million rows, oracle still came up with a hash join explain plan. even when I tricked it with OPT_ESTIMATE(rows = ....) hints, it always decides to use HASH JOIN instead of merge sort join.
so I wonder how is HASH JOIN possible in the case of both tables being very large?
thanks
Yang

Hash joins obviously work best when everything can fit in memory. But that does not mean they are not still the best join method when the table can't fit in memory. I think the only other realistic join method is a merge sort join.
If the hash table can't fit in memory, than sorting the table for the merge sort join can't fit in memory either. And the merge join needs to sort both tables. In my experience, hashing is always faster than sorting, for joining and for grouping.
But there are some exceptions. From the Oracle® Database Performance Tuning Guide, The Query Optimizer:
Hash joins generally perform better than sort merge joins. However,
sort merge joins can perform better than hash joins if both of the
following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
Test
Instead of creating hundreds of millions of rows, it's easier to force Oracle to only use a very small amount of memory.
This chart shows that hash joins outperform merge joins, even when the tables are too large to fit in (artificially limited) memory:
Notes
For performance tuning it's usually better to use bytes than number of rows. But the "real" size of the table is a difficult thing to measure, which is why the chart displays rows. The sizes go approximately from 0.375 MB up to 14 MB. To double-check that these queries are really writing to disk you can run them with /*+ gather_plan_statistics */ and then query v$sql_plan_statistics_all.
I only tested hash joins vs merge sort joins. I didn't fully test nested loops because that join method is always incredibly slow with large amounts of data. As a sanity check, I did compare it once with the last data size, and it took at least several minutes before I killed it.
I also tested with different _area_sizes, ordered and unordered data, and different distinctness of the join column (more matches is more CPU-bound, less matches is more IO bound), and got relatively similar results.
However, the results were different when the amount of memory was ridiculously small. With only 32K sort|hash_area_size, merge sort join was significantly faster. But if you have so little memory you probably have more significant problems to worry about.
There are still many other variables to consider, such as parallelism, hardware, bloom filters, etc. People have probably written books on this subject, I haven't tested even a small fraction of the possibilities. But hopefully this is enough to confirm the general consensus that hash joins are best for large data.
Code
Below are the scripts I used:
--Drop objects if they already exist
drop table test_10k_rows purge;
drop table test1 purge;
drop table test2 purge;
--Create a small table to hold rows to be added.
--("connect by" would run out of memory later when _area_sizes are small.)
--VARIABLE: More or less distinct values can change results. Changing
--"level" to something like "mod(level,100)" will result in more joins, which
--seems to favor hash joins even more.
create table test_10k_rows(a number, b number, c number, d number, e number);
insert /*+ append */ into test_10k_rows
select level a, 12345 b, 12345 c, 12345 d, 12345 e
from dual connect by level <= 10000;
commit;
--Restrict memory size to simulate running out of memory.
alter session set workarea_size_policy=manual;
--1 MB for hashing and sorting
--VARIABLE: Changing this may change the results. Setting it very low,
--such as 32K, will make merge sort joins faster.
alter session set hash_area_size = 1048576;
alter session set sort_area_size = 1048576;
--Tables to be joined
create table test1(a number, b number, c number, d number, e number);
create table test2(a number, b number, c number, d number, e number);
--Type to hold results
create or replace type number_table is table of number;
set serveroutput on;
--
--Compare hash and merge joins for different data sizes.
--
declare
v_hash_seconds number_table := number_table();
v_average_hash_seconds number;
v_merge_seconds number_table := number_table();
v_average_merge_seconds number;
v_size_in_mb number;
v_rows number;
v_begin_time number;
v_throwaway number;
--Increase the size of the table this many times
c_number_of_steps number := 40;
--Join the tables this many times
c_number_of_tests number := 5;
begin
--Clear existing data
execute immediate 'truncate table test1';
execute immediate 'truncate table test2';
--Print headings. Use tabs for easy import into spreadsheet.
dbms_output.put_line('Rows'||chr(9)||'Size in MB'
||chr(9)||'Hash'||chr(9)||'Merge');
--Run the test for many different steps
for i in 1 .. c_number_of_steps loop
v_hash_seconds.delete;
v_merge_seconds.delete;
--Add about 0.375 MB of data (roughly - depends on lots of factors)
--The order by will store the data randomly.
insert /*+ append */ into test1
select * from test_10k_rows order by dbms_random.value;
insert /*+ append */ into test2
select * from test_10k_rows order by dbms_random.value;
commit;
--Get the new size
--(Sizes may not increment uniformly)
select bytes/1024/1024 into v_size_in_mb
from user_segments where segment_name = 'TEST1';
--Get the rows. (select from both tables so they are equally cached)
select count(*) into v_rows from test1;
select count(*) into v_rows from test2;
--Perform the joins several times
for i in 1 .. c_number_of_tests loop
--Hash join
v_begin_time := dbms_utility.get_time;
select /*+ use_hash(test1 test2) */ count(*) into v_throwaway
from test1 join test2 on test1.a = test2.a;
v_hash_seconds.extend;
v_hash_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;
--Merge join
v_begin_time := dbms_utility.get_time;
select /*+ use_merge(test1 test2) */ count(*) into v_throwaway
from test1 join test2 on test1.a = test2.a;
v_merge_seconds.extend;
v_merge_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;
end loop;
--Get average times. Throw out first and last result.
select ( sum(column_value) - max(column_value) - min(column_value) )
/ (count(*) - 2)
into v_average_hash_seconds
from table(v_hash_seconds);
select ( sum(column_value) - max(column_value) - min(column_value) )
/ (count(*) - 2)
into v_average_merge_seconds
from table(v_merge_seconds);
--Display size and times
dbms_output.put_line(v_rows||chr(9)||v_size_in_mb||chr(9)
||v_average_hash_seconds||chr(9)||v_average_merge_seconds);
end loop;
end;
/

So I wonder how is HASH JOIN possible in the case of both tables being very large?
It would be done in multiple passes: the driven table is read and hashed in chunks, the leading table is scanned several times.
This means that with limited memory hash join scales at O(N^2) while merge joins scales at O(N) (with no sorting needed of course), and on really large tables merge outperforms hash joins. However, the tables should be really large so that benefits of single read would outweight drawbacks of non-sequential access, and you would need all data from them (usually aggregated).
Given the RAM sized on modern servers, we are talking about really large reports on really large databases which take hours to build, not something you would really see in everyday live.
MERGE JOIN may also be useful when the output recordset is limited with rownum < N. But this means that the joined inputs should be already sorted which means they both be indexed which means NESTED LOOPS is available too, and that's what is usually chosen by the optimizer, since this is more efficient when the join condition is selective.
With their current implementations, MERGE JOIN always scans and NESTED LOOPS always seeks, while a more smart combination of both methods (backed up by statistics) would be preferred.
You may want to read this article in my blog:
Things SQL needs: MERGE JOIN that would seek

A hash join does not have to fit the whole table into memory, but only the rows which match the where conditions of that table (or even only a hash + the rowid - I'm not sure about that).
So when Oracle decides that the selectivity of the part of the where conditions affecting one of the tables is good enough (i.e. few rows will have to be hashed), it might prefer a hash join even for very large tables.

Related

Force partition pruning on Oracle

I have a query similar to this
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
The huge_table is partitioned by DATE, and the PK is DATE, some_id and some_other_id (so the join not is done by pk index).
small_table just contains a few dates.
The total cost of the SQL is 48 minutes
By some reason the explain plan give me a "PARTITION RANGE (ALL)" with a high numbers on cardinality. Looks like access to the full table, not just the partitions indicated by small_table.DATE
If I put the SQL inside a loop and do
for o in (select date from small_table)
loop
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
where B.DATE=O.DATE
end loop;
Only takes 2 minutes 40 seconds (the full loop).
There is any way to force the partition pruning on Oracle 12c?
Additional info:
small_table has 37 records for 13 different dates. huge_table has 8,000 million of records with 179 dates/partitions. The SQL needs one field from small_table, but I can tweak the SQL to not use it
Update:
With the use_nl hint, now the cardinality show in the execution plan is more accurate and the execution time downs from 48 minutes to 4 minutes.
select /* use_nl(B) */*
from small_table A
inner join huge_table B on A.DATE =B.DATE

This seems like the problem:
"small_table have 37 registries for 13 different dates. huge_table has 8.000 millions of registries with 179 dates/partitions....
The SQL need one field from small_table, but I can tweak the SQL to not use it "
According to the SQL you posted you're joining the two tables on just their DATE columns with no additional conditions. If that's really the case you are generating a cross join in which each partition of huge_table is joined to small_table 2-3 times. So your result set may be much large than you're expecting, which means more database effort, which means more time.
The other thing to notice is that the cardinality of small_table to huge_table partitions is about 1:4; the optimizer doesn't know that there are really only thirteen distinct huge_table partitions in play.
Optimization ought to be a science and this is more guesswork than anything but try this:
select B.*
from ( select /*+ cardinality(t 13) */
distinct t.date
from small_table t ) A
inner join huge_table B
on A.DATE =B.DATE
This should communicate to the optimizer that only a small percentage of the huge_table partitions are required, which may make it choose partition pruning. Also it removes that Cartesian product, which should improve performance too. Obviously you will need to apply that tweak you mentioned, to remove the need to query anything else from small_table.

Optimize hive query to avoid JOIN

Question is similar to this except I want to know if I can do it in one query. This is what I have working but as we all know joins are expensive. Any better hql to do this?
select a.tbl1,b.tbl2
from
(
select count(*) as tbl1 from tbl1
) a
join
(
select count(*) as tbl2 from tbl2
) b ON 1=1

Yes, Joins are expensive
When it is said that joins are expensive, this typically refers to the situation where you have many records in multiple tables that need to be matched with eachother.
According to that description your join is not expensive, as you only join 2 sets with 1 record each.
But, you must be looking at overhead
Perhaps you notice that the individual counts take significantly shorter than the command which you use to count and combine the result. This would be because map and reduce operations have significant overhead (can be 30 seconds per stage).
You can play around a bit to see whether you hit a plan that does not incur much overhead, but it could well be that you are out of luck as hive does not scale down that well.

If it is not critical for you to keep them as a separate columns you can use UNION ALL operation to work with row format:
select 'tbl1', count(*) from tbl1
UNION ALL
select 'tbl2', count(*) from tbl2;
This would allow you to avoid extra MAPJOIN operator in your former query. Technically you can have one less mapper in your end execution plan.
Update
In up-to-date distributions of Hadoop you will not get much differences from performance perspective of going either UNION or MAP JOIN approach as these operations would be optimized within former jobs. But keep in mind that on older versions of the cluster or basing on some configuration properties MAPJOIN could be converted into a separate job.

Can a query use multiple indexes from the same table?

My question is similar to this one, but with a small difference. I have a query running on a single table with multiple WHERE conditions.
Assuming my table has multiple columns (col1 - col9) and I have a query like:
SELECT
col1
, col5
FROM table1
WHERE col1 = 'a'
AND col2 = 'b'
AND col3 = 100
AND col4 = '10a'
AND col5 = 1
And my indexes are:
col1 - unique / non-partitioned
col2, col3 - non-unique / partitioned
col4, col5 - non-unique / partitioned
My question is, if I'm using columns in my WHERE clause that cover multiple indexes, will (should?) the query pick the unique index first to generate a result set and then on that result set use the other two indexes for further filtering, sequentially reducing the result set?
Or will each index go over the entire data in the table and each condition will use an index and later merge all of the result sets?
(I don't have access to a table/data, this is more theoretical than practical).
Thank you in advance for any help

The Oracle optimiser (in more recent versions of Oracle, and unless you force it to behave otherwise) is cost based rather than rule based. When the query is first executed it will consider many different paths to obtain the answer, and choose the one with the lowest cost.
So it's generally impossible to say, ahead of time, how the database will choose to answer a particular query. The answer is always - it depends. It depends on
The statistics for the table, and the number of distinct values on each column
The version of the database you are using
System and session parameters
Statistics for the index
In general, what it will do in most cases is to choose whatever is the most selective index. So if you only had one or two rows where col1='a', it would probably go in on that index, and then scan the rows within it.
As the other answer mentions, the database can combine B-Tree indexes by going through a bitmap conversion stage. This is relatively expensive, and not available in all Oracle versions, but it can happen .
So in summary, the database can do either of the approaches you mention. The only way to know what it will do in your circumstance is to use explain plan or the equivalent tools to watch what it does

performance for sum oracle

I have to sum a huge number of data with aggregation and where clause, using this query
what I am doing is like this : I have three tables one contains terms the second contains user terms , and the third contains correlation factor between term and user term.
I want to calculate the similarity between the sentence that that user inserted with an already existing sentences, and take the results greater than .5 by summing the correlation factor between sentences' terms
The problem is that this query takes more than 15 min. because I have huge tables
any suggestions to improve performance please?
insert into PLAG_SENTENCE_SIMILARITY
SELECT plag_TERMS.SENTENCE_ID ,plag_User_TERMS.SENTENCE_ID,
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length),
plag_TERMs.isn,
plag_user_terms.isn
FROM plag_TERM_CORRELATIONS3,
plag_TERMS,
Plag_User_TERMS
WHERE ( Plag_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM1
AND Plag_User_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM2
AND Plag_User_Terms.ISN=123)
having
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length) >0.5
group by (plag_User_TERMS.SENTENCE_ID,plag_TERMS.SENTENCE_ID , plag_TERMs.isn, plag_terms.sentence_length,plag_user_terms.sentence_length, plag_user_terms.isn);
plag_terms contains more than 50 million records and plag_correlations3 contains 500000

If you have a sufficient amount of free disk space, then create a materialized view
over the join of the three tables
fast-refreshable on commit (don't use the ANSI join syntax here, even if tempted to do so, or the mview won't be fast-refreshable ... a strange bug in Oracle)
with query rewrite enabled
properly physically organized for quick calculations
The query rewrite is optional. If you can modify the above insert-select, then you can just select from the materialized view instead of selecting from the join of the three tables.
As for the physical organization, consider
hash partitioning by Plag_User_Terms.ISN (with a sufficiently high number of partitions; don't hesitate to partition your table with e.g. 1024 partitions, if it seems reasonable) if you want to do a bulk calculation over all values of ISN
single-table hash clustering by Plag_User_Terms.ISN if you want to retain your calculation over a single ISN
If you don't have a spare disk space, then just hint your query to
either use nested loops joins, since the number of rows processed seems to be quite low (assumed by the estimations in the execution plan)
or full-scan the plag_correlations3 table in parallel
Bottom line: Constrain your tables with foreign keys, check constraints, not-null constraints, unique constraints, everything! Because Oracle optimizer is capable of using most of these informations to its advantage, as are the people who tune SQL queries.

oracle partitioning on columns frequently used in joins and where conditions

The customer table contains 9.5 million records. The customer_id column is the primary key. The database is Oracle.
Questions:
1) Should the table contain main partitions or sub-partitions? How do I decide?
Also, I don't think indexing columnA or columnB will help here because of the type of data.
TableA.columnA (varchar) has more than 80% of the records for columnA values 5,6,7. The columnA has values from 1 to 7 only.
TableA.columnB (varchar) has 90% of the records for columnB value = 102. The columnB has values from 1 to 999.
Moreover, the typical queries are (in no particular order):
Query1: where tableA.columnA = values
Query2: where tableA.columnB = values
Query3: where tableA.columnA = values AND/OR tableA.columnB = values
2) When we create sub-partitions, what happens if the query only contains a where clause for sub-partition column? Does the query execution go directly to sub-partition or through main partition?
3) the join contains tableA.partitioned_column = tableB.indexed_column
(eg. customer_Table.branch_code = branch_table.branch_code)
Does partitioning help in the case of JOIN? Will it improve performance?

1) It's very difficult to answer not knowing table structure, the way it's usually used etc. But generally for big tables partitioning is very often necessity.
2) If you will not specify partition then Oracle will have to browse through all partitions to find where the subpartition is (which is not very slow). And then use partition pruning on subpartition. It will be still significantly faster then not having subpartitions at all. But the best situation is to refer in WHERE to partition and subpartition.
3) For 99% I think it will help, because Oracle can use partition pruning to get at once needed rows from tableA. You will be 100% sure if you check query plan. But the best situation is when both column are partition keys.

If 80-90% of these columns have the same values and they are the most often queried values, then partitioning will help some. You would be pruning 10-20% of the data during these queries but you probably want to find another way for Oracle to hone in on the data your query needs (dates, perhaps?)
The value distribution in your two columns also brings up the point of statistics and making sure they are being gathered properly (with histograms to describe the skew in these columns).
As #psur points out, without knowing the details of your system it's hard give concrete suggestions.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oracle always uses HASH JOIN even when both tables are huge? - oracle

Related

Force partition pruning on Oracle

Optimize hive query to avoid JOIN

Can a query use multiple indexes from the same table?

performance for sum oracle

oracle partitioning on columns frequently used in joins and where conditions

Categories

Resources