I have a query similar to this
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
The huge_table is partitioned by DATE, and the PK is DATE, some_id and some_other_id (so the join not is done by pk index).
small_table just contains a few dates.
The total cost of the SQL is 48 minutes
By some reason the explain plan give me a "PARTITION RANGE (ALL)" with a high numbers on cardinality. Looks like access to the full table, not just the partitions indicated by small_table.DATE
If I put the SQL inside a loop and do
for o in (select date from small_table)
loop
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
where B.DATE=O.DATE
end loop;
Only takes 2 minutes 40 seconds (the full loop).
There is any way to force the partition pruning on Oracle 12c?
Additional info:
small_table has 37 records for 13 different dates. huge_table has 8,000 million of records with 179 dates/partitions. The SQL needs one field from small_table, but I can tweak the SQL to not use it
Update:
With the use_nl hint, now the cardinality show in the execution plan is more accurate and the execution time downs from 48 minutes to 4 minutes.
select /* use_nl(B) */*
from small_table A
inner join huge_table B on A.DATE =B.DATE
This seems like the problem:
"small_table have 37 registries for 13 different dates. huge_table has 8.000 millions of registries with 179 dates/partitions....
The SQL need one field from small_table, but I can tweak the SQL to not use it "
According to the SQL you posted you're joining the two tables on just their DATE columns with no additional conditions. If that's really the case you are generating a cross join in which each partition of huge_table is joined to small_table 2-3 times. So your result set may be much large than you're expecting, which means more database effort, which means more time.
The other thing to notice is that the cardinality of small_table to huge_table partitions is about 1:4; the optimizer doesn't know that there are really only thirteen distinct huge_table partitions in play.
Optimization ought to be a science and this is more guesswork than anything but try this:
select B.*
from ( select /*+ cardinality(t 13) */
distinct t.date
from small_table t ) A
inner join huge_table B
on A.DATE =B.DATE
This should communicate to the optimizer that only a small percentage of the huge_table partitions are required, which may make it choose partition pruning. Also it removes that Cartesian product, which should improve performance too. Obviously you will need to apply that tweak you mentioned, to remove the need to query anything else from small_table.
Related
In my case, I have some hive tables, the partition column(dt) is the only column that every table contains.
I execute the sql below in hive
SELECT * FROM (
SELECT row_number() over(ORDER BY T.dt) as row_num,T.* FROM
(select * from ods.test_table where dt='2021-09-06') as T) TT
WHERE TT.row_num BETWEEN 1 AND 10
I get the same result every time.
But I execute the sql in Presto, the result is not the same. I think the root cause is my table lack of a unique key.
Is it possible to do a global query without unique key in Presto?
You are calculating row_number
row_number() over(ORDER BY T.dt)
and ORDER BY column is always the same dt='2021-09-06'. In this case row_number has non-deterministic behavior and can assign the same numbers to different rows from run to run.
The fact that you are always getting the same results in Hive is a coincident, probably you always are running with exactly the same number of splits or even on single mapper, which runs single-threaded and producing results which look like deterministic. Presto may have different parallelism and it affects which rows are passed to the row_number first.
You can try to change something in splits configuration to force more mappers or increase the data size and you will be able to reproduce non-deterministic behavior, many mappers running in parallel on heavy loaded cluster will execute with different speed and different rows will be passed to the row_number.
To have deterministic results, you can add some columns to the ORDER BY which will determine the order of rows. If you have no such columns, then it means that you can have any number of full duplicates.
Even if you do not have unique key, row_number will produce deterministic results if ALL columns are in the order by.
Consider this dataset:
Col1 Col2 Col3
1 1 2
1 1 2
1 1 3
1 1 3
row_number() over(ORDER BY col1) as rn can produce all 4 rows ordered differently each run (let's suppose the dataset is very big one and there are many mappers are running concurrently, some mappers can finish faster, some can fail and restart). Of course, if you have such a small dataset and always processing it in single process, single threaded, the result will be the same, but in general, this is not how databases work.
The same about row_number() over(ORDER BY col1, col2)
But in case of row_number() over(ORDER BY col1, col2, col3) - you will always get the same dataset, guaranteed.
So, the solution is to use as much order by columns as needed to determine the order of rows. In the worst case if you have full duplicates, all columns should be added to the ORDER BY, duplicates will be ordered together and the result will be deterministic.
Question is similar to this except I want to know if I can do it in one query. This is what I have working but as we all know joins are expensive. Any better hql to do this?
select a.tbl1,b.tbl2
from
(
select count(*) as tbl1 from tbl1
) a
join
(
select count(*) as tbl2 from tbl2
) b ON 1=1
Yes, Joins are expensive
When it is said that joins are expensive, this typically refers to the situation where you have many records in multiple tables that need to be matched with eachother.
According to that description your join is not expensive, as you only join 2 sets with 1 record each.
But, you must be looking at overhead
Perhaps you notice that the individual counts take significantly shorter than the command which you use to count and combine the result. This would be because map and reduce operations have significant overhead (can be 30 seconds per stage).
You can play around a bit to see whether you hit a plan that does not incur much overhead, but it could well be that you are out of luck as hive does not scale down that well.
If it is not critical for you to keep them as a separate columns you can use UNION ALL operation to work with row format:
select 'tbl1', count(*) from tbl1
UNION ALL
select 'tbl2', count(*) from tbl2;
This would allow you to avoid extra MAPJOIN operator in your former query. Technically you can have one less mapper in your end execution plan.
Update
In up-to-date distributions of Hadoop you will not get much differences from performance perspective of going either UNION or MAP JOIN approach as these operations would be optimized within former jobs. But keep in mind that on older versions of the cluster or basing on some configuration properties MAPJOIN could be converted into a separate job.
I had an Oracle query as below that took 10 minutes or longer to run:
select
r.range_text as duration_range,
nvl(count(c.call_duration),0) as calls,
nvl(SUM(call_duration),0) as total_duration
from
call_duration_ranges r
left join
big_table c
on c.call_duration BETWEEN r.range_lbound AND r.range_ubound
and c.aaep_src = 'MAIN_SOURCE'
and c.calltimestamp_local >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
AND c.calltimestamp_local <= to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')
and c.destinationnumber LIKE substr( 'abc:1301#company.com:5060;user=phone',1,8) || '%'
group by
r.range_text
order by
r.range_text
If I changed the date part of the query to:
(c.calltimestamp_local+0) >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
(AND c.calltimestamp_local+0) <= to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')
It runs in 2 seconds. I did this based on another post to avoid using the date index. Seems counter intuitive though--the index slowing things down so much.
Ran the explain plan and it seems identical between the new and updated query. Only difference is the the MERGE JOIN operation is 16,269 bytes in the old query and 1,218 bytes in the new query. Actually cardinality is higher in the old query as well. And I actually don't see an "INDEX" operation on the old or new query in the explain plan, just for the index on the destinationnumber field.
So why is the index slowing down the query so much? What can I do to the index--don't think using the "+0" is the best solution going forward...
Querying for two days of data, suppressing use of destinationnumber index:
0 SELECT STATEMENT ALL_ROWS 329382 1218 14
1 SORT GROUP BY 329382 1218 14
2 MERGE JOIN OUTER 329381 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 329377 65 1
7 TABLE ACCESS BY GLOBAL INDEX ROWID BIG_TABLE ANALYZED 329376 65 1
8 INDEX RANGE SCAN IDX_CDR_CALLTIMESTAMP_LOCAL ANALYZED 1104 342104
Querying for 2 days using destinationnumber index:
0 SELECT STATEMENT ALL_ROWS 11 1218 14
1 SORT GROUP BY 11 1218 14
2 MERGE JOIN OUTER 10 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 6 65 1
7 TABLE ACCESS BY GLOBAL INDEX ROWID BIG_TABLE ANALYZED 5 65 1
8 INDEX RANGE SCAN IDX_DESTINATIONNUMBER_PART ANALYZED 4 4
Querying for one month, suppressing destinationnumber index--full scan:
0 SELECT STATEMENT ALL_ROWS 824174 1218 14
1 SORT GROUP BY 824174 1218 14
2 MERGE JOIN OUTER 824173 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 824169 65 1
7 PARTITION RANGE ALL 824168 65 1
8 TABLE ACCESS FULL BIG_TABLE ANALYZED 824168 65 1
Seems counter intuitive though--the index slowing things down so much.
Counter-intuitive only if you don't understand how indexes work.
Indexes are good for retrieving individual rows. They are not suited to retrieving large numbers of records. You haven't bothered to provide any metrics but it seems likely your query is touching a large number of rows. In which case a full table scan or other set=based operation will be more much efficient.
Tuning date range queries is tricky, because it's very hard for the database to know how many records lie between the two bounds, no matter how up-to-date our statistics are. (Even more tricky to tune when the date bounds can vary - one day is a different matter from one month or one year.) So often we need to help the optimizer by using our knowledge of our data.
don't think using the "+0" is the best solution going forward...
Why not? People have been using that technique to avoid using an index in a specific query for literally decades.
However, there are more modern solutions. The undocumented cardinality hint is one:
select /*+ cardinality(big_table,10000) */
... should be enough to dissuade the optimizer from using an index - provided you have accurate statistics gathered for all the tables in the query.
Alternatively you can force the optimizer to do a full table scan with ...
select /*+ full(big_table) */
Anyway, there's nothing you can do to the index to change the way databases work. You could make things faster with partitioning, but I would guess if your organisation had bought the Partitioning option you'd be using it already.
These are the reasons using an index slows down a query:
A full tablescan would be faster. This happens if a substantial fraction of rows has to be retrieved. The concrete numbers depend on various factors, but as a rule of thumb in common situations using an index is slower if you retrieve more than 10-20% of your rows.
Using another index would be even better, because fewer rows are left after the first stage. Using a certain index on a table usually means that other indexes cannot be used.
Now it is the optimizers job to decide which variant is the best. To perform this task, he has to guess (among other things) how much rows are left after applying certain filtering clauses. This estimate is based on the tables statistics, and is usually quite ok. It even takes into account skewed data, but it might be off if either your statitiscs are outdated or you have a rather uncommon distribution of data. For example, if you computed your statistics before the data of february was inserted in your example, the optimizer might wrongly conclude that there are only few (if any) rows left after applaying the date range filter.
Using combined indexes on several columns might also be an option dependent on your data.
Another note on the "skewed data issue": There are cases the optimizer detects skewed data in column A if you have an index on cloumn A but not if you only have a combined index on columns A and B, because the combination might make the distribution more even. This is one of the few cases where an index on A,B does not make an index on A redundant.
APCs answer show how to use hints to direct the optimizer in the right direction if it still produces wrong plans even with right statistics.
I am trying to pull a random sample of a population from a Peoplesoft Database. The searches online have lead me to think that the Sample Clause of the select statement may be a viable option for us to use, however I am having trouble understanding how the Sample clause determines the number of samples returned. I have looked at the oracle documentation found here:
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#i2065953
But the above reference only talks about the syntax used to create the sample. The reason for my question is I need to understand how the sample percent determines the sample size returned. It seems like it applies a random number to the percent you ask for and then uses a seed number to count every "n" records. Our requirement is that we pull an exact number of samples for example, that they are randomly selected, and that they are representative of the entire table (or at least the grouping of data we choose with filters)
In a population of 10200 items if I need a sample of approximately 100 items, I could use this statement:
SELECT * FROM PS_LEDGER SAMPLE(1) --1 % of my total population
WHERE DEPTID = '700064'
However, We need to pull an exact number of samples (in this case 100) so I could pick a sample size that almost always returns more than the number I need then trim it down IE
SELECT Count(*) FROM PS_LEDGER SAMPLE(2.5) --this percent must always give > 100 items
WHERE DEPTID = '700064' and rownum < 101
My concern with doing that, is that my sample would not uniformly represent the entire population. For example if the sample function just pulls every N record after it creates its own randomly generated seed, then choosing the rownum < 101 will cut off all of the records chosen from the bottom of the table. What I am looking for is a way to pull out exactly 100 records from the table, which are randomly selected and fairly representative of the entire table. Please help!!
Borrowing jonearles' example table, I see exactly the same thing (in 11gR2 on an OEL developer image), usually getting values for a heavily skewed towards 1; with small sample sizes I can sometimes see none at all. With the extra randomisation/restriction step I mentioned in a comment:
select a, count(*) from (
select * from test1 sample (1)
order by dbms_random.value
)
where rownum < 101
group by a;
... with three runs I got:
A COUNT(*)
---------- ----------
1 71
2 29
A COUNT(*)
---------- ----------
1 100
A COUNT(*)
---------- ----------
1 64
2 36
Yes, 100% really came back as 1 on the second run. The skewing itself seems to be rather random. I tried with the block modifier which seemed to make little difference, perhaps surprisingly - I might have thought it would get worse in this situation.
This is likely to be slower, certainly for small sample sizes, as it has to hit the entire table; but does give me pretty even splits fairly consistently:
select a, count(*) from (
select a, b from (
select a, b, row_number() over (order by dbms_random.value) as rn
from test1
)
where rn < 101
)
group by a;
With three runs I got:
A COUNT(*)
---------- ----------
1 48
2 52
A COUNT(*)
---------- ----------
1 57
2 43
A COUNT(*)
---------- ----------
1 49
2 51
... which looks a bit healthier. YMMV of course.
This Oracle article covers some sampling techniques, and you might want to evaluate the ora_hash approach as well, and the stratified version if your data spread and your requirements for 'representativeness' demand it.
You can't trust SAMPLE to return a truly random set of rows from a table. The algorithm appears to be based on the physical properties of the table.
create table test1(a number, b char(2000));
--Insert 10K fat records. A is always 1.
insert into test1 select 1, level from dual connect by level <= 10000;
--Insert 10K skinny records. A is always 2.
insert into test1 select 2, null from dual connect by level <= 10000;
--Select about 10 rows.
select * from test1 sample (0.1) order by a;
Run the last query multiple times and you will almost never see any 2s. This may be a accurate sample if you measure by bytes, but not by rows.
This is an extreme example of skewed data, but I think it's enough to show that RANDOM doesn't work the way the manual implies it should. As others have suggested, you'll probably want to ORDER BY DBMS_RANDOM.VALUE.
I've been fiddling about with a similar question. Firstly I set up what the sample size will be for the different Stratum. In your case it's only one. ('700064'). So in a with Clause or a temp table I did this:
Select DEPTID, Count(*) SAMPLE_ONE
FROM PS_LEDGER Sample(1)
WHERE DEPTID = '700064'
Group By DEPTID
This tells you the records in a 1% sample to expect. Lets call that TABLE_1
Then I did this:
Select
Ceil (Rank() over (Partition by DEPTID Order by DBMS_RANDOM.VALUE)
/ (Select SAMPLE_ONE From TABLE_1) STRATUM_GROUP
,A.*
FROM PS_LEDGER
Make that another table. What you get then is Random Sample Sets of approx. 1% in size.
So if your original table held 1000 records you would get 100 random sample sets with 10 items in each set.
you can then select one of these randomly to test.
Not sure if I've explained this very well, but it worked for me. I had 168 Stratum Set up on a table with over 10Mil records worked quite well.
If you want more explanation or can improve this please don't hesitate.
Regards
my understanding is that HASH JOIN only makes sense when one of the 2 tables is small enough to fit into memory as a hash table.
but when I gave a query to oracle, with both tables having several hundred million rows, oracle still came up with a hash join explain plan. even when I tricked it with OPT_ESTIMATE(rows = ....) hints, it always decides to use HASH JOIN instead of merge sort join.
so I wonder how is HASH JOIN possible in the case of both tables being very large?
thanks
Yang
Hash joins obviously work best when everything can fit in memory. But that does not mean they are not still the best join method when the table can't fit in memory. I think the only other realistic join method is a merge sort join.
If the hash table can't fit in memory, than sorting the table for the merge sort join can't fit in memory either. And the merge join needs to sort both tables. In my experience, hashing is always faster than sorting, for joining and for grouping.
But there are some exceptions. From the Oracle® Database Performance Tuning Guide, The Query Optimizer:
Hash joins generally perform better than sort merge joins. However,
sort merge joins can perform better than hash joins if both of the
following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
Test
Instead of creating hundreds of millions of rows, it's easier to force Oracle to only use a very small amount of memory.
This chart shows that hash joins outperform merge joins, even when the tables are too large to fit in (artificially limited) memory:
Notes
For performance tuning it's usually better to use bytes than number of rows. But the "real" size of the table is a difficult thing to measure, which is why the chart displays rows. The sizes go approximately from 0.375 MB up to 14 MB. To double-check that these queries are really writing to disk you can run them with /*+ gather_plan_statistics */ and then query v$sql_plan_statistics_all.
I only tested hash joins vs merge sort joins. I didn't fully test nested loops because that join method is always incredibly slow with large amounts of data. As a sanity check, I did compare it once with the last data size, and it took at least several minutes before I killed it.
I also tested with different _area_sizes, ordered and unordered data, and different distinctness of the join column (more matches is more CPU-bound, less matches is more IO bound), and got relatively similar results.
However, the results were different when the amount of memory was ridiculously small. With only 32K sort|hash_area_size, merge sort join was significantly faster. But if you have so little memory you probably have more significant problems to worry about.
There are still many other variables to consider, such as parallelism, hardware, bloom filters, etc. People have probably written books on this subject, I haven't tested even a small fraction of the possibilities. But hopefully this is enough to confirm the general consensus that hash joins are best for large data.
Code
Below are the scripts I used:
--Drop objects if they already exist
drop table test_10k_rows purge;
drop table test1 purge;
drop table test2 purge;
--Create a small table to hold rows to be added.
--("connect by" would run out of memory later when _area_sizes are small.)
--VARIABLE: More or less distinct values can change results. Changing
--"level" to something like "mod(level,100)" will result in more joins, which
--seems to favor hash joins even more.
create table test_10k_rows(a number, b number, c number, d number, e number);
insert /*+ append */ into test_10k_rows
select level a, 12345 b, 12345 c, 12345 d, 12345 e
from dual connect by level <= 10000;
commit;
--Restrict memory size to simulate running out of memory.
alter session set workarea_size_policy=manual;
--1 MB for hashing and sorting
--VARIABLE: Changing this may change the results. Setting it very low,
--such as 32K, will make merge sort joins faster.
alter session set hash_area_size = 1048576;
alter session set sort_area_size = 1048576;
--Tables to be joined
create table test1(a number, b number, c number, d number, e number);
create table test2(a number, b number, c number, d number, e number);
--Type to hold results
create or replace type number_table is table of number;
set serveroutput on;
--
--Compare hash and merge joins for different data sizes.
--
declare
v_hash_seconds number_table := number_table();
v_average_hash_seconds number;
v_merge_seconds number_table := number_table();
v_average_merge_seconds number;
v_size_in_mb number;
v_rows number;
v_begin_time number;
v_throwaway number;
--Increase the size of the table this many times
c_number_of_steps number := 40;
--Join the tables this many times
c_number_of_tests number := 5;
begin
--Clear existing data
execute immediate 'truncate table test1';
execute immediate 'truncate table test2';
--Print headings. Use tabs for easy import into spreadsheet.
dbms_output.put_line('Rows'||chr(9)||'Size in MB'
||chr(9)||'Hash'||chr(9)||'Merge');
--Run the test for many different steps
for i in 1 .. c_number_of_steps loop
v_hash_seconds.delete;
v_merge_seconds.delete;
--Add about 0.375 MB of data (roughly - depends on lots of factors)
--The order by will store the data randomly.
insert /*+ append */ into test1
select * from test_10k_rows order by dbms_random.value;
insert /*+ append */ into test2
select * from test_10k_rows order by dbms_random.value;
commit;
--Get the new size
--(Sizes may not increment uniformly)
select bytes/1024/1024 into v_size_in_mb
from user_segments where segment_name = 'TEST1';
--Get the rows. (select from both tables so they are equally cached)
select count(*) into v_rows from test1;
select count(*) into v_rows from test2;
--Perform the joins several times
for i in 1 .. c_number_of_tests loop
--Hash join
v_begin_time := dbms_utility.get_time;
select /*+ use_hash(test1 test2) */ count(*) into v_throwaway
from test1 join test2 on test1.a = test2.a;
v_hash_seconds.extend;
v_hash_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;
--Merge join
v_begin_time := dbms_utility.get_time;
select /*+ use_merge(test1 test2) */ count(*) into v_throwaway
from test1 join test2 on test1.a = test2.a;
v_merge_seconds.extend;
v_merge_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;
end loop;
--Get average times. Throw out first and last result.
select ( sum(column_value) - max(column_value) - min(column_value) )
/ (count(*) - 2)
into v_average_hash_seconds
from table(v_hash_seconds);
select ( sum(column_value) - max(column_value) - min(column_value) )
/ (count(*) - 2)
into v_average_merge_seconds
from table(v_merge_seconds);
--Display size and times
dbms_output.put_line(v_rows||chr(9)||v_size_in_mb||chr(9)
||v_average_hash_seconds||chr(9)||v_average_merge_seconds);
end loop;
end;
/
So I wonder how is HASH JOIN possible in the case of both tables being very large?
It would be done in multiple passes: the driven table is read and hashed in chunks, the leading table is scanned several times.
This means that with limited memory hash join scales at O(N^2) while merge joins scales at O(N) (with no sorting needed of course), and on really large tables merge outperforms hash joins. However, the tables should be really large so that benefits of single read would outweight drawbacks of non-sequential access, and you would need all data from them (usually aggregated).
Given the RAM sized on modern servers, we are talking about really large reports on really large databases which take hours to build, not something you would really see in everyday live.
MERGE JOIN may also be useful when the output recordset is limited with rownum < N. But this means that the joined inputs should be already sorted which means they both be indexed which means NESTED LOOPS is available too, and that's what is usually chosen by the optimizer, since this is more efficient when the join condition is selective.
With their current implementations, MERGE JOIN always scans and NESTED LOOPS always seeks, while a more smart combination of both methods (backed up by statistics) would be preferred.
You may want to read this article in my blog:
Things SQL needs: MERGE JOIN that would seek
A hash join does not have to fit the whole table into memory, but only the rows which match the where conditions of that table (or even only a hash + the rowid - I'm not sure about that).
So when Oracle decides that the selectivity of the part of the where conditions affecting one of the tables is good enough (i.e. few rows will have to be hashed), it might prefer a hash join even for very large tables.