I have a table with four columns
|-----|-----|-----|-----------|
| a | b | c | d |
| int | int | int | timestamp |
|-----|-----|-----|-----------|
This table contains more than 100 000 000 records.
I have indices on all four columns and one compound index on (a,b,c).
If I run the following query, it works fine (few milliseconds):
SELECT
count(*) FROM my_table
WHERE
a = X AND b = Y AND c = Z
It basically returns about 3 thousand elements.
However if I want to add a condition on column d (which is a timestamp):
SELECT
count(*) FROM my_table
WHERE
a = X AND b = Y AND c = Z AND d < '2018-01-01T00:00:00'
Then the query response time jumps to minutes.
What am I missing here ?
Since you have a compound index on (a,b,c), the first query only needs to use the index (see concept of covering indexes), therefore the results can be served very quickly. The server does not even have to open the table itself.
When you add the criteria on column d, mariadb cannot use the compound index any more as a covering index. The index will still be used to speed up the query to get the records matching the first 3 criteria, but then mariadb has to go to the big table and further filter column d without using any index to get the matching records fort the 4th criterion. Depending on how selective your compound index is, this still can take a lot of time.
You can experiment with creating an index on all 4 columns, but the overall price may be greater than the gain.
Related
I use AWS-EMR to run my Hive queries and I have a performance issue while running hive version 0.13.1.
The newer version of hive took around 5 minutes for running 10 rows of data. But the same script for 230804 rows is taking 2 days and is still running. What should I do to analyze and fix the problem?
Sample Data:
Table 1:
hive> describe foo;
OK
orderno string
Time taken: 0.101 seconds, Fetched: 1 row(s)
Sample data for table1:
hive>select * from foo;
OK
1826203307
1826207803
1826179498
1826179657
Table 2:
hive> describe de_geo_ip_logs;
OK
id bigint
startorderno bigint
endorderno bigint
itemcode int
Time taken: 0.047 seconds, Fetched: 4 row(s)
Sample data for Table 2:
hive> select * from bar;
127698025 417880320 417880575 306
127698025 3038626048 3038626303 584
127698025 3038626304 3038626431 269
127698025 3038626560 3038626815 163
My Query:
SELECT b.itemcode
FROM foo a, bar b
WHERE a.orderno BETWEEN b.startorderno AND b.endorderno;
In the very top of your Hive log output, it states "Warning: Shuffle Join JOIN[4][Tables a, b] in Stage 'Stage-1 Mapred' is a cross product."
EDIT:
A 'cross product' or Cartesian product is a join without conditions, which returns every row in the 'b' table, for every row in the 'a' table. So, if you take an example of 'a' is 5 rows, and 'b' is 10 rows, you get the product, or, 5 multiplied by 10 = 50 rows returned. There will be a lot of rows that are completely 'null' for one or the other tables.
Now, if you have a table 'a' of 20,000 rows and join it to another table 'b' of 500,000 rows, you are asking the SQL engine to return to you a data set 'a, b' of 10,000,000,000 rows, and then perform the BETWEEN operation on the 10-million rows.
So, if you drop the number of 'b' rows, you see you will get more benefit than the 'a' - in your example, if you can filter the ip_logs table, table 2, since I am making a guess that it has more rows than your order number table, it will cut down on the execution time.
END EDIT
You're forcing the execution engine to work through a Cartesian product by not specifying a condition for the join. It's having to scan all of table a over and over. With 10 rows, you will not have a problem. With 20k, you are running into dozens of map/reduce waves.
Try this query:
SELECT b.itemcode
FROM foo a JOIN bar b on <SomeKey>
WHERE a.orderno BETWEEN b.startorderno AND b.endorderno;
But I'm having trouble figuring out what column your model will allow joining on. Maybe the data model for this expression could be improved? It may just be me not reading the sample clearly.
Either way, you need to filter the number of comparisons BEFORE the where clause. Other ways I have done this in Hive is to make a view with a smaller set of data, and join/match the view instead of the original table.
I want to identify all rows whose content in a clob column is not unique.
The query I use is:
select
id,
clobtext
from
table t
where
(select count(*) from table innerT where dbms_lob.compare(innerT.clobtext, t.clobtext) = 0)>1
However this query is very slow. Any suggestions to speed it up? I already tried to use the dbms_lob.getlength function to eliminate more elements in the subquery but I didn't really improve the performance (feels the same).
To make it more clear an example:
table
ID | clobtext
1 | a
2 | b
3 | c
4 | d
5 | a
6 | d
After running the query. I'd like to get (order doesn't matter):
1 | a
4 | d
5 | a
6 | d
In the past I've generated checksums (in my C# code) for each clob.
Whilst this will inccur a one off increase in io (to generate the checksum)
subsequent scans will be quicker, and you can index the value too
TK has a good PL\SQL example here:
Ask Tom
I have 6,00,000 records and i want to fetch 10 records from them as i want to display only 10 records in the grid my stored procedure is working properly when i m fetching records between 1-10000 E.G (500-510) after that the execution time is increased when the row number is increased E.G if i fetch record b/w 1,00,000-1,00,010 it takes more time to execute
can any one please help me i have used ROW_NUMBER() to get the number row number and used between to retrieve data.
please give a optimized way to get records
The stored procedure creats a sql query as given below
I have 6,00,000 records and i want to fetch 10 records from them as i want to display only 10 records in the grid my stored procedure is working properly when i m fetching records between 1-10000 E.G (500-510) after that the execution time is increased when the row number is increased E.G if i fetch record b/w 1,00,000-1,00,010 it takes more time to execute
can any one please help me i have used ROW_NUMBER() to get the number row number and used between to retrieve data.
please give a optimized way to get records
The stored procedure create a sql query as given below
SELECT FuelClaimId from
( SELECT fc.FuelClaimId,ROW_NUMBER() OVER ( order by fc.FuelClaimId ) AS RowNum
from FuelClaims fc
INNER JOIN Vehicles v on fc.VehicleId =v.VehicleId
INNER JOIN Drivers d on d.DriverId =v.OfficialID
INNER JOIN Departments de on de.DepartmentId =d.DepartmentId
INNER JOIN Provinces p on de.ProvinceId =p.ProvinceId
INNER JOIN FuelRates f on f.FuelRateId =fc.FuelRateId
INNER JOIN FuelClaimStatuses fs on fs.FuelClaimStatusId= fc.statusid
INNER JOIN LogsheetMonths l on l.LogsheetMonthId =f.LogsheetMonthId
Where fc.IsDeleted = 0) AS MyDerivedTable WHERE MyDerivedTable.RowNum BETWEEN
600000 And 600010
Try this instead:
SELECT TOP 10 fc.FuelClaimId
FROM FuelClaims fc
INNER JOIN Vehicles v ON fc.VehicleId = v.VehicleId
INNER JOIN Drivers d ON d.DriverId = v.OfficialID
INNER JOIN Departments de ON de.DepartmentId = d.DepartmentId
INNER JOIN Provinces p ON de.ProvinceId = p.ProvinceId
INNER JOIN FuelRates f ON f.FuelRateId = fc.FuelRateId
INNER JOIN FuelClaimStatuses fs ON fs.FuelClaimStatusId = fc.statusid
INNER JOIN LogsheetMonths l ON l.LogsheetMonthId = f.LogsheetMonthId
WHERE fc.IsDeleted = 0 AND fc.FuelClaimId BETWEEN 600001 AND 600010
ORDER BY fc.FuelClaimId
Also BETWEEN is inclusive so BETWEEN 10 and 20 actually returns 10,11,12,13,14,15,16,17,18,19 and 20 so 11 rows not 10. As identity values usually start at 1 you really want BETWEEN 11 AND 20 (hence 600001 in the above)
The above query should fix your issue where your performance degrades as you query the larger range of items.
While it won't always return 10 records the fix for that is:
WHERE fc.IsDeleted = 0 AND fc.FuelClaimId > #LastMaxFuelClaimId
Where #LastMaxFuelClaimId is the previous MAX FuelClaimId you had returned from the previous query execution.
Edit: The reason why it keeps getting slower is because it has to read more and more of the table to read the next chunk, it doesn't skip reading the first 600,000 records it reads them all and then only returns the next 10 hence each time you query it reads all the previous records all over again, the above does not suffer from the same problem.
You should post an execution plan but a probable cause of performance problems would be inadequate or lack of indexing.
Make sure you have
an index on all your foreign key relations
a covering index on the fields you retrieve and select from
Covering Index
CREATE INDEX IX_FUELCLAIMS_FUELCLAIMID_ISDELETED
ON dbo.FuelClaims (FuelClaimId, VehicleID, IsDeleted)
Have the following tables (Oracle 10g):
catalog (
id NUMBER PRIMARY KEY,
name VARCHAR2(255),
owner NUMBER,
root NUMBER REFERENCES catalog(id)
...
)
university (
id NUMBER PRIMARY KEY,
...
)
securitygroup (
id NUMBER PRIMARY KEY
...
)
catalog_securitygroup (
catalog REFERENCES catalog(id),
securitygroup REFERENCES securitygroup(id)
)
catalog_university (
catalog REFERENCES catalog(id),
university REFERENCES university(id)
)
Catalog: 500 000 rows, catalog_university: 500 000, catalog_securitygroup: 1 500 000.
I need to select any 50 rows from catalog with specified root ordered by name for current university and current securitygroup. There is a query:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Where 100 - some catalog, 200 - some securitygroup, 300 - some university. This query return 50 rows from ~ 170 000 in 3 minutes.
But next query return this rows in 2 sec:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
I build next indexes: (catalog.id, catalog.name, catalog.owner), (catalog_securitygroup.catalog, catalog_securitygroup.index), (catalog_university.catalog, catalog_university.university).
Plan for first query (using PLSQL Developer):
http://habreffect.ru/66c/f25faa5f8/plan2.jpg
Plan for second query:
http://habreffect.ru/f91/86e780cc7/plan1.jpg
What are the ways to optimize the query I have?
The indexes that can be useful and should be considered deal with
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
So the following fields can be interesting for indexes
c: id, root
cs: catalog, securitygroup
cu: catalog, university
So, try creating
(catalog_securitygroup.catalog, catalog_securitygroup.securitygroup)
and
(catalog_university.catalog, catalog_university.university)
EDIT:
I missed the ORDER BY - these fields should also be considered, so
(catalog.name, catalog.id)
might be beneficial (or some other composite index that could be used for sorting and the conditions - possibly (catalog.root, catalog.name, catalog.id))
EDIT2
Although another question is accepted I'll provide some more food for thought.
I have created some test data and run some benchmarks.
The test cases are minimal in terms of record width (in catalog_securitygroup and catalog_university the primary keys are (catalog, securitygroup) and (catalog, university)). Here is the number of records per table:
test=# SELECT (SELECT COUNT(*) FROM catalog), (SELECT COUNT(*) FROM catalog_securitygroup), (SELECT COUNT(*) FROM catalog_university);
?column? | ?column? | ?column?
----------+----------+----------
500000 | 1497501 | 500000
(1 row)
Database is postgres 8.4, default ubuntu install, hardware i5, 4GRAM
First I rewrote the query to
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root < 50
AND cs.catalog = c.id
AND cu.catalog = c.id
AND cs.securitygroup < 200
AND cu.university < 200
ORDER BY c.name
LIMIT 50 OFFSET 100
note: the conditions are turned into less then to maintain comparable number of intermediate rows (the above query would return 198,801 rows without the LIMIT clause)
If run as above, without any extra indexes (save for PKs and foreign keys) it runs in 556 ms on a cold database (this is actually indication that I oversimplified the sample data somehow - I would be happier if I had 2-4s here without resorting to less then operators)
This bring me to my point - any straight query that only joins and filters (certain number of tables) and returns only a certain number of the records should run under 1s on any decent database without need to use cursors or to denormalize data (one of these days I'll have to write a post on that).
Furthermore, if a query is returning only 50 rows and does simple equality joins and restrictive equality conditions it should run even much faster.
Now let's see if I add some indexes, the biggest potential in queries like this is usually the sort order, so let me try that:
CREATE INDEX test1 ON catalog (name, id);
This makes execution time on the query - 22ms on a cold database.
And that's the point - if you are trying to get only a page of data, you should only get a page of data and execution times of queries such as this on normalized data with proper indexes should take less then 100ms on decent hardware.
I hope I didn't oversimplify the case to the point of no comparison (as I stated before some simplification is present as I don't know the cardinality of relationships between catalog and the many-to-many tables).
So, the conclusion is
if I were you I would not stop tweaking indexes (and the SQL) until I get the performance of the query to go below 200ms as rule of the thumb.
only if I would find an objective explanation why it can't go below such value I would resort to denormalisation and/or cursors, etc...
First I assume that your University and SecurityGroup tables are rather small. You posted the size of the large tables but it's really the other sizes that are part of the problem
Your problem is from the fact that you can't join the smallest tables first. Your join order should be from small to large. But because your mapping tables don't include a securitygroup-to-university table, you can't join the smallest ones first. So you wind up starting with one or the other, to a big table, to another big table and then with that large intermediate result you have to go to a small table.
If you always have current_univ and current_secgrp and root as inputs you want to use them to filter as soon as possible. The only way to do that is to change your schema some. In fact, you can leave the existing tables in place if you have to but you'll be adding to the space with this suggestion.
You've normalized the data very well. That's great for speed of update... not so great for querying. We denormalize to speed querying (that's the whole reason for datawarehouses (ok that and history)). Build a single mapping table with the following columns.
Univ_id, SecGrp_ID, Root, catalog_id. Make it an index organized table of the first 3 columns as pk.
Now when you query that index with all three PK values, you'll finish that index scan with a complete list of allowable catalog Id, now it's just a single join to the cat table to get the cat item details and you're off an running.
The Oracle cost-based optimizer makes use of all the information that it has to decide what the best access paths are for the data and what the least costly methods are for getting that data. So below are some random points related to your question.
The first three tables that you've listed all have primary keys. Do the other tables (catalog_university and catalog_securitygroup) also have primary keys on them?? A primary key defines a column or set of columns that are non-null and unique and are very important in a relational database.
Oracle generally enforces a primary key by generating a unique index on the given columns. The Oracle optimizer is more likely to make use of a unique index if it available as it is more likely to be more selective.
If possible an index that contains unique values should be defined as unique (CREATE UNIQUE INDEX...) and this will provide the optimizer with more information.
The additional indexes that you have provided are no more selective than the existing indexes. For example, the index on (catalog.id, catalog.name, catalog.owner) is unique but is less useful than the existing primary key index on (catalog.id). If a query is written to select on the catalog.name column, it is possible to do and index skip scan but this starts being costly (and most not even be possible in this case).
Since you are trying to select based in the catalog.root column, it might be worth adding an index on that column. This would mean that it could quickly find the relevant rows from the catalog table. The timing for the second query could be a bit misleading. It might be taking 2 seconds to find 50 matching rows from catalog, but these could easily be the first 50 rows from the catalog table..... finding 50 that match all your conditions might take longer, and not just because you need to join to other tables to get them. I would always use create table as select without restricting on rownum when trying to performance tune. With a complex query I would generally care about how long it take to get all the rows back... and a simple select with rownum can be misleading
Everything about Oracle performance tuning is about providing the optimizer enough information and the right tools (indexes, constraints, etc) to do its job properly. For this reason it's important to get optimizer statistics using something like DBMS_STATS.GATHER_TABLE_STATS(). Indexes should have stats gathered automatically in Oracle 10g or later.
Somehow this grew into quite a long answer about the Oracle optimizer. Hopefully some of it answers your question. Here is a summary of what is said above:
Give the optimizer as much information as possible, e.g if index is unique then declare it as such.
Add indexes on your access paths
Find the correct times for queries without limiting by rowwnum. It will always be quicker to find the first 50 M&Ms in a jar than finding the first 50 red M&Ms
Gather optimizer stats
Add unique/primary keys on all tables where they exist.
The use of rownum is wrong and causes all the rows to be processed. It will process all the rows, assigned them all a row number, and then find those between 0 and 50. When you want to look for in the explain plan is COUNT STOPKEY rather than just count
The query below should be an improvement as it will only get the first 50 rows... but there is still the issue of the joins to look at too:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
where rownum <= 50
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Also, assuming this for a web page or something similar, maybe there is a better way to handle this than just running the query again to get the data for the next page.
try to declare a cursor. I dont know oracle, but in SqlServer would look like this:
declare #result
table (
id numeric,
name varchar(255)
);
declare __dyn_select_cursor cursor LOCAL SCROLL DYNAMIC for
--Select
select distinct
c.id, c.name
From [catalog] c
inner join university u
on u.catalog = c.id
and u.university = 300
inner join catalog_securitygroup s
on s.catalog = c.id
and s.securitygroup = 200
Where
c.root = 100
Order by name
--Cursor
declare #id numeric;
declare #name varchar(255);
open __dyn_select_cursor;
fetch relative 1 from __dyn_select_cursor into #id,#name declare #maxrowscount int
set #maxrowscount = 50
while (##fetch_status = 0 and #maxrowscount <> 0)
begin
insert into #result values (#id, #name);
set #maxrowscount = #maxrowscount - 1;
fetch next from __dyn_select_cursor into #id, #name;
end
close __dyn_select_cursor;
deallocate __dyn_select_cursor;
--Select temp, final result
select
id,
name
from #result;
I have an Oracle 10.2.0.3 database, and a query like this:
select count(a.id)
from LARGE_PARTITIONED_TABLE a
join SMALL_NONPARTITIONED_TABLE b on a.key1 = b.key1 and a.key2 = b.key2
where b.id = 1000
Table LARGE_PARTITIONED_TABLE (a) has about 5 million rows, and is partitioned by a column not present in the query. Table SMALL_NONPARTITIONED_TABLE (b) is not partitioned, and holds about 10000 rows.
Statistics are up-to-date, and there are height balanced histograms in columns key1 and key2 of table a.
Table a has a primary key and a global, nonpartitioned unique index on columns key1, key2, key3, key4, and key5.
Explain plan for the query displays the following results:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 31 | | |
| 2 | NESTED LOOPS | | 406 | 12586 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| INDEX_ON_TABLE_B | 1 | 19 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| PRIMARY_KEY_INDEX_OF_TABLE_A | 406 | 4872 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("b"."id"=1000)
4 - access("a"."key1"="b"."key1" and
"a"."key2"="b"."key2")
Thus the rows (cardinality) estimated for step 4 is 406.
Now, a tkprof trace reveals the following:
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=51 pr=9 pw=0 time=74674 us)
7366 NESTED LOOPS (cr=51 pr=9 pw=0 time=824941 us)
1 INDEX RANGE SCAN INDEX_ON_TABLE_B (cr=2 pr=0 pw=0 time=36 us)(object id 111111)
7366 INDEX RANGE SCAN PRIMARY_KEY_INDEX_OF_TABLE_A (cr=49 pr=9 pw=0 time=810173 us)(object id 222222)
So the cardinality in reality was 7366, not 406!
My question is this: From where does Oracle get the estimated cardinality of 406 in this case, and how can I improve its accuracy, so that the estimate is more in line of what really happens during query execution?
Update: Here is a snippet of a 10053 trace I ran on the query.
NL Join
Outer table: Card: 1.00 Cost: 2.00 Resp: 2.00 Degree: 1 Bytes: 19
Inner table: LARGE_PARTITIONED_TABLE Alias: a
...
Access Path: index (IndexOnly)
Index: PRIMARY_KEY_INDEX_OF_TABLE_A
resc_io: 2.00 resc_cpu: 27093
ix_sel: 1.3263e-005 ix_sel_with_filters: 1.3263e-005
NL Join (ordered): Cost: 4.00 Resp: 4.00 Degree: 1
Cost_io: 4.00 Cost_cpu: 41536
Resp_io: 4.00 Resp_cpu: 41536
****** trying bitmap/domain indexes ******
Best NL cost: 4.00
resc: 4.00 resc_io: 4.00 resc_cpu: 41536
resp: 4.00 resp_io: 4.00 resp_cpu: 41536
Using concatenated index cardinality for table SMALL_NONPARTITIONED_TABLE
Revised join sel: 8.2891-e005 = 8.4475e-005 * (1/12064.00) * (1/8.4475e-005)
Join Card: 405.95 = outer (1.00) * inner (4897354.00) * sel (8.2891-e005)
Join Card - Rounded: 406 Computed: 405.95
So that is where the value 406 is coming from. Like Adam answered, join cardinality is join selectivity * filter cardinality (a) * filter cardinality (b), as can be seen on the second to last line of above trace quote.
What I do not understand is the Revised join sel line. 1/12064 is the selectivity of the index used to find the row from table b (12064 rows on table, and select based on unique id). But so what?
Cardinality appears to be calculated by
multiplying the filter cardinality
of table b (4897354) with
selectivity of table a (1/12064).
Why? What
does the selectivity on
table a have to do with how much
rows is expected to be found from
table b, when a --> b join is not based on
a.id?
Where does the number
8.4475e-005 come from (it does not appear anywhere else in the whole
trace)? Not that it affects the
output, but I'd still like to know.
I understand that the optimizer has likely chosen the correct path here. But still the cardinality is miscalculated - and that can have a major effect on the execution path that is chosen from that point onwards (as in the case I'm having IRL - this example is a simplification of that).
Generating a 10053 trace file will help show exactly what choices the optimizer's making regarding its estimation of cardinality and selectivity. Jonathan Lewis' excellect Cost Based Oracle Fundamentals is an excellent resource to understanding how the optimizer works, and the printing I have spans 8i to 10.1.
From that work:
Join Selectivity = ((num_rows(t1) - num_nulls(t1.c1)) / num_rows(t1))
* ((num_rows(t2) - num_nulls(t2.c2)) / num_rows(t2))
/ greater (num_distinct(t1.c1), num_distinct(t2.c2))
Join Cardinality = Join Selectivity
* filtered_cardinality (t1)
* filtered_cardinality (t2)
However, because we have a multi-column join, Join Selectivity isn't at the table level, it's the product (intersection) of the join selectivities on each column. Assuming there's no nulls in play:
Join Selectivity = Join Selectivity (key1) * Join Selectivity (key2)
Join Selectivity (key1) = ((5,000,000 - 0) / 5,000,000)
* ((10,000 - 0)) / 10,000)
/ max (116, ?) -- distinct key1 values in B
= 1 / max(116, distinct_key1_values_in_B)
Join Selectivity (key2) = ((5,000,000 - 0) / 5,000,000)
* ((10,000 - 0)) / 10,000)
/ max (650, ?) -- distinct key2 values in B
= 1 / max(650, distinct_key2_values in B)
Join Cardinality = JS(key1) * JS(key2)
* Filter_cardinality(a) * Filter_cardinality(b)
We know that there are no filters on A, so that's tables filter cardinality is the number of rows. We're selecting the key value from B, so that table's filter cardinality is 1.
So the best case for estimated estimated join cardinality is now
Join Cardinality = 1/116 * 1/650 * 5,000,000 * 1
=~ 67
It might be easier to work backward. Your estimated cardinality of 406, given what we know, leads to a join selectivty of 406/5,000,000, or approximately 1/12315. That happens to be really, really close to 1 / (116^2), which is a sanity check within the optimizer to prevent it from finding too aggressive a cardinality on multi-column joins.
For the TL;DR crowd:
Get Jonathan Lewis' Cost Based Oracle Fundamentals.
Get a 10053 trace of the query whose behavior your can't understand.
The cardinality estimate would be based on the product of the selectivity of a.key1 and a.key2, which (at least in 10g) would each be based on the number of distinct values for those two columns as recorded in the column statistics.
For a table of 5M rows, a cardinality estimate of 406 is not significantly different to 7366. The question you have to ask yourself is, is the "inaccurate" estimate here causing a problem?
You can check what plan Oracle would choose if it were able to generate a perfectly accurate estimate by getting an explain plan for this:
select /*+CARDINALITY(a 7366)*/ count(a.id)
from LARGE_PARTITIONED_TABLE a
join SMALL_NONPARTITIONED_TABLE b on a.key1 = b.key1 and a.key2 = b.key2
where b.id = 1000;
If this comes up with the same plan, then the estimate that Oracle is calculating is already adequate.
You might be interested to read this excellent paper by Wolfgang Breitling which has a lot of info on CBO calculations: http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf.
As explained there, because you have histograms, the filter-factor calculation for these columns does not use number of distinct values (NDV) but density, which is derived from the histogram in some way.
What are the DENSITY values in USER_TAB_COLUMNS for a.key1 and a.key2?
Generally, the problem in cases like this is that Oracle does not gather statistics on pairs of columns, and assumes that their combined filter factor will be the product of their individual factors. This will produce low estimates if there is any correlation between values of the two columns.
If this is causing a serious performance issue, I suppose you could create a function-based index on a function of those columns, and use that to do the lookup. Then Oracle would gather statistics on that index and probably produce better estimates.