Q: how to unnest bags from complicated data structure in PIG

Q: how to unnest bags from complicated data structure in PIG - hadoop

Originally I have structure like this:
+-------+-------+----+----+----+-----+
| time | type | s1 | s2 | id | p1 |
+-------+-------+----+----+----+-----+
| 10:30 | send | a | b | 1 | 110 |
| 10:35 | send | c | d | 1 | 120 |
| 10:31 | reply | e | f | 3 | 221 |
| 10:33 | reply | a | c | 1 | 210 |
| 10:34 | send | a | a | 3 | 113 |
| 10:32 | reply | c | d | 3 | 157 |
+-------+-------+----+----+----+-----+
I want to normalize the table:
group the entries by id,
inside each group, find out the oldest send type entry,
replace s1, s2 of other entries with the values from that oldest send type entry
```
+-------+-------+----+----+----+-----+
| time | type | s1 | s2 | id | p1 |
+-------+-------+----+----+----+-----+
| 10:30 | send | a | b | 1 | 110 |
| 10:35 | send | a | b | 1 | 120 |
| 10:33 | reply | a | b | 1 | 210 |
| 10:31 | reply | a | a | 3 | 221 |
| 10:34 | send | a | a | 3 | 113 |
| 10:32 | reply | a | a | 3 | 157 |
+-------+-------+----+----+----+-----+
this is how I tried to tackle the problem:
events_groupby_id = GROUP events BY id;
events_normalized = FOREACH events_groupby_id {
f_reqs = FILTER events BY type matches 'send';
o_reqs = ORDER events BY time ASC;
req = LIMIT o_reqs 1;
GENERATE req, events;
};
I am stuck at here. Because I found that events_normalized became a complicated structure with nested bags and I don't know how to flatten correctly.
events_normalized | req:bag{:tuple()} | events:bag{:tuple()}
From here, what should I do to achieve the data structure that I want? I would really appreciate it if anyone can help me out. Thank you.

You can unnest the bags in events_normalized using FLATTEN:
events_flattened = FOREACH events_normalized GENERATE
FLATTEN(req),
FLATTEN(events);
This creates a crossproduct between req and events, but since there is only one tuple in req, you end up with only one record for each of your original entries. The schema for events_flattened is:
req::time | req::type | req::s1 | req::s2 | req::id | req::p1 | events::time | events::type | events::s1 | events::s2 | events::id | events::p1
So now you can refer to the fields you wish to keep, using events for the original entries and req for the replacements from the oldest send type entry:
final = FOREACH events_flattened GENERATE
events::time AS time,
events::type AS type,
req::s1 AS s1,
req::s2 AS s2,
events::id AS id,
events::p1 AS p1;

Related

Another Combination

Very similar to my last question, now I want only the, "full combination," for a group in order of priority. So, from this source table:
+-------+-------+----------+
| GROUP | State | Priority |
+-------+-------+----------+
| 1 | MI | 1 |
| 1 | IA | 2 |
| 1 | CA | 3 |
| 1 | ND | 4 |
| 1 | AZ | 5 |
| 2 | IA | 2 |
| 2 | NJ | 1 |
| 2 | NH | 3 |
And so on...
I need a query that returns:
+-------+---------------------+
| GROUP | COMBINATION |
+-------+---------------------+
| 1 | MI, IA, CA, ND, AZ |
| 2 | NJ, IA, NH |
+-------+---------------------+
Thanks for the help, again!

Use listagg() ordering by priority within the group.
SELECT "GROUP",
listagg("STATE", ', ') WITHIN GROUP (ORDER BY "PRIORITY")
FROM "ELBAT"
GROUP BY "GROUP";
db<>fiddle

Pivot Table in Hive and Create Multiple Columns for Unique Combinations

I want to pivot the following table
| ID | Code | date | qty |
| 1 | A | 1/1/19 | 11 |
| 1 | A | 2/1/19 | 12 |
| 2 | B | 1/1/19 | 13 |
| 2 | B | 2/1/19 | 14 |
| 3 | C | 1/1/19 | 15 |
| 3 | C | 3/1/19 | 16 |
into
| ID | Code | mth_1(1/1/19) | mth_2(2/1/19) | mth_3(3/1/19) |
| 1 | A | 11 | 12 | 0 |
| 2 | B | 13 | 14 | 0 |
| 3 | C | 15 | 0 | 16 |
I am new to hive, i am not sure how to implement it.
NOTE: I don't want to do mapping because my month values change over time.

Determinate unique values from oracle join?

I need a way to avoid duplicate values from oracle join, I have this scenario.
The first table contain general information about a person.
+-----------+-------+-------------+
| ID | Name | Birtday_date|
+-----------+-------+-------------+
| 1 | Byron | 12/10/1998 |
| 2 | Peter | 01/11/1973 |
| 4 | Jose | 05/02/2008 |
+-----------+-------+-------------+
The second table contain information about a telephone of the people in the first table.
+-------+----------+----------+----------+
| ID |ID_Person |CELL_TYPE | NUMBER |
+-------+- --------+----------+----------+
| 1221 | 1 | 3 | 099141021|
| 2221 | 1 | 2 | 099091925|
| 3222 | 1 | 1 | 098041013|
| 4321 | 2 | 1 | 088043153|
| 4561 | 2 | 2 | 090044313|
| 5678 | 4 | 1 | 092049013|
| 8990 | 4 | 2 | 098090233|
+----- -+----------+----------+----------+
The Third table contain information about a email of the people in the first table.
+------+----------+----------+---------------+
| ID |ID_Person |EMAIL_TYPE| Email |
+------+- --------+----------+---------------+
| 221 | 1 | 1 |jdoe#aol.com |
| 222 | 1 | 2 |jdoe1#aol.com |
| 421 | 2 | 1 |xx12#yahoo.com |
| 451 | 2 | 2 |dsdsa#gmail.com|
| 578 | 4 | 1 |sasaw1#sdas.com|
| 899 | 4 | 2 |cvcvsd#wew.es |
| 899 | 4 | 2 |cvsd#www.es |
+------+----------+----------+---------------+
I was able to produce a result like this, you can check in this link http://sqlfiddle.com/#!4/8e326/1
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
+-----+-------+-------------+----------+----------+----------+----------------+
If you check the data in table Email for user with ID_Person = 4 only present two of the three emails that have, the problem for this case is the person have more emails that cellphone numbers and only will present the same number of the cellphone numbers.
The result i expected is something like this.
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
| 4 | Jose | 05/02/2008 | | |2 |cvsd#www.es |
+-----+-------+-------------+----------+----------+----------+----------------+
This is the way that i need to present the data.

I could not understand why your query was so complex, thus, added the simple full outer join and it seems to be working:
select distinct p.id, p.name,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else cell_type end as cell_type,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else CELL end as CELL,
EMAIL_TYPE as EMAIL_TYPE, EMAIL as EMAIL
from person p full outer join phones pe on p.id = pe.id
full outer join emails e
on p.id = e.id and pe.cell_type = e.email_type;

Slow aggregation on big neo4j graph

Configuration:
Windows 8.1
neo4j-enterprise-2.2.0-M03
cache type: hpc
8Gb RAM
6Gb for JVM Heap (wrapper.java.initmemory=6144 wrapper.java.maxmemory=6144)
5Gb out of 6Gb of JVM Heap for mapped memory (dbms.pagecache.memory=5G)
Model:
Model represents how users navigate through website.
27 522 896 nodes (394Mb)
111 294 796 relationships (3609Mb)
33 906 363 properties (1326Mb)
293 (:Page) nodes
27522603 (:PageView) nodes
0 (:User) nodes (not load yet)
each (:PageView) node connected with (:Page) node
each (:PageView) node connected with next (:PageView) node
each (:PageView) node connected with (:User) node (not yet)
Query
match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
195363 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID105, FRESHID110, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID105, FRESHID110 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID105, FRESHID110 | |
| Projection(1) | 776404 | 35941815 | 71883630 | FRESHID105, p | |
| Filter(0) | 776404 | 35941815 | 35941815 | p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
| Expand(All)(0) | 776404 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Filter(1) | 384001 | 13345621 | 13345621 | | hasLabel(anon[67]:PageView) |
| Expand(All)(1) | 384001 | 13345621 | 19478500 | | ()-[:Next]->() |
| Filter(2) | 189923 | 6132879 | 6132879 | | hasLabel(anon[46]:PageView) |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
Total database accesses: 202202762
Query without unnecessary labels
match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
166751 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID82, FRESHID87, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID82, FRESHID87 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID82, FRESHID87 | |
| Projection(1) | 776388 | 35941815 | 71883630 | FRESHID82, p | |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
Total database accesses: 146782447
Message.log
Question
How can I perform this query much faster? (more RAM, refactor query, distributed cache, use another language/shell/method, ...)
UPD:
Profile info for last query in answer
neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx" | 5172680 |
| "DocumentPreview.aspx" | 3846455 |
| "Dashboard.aspx" | 3579022 |
| "ActualTasks.aspx" | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx" | 1373928 |
| "PaymentsList.aspx" | 1338063 |
| "Login.aspx" | 1285447 |
| "ReportingRequisites.aspx" | 884077 |
| "ContractorInfo.aspx" | 759665 |
+------------------------------------------------+
10 rows
151328 ms
Compiler CYPHER 2.2
Planner COST
Projection
|
+Top
|
+EagerAggregation
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Projection | 881 | 10 | 20 | count, p, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | count, p | { AUTOINT1}; count |
| EagerAggregation | 881 | 173 | 0 | count, p | p |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+
Total database accesses: 74898837

As I mentioned before, in your other question, if you can write a Java based server extension you can do it pretty easily.
// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());
// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();
// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
// follow singular next relationship
Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
if (at2==null) continue;
// follow singular page-view relationship to end-page
Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
// increment counter
pageCounts.get(page2).incrementAndGet();
}
// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
return - Integer.compare(e1.getValue(),e2.getValue());
}
});
// return top 10
return pages.subList(0,10);
For Cypher I would try something like this:
match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Update
I wrote a test for it and ran it on my bigger linux machine, the results there are much more sensible: between 1.6s in Java and 5s max in Cypher.
Here is the code and the results: https://gist.github.com/jexp/94f75ddb849f8c41c97c
In Cypher:
-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name | count |
+-------------------+
| "Page169" | 975 |
| "Page125" | 959 |
| "Page106" | 955 |
| "Page274" | 951 |
| "Page176" | 947 |
| "Page241" | 944 |
| "Page30" | 942 |
| "Page44" | 938 |
| "Page1" | 938 |
| "Page118" | 938 |
+-------------------+
10 rows
in 3212 ms
[Compiler CYPHER 2.2
Planner COST
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Top | 488 | 10 | 0 | FRESHID71, FRESHID76 | { AUTOINT1}; |
| EagerAggregation | 488 | 300 | 0 | FRESHID71, FRESHID76 | |
| Projection | 238460 | 264828 | 529656 | FRESHID71, p | |
| Filter | 238460 | 264828 | 0 | p | NOT(anon[29] == anon[51]) |
| Expand(All)(0) | 238460 | 264828 | 529656 | p | ()-[:At]->(p) |
| Expand(All)(1) | 238460 | 264828 | 778522 | | ()-[:Next]->() |
| Expand(All)(2) | 476922 | 513694 | 513695 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
Total database accesses: 2351530]
And in Java:
-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938
Something you can also do to speed up your Cypher query, is to only aggregate on the nodes, and only return the page.Name property for the last 10 rows, much faster.
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count

Oracle "Total" plan cost is really less than some of it's elements

I cannot figure out why sometimes, the total cost of a plan can be a very small number whereas looking inside the plan we can find huge costs. (indeed the query is very slow).
Can somebody explain me that?
Here is an example.
Apparently the costful part comes from a field in the main select that does a listagg on a subview and the join condition with this subview contains a complex condition (we can join on one field or another).
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 875 | 20 |
| 1 | SORT GROUP BY | | 1 | 544 | |
| 2 | VIEW | | 1 | 544 | 3 |
| 3 | SORT UNIQUE | | 1 | 481 | 3 |
| 4 | NESTED LOOPS | | | | |
| 5 | NESTED LOOPS | | 3 | 1443 | 2 |
| 6 | TABLE ACCESS BY INDEX ROWID | | 7 | 140 | 1 |
| 7 | INDEX RANGE SCAN | | 7 | | 1 |
| 8 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 9 | TABLE ACCESS BY INDEX ROWID | | 1 | 461 | 1 |
| 10 | SORT GROUP BY | | 1 | 182 | |
| 11 | NESTED LOOPS | | | | |
| 12 | NESTED LOOPS | | 8 | 1456 | 3 |
| 13 | NESTED LOOPS | | 8 | 304 | 2 |
| 14 | TABLE ACCESS BY INDEX ROWID | | 7 | 154 | 1 |
| 15 | INDEX RANGE SCAN | | 7 | | 1 |
| 16 | INDEX RANGE SCAN | | 1 | 16 | 1 |
| 17 | INDEX RANGE SCAN | | 1 | | 1 |
| 18 | TABLE ACCESS BY INDEX ROWID | | 1 | 144 | 1 |
| 19 | SORT GROUP BY | | 1 | 268 | |
| 20 | VIEW | | 1 | 268 | 9 |
| 21 | SORT UNIQUE | | 1 | 108 | 9 |
| 22 | CONCATENATION | | | | |
| 23 | NESTED LOOPS | | | | |
| 24 | NESTED LOOPS | | 1 | 108 | 4 |
| 25 | NESTED LOOPS | | 1 | 79 | 3 |
| 26 | NESTED LOOPS | | 1 | 59 | 2 |
| 27 | TABLE ACCESS BY INDEX ROWID | | 1 | 16 | 1 |
| 28 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 29 | TABLE ACCESS BY INDEX ROWID | | 1 | 43 | 1 |
| 30 | INDEX RANGE SCAN | | 1 | | 1 |
| 31 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 32 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 33 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 34 | TABLE ACCESS BY INDEX ROWID | | 1 | 29 | 1 |
| 35 | NESTED LOOPS | | | | |
| 36 | NESTED LOOPS | | 1 | 108 | 4 |
| 37 | NESTED LOOPS | | 1 | 79 | 3 |
| 38 | NESTED LOOPS | | 1 | 59 | 2 |
| 39 | TABLE ACCESS BY INDEX ROWID | | 4 | 64 | 1 |
| 40 | INDEX RANGE SCAN | | 2 | | 1 |
| 41 | TABLE ACCESS BY INDEX ROWID | | 1 | 43 | 1 |
| 42 | INDEX RANGE SCAN | | 1 | | 1 |
| 43 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 44 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 45 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 46 | TABLE ACCESS BY INDEX ROWID | | 1 | 29 | 1 |
| 47 | SORT GROUP BY | | 1 | 330 | |
| 48 | VIEW | | 1 | 330 | 26695 |
| 49 | SORT UNIQUE | | 1 | 130 | 26695 |
| 50 | CONCATENATION | | | | |
| 51 | HASH JOIN ANTI | | 1 | 130 | 13347 |
| 52 | NESTED LOOPS | | | | |
| 53 | NESTED LOOPS | | 1 | 110 | 4 |
| 54 | NESTED LOOPS | | 1 | 81 | 3 |
| 55 | NESTED LOOPS | | 1 | 61 | 2 |
| 56 | TABLE ACCESS BY INDEX ROWID | | 1 | 16 | 1 |
| 57 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 58 | TABLE ACCESS BY INDEX ROWID | | 1 | 45 | 1 |
| 59 | INDEX RANGE SCAN | | 1 | | 1 |
| 60 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 61 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 62 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 63 | TABLE ACCESS BY INDEX ROWID | | 1 | 29 | 1 |
| 64 | VIEW | | 164K| 3220K| 13341 |
| 65 | NESTED LOOPS | | | | |
| 66 | NESTED LOOPS | | 164K| 11M| 13341 |
| 67 | NESTED LOOPS | | 164K| 8535K| 10041 |
| 68 | TABLE ACCESS BY INDEX ROWID | | 164K| 6924K| 8391 |
| 69 | INDEX SKIP SCAN | | 2131K| | 163 |
| 70 | INDEX UNIQUE SCAN | | 1 | 10 | 1 |
| 71 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 72 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 73 | HASH JOIN ANTI | | 2 | 260 | 13347 |
| 74 | NESTED LOOPS | | | | |
| 75 | NESTED LOOPS | | 2 | 220 | 4 |
| 76 | NESTED LOOPS | | 2 | 162 | 3 |
| 77 | NESTED LOOPS | | 2 | 122 | 2 |
| 78 | TABLE ACCESS BY INDEX ROWID | | 4 | 64 | 1 |
| 79 | INDEX RANGE SCAN | | 2 | | 1 |
| 80 | TABLE ACCESS BY INDEX ROWID | | 1 | 45 | 1 |
| 81 | INDEX RANGE SCAN | | 1 | | 1 |
| 82 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 83 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 84 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 85 | TABLE ACCESS BY INDEX ROWID | | 1 | 29 | 1 |
| 86 | VIEW | | 164K| 3220K| 13341 |
| 87 | NESTED LOOPS | | | | |
| 88 | NESTED LOOPS | | 164K| 11M| 13341 |
| 89 | NESTED LOOPS | | 164K| 8535K| 10041 |
| 90 | TABLE ACCESS BY INDEX ROWID | | 164K| 6924K| 8391 |
| 91 | INDEX SKIP SCAN | | 2131K| | 163 |
| 92 | INDEX UNIQUE SCAN | | 1 | 10 | 1 |
| 93 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 94 | TABLE ACCESS BY INDEX ROWID | | 1 | 20 | 1 |
| 95 | NESTED LOOPS OUTER | | 1 | 875 | 20 |
| 96 | NESTED LOOPS OUTER | | 1 | 846 | 19 |
| 97 | NESTED LOOPS OUTER | | 1 | 800 | 18 |
| 98 | NESTED LOOPS OUTER | | 1 | 776 | 17 |
| 99 | NESTED LOOPS OUTER | | 1 | 752 | 16 |
| 100 | NESTED LOOPS OUTER | | 1 | 641 | 15 |
| 101 | NESTED LOOPS OUTER | | 1 | 576 | 14 |
| 102 | NESTED LOOPS OUTER | | 1 | 554 | 13 |
| 103 | NESTED LOOPS OUTER | | 1 | 487 | 12 |
| 104 | NESTED LOOPS OUTER | | 1 | 434 | 11 |
| 105 | NESTED LOOPS | | 1 | 368 | 10 |
| 106 | NESTED LOOPS | | 1 | 102 | 9 |
| 107 | NESTED LOOPS OUTER | | 1 | 85 | 8 |
| 108 | NESTED LOOPS | | 1 | 68 | 7 |
| 109 | NESTED LOOPS | | 50 | 2700 | 6 |
| 110 | HASH JOIN | | 53 | 1696 | 5 |
| 111 | INLIST ITERATOR | | | | |
| 112 | TABLE ACCESS BY INDEX ROWID| | 520 | 10400 | 3 |
| 113 | INDEX RANGE SCAN | | 520 | | 1 |
| 114 | INLIST ITERATOR | | | | |
| 115 | TABLE ACCESS BY INDEX ROWID| | 91457 | 1071K| 1 |
| 116 | INDEX UNIQUE SCAN | | 2 | | 1 |
| 117 | TABLE ACCESS BY INDEX ROWID | | 1 | 22 | 1 |
| 118 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 119 | TABLE ACCESS BY INDEX ROWID | | 1 | 14 | 1 |
| 120 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 121 | TABLE ACCESS BY INDEX ROWID | | 1 | 17 | 1 |
| 122 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 123 | TABLE ACCESS BY INDEX ROWID | | 1 | 17 | 1 |
| 124 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 125 | TABLE ACCESS BY INDEX ROWID | | 1 | 266 | 1 |
| 126 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 127 | TABLE ACCESS BY INDEX ROWID | | 1 | 66 | 1 |
| 128 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 129 | TABLE ACCESS BY INDEX ROWID | | 1 | 53 | 1 |
| 130 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 131 | TABLE ACCESS BY INDEX ROWID | | 1 | 67 | 1 |
| 132 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 133 | INDEX RANGE SCAN | | 1 | 22 | 1 |
| 134 | TABLE ACCESS BY INDEX ROWID | | 1 | 65 | 1 |
| 135 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 136 | TABLE ACCESS BY INDEX ROWID | | 1 | 111 | 1 |
| 137 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 138 | TABLE ACCESS BY INDEX ROWID | | 1 | 24 | 1 |
| 139 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 140 | TABLE ACCESS BY INDEX ROWID | | 1 | 24 | 1 |
| 141 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 142 | TABLE ACCESS BY INDEX ROWID | | 1 | 46 | 1 |
| 143 | INDEX UNIQUE SCAN | | 1 | | 1 |
| 144 | TABLE ACCESS BY INDEX ROWID | | 1 | 29 | 1 |
| 145 | INDEX UNIQUE SCAN | | 1 | | 1 |
----------------------------------------------------------------------------------------------------------

The total cost of a statement is usually equal to or greater than the cost of any of its child operations. There are at least 4 exceptions to this rule.
Your plan looks like #3 but we can't be sure without looking at code.
1. FILTER
Execution plans may depend on conditions at run-time. These conditions cause FILTER operations that will dynamically decide which query block to execute. The example below uses a static condition but still demonstrates the concept. Part of the subquery is very expensive but the condition negates the whole thing.
explain plan for select * from dba_objects cross join dba_objects where 1 = 2;
select * from table(dbms_xplan.display(format => 'basic +cost'));
Plan hash value: 3258663795
--------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 (0)|
| 1 | FILTER | | |
| 2 | MERGE JOIN CARTESIAN | | 11M (3)|
...
2. COUNT STOPKEY
Execution plans sum child operations up until the final cost. But child operations will not always finish. In the example below it may be correct to say that part of the plan costs 214. But because of the condition where rownum <= 1 only part of that child operation may run.
explain plan for
select /*+ no_query_transformation */ *
from (select * from dba_objects join dba_objects using (owner))
where rownum <= 1;
select * from table(dbms_xplan.display(format => 'basic +cost'));
Plan hash value: 2132093199
-------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 (0)|
| 1 | COUNT STOPKEY | | |
| 2 | VIEW | | 4 (0)|
| 3 | VIEW | | 4 (0)|
| 4 | NESTED LOOPS | | 4 (0)|
| 5 | VIEW | DBA_OBJECTS | 2 (0)|
| 6 | UNION-ALL | | |
| 7 | HASH JOIN | | 3 (34)|
| 8 | INDEX FULL SCAN | I_USER2 | 1 (0)|
| 9 | VIEW | _CURRENT_EDITION_OBJ | 1 (0)|
| 10 | FILTER | | |
| 11 | HASH JOIN | | 214 (3)|
...
3. Subqueries in the SELECT column list
Cost aggregation does not include subqueries in the SELECT column list. A query like select ([expensive query]) from dual; will have a very small total cost. I don't understand the reason for this; Oracle estimates the subquery and he number of rows in the FROM, surely it could multiply them together for a total cost.
explain plan for
select dummy,(select count(*) from dba_objects cross join dba_objects) from dual;
select * from table(dbms_xplan.display(format => 'basic +cost'));
Plan hash value: 3705842531
---------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | MERGE JOIN CARTESIAN | | 11M (3)|
...
4. Other - rounding? bugs?
About 0.01% of plans still have unexplainable cost issues. I can't find any pattern among them. Perhaps it's just a rounding issue or some rare optimizer bugs. There will always be some weird cases with a any model as complicated as the optimizer.
Check for more exceptions
This query can find other exceptions, it returns all plans where the first cost is less than the maximum cost.
select *
from
(
--First and Max cost per plan.
select
sql_id, plan_hash_value, id, cost
,max(cost) keep (dense_rank first order by id)
over (partition by sql_id, plan_hash_value) first_cost
,max(cost)
over (partition by sql_id, plan_hash_value) max_cost
,max(case when operation = 'COUNT' and options = 'STOPKEY' then 1 else 0 end)
over (partition by sql_id, plan_hash_value) has_count_stopkey
,max(case when operation = 'FILTER' and options is null then 1 else 0 end)
over (partition by sql_id, plan_hash_value) has_filter
,count(distinct(plan_hash_value))
over () total_plans
from v$sql_plan
--where sql_id = '61a161nm1ttjj'
order by 1,2,3
)
where first_cost < max_cost
--It's easy to exclude FILTER and COUNT STOPKEY.
and has_filter = 0
and has_count_stopkey = 0
order by 1,2,3;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Q: how to unnest bags from complicated data structure in PIG - hadoop

Related

Another Combination

Pivot Table in Hive and Create Multiple Columns for Unique Combinations

Determinate unique values from oracle join?

Slow aggregation on big neo4j graph

Oracle "Total" plan cost is really less than some of it's elements

Categories

Resources