Update queries have never been my strong point, I am hoping someone can help me make a more efficient query?
I am trying to update a table with the total sales for a given product, for a given customer.
The table I am looking to update is the sales column of the below 'Estimate' table:
ID Customer Product Estimate Sales
--------------------------------------------
1 A 303 100 20
2 A 425 20 30
3 C 1145 500 250
4 F 801 25 0
The figure I am using to update is taken from the 'Sales' view:
Product Customer Actual
------------------------------
303 A 30
500 C 2
425 A 88
1145 C 700
The query I have written is:
UPDATE estimate e
SET e.sales = (SELECT s.actual FROM sales s
WHERE e.customer = s.customer and e.product = s.product)
WHERE EXISTS (SELECT 1 sales s
WHERE e.customer = s.customer and e.product = s.product)
An added complication is that 'estimates' exists between a range of dates and need to be updated for sales during that period only. My 'sales' view above take care of that, but I have left this out of the example for simplicity sake.
I initially ran the query using test data of only around 20 records and it ran in around 3 /4 seconds. My actual data is 7,000+ records and when I run the query here, my browser times out before I get any results.
I suspect that the query is updating the whole table for every record in the view or vice versa?
Any help much appreciated.
Cheers
Andrew
Try a merge instead:
merge into estimate tgt
using sales src
on (tgt.customer = src.customer and tgt.product = src.product)
when matched then
update tgt.sales = src.actual;
By doing the merge instead of an update, you negate the need to repeat the query in the set clause in the where clause, which ought to speed things up a bit.
Another thing to check is how many indexes do you have on the estimate table that has the src column in it? If you have several, it might be worth reducing the number of indexes. Each index that has to be updated is an overhead when you update the row(s) in the table.
Also, do you have triggers on the estimate table? Those might be slowing things down as well.
Or maybe you're missing an index on the sales table - an index on (customer, product and sales) ought to help, as the query should then be able to avoid going to the table at all since the data it needs from that table would all be in the index.
Another argument you could have is to not do the update at all. If the information is available in the Sales table, why do you need to bother updating the estimate table at all? You could do that as a join when querying the estimate table. Of course, it depends on how often you'd be querying for the estimate vs actual sales information vs how often they'd be updated. If you update frequently and read infrequently, then I'd skip the update and just query the two tables directly.
Related
This is a re-occuring Problem for me. I have statements that work well for a while and after a while the optimizer decides to choose another execution plan. This even happens for when I query for exactly one (composite) primary key.
When I look up the execution plan in dba_hist_sql_plan, it shows me costs of 20 for the query using the primary key index and costs of 270 for the query doing a full table scan.
plan_hash_value Operation Options Cost Search_Columns
2550672280 0 SELECT STATEMENT 20
2550672280 1 PARTITION HASH SINGLE 20
2550672280 2 TABLE ACCESS BY LOCAL INDEX ROWID 20
2550672280 3 INDEX RANGE SCAN 19 1
3908080950 0 SELECT STATEMENT 270
3908080950 1 PARTITION HASH SINGLE 270
3908080950 2 TABLE ACCESS FULL 270
I already noticed that the optimizer only uses the first column in the Primary key index and then does a range scan. But my real question is: Why does the optimizer choose the higher cost execution plan? It's not that both executions plans are used at the same time, I notice a switch within one snapshot and then it stays like that for several hours/days. So it can't be an issue of bind peeking.
Our current solution is that I call our DBA and he flushes the Statement Cache. But this is not really sustainable.
EDIT:
The SQL looks something like this: select * from X where X.id1 = ? and X.id2 = ? and X.id3 = ?
with (id1,id2,id3) being the composite primary key (with a unique index) on the table.
Maybe it's related to one bug on Oracle 11g.
Bug 18377553 : POOR CARDINALITY ESTIMATE WITH HISTOGRAMS AND VALUES > 32 BYTES
When your data is like :
AAAAAAAAAAAAAAAAAAAAmyvalue
AAAAAAAAAAAAAAAAAAAAsomeohtervalue
AAAAAAAAAAAAAAAAAAAAandsoon
B1234
Histograms do not work well.
The solution is disabling histograms on primary key and all will start working smoothly.
Most likely clustering factor and blevel of the index could be very high. Check the blevel by querying dba_indexes. If blevel is greater than 3 try rebuilding the index.
Also check whether the index created for primary key is unique or not. As per the plan it is using range scan instead of unique scan. Most likely the index is not unique.
Apparently the optimizer doesn't correctly display costs regarding type conversions. The root cause for this Problem was incorrect type mapping for a date value. While the column in the database is of type DATE, the JDBC type was incorrectly java.sql.Timestamp. To compare a DATE column with a Timestamp search parameter, all values in the table need to be transferred to Timestamp first. Which is additional cost and renders an index unusable.
I have a table with a huge amount of data. It is partitioned by week. This table contains a column named group. Each group could have multiple records of weeks. For example:
List item
gr week data
1 1 10
1 2 13
1 3 5
. . 6
2 2 14
2 3 55
. . .
I want to create a table based on one group. The creation currently is taking ~23 minutes on Oracle 11g. This is a long time since I have to repeat the process for each group and I have many groups. what is the best fastest way to create the table ?
Create all tables then use INSERT ALL WHEN
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_9014.htm#i2145081
The data will be read only once.
insert all
when gr=1 then
into tab1 values (gr, week, data)
when gr=2 then
into tab2 values (gr, week, data)
when gr=3 then
into tab3 values (gr, week, data)
select *
from big_table
The best speed up you reach if you don't copy the data on group basis and process then week by week, but you don't say what you will reach so it is not possible to comment (this approach may be of course difficult or impracticable; but you should at least consider it).
Therefore below some hints how to extract the group data:
remove all indexes as this will only block space - all you need to do is one large FULL TABLE SCAN
check the available space and size of each group; maybe you can process several groups in one pass
deploy parallel query
.
create table tmp as
select /*+ parallel(4) */ * from BIG_TABLE
where group_id in (..list of groupIds..);
Please note that parallel mode must be enabled in the database, ask your DBA if you are unsure. The point is that the large FULL TABLE SCAN is performed by several sub-processes (here 4) which may (dependent on your mileage) cut the elapsed time.
I have a quiet simple query:
SELECT
contract.ctrId,
contract.ctrNr
FROM
changeLog,
contract
where
changelog.entid in (select contract.ctrid from contract where contract.ctrnr LIKE '1000002%');
This query takes 800 ms.
If I change the query with the inner select clause to the result of the select (which is a single number)
SELECT
contract.ctrId,
contract.ctrNr
FROM
changeLog,
contract
where
changelog.entid in (100000001611624);
This query only takes 16 ms.
The inner select executed alone takes 4 ms.
Chnagelog.entid has an index. Contract.ctris id a primary key. The contract table has just 2 rows the changelog table has about 40 thousand.
Still I really cannot get my head around this. What can be the problem with the inners select?
Sorry for providing not enough details, I will be more precise and follow the tag descriptions next time.
The join of changelog and contract did not have much effect on performance.
The problem here was that changelog is a VIEW. It is a union of changelogActive and changelogPendig tables. Postgres needed to join the two tables in the view on every select.
Thank you guys all for the hints, you helped a lot!
There is an index at table invt_item_d on (item_id & branch_id & co_id) columns.
The plan results for the first query are TABLE ACCESS FULL and cost is 528,
results for the second query are INDEX FAST FULL SCAN (my index) and cost is 27.
The only difference is, as you can see, the selected column is used in index on the second query.
Is there something wrong with this? And please, can you tell me what should I do to fix this at db administration level?
select d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
select d.item_id
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
EDIT:
i made a new query and this query's cost is 529, with TABLE ACCESS FULL.
select qty from invt_item_d
so it doesn't matter if i use an index or not. Some says this is normal, is this a normal behaviour really?
In the first case, the table must be accessed, since the "qty" column is only stored in the table.
In the second case, all the columns used in the query can be read from the index, skipping the table read altogether.
You can add another index on columns (item_id, branch_id, co_id, qty) and it will most probably be used in the first query.
From the Oracle documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
A fast full index scan is a full index scan in which the database
accesses the data in the index itself without accessing the table, and
the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the
index must have either:
A NOT NULL constraint
A predicate applied to it that prevents nulls from being considered in the query result set
This is exactly the main purpose of using index -- make search faster.
Querying columns with indexes are faster compared to querying columns without indexes.
Its basic oracle knowledge.
I am adding another answer because it seems to be more convinient.
First:
" i doesn't hit the index because there are 34000 rows, not millions". This is COMPLETELY WRONG and a dangerous understanding.
What I meant was, if there are a few thousand rows, and the index is not hit(oracle engine does a full table scan(TABLE ACCESS FULL) then), its not a big deal. Oracle is fast enough to read few thousand rows in a matter of a second(even without indexes) , and hence you wont feel the difference.The query is still slower(than the occasion when there is an index) , but its is so minimally slower that you wont feel the difference.
But, if there are millions of rows, the execution of the query will be much, much slower without index ( as this time it will scan millions of rows in a full table scan)and your performance will be hit.
Second: Why on earth do you have to loop over a table with 34000 rows, that too 4000 times???
Thats a terrible approach. Avoid loops as much as possible.There has to be a better approach!
Third:
You can force the oracle optimiser to hit the index by using the index hint.You will need to know the name of the index for that.
select /*+ index(invt_item_d <index_name>) */
d.qty
from invt_item_d d
where d.item_id = 999
and d.branch_id = 888
and d.co_id = 777
Here is the link to a stack overflow question on index hint
I have a table in oracle 10g with around 51 columns and 25 Million number of records in it. When I execute a simple select query on the table to extract 3 columns I am getting the cost too high around 182k. So I need to reduce the cost effect. Is there any possible way to reduce it?
Query:
select a,b,c
from X
a - char
b - varchar2
c - varchar2
TIA
In cases like this it's difficult to give good advice without knowing why you would need to query 25 million records. As #Ryan says, normally you'd have a WHERE clause; or, perhaps you're extracting the results into another table or something?
A covering index (i.e. over a,b,c) would probably be the only way to make any difference to the performance - the query could then do a fast full index scan, and would get many more records per block retrieved.
Well...if you know you only need a subset of those values, throwing a WHERE clause on there would obviously help out quite a bit. If you truly need all 25 million records, and the table is properly indexed, then I'd say there's really not much you can do.
Yes, better telling the purpose of the select as jeffrey Kemp said.
If normal select, you just need to give index to your fields that mostly you can do, provide table statistic on index (DBMS_STATS.GATHER_TABLE_STATS), check the statistic of each field to be sure your index is right (Read: http://bit.ly/qR12Ul).
If need to load to another table, use cursor, limit the records of each executing and load to the table via the bulk insert (FORALL technique).