Oracle 11g XE slow query execution - oracle

I face a problem regarding a select query i have created. The query is the following:
SELECT --184.791
C.MSISDN
FROM
CONTACTS_HISTORY C
INNER JOIN WAVECONTACTS_HISTORY WC
ON C.CONTACTSID = WC.CONTACTSID
WHERE
C.CAMPAIGNSID = 472;
The C.CAMPAIGNSID, C.CONTACTSID and WC.CONTACTSID columns are indexed and the WC.CONTACTSID is a foregin key to the C.CONTACTSID.
The CONTACTS_HISTORY table has 3.000.000 records and the WAVECONTACTS_HISTORY table 2.000.000 records.
When I include the join in the query the execution is too slow.
The execution plan from the SQLDeveloper has a total cost of 3.
I can not understand why the execution is too slow. Is this because of the limitation of the XE edition?
The Oracle DB is installed on my laptop Intel Core i3, 8GB RAM (but I am aware of the limitations of this edition to 1 CPU, 1 Gb RAM)
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 3
NESTED LOOPS
NESTED LOOPS 3
TABLE ACCESS WAVECONTACTS_HISTORY FULL 2
INDEX IX_CONTACTS_HISTORY_CMPSID RANGE SCAN 1
Access Predicates
TABLE ACCESS CONTACTS_HISTORY BY INDEX ROWID 1
Filter Predicates
WC.CONTACTSID=C.CONTACTSID

All of the costs are out of whack, especially this one:
TABLE ACCESS WAVECONTACTS_HISTORY FULL 2
That's a FULL TABLE SCAN of two million row table.
Most likely your statistics are stale. Gather fresh stats on the tables and indexes, and you should see a much smarter and efficient execution plan. Find out more.
That may not be the whole solution. Tuning is dependent on a number of factors, such as skew and distribution. For instance, if you have relatively few Campaigns in your Contact History and they're spread across the table the index on C.CAMPAIGNSID won't work any magic for you. If this is a query you're going to run a lot you should consider a compound index on (CAMPAIGNSID, CONTACTSID), in that order.
Alternatively, as you don't actually use any columns from WAVECONTACTS_HISTORY you could replace the join with an IN or EXISTS sub-query.

Related

Sybase Query Plan too large lots of "child tables"

I have a simple Query:
DELETE FROM TABLE1 WHERE ID=1
This table is referred by at least 90 (X) other table :
ADD CONSTRAINT FK_TABLEX_TABLE1
FOREIGN KEY (ID)
REFERENCES "db"."TABLE1"(ID)
Looking at the plan for the given query, it's really large.
We have DIRECT RI FILTER Operator (VA = 103) has 98 children. and then a bunch of SCAN Operator (VA = X) FROM TABLE TABLEX ...
The problem is that when we use batch deletion (JDBC) and reach immediately the procedure cache limit (which we did try to increase a lot, but the solution is not ok because children tables number is meant to increase).
However, my DELETE queries should only look into 5 tables for each different batch (I know that IDs XXX,YYY and ZZZ are referred by TABLEX, TABLEX+1, ... TABLEX+4). Is there a way to force the query plan to limit the scan to some tables ?
Modifying the data model is not really an option.
Sybase 15.0.3
JConnect3d 6.0.5
I did reduce the number of queries done in the batch and going from 1000 to 100 won't crash anymore. It's a solution but I would like to reduce the query plan, not the batch.
Finally we ended up having a batch size computed from the number of child tables.
The more table the less update in the same batch.

Microstrategy / Oracle - slow performance

We have a Microstrategy / Oracle setup which has a fact table with 50+ billion rows (that is 50,000,000,000+ rows).
The system performance is very unstable; sometimes it runs OK but at other times it is very slow, i.e. simple reports will take 20 minutes to run!
The most weird part: if we add more constraints to a report (i.e. more where clauses) that end up in LESS data coming back, the report actually slows down further.
We are able to pick up the SQL from Microstrategy, and we find that the SQL itself runs quite slowly as well. However, since the SQL is generated by Microstrategy, we do not have much control over the SQL.
Any thoughts as to where we should look?
Look at the SQL and see if you can add any more useful indexes. Check that the query is using the indexes you think it should be.
Check that every column that is filtered has an index.
Remember to update the statistics for all the tables involved: with tables so big it is very important.
Look at the query plan and check that there aren't table scan on large tables (you can accept them on small look up tables)
EnableDescribeParam=1 in ODBC driver
If your environment is like mine then what i will provide may help with your request if not it may help others. we too have a table like that and after weeks of trying to add this index or that index the ultimate solution was setting parallel on the table and at the index level.
report runtime 25 mins
alter table TABLE_NAME parallel(degree 4 instances 4);
alter index INDEX_NAME parallel(degree 4 instances 4);
report runtime 6 secs.
there is criteria for a table to have parallel set up on it such as must be larger tha 1G, but play with parallel threads to get most optimal time.

Oracle: Having join or simple from/where clause has no affect on performance?

My manger just told me that having joins or where clause in oracle query doesn't affect performance even when you have million records in each table. And I am just not satisfied with this and want to confirm that.
which of the following queries is better in performance on oracle and in postgresql also
1- select a.name,b.salary,c.address
from a,b,c
where a.id=b.id and a.id=c.id;
2- select a.name,b.salary,c.address
from a
JOIN b on a.id=b.id
JOIN C on a.id=c.id;
I have tried Explain in postgresql for a small data set and query time was same (may be because I have just few rows) and right now I have no access to oracle and actual database to analyze the Explain in real envoirnment.
Using JOINS makes the code easier to read, since it's self-explanatory.
In speed there is no difference (I have just tested it) and the execution plan is the same
If the query optimizer is doing its job right, there should be no difference between those queries.
They are just two ways to specify the same desired result.

Improving DELETE and INSERT times on a large table that has an index structure

Our application manages a table containing a per-user set of rows that is the
result of a computationally-intensive query. Storing this result in a table
seems a good way of speeding up further calculations.
The structure of that table is basically the following:
CREATE TABLE per_user_result_set
( user_login VARCHAR2(N)
, result_set_item_id VARCHAR2(M)
, CONSTRAINT result_set_pk PRIMARY KEY(user_login, result_set_item_id)
)
;
A typical user of our application will have this result set computed 30 times a
day, with a result set consisting of between 1 single items and 500,000 items.
A typical customer will declare about 500 users into the production database.
So, this table will typically consist of 5 million rows.
The typical query that we use to update this table is:
BEGIN
DELETE FROM per_user_result_set WHERE user_login = :x;
INSERT INTO per_user_result_set(...) SELECT :x, ... FROM ...;
END;
/
After having run into performance issues (the DELETE part would take much time)
we decided to have a GLOBAL TEMPORARY TABLE (on commit delete rows) to hold a
“delta” of rows to suppress from the table and rows to insert into it:
BEGIN
INSERT INTO _tmp
SELECT ... FROM ...
MINUS SELECT result_set_item_id
FROM per_user_result_set
WHERE user_login = :x;
DELETE FROM per_user_result_set
WHERE user_login = :x
AND result_set_item_id NOT IN (SELECT result_set_item_id
FROM _tmp
);
INSERT INTO per_user_result_set
SELECT :x, result_set_item_id
FROM _tmp;
COMMIT;
END;
/
This has improved performance a bit, but still this is not satisfactory. So
we're exploring ways to speed up that process and here are the issues that
we experience:
We would have loved to use table partitioning (partitioning by user_login).
But partitioning is not always available (on our test databases we hit
ORA-00439). Our customers cannot all afford Oracle Enterprise Edition with
paid additional features.
We could make the per_user_result_set table GLOBAL TEMPORARY, so that it
is isolated and we can TRUNCATE it for example… but our application
sometimes loses connection to Oracle due to network problems, and will
automatically reconnect. By that time we lose the contents of our
computation.
We could split that table into a certain number of buckets, make a view that
UNIONs ALL all those buckets, and triggers INSTEAD OF UPDATE and DELETE on
that view, and repart rows according to ORA_HASH(user_login) % num_buckets.
But we are afraid this could make SELECT operations much slower.
This would result in a constant number of tables, with smaller indexes
affected in DELETE or INSERT operations. In short, “partioning table for the
poor”.
We've tried to ALTER TABLE per_user_result_set NOLOGGING. This does not
improve things much.
We've tried to CREATE TABLE ... ORGANIZATION INDEX COMPRESS 1. This speeds
things up by a ratio of 1:5.
We've tried to have one table per user_login. That's exactly what we could
have by partitioning using a number of partitions equal to the number of
distinct user_logins and a well-chosen hash function. Performance factor is
1:10. But I would really like to avoid this solution: have to maintain a
huge number of indexes, tables, views, on a per-user basis. This would be
an interesting performance gain for the users, but not for us maintainers of
the systems.
Since the users work at the same time there is no way that we create a new
table and swap it with the old one.
What could you please suggest in complement to these approaches?
Note. Our customers run Oracle Databases from 9i to 11g, and XE editions to
Enterprise edition. That's a wide variety of versions that we need to be
compatible with.
Thanks.
We've tried to have one table per user_login. That's exactly what we
could have by partitioning using a number of partitions equal to the
number of distinct user_logins and a well-chosen hash function.
Performance factor is 1:10. But I would really like to avoid this
solution: have to maintain a huge number of indexes, tables, views, on
a per-user basis. This would be an interesting performance gain for
the users, but not for us maintainers of the systems.
Can you then make a stored procedure to generate these table on a per-user basis? Or, better yet, have this stored procedure do the most appropriate thing depending on the licensure of Oracle being supported?
If Partitioning option
then create or truncate user-specific list partition
Else
drop user-specific result table
Create user-specific result table
as Select from template result table
create indexes
create constraints
perform grants
end if
Perform insert
If all your users were on 11g Enterprise Edition I would recommend you to use Oracle's built-in result-set caching rather than trying to roll your own. But that is not the case, so let's move on.
Another attractive option might be to use PL/SQL collections rather than tables. Being in memory these are faster to retrieve and require less maintenance. They are also supported in all the versions you need. However, they are session variables, so if you have lots of users with big result sets that would put stress on your PGA allocations. Also their data would be lost when the network connection drops. So that's probably not the solution you're looking for.
The core of your problem is this statement:
DELETE FROM per_user_result_set WHERE user_login = :x;
It's not a problem in itself but you have extreme variations in data distribution. Bluntly, the deletion of a single row is going to have a very different performance profile from the deletion of half a million rows. And because your users are constantly refreshing their data there is no way you can handle that, except by giving your users their own tables.
You say you don't want to have a table per user because
"[it] would be an interesting performance gain for the users, but not
for us maintainers of the systems,"
Systems exist for the benefit of our users. Convenience for us is great as long as it helps us to provide better service to them. But their need for a good working experience trumps ours: they pay the bills.
But I question whether having individual tables for each user really increases the work load. I presume each user has their own account, and hence schema.
I suggest you stick with index-organized tables. You only need columns which are in the primary key and maintaining a separate index is unnecessary overhead (for both inserting and deleting). The big advantage of having a table per user is that you can use TRUNCATE TABLE in the refresh process, which is a lot faster than deletion.
So your refresh procedure will look like this:
BEGIN
TRUNCATE TABLE per_user_result_set REUSE STORAGE;
INSERT INTO per_user_result_set(...)
SELECT ... FROM ...;
DBMS_STATS.GATHER_TABLE_STATS(user
, 'PER_USER_RESULT_SET'
, estimate_percent=>10);
COMMIT;
END;
/
Note that you don't need to include the USER column any more, so yur table will just have the single column of result_set_item_id (another indication of the suitability of IOT.
Gathering the table stats isn't mandatory but it is advisable. You have a wide variability in the size of result sets, and you don't want to be using an execution plan devised for 500000 rows when the table has only one row, or vice versa.
The only overhead is the need to create the table in the user's schema. But presumably you already have some set-up for a new user - creating the account, granting privileges, etc - so this shouldn't be a big hardship.

How can I speed up a diff between tables?

I am working on doing a diff between tables in postgresql, it takes a long time, as each table is ~13GB...
My current query is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
When I do a diff on the two (unindexed) tables it takes 1:40 hours (1 hour and 40 minutes) In order to get both the new and removed rows I need to run the query twice, bringing the total time to 3:30 hours.
I ran the Postgresql EXPLAIN query on it to see what it was doing. It looks like it is sorting the first table, then the second, then comparing them. Well that made me think that if I indexed the tables they would be presorted and the diff query would be much faster.
Indexing each table took 45 minutes. Once Indexed, each Diff took 1:35 hours.
Why do the indexes only shave off 5 minutes off the total diff time? I would assume that it would be more than half, since in the unindexed queries I am sorting each table twice (I need to run the query twice)
Since one of these tables will not be changing much, it will only need to be indexed once, the other will be updated daily. So the total runtime for the indexed method is 45 minutes for the index, plus 2x 1:35 for the diff, giving a total of 3:55 hours, almost 4hours.
What am I doing wrong here, I can't possibly see why with the index my net diff time is larger than without it?
This is in slight reference to my other question here: Postgresql UNION takes 10 times as long as running the individual queries
EDIT:
Here is the schema for the two tables, they are identical except the table name.
CREATE TABLE bulk.blue
(
"partA" text NOT NULL,
"type" text NOT NULL,
"partB" text NOT NULL
)
WITH (
OIDS=FALSE
);
In the statements above you are not using the indexes.
You could do something like:
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
You could then use the same statement to show which tables had missing values
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
WHERE ISNULL(a.someID) OR ISNULL(b.someID)
This should give you the rows that were missing in table A OR table B
Confirm you indexes are being used (they are likely not in such a generic except statement), but you are not joining against a specified column(s) so likely that lack of explicit join will not make for an optimized query:
http://www.postgresql.org/docs/9.0/static/indexes-examine.html
This will help you view the explain analyze more clearly:
http://explain.depesz.com
Also, make sure you do an analyze on the table after you create the index if you want it to perform well right away:}
The queries as specified require a comparison of every column of the tables.
For example if tableA and tableB each have five columns then the query is having to compare tableA.col1 to tableB.col1, tableA.col2 to tableB.col2, . . . tableA.col5 to tableB.col5
If there are just few columns that uniquely identify a record instead of all the columnS in the table then joining the tables on the specific columns that uniquely identify a record will improve your performance.
The above statement assumes that a primary key has not been created. If a primary key has been defined to indicated which columns uniquely identify a record then I believe the EXCEPT statement would take that into consideration.
What kind of index did you apply? Indexes are only useful to improve WHERE conditions. If you're doing a select *, you're grabbing all the fields and the index is probably not doing anything, but taking up space, and adding a little more processing behind the scenes for the db-engine to compare the query to the index cache.
Instead of SELECT *, you can try selecting your unique fields and create an index for those unique fields
You can also use an OUTER JOIN to show results from both tables that did not match on the unique fields
You may want to consider is clustering your tables
What version of Postgres are you running?
When was the last time you vacuumed?
Other than the above, 13GB is pretty large, so you'll want to check your config settings. It shouldn't take hours to run that, unless you don't have enough memory on your system.

Resources