I have a custom form with the following Datasource setup;
SalesTable
SalesLine (SalesTable - Inner Join)
InventTable (SalesLine - Inner Join)
InventDim (SalesLine - Inner Join)
...which works without any performance issue.
When I add the following;
InventHazardousGroup (InventTable - Outer Join)
...I see no performance issues in our development environment, however in the production environment the query is terribly slow, which means the form takes a long time to load.
SQL Statement trace log produced the following output in both environments;
(I have ended the field list with etc becuase it is long);
SELECT A.SALESID,A.SALESNAME,A.RESERVATION,A.CUSTACCOUNT,A.INVOICEACCOUNT,A.DELIVERYDATE,A.DELIVERYADDRESS,A.URL,A.PURCHORDERFORMNUM,A.SALESTAKER,A.SALESGROUP,A.FREIGHTSLIPTYPE,A.DOCUMENTSTATUS,A.INTERCOMPANYORIGINALSALESID,etc
FROM {OJ INVENTTABLE C LEFT OUTER JOIN INVENTHAZARDOUSGROUP E ON ((E.DATAAREAID=?)
AND (C.HAZARDOUSGROUPID=E.HAZARDOUSGROUPID))},SALESTABLE A,SALESLINE B,INVENTDIM D
WHERE ((A.DATAAREAID=?)
AND (A.SALESTYPE=?))
AND ((B.DATAAREAID=?)
AND (A.SALESID=B.SALESID))
AND ((C.DATAAREAID=?)
AND (B.ITEMID=C.ITEMID))
AND ((D.DATAAREAID=?)
AND (B.INVENTDIMID=D.INVENTDIMID))
ORDER BY A.DATAAREAID,A.SALESID OPTION(FAST 1)
Is there any reason why this should be so slow in one environment but not in another? The data I have tested on in the development environment is quite recent, around 1 month old. I have the same performance problem in the production environment, in a different company.
I have had this issue crop up before and I do not think that it has anything to do with the outer join. It is most likely because the number of queries that the form generates in production vs. development. SQL attempts to cache the query when it is used and AX likes to pass objects into SQL as variables. Most likely, you have a bad cache plan in Production that then gets used by all the users. I suggest using Force Literals . I have used it sparingly in a few places and it has had a major impact on performance.
Check the indexes in AX exist in SQL Server.
Check the execution plan of the query.
The easiest is to log to the infolog (long queries setup on the SQL tab on User settings), then double-click the offending query.
Otherwise try an index rebuild of the tables and a create statistics of the tables.
What version of SQL server are you running in your development, and what version in production. Do a DBCC TRACESTATUS(-1); to determine what flags are on in dev vs prod. Make sure those do not differ. I have seen issues where when they do, a performance issue shows up in one, but not the other.
Does the query ALWAYS run slow in production, or does it only SOMETIMES run slow?
Related
I am writing an application in C# that will execute queries against a local and linked databases (all Oracle 10g and newer), and I want to make sure I understand who is doing what when a linked database is being queried.
For example, for a simple query such as
SELECT * FROM FOO#DB_LINK
What is the local database server responsible for? I assume that this will use the CPU, disk, and memory on the database server that hosts DB_LINK, but what impact does this query have on the local database server resources?
What if the query is a little more complex, such as
SELECT * FROM FOO#DB_LINK F INNER JOIN BAR#DB_LINK B ON F.FOOBAR = B.FOOBAR
Is the entire query executed on the server that hosts DB_LINK, or is the INNER JOIN performed on the local server? If the INNER JOIN is performed by the local database, is it able to utilize the indexes that are on the linked tables (I wouldn't think so)? Is there a way to tell Oracle to execute the entire query on the linked database?
In my application, my queries will always be completely against either the local database, or a selected linked database. In other words, I will never have a query where I am getting data from both the local and a linked database at the same time like
SELECT * FROM FOO F INNER JOIN BAR#DB_LINK B ON F.FOOBAR = B.FOOBAR
To summarize,
I am only dealing with Oracle 10g or newer databases.
What is the local database server responsible for when a query (however complex) is strictly against linked tables?
What are the ways (if any) to optimize or give Oracle hints about how to best execute these kinds of queries? (examples in C# would be great)
Like most things related to the optimizer, it depends.
If you generate a query plan for a particular query, the query plan will tell you what if anything the local database is doing and which operations are being done on the remote database. Most likely, if statistics on the objects are reasonably accurate and the query references only objects in a single remote database, the optimizer will be smart enough to push the entire query to the remote server to execute.
Alas, the optimizer is not always going to be smart enough to do the right thing. If that happens, you can most likely resolve it by adding an appropriate driving_site hint to the query.
SELECT /*+ driving_site(F) */ *
FROM FOO#DB_LINK F
INNER JOIN BAR#DB_LINK B
ON F.FOOBAR = B.FOOBAR
Depending on how complex the queries are, how difficult it is to add hints to your queries, and how much difficulty you have in your environment getting the optimizer to behave, creating views in the remote database can be another way to force queries to run on the remote database. If you create a view on db_link that joins the two tables together and query that view over the database link, that will (in my experience) always force the execution to happen on the remote database where the view is defined. I wouldn't expect this option to be needed given the fact that you aren't mixing local and remote objects but I include it for completeness.
A 100% remote query will get optimized by the remote instance. The local instance will still need to allocate some memory and use CPU in order to fetch results from the remote server but the main work (things like hash joins and looping) will all be done by the remote instance.
When this happens, you will get a note in your local execution plan
Note
-----
- fully remote statement
As soon as something is required to be done on the local server (e.g an insert or if you join to a local table (including local dual)) as part of the statement, then the query becomes distributed, only one server can be considered the driving site and it will typically be local (I can't come up with a demo where this chooses the remote site, even when it's cheaper so maybe it's not cost based). Typically this will end up with you hitting some badness somewhere - perhaps a nested loop join against remote tables computed on the local side.
One thing to keep in mind with distributed queries - the optimizing instance will not look at histogram information from the other instance.
I have an Oracle bind query that is extremely slow (about 2 minutes) when it executes in my C# program but runs very quickly in SQL Developer. It has two parameters that hit the tables index:
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
Also, if I remove the bind variables and create dynamic sql, it runs just like it does in SQL Developer.
Any suggestion?
BTW, I'm using ODP.
If you are replacing the bind variables with static varibles in sql developer, then you're not really running the same test. Make sure you use the bind varibles, and if it's also slow you're just getting bit by a bad cached execution plan. Updating the stats on that table should resolve it.
However if you are actually using bind variables in sql developers then keep reading. The TLDR version is that parameters that ODP.net run under sometimes cause a slightly more pessimistic approach. Start with updating the stats, but have your dba capture the execution plan under both scenarios and compare to confirm.
I'm reposting my answer from here: https://stackoverflow.com/a/14712992/852208
I considered flagging yours as a duplicate but your title is a little more concise since it identifies the query does run fast in sql developer. I'll welcome advice on handling in another manner.
Adding the following to your config will send odp.net tracing info to a log file:
This will probably only be helpful if you can find a large gap in time. Chances are rows are actually coming in, just at a slower pace.
Try adding "enlist=false" to your connection string. I don't consider this a solution since it effecitively disables distributed transactions but it should help you isolate the issue. You can get a little bit more information from an oracle forumns post:
From an ODP perspective, all we can really point out is that the
behavior occurs when OCI_ATR_EXTERNAL_NAME and OCI_ATR_INTERNAL_NAME
are set on the underlying OCI connection (which is what happens when
distrib tx support is enabled).
I'd guess what you're not seeing is that the execution plan is actually different (meaning the actual performance hit is actually occuring on the server) between the odp.net call and the sql developer call. Have your dba trace the connection and obtain execution plans from both the odp.net call and the call straight from SQL Developer (or with the enlist=false parameter).
If you confirm different execution plans or if you want to take a preemptive shot in the dark, update the statistics on the related tables. In my case this corrected the issue, indicating that execution plan generation doesn't really follow different rules for the different types of connections but that the cost analysis is just slighly more pesimistic when a distributed transaction might be involved. Query hints to force an execution plan are also an option but only as a last resort.
Finally, it could be a network issue. If your odp.net install is using a fresh oracle home (which I would expect unless you did some post-install configuring) then the tnsnames.ora could be different. Host names in tnsnams might not be fully qualified, creating more delays resolving the server. I'd only expect the first attempt (and not subsequent attempts) to be slow in this case so I don't think it's the issue but I thought it should be mentioned.
Are the parameters bound to the correct data type in C#? Are the columns key1 and key2 numbers, but the parameters :key1 and :key2 are strings? If so, the query may return the correct results but will require implicit conversion. That implicit conversion is like using a function to_char(key1), which prevents an index from being used.
Please also check what is the number of rows returned by the query. If the number is big then possibly C# is fetching all rows and the other tool first pocket only. Fetching all rows may require many more disk reads in that case, which is slower. To check this try to run in SQL Developer:
SELECT COUNT(*) FROM (
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
)
The above query should fetch the maximum number of database blocks.
Nice tool in such cases is tkprof utility which shows SQL execution plan which may be different in cases above (however it should not be).
It is also possible that you have accidentally connected to different databases. In such cases it is nice to compare results of queries.
Since you are raising "Bind is slow" I assume you have checked the SQL without binds and it was fast. In 99% using binds makes things better. Please check if query with constants will run fast. If yes than problem may be implicit conversion of key1 or key2 column (ex. t.key1 is a number and :key1 is a string).
I have an Oracle 9 database from which my Delphi 2006 application reads data into a TSimpleDataSet using a SQL statement like this one (in reality it is more complex, of course):
select * from myschema.mytable where ID in (1, 2, 4)
My applications starts up and executes this query quite often during the course of the day, each time with different values in the in clause.
My DBAs have notified me that this is creating execessive load on the database server, as the query is re-parsed on every run. They suggested to use bind variables instead of building the SQL statement on the client.
I am familiar with using parameterized queries in Delphi, but from the article linked to above I get the feeling that is not exactly what bind variables are. Also, I would need theses prepared statements to work across different runs of the application.
Is there a way to prepare a statement containing an in clause once in the database and then have it executed with different parameters passed in from a TSimpleDataSet so it won't need to be reparsed every time my application is run?
My answer is not directly related to Delphi, but this problem in general. Your problem is that of the variable-sized in-list. Tom Kyte of Oracle has some recommendations which you can use. Essentially, you are creating too many unique queries, causing the database to do a bunch of hard-parsing. This will spike the CPU consumption (and DBA blood pressures) unnecessarily.
By making your query static, it can get by with a soft-parse or perhaps no parse at all! The DB can then cache the execution plan, the DBAs can deal with a more "stable" SQL, and overall performance should be improved.
If I have a VIEW with a bunch of INNER JOINs but I query against that VIEW SELECTing only columns that come from the main table, will SQL Server ignore the unnecessary joins in the VIEW while executing or do those joins still need to happen for some reason?
If it makes a different, this is on SQL Server 2008 R2. I know in either case that this is already not a great solution but but I'm attempting to find the lesser of 2 evils.
It might ignore the joins if they don't actually change the semantics. One example of this might be if you have a trusted foreign key constraint between the tables and you are only selecting columns from the referencing table (See example 9 in this article).
You would need to check the execution plan to be sure for your specific case.
If you don't pull fields from those tables, it may be faster to use an EXISTS clause - this will also prevent duplicates from the JOINed table cause dupes in your results.
Even if the optimizer ignores unnecessary joins you should just create another view to handle your particular case. Use and abuse of views (such as this case) can get out of hand and lead to obfuscation, confusion and very significant performance issues.
You might even consider refactoring the view that you're planning on using by having it join a set of "smaller" views to deliver the same data set that it does now... if it makes sense to do that of course.
Are there general ABAP-specific tips related to performance of big SELECT queries?
In particular, is it possible to close once and for all the question of FOR ALL ENTRIES IN vs JOIN?
A few (more or less) ABAP-specific hints:
Avoid SELECT * where it's not needed, try to select only the fields that are required. Reason: Every value might be mapped several times during the process (DB Disk --> DB Memory --> Network --> DB Driver --> ABAP internal). It's easy to save the CPU cycles if you don't need the fields anyway. Be very careful if you SELECT * a table that contains BLOB fields like STRING, this can totally kill your DB performance because the blob contents are usually stored on different pages.
Don't SELECT ... ENDSELECT for small to medium result sets, use SELECT ... INTO TABLE instead.
Reason: SELECT ... INTO TABLE performs a single fetch and doesn't keep the cursor open while SELECT ... ENDSELECT will typically fetch a single row for every loop iteration.
This was a kind of urban myth - there is no performance degradation for using SELECT as a loop statement. However, this will keep an open cursor during the loop which can lead to unwanted (but not strictly performance-related) effects.
For large result sets, use a cursor and an internal table.
Reason: Same as above, and you'll avoid eating up too much heap space.
Don't ORDER BY, use SORT instead.
Reason: Better scalability of the application server.
Be careful with nested SELECT statements.
While they can be very handy for small 'inner result sets', they are a huge performance hog if the nested query returns a large result set.
Measure, Measure, Measure
Never assume anything if you're worried about performance. Create a representative set of test data and run tests for different implementations. Learn how to use ST05 and SAT.
There won't be a way to close your second question "once and for all". First of all, FOR ALL ENTRIES IN 'joins' a database table and an internal (memory) table while JOIN only operates on database tables. Since the database knows nothing about the internal ABAP memory, the FOR ALL ENTRIES IN statement will be transformed to a set of WHERE statements - just try and use the ST05 to trace this. Second, you can't add values from the second table when using FOR ALL ENTRIES IN. Third, be aware that FOR ALL ENTRIES IN always implies DISTINCT. There are a few other pitfalls - be sure to consult the on-line ABAP reference, they are all listed there.
If the number of records in the second table is small, both statements should be more or less equal in performance - the database optimizer should just preselect all values from the second table and use a smart joining algorithm to filter through the first table. My recommendation: Use whatever feels good, don't try to tweak your code to illegibility.
If the number of records in the second table exceeds a certain value, Bad Things [TM] happen with FOR ALL ENTRIES IN - the contents of the table are split into multiple sets, then the query is transformed (see above) and re-run for each set.
Another note: The "Avoid SELECT *" statement is true in general, but I can tell you where it is false.
When you are going to take most of the fields anyway, and where you have several queries (in the same program, or different programs that are likely to be run around the same time) which take most of the fields, especially if they are different fields that are missing.
This is because the App Server Data buffers are based on the select query signature. If you make sure to use the same query, then you can ensure that the buffer can be used instead of hitting the database again. In this case, SELECT * is better than selecting 90% of the fields, because you make it much more likely that the buffer will be used.
Also note that as of the last version I tested, the ABAP DB layer wasn't smart enough to recognize SELECT A, B as being the same as SELECT B, A, which means you should always put the fields you take in the same order (preferable the table order) in order to make sure again that the data buffer on the application is being well used.
I usually follow the rules stated in this pdf from SAP: "Efficient Database Programming with ABAP"
It shows a lot of tips in optimizing queries.
This question will never be completely answered.
ABAP statement for accessing database is interpreted several times by different components of whole system (SAP and DB). Behavior of each component depends from component itself, its version and settings. Main part of interpretation is done in DB adapter on SAP side.
The only viable approach for reaching maximum performance is measurement on particular system (SAP version and DB vendor and version).
There are also quite extensive hints and tips in transaction SE30. It even allows you (depending on authorisations) to write code snippets of your own & measure it.
Unfortunately we can't close the "for all entries" vs join debate as it is very dependent on how your landscape is set up, wich database server you are using, the efficiency of your table indexes etc.
The simplistic answer is let the DB server do as much as possible. For the "for all entries" vs join question this means join. Except every experienced ABAP programmer knows that it's never that simple. You have to try different scenarios and measure like vwegert said. Also remember to measure in your live system as well, as sometimes the hardware configuration or dataset is significantly different to have entirely different results in your live system than test.
I usually follow the following conventions:
Never do a select *, Select only the required fields.
Never use 'into corresponding table of' instead create local structures which has all the required fields.
In the where clause, try to use as many primary keys as possible.
If select is made to fetch a single record and all primary keys are included in where clause use Select single, or else use SELECT UP TO TO 1 ROWS, ENDSELECT.
Try to use Join statements to connect tables instead of using FOR ALL ENTRIES.
If for all entries cannot be avoided ensure that the internal table is not empty and a delete the duplicate entries to increase performance.
Two more points in addition to the other answers:
usually you use JOIN for two or more tables in the database and you use FOR ALL ENTRIES IN to join database tables with a table you have in memory. If you can, JOIN.
usually the IN operator is more convinient than FOR ALL ENTRIES IN. But the kernel translates IN into a long select statement. The length of such a statement is limited and you get a dump when it gets too long. In this case you are forced to use FOR ALL ENTRIES IN despite the performance implications.
With in-memory database technologies, it's best if you can finish all data and calculations on the database side with JOINs and database aggregation functions like SUM.
But if you can't, at least try to avoid accessing database in LOOPs. Also avoid reading the database without using indexes, of course.