Performance on joins in linq - linq

HI ,
I am going to rewrite a store procedure in LINQ.
What this sp is doing is joining 12 tables and get the data and insert it into another table.
it has 7 left outer joins and 4 inner joins.And returns one row of data.
Now question.
1)What is the best way to achieve this joins in linq.
2) do you think this affect performance (its only retrieving one row of data at a given point of time)
Please advice.
Thanks
SNA.

You might want to check this question for the multiple joins. I usually prefer lambda syntax, but YMMV.
As for performance: I doubt the query performance itself will be affected, but there may be some overhead in figuring out the execution plan, since it's such a complicated query. The biggest performance hit will likely be the extra database round trip you will need compared to the stored procedure. If I understand you correctly, your current SP does the SELECT AND INSERT all at once. Using LINQ to SQL or LINQ to Entities, you will need to fetch the data first before you can actually write them to the other table.
So, it depends on your usage if rewriting is warranted. Alternatively, you can add stored procedures to your data model. It will be exposed as a method on your data context.

Related

Create an oracle materialized view with fast refresh on aggregated join

I've got this really nasty view that I'm trying to make faster by performing some joins ahead of time via materialized views. My problem is the most expensive joins, and therefore most worthwhile to pre-execute, don't play nice with materialized views.
Goal of the application is to provide livest data possible, so if I make mat views, they need to fast refresh on commit(maybe I haven't considered other approaches I'm unaware of). Fast refresh has limitations, specifically you must have rowid. See this thread here; but my problem is a little different as the nature of my join requires me to aggregate my join to get the right record.
Here's what I want to "pre-execute" (or optimize another genius way):
CREATE MATERIALIZED VIEW testing
NO LOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT br.id, br.rowid, max(mr.id) as modifier_id --somehow fit mr.rowid in here
FROM tableA br --base record
LEFT OUTER JOIN tableA mr --modifier record
ON br.external_key = mr.external_key
AND mr.record_type_code in ('SOME','TYPE')
AND mr.status_code in ('SOME','STATUS');
Basically, it's a self-join, because 0-*n* modifications get made to the entity, all of which are done in subsequent rows in the same table. I'm selecting the most recent of a given type. (I do this additional times for other types). To get the above working, I'd have to include rowid of both br and mr, which I can't wrap my brain around a way to do. I've considered rank() and ROWNUM instead of aggregating w/ MAX(), but can't get the logic right.
EDIT:
Not sure fast refresh MV is in the cards for me as even if I make the refresh on demand and remove the aggregation entirely (assume there is exactly 1 row), oracle tells me the query is too complex for a fast refresh. So, now I'm in need of other ideas...
It might not be applicable in your situation, but possibly you could denormalize your table.
For example, if you have multiple language dependent names, you could just have named columns for each language.
For example, if your access is index-based, consider varray or nested tables.
Another idea is to use triggers: On insert/update/delete, update another table (or tables), and use that table for the query. Possibly you can pre-calculate aggregates this way as well.
I would look into using a materialised view to do the aggregation only, so you're just storing EXTERNAL_KEY and MAX(ID).
If you have deletes occurring on the master table then include count(*) as well.
That should give you fast refresh capability.

Oracle PL/SQL: choosing the update/merge column dynamically

I have a table with data relating to several moments in time that I have to keep updated. To save space and time, however, each row in my table refers to a given day and hourly and quarter-hourly data for that day are scattered throughout the several columns in that same row. When updating the data for a particular moment in time I, therefore, must choose the column that has to be be updated through some programming logic in my PL/SQL procedures and functions.
Is there a way to dynamically choose the column or columns involved in an update/merge operation without having to assemble the query string anew every time? Performance is a concern and the throughput must be high, so I can't do anything that would perform poorly.
Edit: I am aware of normalization issues. However I still would like to know a good way for choosing the columns to be updated/merged dynamically and programatically.
The only way to dynamically choose what column or columns to use for a DML statement is to use dynamic SQL. And the only way to use dynamic SQL is to generate a SQL statement that can then be prepared and executed. Of course, you can assemble the string in a more or less efficient manner, you can potentially parse the statement once and execute it multiple times, etc. in order to minimize the expense of using dynamic SQL. But using dynamic SQL that performs close to what you'd get with static SQL requires quite a bit more work.
I'd echo Ben's point-- it doesn't appear that you are saving time by structuring your table this way. You'll likely get much better performance by normalizing the table properly. I'm not sure what space you believe you are saving but I would tend to doubt that denormalizing your table structure is going to save you much if anything in terms of space.
One way to do what is required is to create a package with all possible updates (which aren't that many, as I'll only update one field at a given time) and then choosing which query to use depending on my internal logic. This would, however, lead to a big if/else or switch/case-like statement. Is there a way to achieve similar results with better performance?

ADO Search Performance

Because I am not familiar with ADO under the hood, I was wonder which of the two methods of finding a record generally yields quicker results using VB6.
Use a 'select' statement using 'where' as a qualifier. If the recordset count yields zero, the record was not found.
Select all records iterating through records with a client-side cursor until record is found, or not at all.
The recordset is in the range of 10,000 records and will grow. Also, I am open to anything that will yield shorter search times other than what was mentioned.
SELECT count(*) FROM foo WHERE some_column='some value'
If the result is greater than 0 the record satisfying your condition was found in the database. It is unlikely you would get any faster than this. Proper indexes on the columns you are using in the WHERE clause could considerably improve performance.
In every case I can think of, selecting using the where clause is faster.
Even in situations where the client code will iterate through the whole database (file-based databases like Access, for example), you will have optimized code written in c or c++ doing the selection (in the database driver.) This is always faster than VB6.
For Database engines (SQL, MySQL, etc), the performance increase can even be more profound. By using the where clause, you limit the amount of data that must be transmitted over the network, vastly improving the response.
Some additional performance tips:
Select only the fields you want.
Build indexes on frequently used fields
Watch what kind of recordset you are returning. Use Forward-only cursors if you are just returning data from a database.
Lastly, I was shocked by VB.NET's database performance, it being several times faster than the fastest VB6 code.

how to reduce the database's pressure

I have a database(sql server 2005),now there are about 100000 records in the table called users, when I do query use linq to sql, it is very slower and slower.how can I do some operate to improve the speed?
Analyse your query and add some indexes to your table may help.
To get a more specific answer post more specific information (table stucture, indexes you have, the sql code L2S generates, ...)
You could (in order of preference)
Save your query as a stored procedure
Add indexes to your users
table, for what you are querying for/sorting for
Analyze your query
(if it is complicated), see if there's a less-resource-intensive way
of doing it. There are graphical query analyzers to help you.
As a last resort, not use LINQ, but instead ADO.NET Entity Framework, it's significantly faster. But you'll only see performance improvements for crazy stuff, and only if you've already done all of the above.
Use stored procedures and then use linq to sql to get the desired rows, this will give performance.
The best tools at your disposal for analyzing your database access and seeing what needs to be optimized are:
SQL Server Profiler
Graphical Execution Plans
The first one will allow you to see the exact queries being sent to your database from your application, which is especially useful if it turns out that your application is chattier than you think. The second one will allow you to take those queries and see exactly what the SQL server is doing with them.
In the graphical execution plan, look for steps which use a lot of CPU and paths which transfer a lot of records. Those are what you'll want to optimize. It's possible that you're doing a table scan somewhere, which is slow, or maybe joining on many more records than you need somewhere, which is slow, etc.

ABAP select performance hints?

Are there general ABAP-specific tips related to performance of big SELECT queries?
In particular, is it possible to close once and for all the question of FOR ALL ENTRIES IN vs JOIN?
A few (more or less) ABAP-specific hints:
Avoid SELECT * where it's not needed, try to select only the fields that are required. Reason: Every value might be mapped several times during the process (DB Disk --> DB Memory --> Network --> DB Driver --> ABAP internal). It's easy to save the CPU cycles if you don't need the fields anyway. Be very careful if you SELECT * a table that contains BLOB fields like STRING, this can totally kill your DB performance because the blob contents are usually stored on different pages.
Don't SELECT ... ENDSELECT for small to medium result sets, use SELECT ... INTO TABLE instead.
Reason: SELECT ... INTO TABLE performs a single fetch and doesn't keep the cursor open while SELECT ... ENDSELECT will typically fetch a single row for every loop iteration.
This was a kind of urban myth - there is no performance degradation for using SELECT as a loop statement. However, this will keep an open cursor during the loop which can lead to unwanted (but not strictly performance-related) effects.
For large result sets, use a cursor and an internal table.
Reason: Same as above, and you'll avoid eating up too much heap space.
Don't ORDER BY, use SORT instead.
Reason: Better scalability of the application server.
Be careful with nested SELECT statements.
While they can be very handy for small 'inner result sets', they are a huge performance hog if the nested query returns a large result set.
Measure, Measure, Measure
Never assume anything if you're worried about performance. Create a representative set of test data and run tests for different implementations. Learn how to use ST05 and SAT.
There won't be a way to close your second question "once and for all". First of all, FOR ALL ENTRIES IN 'joins' a database table and an internal (memory) table while JOIN only operates on database tables. Since the database knows nothing about the internal ABAP memory, the FOR ALL ENTRIES IN statement will be transformed to a set of WHERE statements - just try and use the ST05 to trace this. Second, you can't add values from the second table when using FOR ALL ENTRIES IN. Third, be aware that FOR ALL ENTRIES IN always implies DISTINCT. There are a few other pitfalls - be sure to consult the on-line ABAP reference, they are all listed there.
If the number of records in the second table is small, both statements should be more or less equal in performance - the database optimizer should just preselect all values from the second table and use a smart joining algorithm to filter through the first table. My recommendation: Use whatever feels good, don't try to tweak your code to illegibility.
If the number of records in the second table exceeds a certain value, Bad Things [TM] happen with FOR ALL ENTRIES IN - the contents of the table are split into multiple sets, then the query is transformed (see above) and re-run for each set.
Another note: The "Avoid SELECT *" statement is true in general, but I can tell you where it is false.
When you are going to take most of the fields anyway, and where you have several queries (in the same program, or different programs that are likely to be run around the same time) which take most of the fields, especially if they are different fields that are missing.
This is because the App Server Data buffers are based on the select query signature. If you make sure to use the same query, then you can ensure that the buffer can be used instead of hitting the database again. In this case, SELECT * is better than selecting 90% of the fields, because you make it much more likely that the buffer will be used.
Also note that as of the last version I tested, the ABAP DB layer wasn't smart enough to recognize SELECT A, B as being the same as SELECT B, A, which means you should always put the fields you take in the same order (preferable the table order) in order to make sure again that the data buffer on the application is being well used.
I usually follow the rules stated in this pdf from SAP: "Efficient Database Programming with ABAP"
It shows a lot of tips in optimizing queries.
This question will never be completely answered.
ABAP statement for accessing database is interpreted several times by different components of whole system (SAP and DB). Behavior of each component depends from component itself, its version and settings. Main part of interpretation is done in DB adapter on SAP side.
The only viable approach for reaching maximum performance is measurement on particular system (SAP version and DB vendor and version).
There are also quite extensive hints and tips in transaction SE30. It even allows you (depending on authorisations) to write code snippets of your own & measure it.
Unfortunately we can't close the "for all entries" vs join debate as it is very dependent on how your landscape is set up, wich database server you are using, the efficiency of your table indexes etc.
The simplistic answer is let the DB server do as much as possible. For the "for all entries" vs join question this means join. Except every experienced ABAP programmer knows that it's never that simple. You have to try different scenarios and measure like vwegert said. Also remember to measure in your live system as well, as sometimes the hardware configuration or dataset is significantly different to have entirely different results in your live system than test.
I usually follow the following conventions:
Never do a select *, Select only the required fields.
Never use 'into corresponding table of' instead create local structures which has all the required fields.
In the where clause, try to use as many primary keys as possible.
If select is made to fetch a single record and all primary keys are included in where clause use Select single, or else use SELECT UP TO TO 1 ROWS, ENDSELECT.
Try to use Join statements to connect tables instead of using FOR ALL ENTRIES.
If for all entries cannot be avoided ensure that the internal table is not empty and a delete the duplicate entries to increase performance.
Two more points in addition to the other answers:
usually you use JOIN for two or more tables in the database and you use FOR ALL ENTRIES IN to join database tables with a table you have in memory. If you can, JOIN.
usually the IN operator is more convinient than FOR ALL ENTRIES IN. But the kernel translates IN into a long select statement. The length of such a statement is limited and you get a dump when it gets too long. In this case you are forced to use FOR ALL ENTRIES IN despite the performance implications.
With in-memory database technologies, it's best if you can finish all data and calculations on the database side with JOINs and database aggregation functions like SUM.
But if you can't, at least try to avoid accessing database in LOOPs. Also avoid reading the database without using indexes, of course.

Resources