Following is the Data Step I am trying to work on. I am a beginner in SAS. I want to compare the performance of SAS DATA Step and PROC SQL using concatenation. I tried to do it but i am not getting the same output for the DATA step and proc SQL. Hence, i cant compare the performance- which one is better?
eg-> I tried concatenating Ridings (dataset) and REsults (dataset). But output was different.
Can somebody help me with this? Link for reference
SQL data stacking (or what you might call concatenation) is accomplished with a UNION statement. However, UNION will remove duplicates. Try UNION ALL to maintain any duplicates that occur in the stacked data.
In SAS the highest performing operation for data stacking might be PROC APPEND.
Related
For querying an sqlite table based on a list of IDs (i.e. distinct primary keys) I am using following statement (example based on the Chinook Database):
SELECT * FROM Customer WHERE CustomerId IN (1,2,3,8,20,35)
However, my actual list of IDs might become rather large (>1000). Thus, I was wondering if this approach using the IN statement is the most efficient or if there is a better/optimized way to query an sqlite table based on a list of primary keys.
If the number of elements in the IN is large enough, SQLite constructs a temporary index for them. This is likely to be more efficient than creating a temporary table manually.
The length of the IN list is limited only be the maximum length of an SQL statement, and by memory.
Because the statement you wrote does not include any instructions to SQLite about how to find the rows you want the concept of "optimizing" doesn't really exist -- there's nothing to optimize. The job of planning the best algorithm to retrieve the data belongs to the SQLite query optimizer.
Some databases do have idiosyncrasies in their query optimizers which can lead to performance issues but I wouldn't expect SQLite to have any trouble finding the correct algorithm for this simple query, even with lots of values in the IN list. I would only worry about trying to guide the query optimizer to another execution plan if and when you find that there's a performance problem.
SQLite Optimizer Overview
IN (expression-list) does use an index if available.
Beyond that, I can't glean any guarantees from it, so the following is subject to a performance measaurement.
Axis 1: how to pass the expression-list
hardocde as string. Overhead for int-to-string conversion and string-to-int parsing
bind parameters (i.e. the statement is ... WHERE CustomerID in (?,?,?,?,?,?,?,?,?,?....), which is easier to build from a predefined string than hardcoded values). Prevents int → string → int conversion, but the default limit for number of parameters is 999. This can be increased by SQLITE_LIMIT_VARIABLE_NUMBER, but might lead to excessive allocations.
Temporary table. Possibly less efficient than any of the above methods after the statement is prepared, but that doesn't help if most time is spent preparing the statement
Axis 2: Statement optimization
If the same expression-list is used in multiple queries against changing CustomerIDs, one of the following may help:
reusing a prepared statement with hardcoded values (i.e. don't pass 1001 parameters)
create a temporary table for the CustomerIDs with index (so the index is created once, not on the fly for every query)
If the expression-list is different with every query, ist is probably best to let SQLite do its job. The following might be an improvement
create a temp table for the expression-list
bulk-insert expression-list elements using union all
use a sub query
(from my experience with SQLite, I'd expect it to be on par or slightly worse)
Axis 3 Ask Richard
the sqlite mailing list (yeah I know, that technology even older than rotary phones!) is pretty active with often excellent advise, including from the author of SQLite. 90% chance someone will dismiss you ass "Measure before asking suhc a question!", 10% chance someone gives you detailed insight.
I am new to Perl and getting to grips with it.
This is a two phased question.
Question 1:
The Perl DBI has several methods to fetch data from the database tables.
I am returning one column of data from an Oracle DB into an array
in a manner that each resultset is put into its own element, we are talking about 30,000 rows of data.
Here is my code
#arr_oracle_rs = #{$dbh->selectcol_arrayref($oracle_select)};
I was wondering if this is the fastest way to do this or should I use another DBI method?
It's hard to tell because of networks latency, current DB load, etc on the quickest and most efficient method to use to get the data and place into an array.
I am just wondering if people with the knowledge can tell me if I am using the correct method for such a task.
https://metacpan.org/pod/DBI#Database-Handle-Methods
selectrow_array()
selectrow_arrayref()
selectrow_hashref()
selectall_arrayref()
selectall_hashref()
selectcol_arrayref()
Question 2:
What is the best method to determine if the select query above ran successfully?
I'm assuming you are using DBD::Oracle.
Don't try and second guess what is happening under the hood. For a start, by default DBD::Oracle fetches multiple rows in one go (see https://metacpan.org/pod/DBD::Oracle#RowCacheSize).
Secondly in your example you already have an array and you copy it to another array - that is a waste of time and memory, i.e., selectcol_arrayref returns a reference to an array and you dereference it then copy it to #arr_oracle_rs. Just use the array ref returned.
Thirdly, we cannot say what is quickest since you've not told us how you are going to work with the returned array. Depending on what you are doing with the array it may actually be quicker to bind the column, repeatedly call fetch and do whatever you need per row (requires less memory and no repeated creation of a scalar) or it may be quicker to get all the rows in one go (requires more memory and lots of scalar creation).
I haven't actually looked at how selectcol_arrayref works but as it has to pick the first column from each row it /might/ be just as well to use selectall_arrayref IF you end up using selectall method.
As in all these things you'll have to benchmark your solutions yourself.
As mpapec said, RaiseError is your friend.
I have a table with data relating to several moments in time that I have to keep updated. To save space and time, however, each row in my table refers to a given day and hourly and quarter-hourly data for that day are scattered throughout the several columns in that same row. When updating the data for a particular moment in time I, therefore, must choose the column that has to be be updated through some programming logic in my PL/SQL procedures and functions.
Is there a way to dynamically choose the column or columns involved in an update/merge operation without having to assemble the query string anew every time? Performance is a concern and the throughput must be high, so I can't do anything that would perform poorly.
Edit: I am aware of normalization issues. However I still would like to know a good way for choosing the columns to be updated/merged dynamically and programatically.
The only way to dynamically choose what column or columns to use for a DML statement is to use dynamic SQL. And the only way to use dynamic SQL is to generate a SQL statement that can then be prepared and executed. Of course, you can assemble the string in a more or less efficient manner, you can potentially parse the statement once and execute it multiple times, etc. in order to minimize the expense of using dynamic SQL. But using dynamic SQL that performs close to what you'd get with static SQL requires quite a bit more work.
I'd echo Ben's point-- it doesn't appear that you are saving time by structuring your table this way. You'll likely get much better performance by normalizing the table properly. I'm not sure what space you believe you are saving but I would tend to doubt that denormalizing your table structure is going to save you much if anything in terms of space.
One way to do what is required is to create a package with all possible updates (which aren't that many, as I'll only update one field at a given time) and then choosing which query to use depending on my internal logic. This would, however, lead to a big if/else or switch/case-like statement. Is there a way to achieve similar results with better performance?
Because I am not familiar with ADO under the hood, I was wonder which of the two methods of finding a record generally yields quicker results using VB6.
Use a 'select' statement using 'where' as a qualifier. If the recordset count yields zero, the record was not found.
Select all records iterating through records with a client-side cursor until record is found, or not at all.
The recordset is in the range of 10,000 records and will grow. Also, I am open to anything that will yield shorter search times other than what was mentioned.
SELECT count(*) FROM foo WHERE some_column='some value'
If the result is greater than 0 the record satisfying your condition was found in the database. It is unlikely you would get any faster than this. Proper indexes on the columns you are using in the WHERE clause could considerably improve performance.
In every case I can think of, selecting using the where clause is faster.
Even in situations where the client code will iterate through the whole database (file-based databases like Access, for example), you will have optimized code written in c or c++ doing the selection (in the database driver.) This is always faster than VB6.
For Database engines (SQL, MySQL, etc), the performance increase can even be more profound. By using the where clause, you limit the amount of data that must be transmitted over the network, vastly improving the response.
Some additional performance tips:
Select only the fields you want.
Build indexes on frequently used fields
Watch what kind of recordset you are returning. Use Forward-only cursors if you are just returning data from a database.
Lastly, I was shocked by VB.NET's database performance, it being several times faster than the fastest VB6 code.
HI ,
I am going to rewrite a store procedure in LINQ.
What this sp is doing is joining 12 tables and get the data and insert it into another table.
it has 7 left outer joins and 4 inner joins.And returns one row of data.
Now question.
1)What is the best way to achieve this joins in linq.
2) do you think this affect performance (its only retrieving one row of data at a given point of time)
Please advice.
Thanks
SNA.
You might want to check this question for the multiple joins. I usually prefer lambda syntax, but YMMV.
As for performance: I doubt the query performance itself will be affected, but there may be some overhead in figuring out the execution plan, since it's such a complicated query. The biggest performance hit will likely be the extra database round trip you will need compared to the stored procedure. If I understand you correctly, your current SP does the SELECT AND INSERT all at once. Using LINQ to SQL or LINQ to Entities, you will need to fetch the data first before you can actually write them to the other table.
So, it depends on your usage if rewriting is warranted. Alternatively, you can add stored procedures to your data model. It will be exposed as a method on your data context.