SQL Server 2008 R2 Express DataReader performance - performance

I have a database that contains 250,000 records. I am using a DataReader to loop the records and export to a file. Just looping the records with a DataReader and no WHERE conditions is taking approx 22 minutes. I am only selecting two columns (the id and a nvarchar(max) column with about 1000 characters in it).
Does 22 minutes sound correct for SQL Server Express? Would the 1GB of RAM or 1CPU have an impact on this?

22 minutes sounds way too long for a single basic (non-aggregating) SELECT against 250K records (even 22 seconds sounds awfully long for that to me).
To say why, it would help if you could post some code and your schema definition. Do you have any triggers configured?
With 1K characters in each record (2KB), 250K records (500MB) should fit within SQL Express' 1GB limit, so memory shouldn't be an issue for that query alone.
Possible causes of the performance problems you're seeing include:
Contention from other applications
Having rows that are much wider than just the two columns you mentioned
Excessive on-disk fragmentation of either the table or the DB MDF file
A slow network connection between your app and the DB
Update: I did a quick test. On my machine, reading 250K 2KB rows with a SqlDataReader takes under 1 second.
First, create test table with 256K rows (this only took about 30 seconds):
CREATE TABLE dbo.data (num int PRIMARY KEY, val nvarchar(max))
GO
DECLARE #txt nvarchar(max)
SET #txt = N'put 1000 characters here....'
INSERT dbo.data VALUES (1, #txt);
GO
INSERT dbo.data
SELECT num + (SELECT COUNT(*) FROM dbo.data), val FROM dbo.data
GO 18
Test web page to read data and display the statistics:
using System;
using System.Collections;
using System.Data.SqlClient;
using System.Text;
public partial class pages_default
{
protected override void OnLoad(EventArgs e)
{
base.OnLoad(e);
using (SqlConnection conn = new SqlConnection(DAL.ConnectionString))
{
using (SqlCommand cmd = new SqlCommand("SELECT num, val FROM dbo.data", conn))
{
conn.Open();
conn.StatisticsEnabled = true;
using (SqlDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
}
}
StringBuilder result = new StringBuilder();
IDictionary stats = conn.RetrieveStatistics();
foreach (string key in stats.Keys)
{
result.Append(key);
result.Append(" = ");
result.Append(stats[key]);
result.Append("<br/>");
}
this.info.Text = result.ToString();
}
}
}
}
Results (ExecutionTime in milliseconds):
IduRows = 0
Prepares = 0
PreparedExecs = 0
ConnectionTime = 930
SelectCount = 1
Transactions = 0
BytesSent = 88
NetworkServerTime = 0
SumResultSets = 1
BuffersReceived = 66324
BytesReceived = 530586745
UnpreparedExecs = 1
ServerRoundtrips = 1
IduCount = 0
BuffersSent = 1
ExecutionTime = 893
SelectRows = 262144
CursorOpens = 0
I repeated the test with SQL Enterprise and SQL Express, with similar results.
Capturing the "val" element from each row increased ExecutionTime to 4093 ms (string val = (string)reader["val"];). Using DataTable.Load(reader) took about 4600 ms.
Running the same query in SSMS took about 8 seconds to capture all 256K rows.

Your results from running exec sp_spaceused myTable provide a potential hint:
rows = 255,000
reserved = 1994320 KB
data = 1911088 KB
index_size = 82752 KB
unused 480KB
The important thing to note here is reserved = 1994320 KB meaning your table is some 1866 MB, when reading fields that are not indexed (since NVARCHAR(MAX) can not be indexed) SQL Server must read the entire row into memory before restricting the columns. Hence you're easily running past the 1GB RAM limit.
As a simple test delete the last (or first) 150k rows and try the query again see what performance you get.
A few questions:
Does your table have a clustered index on the primary key (is it the id field or something else)?
Are you sorting on a column that is not indexed such as the `nvarchar(max) field?
In the best scenario for you your PK is id and also a clustered index and you either have no order by or you are order by id:
Assuming your varchar(max) field is named comments:
SELECT id, comments
FROM myTable
ORDER BY id
This will work fine but it will require you to read all the rows into memory (but it will only do one parse over the table), since comments is VARCHAR(MAX) and cannot be indexed and table is 2GB SQL will then have to load the table into memory in parts.
Likely what is happening is you have something like this:
SELECT id, comments
FROM myTable
ORDER BY comment_date
Where comment_date is an additional field that is not indexed. The behaviour in this case would be that SQL would be unable to actually sort the rows all in memory and it will end up having to page the table in and out of memory several times likely causing the problem you are seeing.
A simple solution in this case is to add an index to comment_date.
But suppose that is not possible as you only have read access to the database, then another solution is make a local temp table of the data you want using the following:
DECLARE #T TABLE
(
id BIGINT,
comments NVARCHAR(MAX),
comment_date date
)
INSERT INTO #T SELECT id, comments, comment_date FROM myTable
SELECT id, comments
FROM #T
ORDER BY comment_date
If this doesn't help then additional information is required, can you please post your actual query along with your entire table definition and what the index is.
Beyond all of this run the following after you restore backups to rebuild indexes and statistics, you could just be suffering from corrupted statistics (which happens when you backup a fragmented database and then restore it to a new instance):
EXEC [sp_MSforeachtable] #command1="RAISERROR('UPDATE STATISTICS(''?'') ...',10,1) WITH NOWAIT UPDATE STATISTICS ? "
EXEC [sp_MSforeachtable] #command1="RAISERROR('DBCC DBREINDEX(''?'') ...',10,1) WITH NOWAIT DBCC DBREINDEX('?')"
EXEC [sp_MSforeachtable] #command1="RAISERROR('UPDATE STATISTICS(''?'') ...',10,1) WITH NOWAIT UPDATE STATISTICS ? "

Related

Code running slow in SAS, any idea what I can do to run it quicker?

I am a beginner with SAS and trying to create a table with code below. Although the code has been running for 3 hours now. The dataset is quite huge (150000 rows). Although, when I insert a different date it runs in 45 mins. The date I have inserted is valid under date_key. Any suggestions on why this may be/what I can do? Thanks in advance
proc sql;
create table xyz as
select monotonic() as rownum ,*
from x.facility_yz
where (Fac_Name = 'xyz' and (Ratingx = 'xyz' or Ratingx is null) )
and Date_key = '20000101'
;
quit;
Tried running it again but same problem
Is your dataset coming from an external database? A SAS dataset of this size should not take nearly this long to query - it should be almost instant. If it is external, you may be able to take advantage of indexing. Try and find out what the database is indexed on and try using that as a first pass. You may consider using a data step instead rather than SQL with the monotonic() function.
For example, assume it is indexed by date:
data xyz1;
set x.facility_xyz;
where date_key = '20000101';
run;
Then you can filter this final dataset within SAS itself. 150,000 rows is nothing for a SAS dataset, assuming there aren't hundreds of variables making it large. A SAS dataset this size should run lightning fast when querying.
data xyz2;
set xyz1;
where fac_name = 'xyz' AND (Ratingx = 'xyz' or Ratingx = ' ') );
rownum = _N_;
run;
Or, you could try it all in one pass while still taking advantage of the index:
data xyz;
set x.facility_xyz;
where date_key = '20000101';
if(fac_name = 'xyz' AND (Ratingx = 'xyz' or Ratingx = ' ') );
rownum+1;
run;
You could also try rearranging your where statement to see if you can take advantage of compound indexing:
data xyz;
set x.facility_xyz;
where date_key = '20000101'
AND fac_name = 'xyz'
AND (Ratingx = 'xyz' or Ratingx = ' ')
;
rownum = _N_;
run;
More importantly, only keep variables that are necessary. If you need all of them then that is okay, but consider using the keep= or drop= dataset options to only pull what you need. This is especially important when talking with an external database.
What kind of libname to you use ?
if you are running implicit passthrough using sas function, it would explain why it takes so long.
If you are using sas/connect to xxx module, first add option to understand what is going on : options sastrace=,,,d sastraceloc=saslog;
You should probably use explicit passthrough : using rdbms native language to avoid automatic translation of your code.

Anyway to iterate quickly over a collection of UDTs?

Let's say I have a collection of UDTs. I populate it as below:
public type udtEmp
Id as long
Name as string
end type
dim col as new Collection
dim empRec as udtEmp, empDummy as udtEmp
for n = 1 to 100000
empRec = empDummy ' reset record
emp.Id = n
emp.Name = "Name " & n
col.add emp, cstr(emp.Id)
next
Now I want to loop through it. I am using a Long data type as the index to .Item()
dim n as long
For n = 1 To 100000
emp = col.Item(n)
Next
The code above works, but it's really slow - takes 10,000 milliseconds to iterate. If I accessed the collection via a key, its much faster - 78 milliseconds.
For n = 1 To 100000
emp = col.Item(cstr(n))
Next
The problem is that when I iterate over collection, I don't have the keys. If I had a collection of objects instead of UDTs, I could do for each obj in col, but with UDTs, it won't let me iterate in that manner.
One of my thoughts was to have a secondary collection of indexes and keys to point to the main collection, but I am trying not to complicate the code unless I absolutely have to.
So what are my options?
the elegance of the code or the performance of it is a serious decision you have to make. the choice should be based on the impact of the results. for each is elegant but slow and goes with objects and classes. but if the speed is a mater then use UDT and arrays.
in your case, i think an array of UDT is best suited for your situation. and to gain more speed , try to access arrays using SAFE_ARRAY (that you can google for it), the result is much impressive.
You can use a user typed class collection. It'll provide the for-each iteration ability with great performance.
Easiest way to make that happen is through the Class Builder Utility (https://msdn.microsoft.com/en-us/library/aa442930(v=vs.60).aspx). You might need to first run the Add-in Manager and load the Class Builder Utility. (I think that there were install options regarding these features when you installed vb6/vs6? So if you don't see the Class Builder Utility in the Add-in manager it's could be due to that).
To match your udt sample, using the Class Builder Utility, first add a class (eg: Employee), with two properties (eg: EmpId and EmpName, long and string types respectively). Then add a collection (eg: Employees) based on the Employee class. Save it to the project (that will create two new class modules) and close the Utility.
Now you can create the new Employees collection, load it up, and iterate through it via index, key or for-each. (note: don't use a pure number for the key - requesting an item by a key that is a pure number, even as a string, will be interpreted as an index request, it'll be slow and you probably won't get the desired item)
Also - once the new classes have been created, you can add customized properties and methods to them to handle whatever kinds of fancy stuff you may have requirements for.
Dim i As Long
Dim Emp As Employee
Dim colEmp As New Employees
Dim name As String
' Loading
For i = 1 To 100000
colEmp.Add i, "name" & CStr(i), "key" & CStr(i)
Next i
' iterate with index
For i = 1 To 100000
Set Emp = colEmp(i)
name = Emp.EmpName
Next i
' iterate with key
For i = 1 To 100000
Set Emp = colEmp("key" & i)
name = Emp.EmpName
Next i
'iterate with for-each
For Each Emp In colEmp
name = Emp.EmpName
Next Emp
Timings
On my system for the above code:
Loading time: 1 second
Index time: 20 seconds
Key time: 0.29 seconds
For-each time: 0.031 seconds

Cassandra slow get_indexed_slices speed

We are using Cassandra for log collecting.
About 150,000 - 250,000 new records per hour.
Our column family has several columns like 'host', 'errorlevel', 'message', etc and special indexed column 'indexTimestamp'.
This column contains time rounded to hours.
So, when we want to get some records, we use get_indexed_slices() with first IndexExpression by indexTimestamp ( with EQ operator ) and then some other IndexExpressions - by host, errorlevel, etc.
When getting records just by indexTimestamp everything works fine.
But, when getting records by indexTimestamp and, for example, host - cassandra works for long ( more than 15-20 seconds ) and throws timeout exception.
As I understand, when getting records by indexed column and non-indexed column, Cassandra firstly gets all records by indexed column and than filters them by non-indexed columns.
So, why Cassandra does it so slow? By indexTimestamp there are no more than 250,000 records. Isn't it possible to filter them at 10 seconds?
Our Cassandra cluster is running on one machine ( Windows 7 ) with 4 CPUs and 4 GBs memory.
You have to bear in mind that Cassandra is very bad with this kind of queries. Indexed columns queries are not meant for big tables. If you want to search for your data around this type of queries you have to tailor your data model around it.
In fact Cassandra is not a DB you can query. It is a key-value storage system. To understand that please go there and have a quick look: http://howfuckedismydatabase.com/
The most basic pattern to help you is bucket-rows and ranged range-slice-queries.
Let's say you have the object
user : {
name : "XXXXX"
country : "UK"
city : "London"
postal_code :"N1 2AC"
age : "24"
}
and of course you want to query by city OR by age (and & or is another data model yet).
Then you would have to save your data like this, assuming the name is a unique id :
write(row = "UK", column_name = "city_XXXX", value = {...})
AND
write(row = "bucket_20_to_25", column_name = "24_XXXX", value = {...})
Note that I bucketed by country for the city search and by age bracket for age search.
the range query for age EQ 24 would be
get_range_slice(row= "bucket_20_to_25", from = "24-", to = "24=")
as a note "minus" == "under_score" - 1 and "equals" == "under_score" + 1, giving you effectively all the columns that start with "24_"
This also allow you to query for age between 21 and 24 for example.
hope it was useful

Hbase quickly count number of rows

Right now I implement row count over ResultScanner like this
for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
number++;
}
If data reaching millions time computing is large.I want to compute in real time that i don't want to use Mapreduce
How to quickly count number of rows.
Use RowCounter in HBase
RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.
$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
Usage: RowCounter [options]
<tablename> [
--starttime=[start]
--endtime=[end]
[--range=[startKey],[endKey]]
[<column1> <column2>...]
]
You can use the count method in hbase to count the number of rows. But yes, counting rows of a large table can be slow.count 'tablename' [interval]
Return value is the number of rows.
This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar
hbase.jar rowcount’ to run a counting mapreduce job). Current count is shown
every 1000 rows by default. Count interval may be optionally specified. Scan
caching is enabled on count scans by default. Default cache size is 10 rows.
If your rows are small in size, you may want to increase this
parameter.
Examples:
hbase> count 't1'
hbase> count 't1', INTERVAL => 100000
hbase> count 't1', CACHE => 1000
hbase> count 't1', INTERVAL => 10, CACHE => 1000
The same commands also can be run on a table reference. Suppose you had a reference to table 't1', the corresponding commands would be:
hbase> t.count
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
If you cannot use RowCounter for whatever reason, then a combination of these two filters should be an optimal way to get a count:
FirstKeyOnlyFilter() AND KeyOnlyFilter()
The FirstKeyOnlyFilter will result in the scanner only returning the first column qualifier it finds, as opposed to the scanner returning all of the column qualifiers in the table, which will minimize the network bandwith. What about simply picking one column qualifier to return? This would work if you could guarentee that column qualifier exists for every row, but if that is not true then you would get an inaccurate count.
The KeyOnlyFilter will result in the scanner only returning the column family, and will not return any value for the column qualifier. This further reduces the network bandwidth, which in the general case wouldn't account for much of a reduction, but there can be an edge case where the first column picked by the previous filter just happens to be an extremely large value.
I tried playing around with scan.setCaching but the results were all over the place. Perhaps it could help.
I had 16 million rows in between a start and stop that I did the following pseudo-empirical testing:
With FirstKeyOnlyFilter and KeyOnlyFilter activated:
With caching not set (i.e., the default value), it took 188 seconds.
With caching set to 1, it took 188 seconds
With caching set to 10, it took 200 seconds
With caching set to 100, it took 187 seconds
With caching set to 1000, it took 183 seconds.
With caching set to 10000, it took 199 seconds.
With caching set to 100000, it took 199 seconds.
With FirstKeyOnlyFilter and KeyOnlyFilter disabled:
With caching not set, (i.e., the default value), it took 309 seconds
I didn't bother to do proper testing on this, but it seems clear that the FirstKeyOnlyFilter and KeyOnlyFilter are good.
Moreover, the cells in this particular table are very small - so I think the filters would have been even better on a different table.
Here is a Java code sample:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.KeyOnlyFilter;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
public class HBaseCount {
public static void main(String[] args) throws IOException {
Configuration config = HBaseConfiguration.create();
HTable table = new HTable(config, "my_table");
Scan scan = new Scan(
Bytes.toBytes("foo"), Bytes.toBytes("foo~")
);
if (args.length == 1) {
scan.setCaching(Integer.valueOf(args[0]));
}
System.out.println("scan's caching is " + scan.getCaching());
FilterList allFilters = new FilterList();
allFilters.addFilter(new FirstKeyOnlyFilter());
allFilters.addFilter(new KeyOnlyFilter());
scan.setFilter(allFilters);
ResultScanner scanner = table.getScanner(scan);
int count = 0;
long start = System.currentTimeMillis();
try {
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
count += 1;
if (count % 100000 == 0) System.out.println(count);
}
} finally {
scanner.close();
}
long end = System.currentTimeMillis();
long elapsedTime = end - start;
System.out.println("Elapsed time was " + (elapsedTime/1000F));
}
}
Here is a pychbase code sample:
from pychbase import Connection
c = Connection()
t = c.table('my_table')
# Under the hood this applies the FirstKeyOnlyFilter and KeyOnlyFilter
# similar to the happybase example below
print t.count(row_prefix="foo")
Here is a Happybase code sample:
from happybase import Connection
c = Connection(...)
t = c.table('my_table')
count = 0
for _ in t.scan(filter='FirstKeyOnlyFilter() AND KeyOnlyFilter()'):
count += 1
print count
Thanks to #Tuckr and #KennyCason for the tip.
Use the HBase rowcount map/reduce job that's included with HBase
Simple, Effective and Efficient way to count row in HBASE:
Whenever you insert a row trigger this API which will increment that particular cell.
Htable.incrementColumnValue(Bytes.toBytes("count"), Bytes.toBytes("details"), Bytes.toBytes("count"), 1);
To check number of rows present in that table. Just use "Get" or "scan" API for that particular Row 'count'.
By using this Method you can get the row count in less than a millisecond.
To count the Hbase table record count on a proper YARN cluster you have to set the map reduce job queue name as well:
hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.queuename= < Your Q Name which you have SUBMIT access>
< TABLE_NAME>
You can use coprocessor what is available since HBase 0.92. See Coprocessor and AggregateProtocol and example
Two ways Worked for me to get count of rows from hbase table with Speed
Scenario #1
If hbase table size is small then login to hbase shell with valid user and execute
>count '<tablename>'
Example
>count 'employee'
6 row(s) in 0.1110 seconds
Scenario #2
If hbase table size is large,then execute inbuilt RowCounter map reduce job:
Login to hadoop machine with valid user and execute:
/$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter '<tablename>'
Example:
/$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'employee'
....
....
....
Virtual memory (bytes) snapshot=22594633728
Total committed heap usage (bytes)=5093457920
org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
ROWS=6
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
If you're using a scanner, in your scanner try to have it return the least number of qualifiers as possible. In fact, the qualifier(s) that you do return should be the smallest (in byte-size) as you have available. This will speed up your scan tremendously.
Unfortuneately this will only scale so far (millions-billions?). To take it further, you can do this in real time but you will first need to run a mapreduce job to count all rows.
Store the Mapreduce output in a cell in HBase. Every time you add a row, increment the counter by 1. Every time you delete a row, decrement the counter.
When you need to access the number of rows in real time, you read that field in HBase.
There is no fast way to count the rows otherwise in a way that scales. You can only count so fast.
U can find sample example here:
/**
* Used to get the number of rows of the table
* #param tableName
* #param familyNames
* #return the number of rows
* #throws IOException
*/
public long countRows(String tableName, String... familyNames) throws IOException {
long rowCount = 0;
Configuration configuration = connection.getConfiguration();
// Increase RPC timeout, in case of a slow computation
configuration.setLong("hbase.rpc.timeout", 600000);
// Default is 1, set to a higher value for faster scanner.next(..)
configuration.setLong("hbase.client.scanner.caching", 1000);
AggregationClient aggregationClient = new AggregationClient(configuration);
try {
Scan scan = new Scan();
if (familyNames != null && familyNames.length > 0) {
for (String familyName : familyNames) {
scan.addFamily(Bytes.toBytes(familyName));
}
}
rowCount = aggregationClient.rowCount(TableName.valueOf(tableName), new LongColumnInterpreter(), scan);
} catch (Throwable e) {
throw new IOException(e);
}
return rowCount;
}
Go to Hbase home directory and run this command,
./bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'namespace:tablename'
This will launch a mapreduce job and the output will show the number of records existing in the hbase table.
You could try hbase api methods!
org.apache.hadoop.hbase.client.coprocessor.AggregationClient

Picking query based on parameter in Oracle PL/SQL

Ok, say I have a query:
SELECT * FROM TABLE_AWESOME WHERE YEAR = :AMAZINGYEAR;
Which works very nicely. But say I want to be able to return either just those results or all results based on a drop down. (e.g., the drop down would have 2008, 2009, ALL YEARS)
I decided to tackle said problem with PL/SQL with the following format:
DECLARE
the_year VARCHAR(20) := &AMAZINGYEAR;
BEGIN
IF the_year = 'ALL' THEN
SELECT * FROM TABLE_AWESOME;
ELSE
SELECT * FROM TABLE_AWESOME WHERE YEAR = the_year;
END IF;
END;
Unfortunately, this fails. I get errors like "an INTO clause is expected in this SELECT statement".
I'm completely new to PL/SQL so I think I'm just expecting too much of it. I have looked over the documentation but haven't found any reason why this wouldn't work the way I have it. The query I'm actually using is much much more complicated than this but I want to keep this simple so I'll get answer quickly.
Thanks in advance :)
There is a real danger in the queries offered by Jim and Alex.
Assumption, you have 20 years of data in there, so a query on YEAR = return 5% of the blocks. I say blocks and not rows because I assume the data is being added on that date so the clustering factor is high.
If you want 1 year, you want the optimizer to use an index on year to find those 5% of rows.
If you want all years, you want the optimizer to use a full table scan to get every row.
Are we good so far?
Once you put this into production, the first time Oracle loads the query it peaks at the bind variable and formulates a plan based on that.
SO let's say the first load is 'All'.
Great, the plan is a Full table scan (FTS) and that plan is cached and you get all the rows back in 5 minutes. No big deal.
The next run you say 1999. But the plan is cached and so it uses a FTS to get just 5% of the rows and it takes 5 minutes. "Hmmm... the user says, that was many fewer rows and the same time." But that's fine... it's just a 5 minute report... life is a little slow when it doesn't have to be but no one is yelling.
That night the batch jobs blow that query out of the cache and in the morning the first user asks for 2001. Oracle checks the cache, not there, peeks at the variable, 2001. Ah, the best plan for that is an index scan. and THAT plan is cached. The results come back in 10 seconds and blows the user away. The next person, who is normally first, does the morning "ALL" report and the query never returns.
WHY?
Because it's getting every single row by looking through the index.... horrible nested loops. The 5 minute report is now at 30 and counting.
Your original post has the best answer. Two queries, that way both will ALWAYS get the best plan, bind variable peeking won't kill you.
The problem you're having is just a fundamental Oracle issue. You run a query from a tool and get the results back INTO the tool. If you put a select statement into a pl/sql block you have to do something with it. You have to load it into a cursor, or array, or variable. It's nothing to do with you being wrong and them being right... it's just a lack of pl/sql skills.
You could do it with one query, something like:
SELECT * FROM TABLE_AWESOME WHERE (? = 'ALL' OR YEAR = ?)
and pass it the argument twice.
In PL/SQL you have to SELECT ... INTO something, which you need to be able to return to the client; that could be a ref cursor as tanging demonstrates. This can complicate the client.
You can do this in SQL instead with something like:
SELECT * FROM TABLE_AWESOME WHERE :AMAZING_YEAR = 'ALL' OR YEAR = :AMAZINGYEAR;
... although you may need to take care about indexes; I'd look at the execution plan with both argument types to check it isn't doing something unexpected.
Not sure about using a SqlDataSource, but you can definately do this via the system.data.oracle or the oracle clients.
You would do this via an anonymous block in asp.net
VAR SYS1 REFCURSOR;
VAR SYS2 REFCURSOR;
DECLARE
FUNCTION CURSORCHOICE(ITEM IN VARCHAR2) RETURN SYS_REFCURSOR IS
L_REFCUR SYS_REFCURSOR;
returnNum VARCHAR2(50);
BEGIN
IF upper(item) = 'ALL' THEN
OPEN L_REFCUR FOR
SELECT level FROM DUAL
CONNECT BY LEVEL < 15 ;
ELSE
OPEN L_REFCUR FOR
SELECT 'NONE' FROM DUAL ;
END IF;
RETURN L_REFCUR;
END ;
BEGIN
:SYS1 := CURSORCHOICE('ALL');
:SYS2 := CURSORCHOICE('NOT ALL');
end ;
/
PRINT :SYS1 ;
PRINT :SYS2 ;
whereas you would simply create an output param (of type refcursor) -- instead of the var sys# refcursors) and pretty much just amend the above code.
I answered a similar question about getting an anonymous block refcuror here
How to return a RefCursor from Oracle function?
This kind of parameter shall be processed from within your code so that your OracleCommand object only executes either queries.
using (var connection = new OracleConnection(connString)) {
connection.Open();
string sql = "select * from table_awesome";
sql = string.Concat(sql, theYear.Equals(#"ALL") ? string.Empty : " where year = :pYear")
using (var command = connection.CreateCommand()) {
command.CommancText = sql;
command.CommandType = CommandType.Text;
var parameter = command.CreateParameter();
parameter.Name = #":yearParam";
parameter.Direction = ParameterDirection.Input;
parameter.Value = theYear;
var reader = command.ExecuteQuery();
if (!reader.HasRows) return;
while (reader.Read()) {
// Extract your data from the OracleDataReader instance here.
}
}
}

Resources