Include variable as field name in linq statement - linq

Hi I have a link statement against a database table I did not create... The data structure is
Tbl_BankHols
BHDate .... Datetime
E ......... Bit
S ......... Bit
W ......... Bit
I ......... Bit
Basically it has a list of dates and then a value of 0 or 1 in E, S, W, I which indicate if that date is a bank holiday in England, Scotland, Wales and/or Ireland.
If I want to find out if a date is a bank holiday in any of the countries my Linq statement is
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate _
Select d.BHDate
Where chkDate is the date I am checking. If a result is returned then the date is a bank holiday in one of the countries.
I now need to find out if chkDate is a bank holiday in a particular country how do I introduce that into the where statement?
I'm asking if this is possible before I think about changing the structure of the database. I was thinking of just having a single country field as a string which will contain values like E, EW, EWS, EWSI, I and other similar combinations and then I just use WHERE BCountry LIKE %X% (where X is the country I'm interested in). Or is there a better way?

Erm stop,
Your suggestion for extra denormalisation and using LIKE is a really "wrong" idea.
You need one table for countries, lets call it Country and another table for holidays, lets call it Holiday. The Country table should contain a row for each country in your system/model. The Holiday table should have two columns. One for the Date and a foriegn key to country, lets call it CountryId.
Then your linq could look something like,
db.Holiday.Any(Function(h) h.Country.Name =
"SomeCountry" AndAlso h.Date = someDate)
The reasons why you shoudn't use LIKE for this are manifold but two major objections are.
LIKE doesn't perform well, its hard for an index to support it, and,
Lets imagine a situation where you need to store holidays for these countries,
Ecuador
El Salvador
Estonia
Ethiopia
England
Now, you have already assigned the code "E" to England, what code will you give to the others? No problem you say, "EL", "ET" ... but, already your LIKE "%E" condition is broken.
Here are the scripts for the schema I would go with.
CREATE TABLE [Country](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_Country] PRIMARY KEY CLUSTERED
(
[Id] ASC
));
CREATE TABLE [Holiday](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Date] [date] NOT NULL,
[CountryId] [int] NOT NULL FOREIGN KEY Country(Id),
CONSTRAINT [PK_Country] PRIMARY KEY CLUSTERED
(
[Id] ASC
));

Instead of changing the structure of your table the way you wrote, you could introduce a new Region (Code, Description) table and add a foreign key to your table pointing to the regions table. Your bank holidays table will then contain one record per (date/region) combination.
And your linq statement:
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate And d.Region = "England" _
Select d.BHDate

before I think about changing the structure of the database
You can do it by composing a query with OR predicates by using LINQKit.
Dim IQueryable<BankHoliday> BHQ = ... (your query)
Dim pred = Predicate.False(Of BankHolifday)
If countryString.Contains("E")
pred = pred.Or(Function(h) h.E)
EndIf
If countryString.Contains("S")
pred = pred.Or(Function(h) h.S)
EndIf
...
Return query.Where(pred.Expand())
(I'm not fluent in VB so there may be some error in there)
See this answer for a similar example.

While the DB structure is not optimal, if you are working with legacy code and there's no effort given for a full refactor, you're gonna have to make do (I feel for you).
The simplest option I think you have is to select not just the date, but also the attributes.
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate _
Select d
This code will give you d.S, d.E, d.W, and d.I, which you can use programmatically to determine if the holiday applies to whatever country you are currently working on. This means outside the query, you will have a separate if statement which would qualify if the holiday applies to the country you are processing.

Related

can't display table in multiple textbox

I must go through the records of a table and display them in multiple textboxes
I am using the table with four different alias to have four workareas on the same table and have four record pointers.
USE Customers ALIAS customers1
USE customers AGAIN ALIAS customers2
USE customers AGAIN ALIAS customers3
USE customers AGAIN ALIAS customers4
Thisform.TxtNomCli.ControlSource = "customers.name"
Thisform.TxtIdent.ControlSource = "customers.identify"
Thisform.TxtAddress.ControlSource = "customers.address"
Thisform.TxtTele.ControlSource = "customers.phone"
Thisform.TxtNomCli2.ControlSource = "customers2.name"
Thisform.TxtIdent2.ControlSource = "customers2.identify"
Thisform.TxtDirec2.ControlSource = "customers2.address"
Thisform.TxtTele2.ControlSource = "customers2.phone"
Thisform.TxtNomCli3.ControlSource = "customers3.name"
Thisform.TxtIdent3.ControlSource = "customers3.identify"
Thisform.TxtDirec3.ControlSource = "customers3.address"
Thisform.TxtTele3.ControlSource = "customers3.phone"
Thisform.TxtNomCli4.ControlSource = "customers4.name"
Thisform.TxtIdent4.ControlSource = "customers4.identify"
Thisform.TxtDirec4.ControlSource = "customers4.address"
Thisform.TxtTele4.ControlSource = "customers4.phone"
how to go through the records of the table, that in customers is in the first record, customers2 in the second record, customers3 in the third record and customers4 in the fourth record of the table?
How do I make each row of the textbox show the corresponding row of the table?
I would SQL Select id + whatever other fields you need into four cursors:
select id, identifica, nombre, direccion, telefono from customers ;
into cursor customers1 nofilter readwrite
select id, identifica, nombre, direccion, telefono from customers;
into cursor customers2 nofilter readwrite
* repeat for 3 and 4
Then set your ControlSources() to the cursors, not the base table. If you need to update records you can use the id of the modified record in the cursor to update the correct record in the base table.
You could simply use SET RELATION to achieve what you want. However, in your current code you are not really using 4 aliases. You are reopening the same table with a different alias in the same workarea and you would end up with a single table with alias Customers4. To do that correctly, you need to add "IN 0" clause to your USE commands. ie:
USE customers ALIAS customers1
USE customers IN 0 AGAIN ALIAS customers2
USE customers IN 0 AGAIN ALIAS customers3
USE customers IN 0 AGAIN ALIAS customers4
SELECT customers1
SET RELATION TO ;
RECNO()+1 INTO Customers2, ;
RECNO()+2 INTO Customers3, ;
RECNO()+3 INTO Customers4 IN Customers1
With this setup, as you move the pointer in Customers1 it would move in all other 3 aliases accordingly (note that there is no order set).
Having said these, now you should think why you need to do this? Maybe having another control like a grid is the way to go? Or using an array might be a better way to control this? ie: With an array:
USE (_samples+'data\customer') ALIAS customers
LOCAL ARRAY laCustomers[4]
LOCAL ix
FOR ix=1 TO 4
GO m.ix
SCATTER NAME laCustomers[m.ix]
ENDFOR
? laCustomers[1].Cust_id, laCustomers[2].Cust_id, laCustomers[3].Cust_id, laCustomers[4].Cust_id
With this approach, you could set your controlsources to be laCustomers[1].Identify, laCustomers[1].name and so on. While saving back to data, you would go to related record and do a GATHER. That would be all.
First you need to think about what you really want to do.

Sum of only Distinct values in a Column in DAX

I have table[Table 1] having three columns
OrganizationName, FieldName, Acres having data as follows
organizationname fieldname Acres
ABC |F1 |0.96
ABC |F1 |0.96
ABC |F1 |0.64
I want to calculate the sum of Distinct values of Acres
(eg: 0.96+0.64) in DAX.
One of the problems with doing what you want is that many measures rely on filters and not actual table expressions. So, getting a distinct list of values and then filtering the table by those values, just gives you the whole table back.
The iterator functions are handy and operate on table expressions, so try SUMX
TotalDistinctAcreage = SUMX(DISTINCT(Table1[Acres]),[Acres])
This will generate a table that is one column containing only the distinct values for Acres, and then add them up. Note that this is only looking at the Acres column, so if different fields and organizations had the same acreage -- then that acreage would still only be counted once in this sum.
If instead you want to add up the acreage simply on distinct rows, then just make a small change:
TotalAcreageOnDistinctRows = SUMX(DISTINCT(Table1),[Acres])
Hope it helps.
Ok, you added these requirements:
Thank You. :) However, I want to add Distinct values of Acres for a
Particular Fieldname. Is this possible? – Pooja 3 hours ago
The easiest way really is just to go ahead and slice or filter the original measure that I gave you. But if you have to apply the filter context in DAX, you can do it like this:
Measure =
SUMX(
FILTER(
SUMMARIZE( Table1, [FieldName], [Value] )
, [FieldName] = "<put the name of your specific field here"
)
, [Value]
)

Need a query that will satisfy two conditions from two tables

table a and table b, table a has two field, field 1 and 2, and table b has two fields, field 3 and 4.
where
tablea.field1 >= 4 and tableb.field3 = 'male'
is something like the above query possible, Ive tried something like this in my database although there are not errors and i get results, it checks whether both are true separately.
im going to try to be abit clear, and cant give out the query outright as much as i would like to (University reasons). so ill explain, table 1 has several columns of information one of which is number of kids, table two has more information on said kids, like gender.
so im having trouble creating a query where first it checks that a parent has 2 kids but two male kids, thus creating a relationship between parent table and kids table.
CREATE TABLE parent
(pID NUMBER,
numberkids INTEGER)
CREATE TABLE kids
(kID NUMBER,
father NUMBER,
mother NUMBER,
gender VARCHAR(7))
select
p.pid
from
kids k
inner join parent pm on pm.pid = k.mother
inner join parent pf on pf.pid = k.father,
parent p
where
p.numberkids >= 2 and k.gender = 'male'
/
this query checks that the parent has 2 kids or more and the kids gender is male, but i need it to check whether the parent has 2 kids and OF those kids is there 2 or more male kids (or in short to check whether the parent has 2 or more male kids).
sorry for the long winded explanation i modified the tables and the query from the one im actually going to use (so some mistakes might be there, but the original query work, just not how i want explained above). any help would be greatly appreciated.
The best thing to do would be to take the numberKids column out of the parent table ... you'll find it very difficult to maintain.
Anyway, something like this might do the trick:
SELECT p.pID
FROM parent p INNER JOIN kids k
ON p.pID IN (k.father, k.mother)
WHERE k.gender = 'male'
GROUP BY p.pID
HAVING COUNT(*) >= 2;

Performance tuning on reading huge table

I have a huge table with more than one hundred million of rows and I have to query this table to return a set of data in a minimum of time.
So I have created a test environment with this table definition:
CREATE TABLE [dbo].[Test](
[Dim1ID] [nvarchar](20) NOT NULL,
[Dim2ID] [nvarchar](20) NOT NULL,
[Dim3ID] [nvarchar](4) NOT NULL,
[Dim4ID] [smalldatetime] NOT NULL,
[Dim5ID] [nvarchar](20) NOT NULL,
[Dim6ID] [nvarchar](4) NOT NULL,
[Dim7ID] [nvarchar](4) NOT NULL,
[Dim8ID] [nvarchar](4) NOT NULL,
[Dim9ID] [nvarchar](4) NOT NULL,
[Dim10ID] [nvarchar](4) NOT NULL,
[Dim11ID] [nvarchar](20) NOT NULL,
[Value] [decimal](21, 6) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[Dim1ID] ASC,
[Dim2ID] ASC,
[Dim3ID] ASC,
[Dim4ID] ASC,
[Dim5ID] ASC,
[Dim6ID] ASC,
[Dim7ID] ASC,
[Dim8ID] ASC,
[Dim9ID] ASC,
[Dim10ID] ASC,
[Dim11ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This table is the fact table of Star schema architecture (fact/dimensions). As you can see I have a clustered index on all the columns except for the “Value” column.
I have filled this data with approx. 10,000,000 rows for testing purpose. The fragmentation is currently at 0.01%.
I would like to improve the performance when reading a set of rows from this table using this query:
DECLARE #Dim1ID nvarchar(20) = 'C1'
DECLARE #Dim9ID nvarchar(4) = 'VRT1'
DECLARE #Dim10ID nvarchar(4) = 'S1'
DECLARE #Dim6ID nvarchar(4) = 'FRA'
DECLARE #Dim7ID nvarchar(4) = '' -- empty = all
DECLARE #Dim8ID nvarchar(4) = '' -- empty = all
DECLARE #Dim2 TABLE ( Dim2ID nvarchar(20) NOT NULL )
INSERT INTO #Dim2 VALUES ('A1'), ('A2'), ('A3'), ('A4');
DECLARE #Dim3 TABLE ( Dim3ID nvarchar(4) NOT NULL )
INSERT INTO #Dim3 VALUES ('P1');
DECLARE #Dim4ID TABLE ( Dim4ID smalldatetime NOT NULL )
INSERT INTO #Dim4ID VALUES ('2009-01-01'), ('2009-01-02'), ('2009-01-03');
DECLARE #Dim11 TABLE ( Dim11ID nvarchar(20) NOT NULL )
INSERT INTO #Dim11 VALUES ('Var0001'), ('Var0040'), ('Var0060'), ('Var0099')
SELECT RD.Dim2ID,
RD.Dim3ID,
RD.Dim4ID,
RD.Dim5ID,
RD.Dim6ID,
RD.Dim7ID,
RD.Dim8ID,
RD.Dim9ID,
RD.Dim10ID,
RD.Dim11ID,
RD.Value
FROM dbo.Test RD
INNER JOIN #Dim2 R
ON RD.Dim2ID = R.Dim2ID
INNER JOIN #Dim3 C
ON RD.Dim3ID = C.Dim3ID
INNER JOIN #Dim4ID P
ON RD.Dim4ID = P.Dim4ID
INNER JOIN #Dim11 V
ON RD.Dim11ID = V.Dim11ID
WHERE RD.Dim1ID = #Dim1ID
AND RD.Dim9ID = #Dim9ID
AND ((#Dim6ID <> '' AND RD.Dim6ID = #Dim6ID) OR #Dim6ID = '')
AND ((#Dim7ID <> '' AND RD.Dim7ID = #Dim7ID) OR #Dim7ID = '')
AND ((#Dim8ID <>'' AND RD.Dim8ID = #Dim8ID) OR #Dim8ID = '')
I have tested this query and that’s returned 180 rows with these times:
1st execution: 1 min 32 sec; 2nd execution: 1 min.
I would like to return the data in a few seconds if it’s possible.
I think I can add the non-clustered indexes but I am not sure what the best way is to set the non-clustered indexes!
If having sorted order data in this table could improve the performances?
Or are there other solutions than indexes?
Thanks.
Consider your datatypes as one problem. Do you need nvarchar? It's measurably slower
Second problem: the PK is wrong for your query, It should be Dim1ID, Dim9ID first (or vice versa based on selectivity). or some flavour with the JOIN columns in.
Third problem: use of OR. This construct usually works despite what nay-sayers who don't try it will post.
RD.Dim7ID = ISNULL(#Dim7ID, RD.Dim7ID)
This assumes #Dim7ID is NULL though. The optimiser will short circuit it in most cases.
I'm with gbn on this. Typically in star schema data warehouses, the dimension IDs are int, which is 4 bytes. Not only are all your dimensions larger than that, the nvarchar are both varying and using wide characters.
As far as indexing, just one clustering index may be fine since in the case of your fact table, you really don't have many facts. As gbn says, with your particular example, your index needs to be in the order of the columns which you are going to be providing so that the index can actually be used.
In a real-world case of a fact table with a number of facts, your clustered index is simply for data organization - you'll probably be expecting some non-clustered indexes for specific usages.
But I'm worried that your query specifies an ID parameter. Typically in a DW environment, you don't know the IDs, for selective queries, you select based on the dimensions, and the ids are meaningless surrogates:
SELECT *
FROM fact
INNER JOIN dim1
ON fact.dim1id = dim1.id
WHERE dim1.attribute = ''
Have you looked at Kimball's books on dimensional modeling? I think if you are going to a star schema, you should probably be familiar with his design techniques, as well as the various pitfalls he discusses with the too many and too few dimensions.
see this: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
quick answer, if you are on the right service pack of SQL Server 2008, is to
try adding that to the end of the query:
OPTION(RECOMPILE)
when on the proper service pack of SQL Server 2008, the OPTION(RECOMPILE) will build the execution plan based on the runtime value of the local variables.
For people still using SQl Server 2008 without the proper service packs or still on 2005 see: Dynamic Search Conditions in T-SQLVersion for SQL 2005 and Earlier
I'd be a little concerned about having all the non-value columns in your clustered index. That will make for a large index in the non-leaf levels. And, that key will be used in the nonclustered indexes. And, it will only provide any benefit when [Dim1ID] is included in the query. So, even if you're only optimizing this query, you're probably getting a full scan.
I would consider a clustered index on the most-commonly used key, and if you have a lot of date-related queries (e.g., date between a and b), go with the date key. Then, create non clustered indexes on the other key values.

How to otimize select from several tables with millions of rows

Have the following tables (Oracle 10g):
catalog (
id NUMBER PRIMARY KEY,
name VARCHAR2(255),
owner NUMBER,
root NUMBER REFERENCES catalog(id)
...
)
university (
id NUMBER PRIMARY KEY,
...
)
securitygroup (
id NUMBER PRIMARY KEY
...
)
catalog_securitygroup (
catalog REFERENCES catalog(id),
securitygroup REFERENCES securitygroup(id)
)
catalog_university (
catalog REFERENCES catalog(id),
university REFERENCES university(id)
)
Catalog: 500 000 rows, catalog_university: 500 000, catalog_securitygroup: 1 500 000.
I need to select any 50 rows from catalog with specified root ordered by name for current university and current securitygroup. There is a query:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Where 100 - some catalog, 200 - some securitygroup, 300 - some university. This query return 50 rows from ~ 170 000 in 3 minutes.
But next query return this rows in 2 sec:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
I build next indexes: (catalog.id, catalog.name, catalog.owner), (catalog_securitygroup.catalog, catalog_securitygroup.index), (catalog_university.catalog, catalog_university.university).
Plan for first query (using PLSQL Developer):
http://habreffect.ru/66c/f25faa5f8/plan2.jpg
Plan for second query:
http://habreffect.ru/f91/86e780cc7/plan1.jpg
What are the ways to optimize the query I have?
The indexes that can be useful and should be considered deal with
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
So the following fields can be interesting for indexes
c: id, root
cs: catalog, securitygroup
cu: catalog, university
So, try creating
(catalog_securitygroup.catalog, catalog_securitygroup.securitygroup)
and
(catalog_university.catalog, catalog_university.university)
EDIT:
I missed the ORDER BY - these fields should also be considered, so
(catalog.name, catalog.id)
might be beneficial (or some other composite index that could be used for sorting and the conditions - possibly (catalog.root, catalog.name, catalog.id))
EDIT2
Although another question is accepted I'll provide some more food for thought.
I have created some test data and run some benchmarks.
The test cases are minimal in terms of record width (in catalog_securitygroup and catalog_university the primary keys are (catalog, securitygroup) and (catalog, university)). Here is the number of records per table:
test=# SELECT (SELECT COUNT(*) FROM catalog), (SELECT COUNT(*) FROM catalog_securitygroup), (SELECT COUNT(*) FROM catalog_university);
?column? | ?column? | ?column?
----------+----------+----------
500000 | 1497501 | 500000
(1 row)
Database is postgres 8.4, default ubuntu install, hardware i5, 4GRAM
First I rewrote the query to
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root < 50
AND cs.catalog = c.id
AND cu.catalog = c.id
AND cs.securitygroup < 200
AND cu.university < 200
ORDER BY c.name
LIMIT 50 OFFSET 100
note: the conditions are turned into less then to maintain comparable number of intermediate rows (the above query would return 198,801 rows without the LIMIT clause)
If run as above, without any extra indexes (save for PKs and foreign keys) it runs in 556 ms on a cold database (this is actually indication that I oversimplified the sample data somehow - I would be happier if I had 2-4s here without resorting to less then operators)
This bring me to my point - any straight query that only joins and filters (certain number of tables) and returns only a certain number of the records should run under 1s on any decent database without need to use cursors or to denormalize data (one of these days I'll have to write a post on that).
Furthermore, if a query is returning only 50 rows and does simple equality joins and restrictive equality conditions it should run even much faster.
Now let's see if I add some indexes, the biggest potential in queries like this is usually the sort order, so let me try that:
CREATE INDEX test1 ON catalog (name, id);
This makes execution time on the query - 22ms on a cold database.
And that's the point - if you are trying to get only a page of data, you should only get a page of data and execution times of queries such as this on normalized data with proper indexes should take less then 100ms on decent hardware.
I hope I didn't oversimplify the case to the point of no comparison (as I stated before some simplification is present as I don't know the cardinality of relationships between catalog and the many-to-many tables).
So, the conclusion is
if I were you I would not stop tweaking indexes (and the SQL) until I get the performance of the query to go below 200ms as rule of the thumb.
only if I would find an objective explanation why it can't go below such value I would resort to denormalisation and/or cursors, etc...
First I assume that your University and SecurityGroup tables are rather small. You posted the size of the large tables but it's really the other sizes that are part of the problem
Your problem is from the fact that you can't join the smallest tables first. Your join order should be from small to large. But because your mapping tables don't include a securitygroup-to-university table, you can't join the smallest ones first. So you wind up starting with one or the other, to a big table, to another big table and then with that large intermediate result you have to go to a small table.
If you always have current_univ and current_secgrp and root as inputs you want to use them to filter as soon as possible. The only way to do that is to change your schema some. In fact, you can leave the existing tables in place if you have to but you'll be adding to the space with this suggestion.
You've normalized the data very well. That's great for speed of update... not so great for querying. We denormalize to speed querying (that's the whole reason for datawarehouses (ok that and history)). Build a single mapping table with the following columns.
Univ_id, SecGrp_ID, Root, catalog_id. Make it an index organized table of the first 3 columns as pk.
Now when you query that index with all three PK values, you'll finish that index scan with a complete list of allowable catalog Id, now it's just a single join to the cat table to get the cat item details and you're off an running.
The Oracle cost-based optimizer makes use of all the information that it has to decide what the best access paths are for the data and what the least costly methods are for getting that data. So below are some random points related to your question.
The first three tables that you've listed all have primary keys. Do the other tables (catalog_university and catalog_securitygroup) also have primary keys on them?? A primary key defines a column or set of columns that are non-null and unique and are very important in a relational database.
Oracle generally enforces a primary key by generating a unique index on the given columns. The Oracle optimizer is more likely to make use of a unique index if it available as it is more likely to be more selective.
If possible an index that contains unique values should be defined as unique (CREATE UNIQUE INDEX...) and this will provide the optimizer with more information.
The additional indexes that you have provided are no more selective than the existing indexes. For example, the index on (catalog.id, catalog.name, catalog.owner) is unique but is less useful than the existing primary key index on (catalog.id). If a query is written to select on the catalog.name column, it is possible to do and index skip scan but this starts being costly (and most not even be possible in this case).
Since you are trying to select based in the catalog.root column, it might be worth adding an index on that column. This would mean that it could quickly find the relevant rows from the catalog table. The timing for the second query could be a bit misleading. It might be taking 2 seconds to find 50 matching rows from catalog, but these could easily be the first 50 rows from the catalog table..... finding 50 that match all your conditions might take longer, and not just because you need to join to other tables to get them. I would always use create table as select without restricting on rownum when trying to performance tune. With a complex query I would generally care about how long it take to get all the rows back... and a simple select with rownum can be misleading
Everything about Oracle performance tuning is about providing the optimizer enough information and the right tools (indexes, constraints, etc) to do its job properly. For this reason it's important to get optimizer statistics using something like DBMS_STATS.GATHER_TABLE_STATS(). Indexes should have stats gathered automatically in Oracle 10g or later.
Somehow this grew into quite a long answer about the Oracle optimizer. Hopefully some of it answers your question. Here is a summary of what is said above:
Give the optimizer as much information as possible, e.g if index is unique then declare it as such.
Add indexes on your access paths
Find the correct times for queries without limiting by rowwnum. It will always be quicker to find the first 50 M&Ms in a jar than finding the first 50 red M&Ms
Gather optimizer stats
Add unique/primary keys on all tables where they exist.
The use of rownum is wrong and causes all the rows to be processed. It will process all the rows, assigned them all a row number, and then find those between 0 and 50. When you want to look for in the explain plan is COUNT STOPKEY rather than just count
The query below should be an improvement as it will only get the first 50 rows... but there is still the issue of the joins to look at too:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
where rownum <= 50
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Also, assuming this for a web page or something similar, maybe there is a better way to handle this than just running the query again to get the data for the next page.
try to declare a cursor. I dont know oracle, but in SqlServer would look like this:
declare #result
table (
id numeric,
name varchar(255)
);
declare __dyn_select_cursor cursor LOCAL SCROLL DYNAMIC for
--Select
select distinct
c.id, c.name
From [catalog] c
inner join university u
on u.catalog = c.id
and u.university = 300
inner join catalog_securitygroup s
on s.catalog = c.id
and s.securitygroup = 200
Where
c.root = 100
Order by name
--Cursor
declare #id numeric;
declare #name varchar(255);
open __dyn_select_cursor;
fetch relative 1 from __dyn_select_cursor into #id,#name declare #maxrowscount int
set #maxrowscount = 50
while (##fetch_status = 0 and #maxrowscount <> 0)
begin
insert into #result values (#id, #name);
set #maxrowscount = #maxrowscount - 1;
fetch next from __dyn_select_cursor into #id, #name;
end
close __dyn_select_cursor;
deallocate __dyn_select_cursor;
--Select temp, final result
select
id,
name
from #result;

Resources