Would using partitions be a good idea in such a situation? - oracle

Context: Oracle 10 database.
In a rather large table (several million records) we recently started to see some performance troubles. The table has some special behaviours / conditions.
its mostly write once and then never gets changed again
during the first day or so the records are classified from 0..N (lets call that column class). records might get reclassified several times during that first day
new entries are added with class 0 meaning "not yet classified"
every hour or so a process classifies the new reocrds and gives them a new class from 1..N
all the readers are only interested in class 1
all records older than a day hardly change their class, > 1 is getting cleaned up a after a few day
Now, as most access is done to class 1, that column is often involved in queries (class = 1), together with other conditions. We have a index on the class column, and then again for certain other columns.
To my question: We are now thinking to partition that table by class. As far as I have understood this would make indexing/working with the data faster, as the class = 1 is already separated from the rest of the data and therefore access to it is implicitly more efficient. Is this correct?
If you agree that this is a good idea I will further read into the topic!
Thanks
Cheers
Update 2010.11.30
Thank you very much for the input. I wasn't aware that its a extra option :) thanks for pointing that out (before I invest too much time into it). But beside the license issue, it appears to me as partition aren't necessarily a good solution in this context.

What operations are experiencing slowness and have you been able to identify why those operations are slow?
If you partition by class, you will be slowing down the process of updating the class for a row. Since that would force a row to move from one partition to another, you'd be turning an update into a delete from the first partition and an insert into the second partition. If your hourly process is slow and it is slow because it takes time to find all the new records, the performance trade-off here may be quite reasonable. If your hourly process is slow because it takes time to compute what the new class should be and to update all the rows, on the other hand, that trade-off is probably a very poor idea.
Because partitioning is an extra cost option on top of the enterprise edition license, I would suggest making sure that you can't use some function-based indexes to get most of the performance improvements you're targeting at relatively little cost. If, for example, you had two function-based indexes
CREATE INDEX idx_new_entries
ON your_table( (CASE WHEN class = 0 THEN primary_key ELSE null END) );
CREATE INDEX idx_class1_entries
ON your_table( (CASE WHEN class = 1 THEN primary_key ELSE null END) );
along with a couple of views
CREATE VIEW vw_new_entries
AS
SELECT (CASE WHEN class = 0 THEN primary_key ELSE null END) primary_key,
<<list of columns>>
FROM your_table
WHERE class = 0
CREATE VIEW vw_class1_entries
AS
SELECT (CASE WHEN class = 1 THEN primary_key ELSE null END) primary_key,
<<list of columns>>
FROM your_table
WHERE class = 1
then any queries against the new views that filtered on the PRIMARY_KEY would use the function-based indexes which in turn would only index the appropriate rows in the underlying table. That may allow you to improve lookup performance without needing to resort to partitioning.

How big is the table in MB? What is tghe growth rate? Are you purging data or do you plan to purge data? What indexes are on the table now? Can you give us the sample table definition? Partitioning is an extra license option. Have you verified that someone is going to actually pay for it?
and most importantly, please provide sample queries
What you have provided is not enough information to base a decision on.

Yepp, sounds like a good idea.
There are better alternatives to this though, but an easy fix is a partition.

Related

Filter settings in Oracle Views

I have 2 views like this (simplified):
CREATE VIEW BASE_VIEW AS
(
-- Simplified version, view actually does a lot more.
SELECT * FROM MYTABLE;
);
CREATE VIEW OUTER_VIEW AS
(
-- The where clause here makes this view return half the rows as the above BASE_VIEW
SELECT * FROM BASE_VIEW where SomeField = 'something'
);
My question is, shouldn't OUTER_VIEW execute in roughly half the time as the BASE_VIEW? I don't see this behavior. It almost takes the full time as it takes to execute BASE_VIEW.
Since Oracle compiles the referenced views into your outer view, I thought it would be intelligent enough to optimize the query based on the outer view's where clase. Should it not?
EDIT: In fact the "base view" query with the where clause from the "outer view" is taking half the time.
Imagine you have an entire room full of different kinds of fruit scattered all over the place. The view BASE_VIEW is like saying, "Retrieve all fruit in this room."
The OUTER_VIEW is like saying, "Retrieve all oranges from this room." If the fruit is not ordered in some way, you will spend roughly the same amount of time as the BASE_VIEW to search for all the oranges.
Now imagine you have all the fruit separated in baskets. Grabbing all the oranges becomes easy because you know exactly where they are.
Adding an index on SomeField is what orders the data so that it doesn't have to search through the entire room of fruit. There are some restrictions on when an index could help or not. Based on the amount of rows in your table, the oracle optimizer might decide that it is faster to just grab all the rows anyway. I suggest you read through this guide to determine the proper way to index tables: Use The Index, Luke. It helped me a ton when starting out indexing.
If SomeField does not directly come from MYTABLE and is just a created field in BASE_VIEW, then it might help to index the field that creates SomeField. It is hard for us to know without seeing the data.

Oracle database help optimizing LIKE searches

I am on Oracle 11g and we have these 3 core tables:
Customer - CUSTOMERID|DOB
CustomerName - CUSTOMERNAMEID|CustomerID|FNAME|LNAME
Address - ADDRESSID|CUSTOMERID|STREET|CITY|STATE|POSTALCODE
I have about 60 million rows on each of the tables and the data is a mix of US and Canadian population.
I have a front-end application that calls a web service and they do a last name and partial zip search. So my query basically has
where CUSTOMERNAME.LNAME = ? and ADDRESS.POSTALCODE LIKE '?%'
They typically provide the first 3 digits of the zip.
The address table has an index on all street/city/state/zip and another one on state and zip.
I did try adding an index exclusively for the zip and forced oracle to use that index on my query but that didn't make any difference.
For returning about 100 rows (I have pagination to only return 100 at a time) it takes about 30 seconds which isn't ideal. What can I do to make this better?
The problem is that the filters you are applying are not very selective and they apply to different tables. This is bad for an old-fashioned btree index. If the content is very static you could try bitmap indexes. More precisely a function based bitmap join index on the first three letter of the last name and a bitmap join index on the postal code column. This assumes that very few people with the whose last name starts with certain letters live in an are with a certain postal code.
CREATE BITMAP INDEX ix_customer_custname ON customer(SUBSTR(cn.lname,1,3))
FROM customer c, customername cn
WHERE c.customerid = cn.customerid;
CREATE BITMAP INDEX ix_customer_postalcode ON customer(SUBSTR(a.postalcode,1,3))
FROM customer c, address a
WHERE c.customerid = a.customerid;
If you are successful you should see the two bitmap indexes becoming AND connected. The execution time should drop to a couple of seconds. It will not be as fast as a btree index.
Remarks:
You may have to play around a bit whether it is more efficient to make one or two indexes and whether the function are helpful useful.
If you decide to do it function based you should include the exact same function calls in the where clause of your query. Otherwise the index will not be used.
DML operations will be considerably slower. This is only useful for tables with static data. Note that DML operations will block whole row "ranges". Concurrent DML operations will run into problems.
Response time will probably still be seconds not instanteously like a BTREE index.
AFAIK this will work only on the enterprise edition. The syntax is untested because I do not have an enterprise db available at the moment.
If this is still not fast enough you can create a materialized view with customerid, last name and postal code and but a btree index on it. But that is kind of expensive, too.

Having more than 50 column in a SQL table

I have designed my database in such a way that One of my table contains 52 columns. All the attributes are tightly associated with the primary key attribute, So there is no scope of further Normalization.
Please let me know if same kind of situation arises and you don't want to keep so many columns in a single table, what is the other option to do that.
It is not odd in any way to have 50 columns. ERP systems often have 100+ columns in some tables.
One thing you could look into is to ensure most columns got valid default values (null, today etc). That will simplify inserts.
Also ensure your code always specifies the columns (i.e no "select *"). Any kind of future optimization will include indexes with a subset of the columns.
One approach we used once, is that you split your table into two tables. Both of these tables get the primary key of the original table. In the first table, you put your most frequently used columns and in the second table you put the lesser used columns. Generally the first one should be smaller. You now can speed up things in the first table with various indices. In our design, we even had the first table running on memory engine (RAM), since we only had reading queries. If you need to get the combination of columns from table1 and table2 you need to join both tables with the primary key.
A table with fifty-two columns is not necessarily wrong. As others have pointed out many databases have such beasts. However I would not consider ERP systems as exemplars of good data design: in my experience they tend to be rather the opposite.
Anyway, moving on!
You say this:
"All the attributes are tightly associated with the primary key
attribute"
Which means that your table is in third normal form (or perhaps BCNF). That being the case it's not true that no further normalisation is possible. Perhaps you can go to fifth normal form?
Fifth normal form is about removing join dependencies. All your columns are dependent on the primary key but there may also be dependencies between columns: e.g, there are multiple values of COL42 associated with each value of COL23. Join dependencies means that when we add a new value of COL23 we end up inserting several records, one for each value of COL42. The Wikipedia article on 5NF has a good worked example.
I admit not many people go as far as 5NF. And it might well be that even with fifty-two columns you table is already in 5NF. But it's worth checking. Because if you can break out one or two subsidiary tables you'll have improved your data model and made your main table easier to work with.
Another option is the "item-result pair" (IRP) design over the "multi-column table" MCT design, especially if you'll be adding more columns from time to time.
MCT_TABLE
---------
KEY_col(s)
Col1
Col2
Col3
...
IRP_TABLE
---------
KEY_col(s)
ITEM
VALUE
select * from IRP_TABLE;
KEY_COL ITEM VALUE
------- ---- -----
1 NAME Joe
1 AGE 44
1 WGT 202
...
IRP is a bit harder to use, but much more flexible.
I've built very large systems using the IRP design and it can perform well even for massive data. In fact it kind of behaves like a column organized DB as you only pull in the rows you need (i.e. less I/O) rather that an entire wide row when you only need a few columns (i.e. more I/O).

Improve SQL Server 2005 Query Performance

I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>

Oracle (PL/SQL): Is UPDATE RETURNING concurrent?

I'm using table with a counter to ensure unique id's on a child element.
I know it is usually better to use a sequence, but I can't use it because I have a lot of counters (a customer can create a couple of buckets and each of them needs to have their own counter, they have to start with 1 (it's a requirement, my customer needs "human readable" keys).
I'm creating records (let's call them items) that have a prikey (bucket_id, num = counter).
I need to guarantee that the bucket_id / num combination is unique (so using a sequence as prikey won't fix my problem).
The creation of rows doesn't happen in pl/sql, so I need to claim the number (btw: it's not against the requirements to have gaps).
My solution was:
UPDATE bucket
SET counter = counter + 1
WHERE id = param_id
RETURNING counter INTO num_forprikey;
PL/SQL returns var_num_forprikey so the item record can be created.
Question:
Will I always get unique num_forprikey even if the user concurrently asks for new items in a bucket?
Will I always get unique num_forprikey
even if the user concurrently asks for
new items in a bucket?
Yes, at least up to a point. The first user to issue that update gets a lock on the row. So no other user can successfully issue that same statement until user numero uno commits (or rolls back). So uniqueness is guaranteed.
Obviously, the cavil is regarding concurrency. Your access to the row is serialized, so there is no way for two users to get a new PRIKEY simultaneously. This is not necessarily a problem. It depends on how many users you have creating new Items, and how often they do it. One user peeling off numbers in the same session won't notice a thing.
I seem to recall this problem from many years back working on of all things an INGRES database. There were no sequences in those days so a lot of effort was put into finding the best scaling solution for this problem by the top INGRES minds of the day. I was fortunate enough to be working along side them so that even though my mind is pitifully smaller than any of theirs, proxmity = residual affect and I retained something. This was one of the things. Let me see if I can remember.
1) for each counter you need row in a work table.
2) each time you need a number
a) lock the row
b) update it
c) get its new value (you use returning for this which I avoid like the plague)
d) commit the update to release your lock on the row
The reason for the commit is for trying to get some kind of scalability. There will always be a limit but you do not serialize on getting a number for any period of time.
In the oracle world we would improve the situation by using a function defined as an AUTONOMOUS_TRANSACTION in order to acquire the next number. IF you think about it, this solution requires that gaps be allowed which you said is OK. By commiting the number update independently of the main transaction, you gain scalability but you introduce gapping.
You will have to accept the fact that your scalability will drop dramatically in this scenario. This is due to at least two reasons:
1) the update/select/commit sequence does its best to reduce the time during which the KEY row is locked, but it is still not zero. Under heavy load, you will serialize and eventually be limited.
2) you are commiting on every key get. A commit is an expensive operation requiring many memory and file management actions on the part of the database. This will limit you also.
In the end you are likely looking at three or more orders of magnitude drop in concurrent transaction load because you are not using sequences. I base this on my experience of the past.
But if you customer requires it, what can you do right?
Good luck. I have not tested the code for syntax errors, I leave that to you.
create or replace function get_next_key (key_name_p in varchar2) return number is
pragma autonomous_transaction;
kev_v number;
begin
update key_table set key = key + 1 where key_name = key_name_p;
select key_name into key_name_v from key_name where key_name = key_name_p;
commit;
return (key_v);
end;
/
show errors
You can still use sequences, just use the row_number() analytic function to please your users. I described it here in more detail: http://rwijk.blogspot.com/2008/01/sequence-within-parent.html
Regards,
Rob.
I'd figure out how to make sequences work. It's the only guarantee, though an exception clause could be coded
http://www.orafaq.com/forum/t/83382/0/ The benefit to sequences (and they could be dynamically created, is you can specify nocache and guarantee order)

Resources