How to fastly select data from Oracle

How to fastly select data from Oracle - performance

I have the following Oracle table:
GAMES
id_game int
id_user int
dt_game DATE
Before creating a new record I must check that the same user has not inserted a game more than N times a day.
I'm actually selecting the number of games played today in this way:
select count(1) as num from GAMES
where id_user=ID_USER and
to_char(dt_game,'DD/MM/YYYY')=to_char(sysdate,'DD/MM/YYYY')
I don't like it very much. Is there a better way to do it?
Thanks in advance.

The date conversions are a bit pointless and prevent any index on that column being used; you could simplify that bit with dt_game > trunc(sysdate).

There is a problem (related to concurrency) with checking the table and counting the rows before inserting a new row.
Let us say the user currently has played 9 games and you want to limit the rows to 10..
if you open two different sessions, then the both of them will see the currently uncommitted value of 9 games and hence both the games are allowed, By the time the transaction is complete, the user would have played 11 games..which is against the rule that you are trying to enforce.

You can use a unique constraint to ensure that a fixed maximum number of records may be inserted for a particular day. For example, if N=5:
CREATE TABLE GAMES (
id_game NUMBER NOT NULL,
id_user NUMBER NOT NULL,
dt_game DATE NOT NULL,
user_game_count NUMBER NOT NULL,
CONSTRAINT max_games_per_day
CHECK (user_game_count BETWEEN 1 AND 5),
CONSTRAINT dt_game_notime
CHECK (dt_game = TRUNC(dt_game)),
CONSTRAINT games_unique
UNIQUE (id_user, dt_game, user_game_count),
);
The max_games_per_day constraint means that it can only take one of 5 different values. The dt_game_notime constraint means that we won't get various time values. The unique constraint, finally, ensures that for any particular date and user, they can only insert up to 5 rows.

In Oracle you can create a function based index on id_user, to_char(dt_game,'DD/MM/YYYY') to improve lookup performance.

Related

Get the last inserted row ID in trafodion

I want to get the row ID or record ID for last inserted record in the table in Trafodion.
Example:
1 | John <br/>
2 | Michael
When executing an INSERT statement, I want to return the created ID, means 3.
Could anyone tell me how to do that using trafodion or is it not possible ?

Are you using a sequence generator to generate unique ids for this table? Something like this:
create table idcol (a largeint generated always as identity not null,
b int,
primary key(a desc));
Either way, with or without sequence generator, you could get the highest key with this statement:
select max(a) from idcol;
The problem is that this statement could be very inefficient. Trafodion has a built-in optimization to read the min of a key column, but it doesn't use the same optimization for the max value, because HBase didn't have a reverse scan until recently. We should make use of the reverse scan, please feel free to file a JIRA. To make this more efficient with the current code, I added a DESC to the primary key declaration. With a descending key, getting the max key will be very fast:
explain select max(a) from idcol;
However, having the data grow from higher to lower values might cause issues in HBase, I'm not sure whether this is a problem or not.
Here is yet another solution: Use the Trafodion feature that allows you to select the inserted data, showing you the inserted values right away:
select * from (insert into idcol(b) values (11),(12),(13)) t(a,b);
A B
-------------------- -----------
1 11
2 12
3 13
--- 3 row(s) selected.

Is it OK to create several indexes on a table if they're really needed

I have a table with 7 columns.
It's going to contain lots and lots of data - something like more than 1.7 million records will be added every month.
Of those 7 columns 5 are the ones that I'll be using in the WHERE clause of my queries against this table in different combinations.
Is it OK to create different indexes for those possible combinations ?
I'm asking this question because if I do that, there'll be more than 10 indexes on this table and I'm not sure if this is a good idea.
On the other hand, I'm afraid of querying a table with this big amount of data without indexes.
Here's the table:
CREATE TABLE AG_PAYMENTS_TO_BE
(
PAYMENTID NUMBER(15, 0) NOT NULL
, DEPARTID NUMBER(3,0)
, PENSIONERID NUMBER(11, 0) NOT NULL
, AMOUNT NUMBER(6, 2)
, PERIOD CHAR(6 CHAR)
, PAYMENTTYPE NUMBER(1,0)
, ST NUMBER(1, 0) DEFAULT 0
, CONSTRAINT AG_PAYMENTS_TO_BE_PK PRIMARY KEY
(
PAYMENTID
)
ENABLE
);
Possible queries:
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND DEPARTID=112 AND PERIOD='201207';
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456 AND PERIOD='201207';
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456 AND PERIOD='201207' AND PAYMENTTYPE=1;
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND DEPARTID=112 AND ST=0;
SELECT AMOUNT FROM AG_PAYMENTS_TO_BE WHERE ST=0 AND PENSIONERID=123456;
and so on.

Ignoring index skip scans* for the moment, in order for a query to use an index:
The leading index columns must be listed in the query
They must compared using exact joins (i.e. using =, not <,> or like)
For example, a table with a composite index on (a, b) could use the index in the following queries:
a = :b1 and b >= :b2
a = :b1
but not:
b = :b2
because column b is listed second in the index. * In some cases, it's possible for the index to be used in this case via an index skip scan. This is where the leading column in the index is skipped. There needs to be relatively few distinct values for the first column however, which doesn't happen often (in my experience).
Note that a "larger" index can be used by queries which only use some of the leading columns from it. So in the example above, an index on just a is redundant because the queries shown can use the index on a, b. An index on just b may be useful however.
The more indexes you add, the slower your inserts/updates/deletes will be, because the indexes have to be maintained at the same time as the table. Therefore you should aim to keep the number of indexes down, unless there's significant query benefits to adding a new one. This is something you'll have to measure in your environment to determine the exact cost/benefit.
Note that having multiple indexes with similar columns can lead to the wrong index being selected. So there is potential downside for selects when you have many similar indexes. There is also a slight overhead in parse times, as Oracle has more options to consider when selecting the execution plan.
Looking at your queries I believe you only need indexes on:
st, departid, period
st, pensionerid, period
You may wish to add amount at the end of these as well, so your queries can be fully answered from the index, saving you a table lookup. You may also need further indexes if these columns are foreign keys to other tables, to prevent locking issues.

This decision would greatly depend on expected number of distinct values in each column, and thus selectivity of each possible index.
Things I would consider while making decisions:
Obviously, PAYMENTTYPE & ST fields hold up to 10 19 distinct values each, which is pretty unselective if we keep in mind your expected volume of data (~400M rows), so they won't help you much.
However, they probably could become good candidates for list partitioning instead.
I would also think of switching PERIOD CHAR(6 CHAR) to DATE and making a composite range-list partition on period+st/paymenttype.
DEPARTID - If you have hundreds of departments, then it's probably an indexing candidate, but if only dozens - then probably a full scan would perform way faster.
PENSIONERID seems to be a high-selectivity field, so I would consider creating a separate index on it, and including it in a composite index on PERIOD+PENSIONERID (in that field order).

I think you should create a few combined indexes (like ('ST' and 'PERIOD') and ('ST' and 'PENSIONERID'). That will speed up most of your sample queries...

ORACLE db performance tuning

We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.

First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.

Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.

Real Time issues: Oracle Performance tuning (types / indexes / plsql / queries)

I am looking for a real time solution...
Below are my DB columns. I am using Oracle10g. Please help me in defining table types / indexes and tuned PLSQL / query (both) for the updates and insertion
Insert and Update queries are simple but here we need to take care of the performance because my system will execute such 200 times per second.
Let me know... should I use procedures or simple queries? It is requested to write tuned plsql and query with proper DB table types / indexes.
I would really like to see the performance of my system after continuous 200 updates per second
DB table (columns) (I can change the structure if required so please let me know...)
Play ID - ID
Type - Song or Message
Count - Summation of total play
Retries - Summation of total play, if failed.
Duration - Total Duration
Last Updated - Late Updated Date Time
Thanks in advance ... let me know in case of any confusion...

You've not really given a lot of detail about WHAT you are updating etc.
As a basis for you to write your update statements, don't use PL/SQL unless you cannot achieve what you want to do in SQL as the context switching alone will hurt your performance before you even get round to processing any records.
If you are able to create indexes specifically for the update then index the columns that will appear in your update statement's WHERE clause so the records can be found quickly before being updated.
As for inserting, look up the benefits of the /*+ append */ hint for inserting records to see if it will benefit your particular case.
Finally, the table structure you will use will depend on may factors that you haven't even begun to touch on with the details you've supplied, I suggest you either do some research on DB structure or ask your DBA's for a 101 class in it.
Best of luck...
EDIT:
In response to:
Play ID - ID ( here id would be song name like abc.wav something..so may be VARCHAR2, yet not decided..whats your openion...is that fine if primary key is of type VARCHAR2....any suggesstions are most welcome...... ) Type - Song or Message ( varchar2) Count - Summation of total play ( Integer) Retries - Summation of total play, if failed. ( Integer) Duration - Total Duration ( Integer) Last Updated - Late Updated Date Time ( DateTime )
There is nothing wrong with having a PRIMARY KEY as a VARCHAR2 data type (though there is often debate about the value of having a non-specific PK, i.e. a sequence). You must, however, ensure your PK is unique, if you can't guarentee this then it would be worth having a sequence as your PK over having to introduce another columnn to maintain uniqueness.
As for declaring your table columns as INTEGER, they eventually will be resolved to NUMBER anyway so I'd just create the table column as a number (unless you have a very specific reason for creating them as INTEGER).
Finally, the DATETIME column, you only need decare it as a DATE datatype unless you need real precision in your time portion, in which case declare it as a TIMESTAMP datatype.
As for helping you with the structure of the table itself (i.e. which columns you want etc.) then that is not something I can help you with as I know nothing of your reporting requirements, application requirements or audit requirements, company best practice, naming conventions etc. I'm afraid that is something for you to decide for yourself.
For performance though, keep indexes to a minumum (i.e. only index columns that will aid your UPDATE WHERE clause search), only update the minimum data possible and, as suggested before, research the APPEND hint for inserts it may help in your case but you will have to test it for yourself.

select only new row in oracle

I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?

Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.

Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.

Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.

In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark

"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.

If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.

Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.

I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.

What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE

I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio