Declarative integrity constraint between rows without pivot - oracle

I have a situation like the following join table:
A_ID B_ID
1 27
1 314
1 5
I need to put a constraint on the table that will prevent a duplicate group from being entered. In other words:
A_ID B_ID
2 27
2 314
2 5
should fail, but
A_ID B_ID
3 27
3 314
should succeed, because it's a distinct group.
The 2 ways I've thought of are:
Pivot the table in a materialize view based upon the order and put a unique key on the pivot fields. I don't like this because in Oracle I have to limit the number of rows in a group because of both the pivoting rules, and the 32-column index limitation (thought of a way around this second problem, but still).
Create some unique hash value on the combination of the B_IDs and make that unique. Maybe I'm not enough of a mathematician, but I can't think of a way to do this that doesn't limit the number of values that I can use for B_ID.
I feel like there's something obvious I'm missing here, like I could just add some sort of an ordering column and set a different unique key, but I've done quite a bit of reading and haven't come up with anything. It might also be that the data model I inherited is flawed, but I can't think of anything that would give me similar flexibility.

Firstly a regular constraint can't work.
If the set with A_ID of 1 exists, and then session 1 inserts a record with A_ID 2 and B_ID of 27, session 2 inserts (2,314) and session 3 inserts (2,5), then none of those would see a conflict to cause a constraint violation. Triggers won't work either. Equally, if a set existed of (6,99), then it would be difficult for another session to create a new set of (6,99,300).
The MV with 'refresh on commit' could work, preventing the last session from successfully committing. I'd look more at the hashing option, summing up the hashed B_ID's for each A_ID
select table_name, sum(ora_hash(column_id)), count(*)
from user_tab_columns
group by table_name
While hash collisions are possible, they are very unlikely.
If you are on 11g check out LISTAGG too.
select table_name, listagg(column_id||':') within group (order by column_id)
from user_tab_columns
group by table_name

Related

Can not improve bulk delete

I am using Java with mybatis.
I have a query like this and I need to execute this for 2000 values on key_b. That means I need to run the sql for 2000 times. Which is reasonably slow.
DELETE FROM my_table
WHERE key_a = xxx
AND key_b = yyy
Now I came up with another solution, this time I am sending 1000 values in IN clause for key_b. Which means only two query I am executing. I was expecting this one to be faster at least. But this seems to be even slower than the above one. Here is the sql.
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN (y1, y2, ... y1000)
For more information, the key_b is the Primary Key. And the key_a is a Foreign key and has an Index.
Another thing, I've tried to take out the session and make a commit after all the sqls are executed. But It didn't improve that much.
you can use temp table for this:
I mean if you have a table which has id column.
And then You can insert your values to that table like this:
insert into temp_table
select 1 from dual -- your ids
union all
select 2 from dual
union all
select 3 from dual
union all
......
after you fill your temp_table you can run just this:
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN
(
select id from temp_table
);
I recommend sticking with 1st approach: called prepared Delete statement in a Java loop over id collection. Off course with ExecutorType REUSE or BATCH, so that statement is prepared once and run for every record.
Furthermore, I discourage trying to bind thousands of parameters.
Anyway, I fear this is the best you can do since Delete operation will check integrity constraints, probably update index, for every record. That is not "bulked".

Get the last inserted row ID in trafodion

I want to get the row ID or record ID for last inserted record in the table in Trafodion.
Example:
1 | John <br/>
2 | Michael
When executing an INSERT statement, I want to return the created ID, means 3.
Could anyone tell me how to do that using trafodion or is it not possible ?
Are you using a sequence generator to generate unique ids for this table? Something like this:
create table idcol (a largeint generated always as identity not null,
b int,
primary key(a desc));
Either way, with or without sequence generator, you could get the highest key with this statement:
select max(a) from idcol;
The problem is that this statement could be very inefficient. Trafodion has a built-in optimization to read the min of a key column, but it doesn't use the same optimization for the max value, because HBase didn't have a reverse scan until recently. We should make use of the reverse scan, please feel free to file a JIRA. To make this more efficient with the current code, I added a DESC to the primary key declaration. With a descending key, getting the max key will be very fast:
explain select max(a) from idcol;
However, having the data grow from higher to lower values might cause issues in HBase, I'm not sure whether this is a problem or not.
Here is yet another solution: Use the Trafodion feature that allows you to select the inserted data, showing you the inserted values right away:
select * from (insert into idcol(b) values (11),(12),(13)) t(a,b);
A B
-------------------- -----------
1 11
2 12
3 13
--- 3 row(s) selected.

how to make a trigger like primary key constraint?

i need to define a trigger which i want to apply on a column of table. The trigger should restrict the user to input duplicate and not null values. Or you can say, i need to know the logic of primary key.
Just because you seem intent on seeing this fail, and not to take anything away from APC's points, this appears to work at first glance as long as it's a before trigger:
create table t42 (id number);
create trigger trig42
before insert or update on t42
for each row
declare
c number;
begin
if :new.id is null then
raise_application_error(-20001, 'ID is null');
end if;
select count(*) into c from t42 where id = :new.id;
if c > 0 then
raise_application_error(-20002, 'ID is not unique');
end if;
end;
/
It compiles and if you insert data you get the behaviour you seem to want:
insert into t42 values (1);
1 rows inserted.
insert into t42 values (1);
Error starting at line 20 in command:
insert into t42 values (1)
Error report:
SQL Error: ORA-20002: ID is not unique
ORA-06512: at "STACKOVERFLOW.TRIG42", line 9
ORA-04088: error during execution of trigger 'STACKOVERFLOW.TRIG42'
insert into t42 values (null);
Error starting at line 22 in command:
insert into t42 values (null)
Error report:
SQL Error: ORA-20001: ID is null
ORA-06512: at "STACKOVERFLOW.TRIG42", line 5
ORA-04088: error during execution of trigger 'STACKOVERFLOW.TRIG42'
select * from t42;
ID
----------
1
Which seems to do what you want. But not if you have more than one session. I haven't committed in this session; in another session I can do:
insert into t42 values (1);
1 row created.
select * from t42;
ID
----------
1
1 row selected.
Hmm, that's strange. Well, maybe it's deferred... let's commit them both:
commit;
select * from t42;
ID
----------
1
1
2 rows selected.
Oops. Once session can't see another session's uncommitted data, so this will never work.
Also, the mutating table problem exhibits itself when we insert multiple rows in a single statement:
SQL> insert into t42 select level+1 from dual connect by level <= 5;
insert into t42 select level+1 from dual connect by level <= 5
*
ERROR at line 1:
ORA-04091: table STACKOVERFLOW.T42 is mutating, trigger/function may not see it
ORA-06512: at "STACKOVERFLOW.TRIG42", line 7
ORA-04088: error during execution of trigger 'STACKOVERFLOW.TRIG42'
SQL>
Double oops.
Even with an after trigger and a package to work around the mutating table issue, you'd still have this problem (I think), unless you lock the whole table for every insert or update. As APC said the constraint is implemented deep in the bowels of the database, not at this level.
is it not possible to define a trigger, which checks the value before
insertion that it should not be null and unique as well?
Not when you have more than one session, no. And even within one session, unless you have an index on the column the performance won't scale as the count(*) will get progressively slower. And if you do have an index, well, why not make it a unique index in the first place?
Finally, from the trigger design guidelines:
Do not create triggers that duplicate database features.
For example, do not create a trigger to reject invalid data if you can
do the same with constraints (see "How Triggers and Constraints
Differ").
" i want to learn, how primary key is made(it is a trigger of course)"
There is no "of course" about it. A constraint is not a trigger. It is an internal process which uses an index and a lot of low level activity to enforce relational constraints in a reliable and efficient manner.
If you want to learn the rules are quite straightforward: not null, uniqueness, serialization. So just try to implement a primary key in triggers. You'll find you can't (spoiler alert!) because of the "mutating table" problem. And if you don't understand what that means, well there's a good topic to read about.
there is a question "is it not possible to define a trigger, which
checks the value before insertion that it should not be null and
unique as well? "
The answer to that question is, No. Well, you could code a trigger-based implementation but like other "mutating table" workarounds it would require a package and AFTER statement triggers (so technically not before insertion).
But seriously, what would be the point? You won't learn anything about how primary keys actually work. And mutating tables almost always point to a poor data model, and that would certainly be the case here.
Primary key is not a trigger. It is a key, because it identifies the whole row, that's why it should be unique (and implicitly not null). It is "primary", because it is the candidate key that is most appropriate - by your decision - to be the main reference key for your table. You can add it as ALTER TABLE your_table_name ADD CONSTRAINT PK_your_table_name PRIMARY KEY (your_key_column).
If you do not want to add a primary key like that (which is a bad idea), but want to add a unique index to that table: CREATE UNIQUE INDEX UQ_IX_your_table_your_column ON your_table_name (unique_column_name).
The NOT NULL constraint should be put on the column.

where rownum=1 query taking time in Oracle

I am trying to execute a query like
select * from tableName where rownum=1
This query is basically to fetch the column names of the table.There are more than million records in the table.When I put the above condition its taking so much time to fetch the first row.Is there any alternate to get the first row.
This question has already been answered, I will just provide an explanation as to why sometimes a filter ROWNUM=1 or ROWNUM <= 1 may result in a long response time.
When encountering a ROWNUM filter (on a single table), the optimizer will produce a FULL SCAN with COUNT STOPKEY. This means that Oracle will start to read rows until it encounters the first N rows (here N=1). A full scan reads blocks from the first extent to the high water mark. Oracle has no way to determine which blocks contain rows and which don't beforehand, all blocks will therefore be read until N rows are found. If the first blocks are empty, it could result in many reads.
Consider the following:
SQL> /* rows will take a lot of space because of the CHAR column */
SQL> create table example (id number, fill char(2000));
Table created
SQL> insert into example
2 select rownum, 'x' from all_objects where rownum <= 100000;
100000 rows inserted
SQL> commit;
Commit complete
SQL> delete from example where id <= 99000;
99000 rows deleted
SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from example where rownum = 1;
Elapsed: 00:00:05.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=2015)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'EXAMPLE' (TABLE) (Cost=7 Card=1588 [..])
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
33211 consistent gets
25901 physical reads
0 redo size
2237 bytes sent via SQL*Net to client
278 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see the number of consistent gets is extremely high (for a single row). This situation could be encountered in some cases where for example, you insert rows with the /*+APPEND*/ hint (thus above high water mark), and you also delete the oldest rows periodically, resulting in a lot of empty space at the beginning of the segment.
Try this:
select * from tableName where rownum<=1
There are some weird ROWNUM bugs, sometimes changing the query very slightly will fix it. I've seen this happen before, but I can't reproduce it.
Here are some discussions of similar issues: http://jonathanlewis.wordpress.com/2008/03/09/cursor_sharing/ and http://forums.oracle.com/forums/thread.jspa?threadID=946740&tstart=1
Surely Oracle has meta-data tables that you can use to get column names, like the sysibm.syscolumns table in DB2?
And, after a quick web search, that appears to be the case: see ALL_TAB_COLUMNS.
I'd use those rather than go to the actual table, something like (untested):
SELECT COLUMN_NAME
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = "MYTABLE"
ORDER BY COLUMN_NAME;
If you are hell-bent on finding out why your query is slow, you should revert to the standard method: asking your DBMS to explain the execution plan of the query for you. For Oracle, see section 9 of this document.
There's a conversation over at Ask Tom - Oracle that seems to suggest the row numbers are created after the select phase, which may mean the query is retrieving all rows anyway. The explain will probably help establish that. If it contains FULL without COUNT STOPKEY, then that may explain the performance.
Beyond that, my knowledge of Oracle specifics diminishes and you will have to analyse the explain further.
Your query is doing a full table scan and then returning the first row.
Try
SELECT * FROM table WHERE primary_key = primary_key_value;
The first row, particularly as it pertains to ROWNUM, is arbitrarily decided by Oracle. It may not be the same from query to query, unless you provide an ORDER BY clause.
So, picking a primary key value to filter by is as good a method as any to get a single row.
I think you're slightly missing the concept of ROWNUM - according to Oracle docs: "ROWNUM is a pseudo-column that returns a row's position in a result set. ROWNUM is evaluated AFTER records are selected from the database and BEFORE the execution of ORDER BY clause."
So it returns ANY row that it consideres #1 in the result set which in your case will contain 1M rows.
You may want to check out a ROWID pseudo-column: http://psoug.org/reference/pseudocols.html
I've recently had the same problem you're describing: I want one row from the very large table as a quick, dirty, simple introspection, and "where rownum=1" alone behaves very poorly. Below is a remedy which worked for me.
Select the max() of the first term of some index, and then use it to choose some small fraction of all rows with "rownum=1". Suppose my table has some index on numerical "group-id", and compare this:
select * from my_table where rownum = 1;
-- Elapsed: 00:00:23.69
with this:
select * from my_table where rownum = 1
and group_id = (select max(group_id) from my_table);
-- Elapsed: 00:00:00.01

select only new row in oracle

I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;

Resources