Delete unused records from oracle table - oracle

DB version : 11.2.0.4
OS Solaris 5.10
REQUIREMENT : Delete unused records from table only after finding if the record is being accessed or not.
We have a table employee, which has 100,000 records, if anyone select a particular record/records from employee table then, the respective record's STATUS column should get updated to 'ACTIVE' value.
This is required for auditing purpose, 60 days later we will delete all the records from employee table whose STATUS column value is is NULL. how can this be achieved?
My understanding so far, correct me if I am wrong:
1) Trigger can't be used as there is no SELECT event, we only have UPDATE, INSERT, DELETE event.
2) Oracle FGA (Fine grain auditing) may not solve the purpose. or may be I am not aware, is it doable with FGA?
Table:
CREATE TABLE EMPLOYEE
(
EMPID NUMBER,
NAME VARCHAR2(20 BYTE),
SALARY NUMBER,
DEPART VARCHAR2(100 BYTE),
STATUS VARCHAR2(100 BYTE)
)
sample records:
EMPID NAME SALARY DEPART STATUS
---------- --------------- ---------- -------------------- ----------
101 ALFA 1000 IT
102 BETA 2000 CLERK
103 PETER 3000 FINANCE
104 JOHN 4000 IT
105 MESSI 5000 TECH
106 ROMEO 5000 TECH
107 TERI 5000 TECH
108 ROBERT 5000 TECH
Example:
If any one issue below statements
query 1: SELECT * from EMPLOYEE where name='MESSI';
the auditing should update the STATUS='ACTIVE' of empid=105
EMPID NAME SALARY DEPART STATUS
---------- --------------- ---------- -------------------- ----------
101 ALFA 1000 IT
102 BETA 2000 CLERK
103 PETER 3000 FINANCE
104 JOHN 4000 IT
105 MESSI 5000 TECH 'ACTIVE'
106 ROMEO 5000 TECH
107 TERI 5000 TECH
108 ROBERT 5000 TECH
query 2: SELECT * from EMPLOYEE where DEPART='TECH';
The auditing should update the STATUS='ACTIVE' for empid=105,106,107,108
EMPID NAME SALARY DEPART STATUS
---------- --------------- ---------- -------------------- ----------
101 ALFA 1000 IT
102 BETA 2000 CLERK
103 PETER 3000 FINANCE
104 JOHN 4000 IT
105 MESSI 5000 TECH 'ACTIVE'
106 ROMEO 5000 TECH 'ACTIVE'
107 TERI 5000 TECH 'ACTIVE'
108 ROBERT 5000 TECH 'ACTIVE'

The solution suggested by Oracle to achieve what to want is Known as ILM or Information Lifecycle Management.
It allows you to archive, delete, move or do specific actions depending on some criterias (Such as last access).
But be aware, it requires an additional license.
I have a suggestion :
Enable SELECT audit on the target table
Create an AFTER INSERT TRIGGER that will get the inserted line in DBA_AUDIT_TRAIL et extract SQL_TEXT if the OBJECT_NAME was your table (+others checks)
Inside this trigger, do some string operations to get the WHERE CONDITION & use it for update.

Sounds like an ambiguous requirement to me. What happens if one does select * or select count(*)? That under the specification would make all employee records used.
The view v$SQL would contain "all" SQL statements issued, you could always look in there (going through v$plan_table with the object). Then use those to reverse engineer if the record was touched.

You may try FGA ( fine-grained Audit ) using the DBMS_FGA Package. Try Running the code block below in your schema. If it executes successfully , check if audits are lodged periodically in the DBA_COMMON_AUDIT_TRAIL or DBA_AUDIT_STATEMENT table ( check documentation ) while running a select query. Schedule a program that runs periodically to check the audit table and update the status column accordingly. I have never tried anything like this, However!.
BEGIN
dbms_fga.Add_policy(object_schema => 'HR', object_name => 'EMPLOYEE',
policy_name => 'STATUS_UPDATE', audit_condition => 'SELECT',
audit_column => 'NAME ,DEPART', handler_schema => NULL,
handler_module => NULL,
ENABLE => TRUE);
END;
/

Related

How to make a PL/SQL program to update Student Fees when status is active?

I am a newbie in PL/SQL and i have a table called STUDENT, and it contains the following columns: REGNO, NAME, FNAME, DOMICILE, FEES, STATUS.
what i want to do is when a new record is created and if the student domicile for example DOMICILE = 'TEXAS' and STATUS = 'ACTIVE' then i want to give 50% Discount in FEES.
Here is my code:
CREATE OR REPLACE TRIGGER MYTRIGGER
BEFORE INSERT ON STUDENT
FOR EACH ROW
BEGIN
IF :NEW.DOMICILE = 'TEXAS' AND :NEW.STATUS = 'ACTIVE' THEN
UPDATE STUDENT SET FEES = FEES - 0.50 * FEES;
END IF;
END MYTRIGGER;
/
the trigger gets created but it does not work properly..
example:
SQL> INSERT INTO STUDENT VALUES(1,'MARK','SMITH','TEXAS',5000,'ACTIVE');
1 row created.
SQL> SELECT * FROM STUDENT;
REGNO NAME FNAME DOMICILE FEES STATUS
---------- ------------------------------ ------------------------------ ------------------------------ ---------- --------------------
1 MARK SMITH TEXAS 5000 ACTIVE
SQL> INSERT INTO STUDENT VALUES(2,'JAMES','FORD','TEXAS',5000,'ACTIVE');
1 row created.
SQL> INSERT INTO STUDENT VALUES(3,'SAM','MILLER','NEW JERSEY',5000,'ACTIVE');
1 row created.
SQL> SELECT * FROM STUDENT;
REGNO NAME FNAME DOMICILE FEES STATUS
---------- ------------------------------ ------------------------------ ------------------------------ ---------- --------------------
1 MARK SMITH TEXAS 2500 ACTIVE
2 JAMES FORD TEXAS 5000 ACTIVE
3 SAM MILLER NEW JERSEY 5000 ACTIVE
SQL>
any suggestions?
You don't need update statement there, all you need is just to set new value to :NEW.FEES:
CREATE OR REPLACE TRIGGER MYTRIGGER
BEFORE INSERT ON STUDENT
FOR EACH ROW
BEGIN
IF :NEW.DOMICILE = 'TEXAS' AND :NEW.STATUS = 'ACTIVE' THEN
:NEW.FEES := 0.50 * :NEW.FEES;
END IF;
END MYTRIGGER;
/
Your trigger is working just fine as it is written. But, obviously not as to what you want nor expected. What you want is lower by 1/2 the fees of the incoming row. However it is your expectation that is wrong. What your update statement actually does is lowering the fees by 1/2 for ever existing row in the table. Since the incoming row does not exist yet it does NOT participate in the update. That is why it appears to work on some but not all inserts.(Actually I was surprised as I expected an "ORA-04091: table is mutating..." exception). The reason for this is that your Update statement does not have a WHERE clause so it updates every row. See here a fiddle that shows what is happening by displaying the table after each insert to see just what that statement actually did. Often a useful technique when results are not as expected. #SayanMalakshinov is correct: Do not update instead just do an assignment.
:new.fees := :new.fees * .5;

simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas

I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.
Below code wont work as ranuni is not supported by oracle
proc sql outobs =1000000;
select * from connection to oracle(
select * from tbl1 order by ranuni(12345);
quit;
How do you propose I do it
Use the DBMS_RANDOM Package to Sort Records and Then Use A Row Limiting Clause to Restrict to the Desired Sample Size
The dbms_random.value function obtains a random number between 0 and 1 for all rows in the table and we sort in ascending order of the random value.
Here is how to produce the sample set you identified:
SELECT
*
FROM
(
SELECT
*
FROM
tbl1
ORDER BY dbms_random.value
)
FETCH FIRST 1000000 ROWS ONLY;
To demonstrate with the sample schema table, emp, we sample 4 records:
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7698 0.06857749035643605682648168347885993709
7934 0.07529612360785920635181751566833986766
7902 0.13618520865865754766175030040204331697
7654 0.14056380246495282237607922497308953768
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7839 0.00430658806761508024693197916281775492
7499 0.02188116061148367312927392115186317884
7782 0.10606515700372416131060633064729870016
7788 0.27865276349549877512032787966777990909
With the example above, notice that the empno changes significantly during the execution of the SQL*Plus command.
The performance might be an issue with the row counts you are describing.
EDIT:
With table sizes in the order of 150 gigs - 79 MM, any sorting would be painful.
If the table had a surrogate key based on a sequence incremented by 1, we could take the approach of selecting every nth record based on the key.
e.g.
--scenario n = 3000
FROM
tbl1
WHERE
mod(table_id, 3000) = 0;
This approach would not use an index (unless a function based index is created), but at least we are not performing a sort on a data set of this size.
I performed an explain plan with a table that has close to 80 million records and it does perform a full table scan (the condition forces this without a function based index) but this looks tenable.
None of the answers posted or comments helped my cause, it could but we have 87 MM rows
Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!
libname dwh path username pwd;
proc sql;
create table sample as
(select
<all the variables>, ranuni(any arbitrary seed)
from dwh.<all the tables>
<bunch of where conditions goes here>);
quit);

update rows from multiple tables

I have two tables affiliation and customer, in that i have data like this
aff_id From_cus_id
------ -----------
1 10
2 20
3 30
4 40
5 50
cust_id cust_aff_id
------- -------
10
20
30
40
50
i need to update data for cust_aff_id column from affiliation table which is aff_id like below
cust_id cust_aff_id
------- -------
10 1
20 2
30 3
40 4
50 5
could u please give reply if anyone knows......
Oracle doesn't have an UPDATE with join syntax, but you can use a subquery instead:
UPDATE customer
SET customer.cust_aff_id =
(SELECT aff_id FROM affiliation WHERE From_cus_id = customer.cust_id)
merge into customer t2
using affiliation t1 on (t1.From_cus_id =t2.cust_id )
WHEN MATCHED THEN
update set t2.cust_aff_id = t1.aff_id
;
Here is an update with join syntax. This, quite reasonably, works only if from_cus_id is primary key in the first table and cust_id is foreign key in the second table, referencing the first table. Without these conditions, the requirement doesn't make much sense in the first place anyway... but Oracle requires that these constraints be stated explicitly in the tables. This is also reasonable on Oracle's part IMO.
update
( select t1.aff_id, t2.cust_aff_id
from affiliation t1 join customer t2 on t2.cust_id = t1.from_cus_id) j
set j.cust_aff_id = j.aff_id;

Oracle - Insert x amount of rows with random data

I am currently doing some testing and am in the need for a large amount of data (around 1 million rows)
I am using the following table:
CREATE TABLE OrderTable(
OrderID INTEGER NOT NULL,
StaffID INTEGER,
TotalOrderValue DECIMAL (8,2)
CustomerID INTEGER);
ALTER TABLE OrderTable ADD CONSTRAINT OrderID_PK PRIMARY KEY (OrderID)
CREATE SEQUENCE seq_OrderTable
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10000;
and want to randomly insert 1000000 rows into it with the following rules:
OrderID needs to be be sequential (1, 2, 3 etc...)
StaffID needs to be a random number between 1 and 1000
CustomerID needs to be a random number between 1 and 10000
TotalOrderValue needs to be a random decimal value between 0.00 and 9999.99
Is this even possible to do? I can I could generate all of these using this update statement? however generating a million rows in 1 go I am not sure on how to do this
Thanks for any help on this matter
This is how i would randomly generate the number on update:
UPDATE StaffTable SET DepartmentID = DBMS_RANDOM.value(low => 1, high => 5);
For testing purposes I created the table and populated it in one shot, with this query:
CREATE TABLE OrderTable(OrderID, StaffID, CustomerID, TotalOrderValue)
as (select level, ceil(dbms_random.value(0, 1000)),
ceil(dbms_random.value(0,10000)),
round(dbms_random.value(0,10000),2)
from dual
connect by level <= 1000000)
/
A few notes - it is better to use NUMBER as data type, NUMBER(8,2) is the format for decimal. It is much more efficient for populating this kind of table to use the "hierarchical query without PRIOR" trick (the "connect by level <= ..." trick) to get the order ID's.
If your table is created already, insert into OrderTable (select level...) (same subquery as in my code) should work just as well. You may be better off adding the PK constraint only after you create the data though, so as not to slow things down.
A small sample from the table created (total time to create the table on my cheap laptop - 1,000,000 rows - was 7.6 seconds):
SQL> select * from OrderTable where orderid between 500020 and 500030;
ORDERID STAFFID CUSTOMERID TOTALORDERVALUE
---------- ---------- ---------- ---------------
500020 666 879 6068.63
500021 189 6444 1323.82
500022 533 2609 1847.21
500023 409 895 207.88
500024 80 2125 1314.13
500025 247 3772 5081.62
500026 922 9523 1160.38
500027 818 5197 5009.02
500028 393 6870 5067.81
500029 358 4063 858.44
500030 316 8134 3479.47

Oracle trouble in getting data from a partitioned table

On a new job I have to figure out how some database reporting scripts are working.
There is one table that is giving me some trouble. I see in existing scripts that it is a partitioned table.
My problem is that whatever query I run on this table returns me "no rows selected".
Here are some details about my investigation in this table:
Table size estimate
SQL> select sum(bytes)/1024/1024 Megabytes from dba_segments where segment_name = 'PPREC';
MEGABYTES
----------
45.625
Partitions
There are a total of 730 partitions on date range.
SQL> select min(PARTITION_NAME),max(PARTITION_NAME) from dba_segments where segment_name = 'PPREC';
MIN(PARTITION_NAME) MAX(PARTITION_NAME)
------------------------------ ------------------------------
PART20110201 PART20130130
There are several tablespaces and partitions are allocated in them
SQL> select tablespace_name, count(partition_name) from dba_segments where segment_name = 'PPREC' group by tablespace_name;
TABLESPACE_NAME COUNT(PARTITION_NAME)
------------------------------ ---------------------
REC_DATA_01 281
REC_DATA_02 48
REC_DATA_03 70
REC_DATA_04 26
REC_DATA_05 44
REC_DATA_06 51
REC_DATA_07 13
REC_DATA_08 48
REC_DATA_09 32
REC_DATA_10 52
REC_DATA_11 35
REC_DATA_12 30
Additional query:
SQL> select * from dba_segments where segment_name='PPREC' and partition_name='PART20120912';
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS EXTENTS
----- ------------ -------------- --------------- --------------- ----------- ------------ ----- ------ -------
HIST PPREC PART20120912 TABLE PARTITION REC_DATA_01 13 475315 65536 8 1
INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS PCT_INCREASE FREELISTS FREELIST_GROUPS RELATIVE_FNO BUFFER_POOL
-------------- ----------- ----------- ----------- ------------ --------- --------------- ------------ -----------
65536 1 2147483645 13 DEFAULT
Tabespace usage
Here is a space summary (composite of dba_tablespaces, dba_data_files, dba_segments, dba_free_space)
TABLESPACE_NAME TOTAL_MEGABYTES USED_MEGABYTES FREE_MEGABYTES
------------------------------ --------------- -------------- --------------
REC_01_INDX 30,700 250 30,449
REC_02_INDX 7,745 7 7,737
REC_03_INDX 22,692 15 22,677
REC_04_INDX 15,768 10 15,758
REC_05_INDX 25,884 16 25,868
REC_06_INDX 27,992 16 27,975
REC_07_INDX 17,600 10 17,590
REC_08_INDX 18,864 11 18,853
REC_09_INDX 19,700 12 19,687
REC_10_INDX 28,716 16 28,699
REC_DATA_01 102,718 561 102,156
REC_DATA_02 24,544 3,140 21,403
REC_DATA_03 72,710 4 72,704
REC_DATA_04 29,191 2 29,188
REC_DATA_05 42,696 3 42,692
REC_DATA_06 52,780 323 52,456
REC_DATA_07 16,536 1 16,534
REC_DATA_08 49,247 3 49,243
REC_DATA_09 30,848 2 30,845
REC_DATA_10 49,620 3 49,616
REC_DATA_11 40,616 2 40,613
REC_DATA_12 184,922 123,435 61,486
The tablespace usage seems to confirm that this table is not empty, in fact its last tablespace (REC_DATA_12) seems pretty busy.
Existing scripts
What I find puzzling is that there are some PL/SQL stored procedures that seem to work on that table and get data out of it.
An example of such a stored procedure is as follows:
procedure FIRST_REC as
vpartition varchar2(12);
begin
select 'PART'||To_char(sysdate,'YYYYMMDD') INTO vpartition FROM DUAL;
execute immediate
'MERGE INTO FIRST_REC_temp a
USING (SELECT bno, min(trdate) mintr,max(trdate) maxtr
FROM PPREC PARTITION ('||vpartition||') WHERE route_id IS NOT NULL AND trunc(trdate) <= trunc(sysdate-1)
GROUP BY bno) b
ON (a.bno=b.bno)
when matched then
update set a.last_tr = b.maxtr
when not matched then
insert (a.bno,a.last_tr,a.first_tr)
values (b.bno,b.maxtr,b.mintr)';
commit;
However if I try using the same syntax manually on the table, here is what I get:
SQL> select count(*) from PPREC PARTITION (PART20120912);
COUNT(*)
----------
0
I have tried a few random partitions and I always get the same 0 count.
Summary
- I see a table that seems to contain data (space used, tablespaces, data files)
- The table is partitioned (one partition per day over a period of 730 days ending end of January 2013)
- Scripts are extracting data from that table somehow
Question
- My queries using PARTITION are all returning me "no rows selected". What am I doing wrong? How could I find out how to extract data from this table?
I suppose it's possible that some other process might be deleting the data, but without visiting your site there's no way for anyone here to tell if that might be so.
I don't see in your post that you mentioned the name of the partitioning DATE column, but based on the SQL you posted I'll assume it's TRDATE - if this is not correct, change TRDATE in the statement below to be the partitioning column.
That said, give this a try:
SELECT COUNT(*)
FROM PPREC
WHERE TRDATE >= TO_DATE('01-SEP-2012 00:00:00', 'DD-MON-YYYY HH24:MI:SS')
This assumes you should have data in this table from September. If you find data, great. If you don't - well, Back In The Day (when men were men, women were women, and computers were water-cooled :-) we had a little saying about memory on IBM mainframes:
1. If you can see it, and it's there, it's Real.
2. If you can't see it, but it's there, it's Protected.
3. If you can see it, but it's not there, it's Virtual.
4. If you can't see it, and it's not there, it's GONE!
:-)
Use of the PARTITION clause should be reserved for situations where you are experiencing a performance problem (note: guessing about what is or is not going to be a performance problem is not allowed. Until you've got a performance problem you don't have a performance problem. Over the years I've found that software spends a lot of execution time in the darndest places :-), and the usual fixes (adding indexes, deleting unnecessary data, human sacrifice, etc) haven't worked. Basically, write your queries normally and trust the database to get it right. (In the general case - always write the simplest code - and do the simplest thing - that could possibly work. 99+ percent of the time it will be fine. That allows you to spend your optimization time on the less-than-one-percent cases where simple isn't good enough - and most of the software you write or design will be simple and easy to understand).
Share and enjoy.

Resources