I have a table with 150 million records
I want to update the sequence column from a sequence.
What is the fastest way to update?
I have used parallel, but it is taking hours and not ending
UPDATE /+ parallel (c, 50)/ rpm_future_retail_tmp c
SET future_retail_id = rpm_future_retail_seq.NEXTVAL;
what is the faster way?
Mostly creating a new table is faster than applying a DML, especially faster than an UPDATE statement. You can use as an alternative :
CREATE TABLE rpm_future_retail_tmp_ parallel 8 nologging AS
SELECT rpm_future_retail_seq.NEXTVAL AS future_retail_id,
<and the comma-separated columns other than future_retail_id>
FROM rpm_future_retail_tmp;
DROP TABLE rpm_future_retail_tmp;
ALTER TABLE rpm_future_retail_tmp_ RENAME TO rpm_future_retail_tmp;
where
the degree of parallelism might vary depending on your DBMS's source
the statements to reproduce later the privileges(grants) and indexes should
be saved to a place for the table(rpm_future_retail_tmp) before
dropping it
If sequence starts from 1, then you might also consider updating column with a ROWNUM value instead.
This is my 21cXE database, running on MS Windows 10, Intel i5, 8GB RAM. I have a table with ~1 million rows (don't feel like creating one 150 times larger):
SQL> select count(*) from rpm;
COUNT(*)
----------
1033616
Elapsed: 00:00:00.02
Updating it with a sequence takes ~12 seconds:
SQL> update rpm set id = seq.nextval;
1033616 rows updated.
Elapsed: 00:00:12.98
SQL> update rpm set id = seq.nextval;
1033616 rows updated.
Elapsed: 00:00:12.39
SQL> update rpm set id = seq.nextval;
1033616 rows updated.
Elapsed: 00:00:10.56
Let's try rownum; it takes less time (average of 3 runs is ~4 seconds):
SQL> update rpm set id = rownum;
1033616 rows updated.
Elapsed: 00:00:07.51
SQL> update rpm set id = rownum;
1033616 rows updated.
Elapsed: 00:00:02.89
SQL> update rpm set id = rownum;
1033616 rows updated.
Elapsed: 00:00:02.87
SQL>
I understand that your system is different from mine and timings depend on various things, but I guess it can't harm if you try another approach.
For future inserts into the ID column (via database trigger?), just (re)create the sequence:
SQL> select max(id) from rpm;
MAX(ID)
----------
1033616
SQL> drop sequence seq;
Sequence dropped.
SQL> create sequence seq start with 1033617;
Sequence created.
SQL>
Related
I have an EMP table with columns
emp_id(number(10)), ename varchar2(25) and DOB (date)
The count of records = 1billion.
The emp_id column is totally null and I have to fill it with unique values.
What are the 3 easy steps to complete the task?
Help me with Oracle PL/SQL code to finish this task.
Only 2 steps:
ALTER TABLE emp DROP COLUMN emp_id;
ALTER TABLE emp ADD (emp_id NUMBER GENERATED ALWAYS AS IDENTITY);
db<>fiddle here
Again, 2 steps:
CREATE SEQUENCE emp__emp_id__seq;
UPDATE emp
SET emp_id = emp__emp_id__seq.NEXTVAL;
db<>fiddle here
One step:
If you have overwritten the column data then either ROLLBACK the last transaction or restore the data from backups.
The emp_id column is totally null and I have to fill it with unique values.
If you want to do it one-time-only, then just one step would do:
update emp set emp_id = rownum;
and that column will have unique values. You don't need PL/SQL (but be patient as 1 billion rows is quite a lot, it'll take time).
If you want to automatically populate it in the future, then it depends on database version you use. Before 12c, you'll have to use a sequence and a database trigger. In later versions, you can still use the same (sequence + trigger) or - as MT0 showed - identity column.
I am trying to use the following statement for the Delete process and it has to delete around 23566424 Rows, but oracle takes almost 3 hours to complete the process and we have already created an index on " SCHEDULE_DATE_KEY" but still, the process is very slow.Can someone advise on how to make Deletes faster in oracle
DELETE
FROM
EDWSOURCE.SCHEDULE_DAY_F
WHERE
SCHEDULE_DATE_KEY >
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
I don't think any index will help here, probably Oracle will decide the best approach is a full table scan to delete 20M rows from 300M. It is deleting at a rate of over 2000 rows per second, which isn't bad. In fact any additional indexes will slow it down as it has to delete the row entry from the index as well.
A quicker approach could be to create a new table of the rows you want to keep, something like:
create table EDWSOURCE.SCHEDULE_DAY_F_KEEP
as
select * from EDWSOURCE.SCHEDULE_DAY_F
where SCHEDULE_DATE_KEY <=
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
Then recreate any constraints and indexes to use the new table.
Finally drop the old table and rename the new one.
You can try testing a filtered table move. This has an online clause. So you can do this while the application is still running.
Note 12.2 and later the indexes will remain valid. In earlier versions you will need to rebuild the indexes as they will become invalid. Good Luck
Move a Table
Create and populate a new test table.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
'Description for ' || level AS description
FROM dual
CONNECT BY level <= 100;
COMMIT;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
100 1 100
SQL>
Move the table, filtering out rows with an ID value greater than 50.
ALTER TABLE t1 MOVE ONLINE
INCLUDING ROWS WHERE id <= 50;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
50 1 50
SQL>
The rows with an ID value between 51 and 100 have been removed.
As mentioned above if maybe best to PARTITION the table abs drop a PARTITION every N number of days as part of a daily task.
I'm doing some experimentation with query plans in Oracle, and I have the following table:
--create a table to use
create table SKEWED_DATA(
EMP_ID int,
DEPT int,
COL2 int,
CONSTRAINT SKEWED_DATA_PK PRIMARY KEY (EMP_ID)
);
--add an index on dept
create index SKEWED_DATA_INDEX1 on SKEWED_DATA(DEPT);
I then insert 1 million rows of data where 999,999 rows have dept id 1, and 1 row has dept id 99.
Before calculating statistics on the table, Oracle Autotrace shows that when running the following queries, it is using an index scan for both:
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
It's my understanding that it would be more efficient in this case to use a full table scan for dept id 1, and an index scan for dept id 2.
I then run the following command to generate statistics for the table:
execute DBMS_STATS.GATHER_TABLE_STATS ('HARRY','SKEWED_DATA');
And querying the dba_tab_statistics and user_tab_col_statistics confirms that stats and histograms have been gathered.
Running an autotrace on the following queries now shows full table scan for both!
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
My question is: why is Oracle using a full table scan for dept id 99 when there is only 1 row with this value?
UPDATE
I tried running the query for dept 99 with a hint to force Oracle to use the index, and whilst Autotrace believes it to be less efficient, the time it takes is 0.001 seconds, compared to 0.03 seconds when using the full table scan, thus proving (I think?) my theory that Oracle should be using the index in this instance.
select /*+ INDEX(D SKEWED_DATA_INDEX1) */ AVG(COL2) from SKEWED_DATA D where DEPT = 99;
OK, I think I might have solved it. When I had 999,999 rows with dept 1 and 1 row with dept 99, I inspected the number of histogram buckets by running the following query:
select COLUMN_NAME, HISTOGRAM, NUM_BUCKETS, NUM_DISTINCT from USER_TAB_COL_STATISTICS where TABLE_NAME = 'SKEWED_DATA';
This showed that there are 2 distinct values but only 1 bucket. If I change the stats gathering to this:
execute DBMS_STATS.GATHER_TABLE_STATS('HARRY','SKEWED_DATA',estimate_percent=>100);
It then correctly comes up with 2 buckets, and the autotrace shows the 'correct' execution plans. So, I guess it's because of the extreme 'skewness' of my data that Oracle cannot generate the correct stats for it unless the estimate_percent is massive.
Interestingly if I have slightly less skewed data (say about 2-3% of all records with a dept id of 99) Oracle does treat it correctly even when I leave the estimate_percent as default.
So, the moral of the story seems to be: if you have ridiculously skewed data like this and Oracle is not using the correct execution plan, try playing around with the estimate_percent parameter.
I have 5 development schemas. And each of them have partitioned tables. We also have scripts to dynamically create partition tables (Monthly/Yearly). We have to go to DBA everytime for gathering the details over the parition tables. Our real problem is we do have a parition table with 9 partitions. Every day after a delta load operation (Updates/Deletes using a PL/SQL) also some APPEND load using SQL*Loader. This operation happens when database has the peak load. We do have some performace issues over this table.(SELECT queries)
When reported to DBA, they would say the table statistics are stale and after they do "gathering stats", magically the query works faster. I searched about this and identified some information about dynamic performance views.
So, now , I have the following Questions.
1) Can the developer generate a the list of all partitionon tables, partition name, no of records available without going to DBA?
2) Shall we identify the last analysed date of every parition
3) Also the status of the parition(index) if it usable or unusable.
Normally there is no need to identify objects that need statistics gathered. Oracle automatically gathers statistics for stale objects, unless the task has been
manually disabled. This is usually good enough for OLTP systems. Use this query to find the status of the task:
select status
from dba_autotask_client
where client_name = 'auto optimizer stats collection';
STATUS
------
ENABLED
For data warehouse systems there is also not much need to query the data dictionary for stale stats. In a data warehouse statistics need to be considered after almost
every operation. Developers need to get in the habit of always thinking about statistics after a truncate, insert, swap, etc. Eventually they will "just know" when to gather statistics.
But if you still want to see how Oracle determines if statistics are stale, look at DBA_TAB_STATISTICS and DBA_TAB_MODIFICATIONS.
Here is an example of an initial load with statistics gathering. The table and partitions are not stale.
create table test1(a number, b number) partition by list(a)
(
partition p1 values (1),
partition p2 values (2)
);
insert into test1 select 1, level from dual connect by level <= 50000;
begin
dbms_stats.gather_table_stats(user, 'test1');
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 NO
TEST1 50000 2014-01-22 NO
Now add a large number of rows and the statistics are stale.
begin
insert into test1 select 2, level from dual connect by level <= 25000;
commit;
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 YES
TEST1 50000 2014-01-22 YES
USER_TAB_MODIFICATIONS gives more specific information on table staleness.
--Stale statistics.
select user_tables.table_name, user_tab_modifications.partition_name
,inserts+updates+deletes modified_rows, num_rows, last_analyzed
,case when num_rows = 0 then null
else (inserts+updates+deletes) / num_rows * 100 end percent_modified
from user_tab_modifications
join user_tables
on user_tab_modifications.table_name = user_tables.table_name
where user_tables.table_name = 'TEST1';
TABLE_NAME PARTITION_NAME MODIFIED_ROWS NUM_ROWS LAST_ANALYZED PERCENT_MODIFIED
---------- -------------- ------------- -------- ------------- ----------------
TEST1 P2 25000 50000 2014-01-22 50
TEST1 25000 50000 2014-01-22 50
Yes, you can generate a list of partitioned tables, and a lot of related data which you would like to see, by using ALL_PART_TABLES or USER_PART_TABLES (provided you have access).
ALL_TAB_PARTITIONS can be used to get number of rows per partition, alongwith other details.
Check other views Oracle has for gathering details about partitioned tables.
I would suggest that you should analyze the tables, and possibly rebuild the indexes, every day after your data load. If your data load is affecting a lot of records in the table, and is going to affect the existing indexes, it's a good idea to proactively update the statistics for the table and index.
You can use on the system views to get this information (Check http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin005.htm)
I had a some what similar problem and I solved it by gathering stats on stale partitions only using 11g new INCREMENTAL option.
It's the reverse approach to your problem but it might worth investigating (specifically - how oracle determines what's a "stale" partition is).
dbms_stats.set_table_prefs('DWH','FACT_TABLE','INCREMENTAL','TRUE')
I always prefer the pro active approach - meaning, gather stats on stale partition at the last step of my etl, rather then giving the developer stronger privs.
I used to query all_ tables mentioned below.
The statistics and histogram details you mention will be updated in a frequency automatically by Oracle. But when the database is busy with many loads, I have seen these operations needs to be triggered manually. We faced similar situation, so we used to force the Analyze operation after our load for critical tables. You need to have privilege for the id you use to load the table.
ANALYZE TABLE table_name PARTITION (partition_name) COMPUTE STATISTICS;
EDIT: ANALYZE no longer gather CBO stats as mentioned here
So, DBMS_STATS package has to be used.
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT to_estimate_percent_type
(get_param('ESTIMATE_PERCENT')),
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'),
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT to_no_invalidate_type (
get_param('NO_INVALIDATE')),
force BOOLEAN DEFAULT FALSE);
And until the analyze is complete, the view tables below may not produce the accurate results (Especially the last_analyzed and num_rows columns)
Note: Try replace all_ as dba_ in table names, if you have access to it, you can try them.
You can also try to get SELECT_CATALOG_ROLE for your development id you use, so that you can SELECT the data dictionary views, and this reduces the dependency over DBA over such queries.(Still DBA are the right persons for few issues!!)
Query to identify the partition table, partition name, number of rows and last Analysed date!
select
all_part.owner as schema_name,
all_part.table_name,
NVL(all_tab.partition_name,'N/A'),
all_tab.num_rows,
all_tab.last_analyzed
from
all_part_tables all_part,
all_tab_partitions all_tab
where all_part.table_name = all_tab.table_name and
all_tab.partition_name = all_tab.partition_name and
all_part.owner=all_tab.table_owner and
all_part.owner in ('SCHEMA1','SCHEMA2','SCHEMA3')
order by all_part.table_name,all_tab.partition_name;
The Below Query returns the index/table name that are UNUSABLE
SELECT INDEX_NAME,
TABLE_NAME,
STATUS
FROM ALL_INDEXES
WHERE status NOT IN ('VALID','N/A');
The Below Query returns the index/table (PARTITION) name that are UNUSABLE
SELECT INDEX_NAME,
PARTITION_NAME,
STATUS ,
GLOBAL_STATS
FROM ALL_IND_PARTITIONS
WHERE status != 'USABLE';
I have a sequence named WCOMP_SEQ in oracle to generate auto increment column ON WCOMP table. When I insert a row to WCOMP table in SQLPlus, the row inserted and I can get the auto increment value using
SELECT WCOMP_SEQ.currval FROM dual
But when I ran insert a row using Database Class in CodeIgniter, the row inserted but when I ran the query above to get auto increment value I got Exception:
Exception: Undefined Index currval in E:...
How to fix this?
There is a way to get the value automatically assigned to a column: it is the RETURNING clause.
So, here is my sequence:
SQL> select emp_seq.currval from dual
2 /
CURRVAL
----------
8140
SQL>
I'm going to use it in an INSERT statement:
SQL> var seqval number
SQL> insert into emp
2 (empno, ename, deptno, sal, job)
3 values
4 (emp_seq.nextval, 'JELLEMA', 50, 4575, 'PAINTER')
5 returning empno into :seqval
6 /
1 row created.
SQL>
I returned the EMPNO into a SQL*Plus variable which I can print, and it has the same value as CURRVAL:
SQL> print :seqval
SEQVAL
----------
8141
SQL> select emp_seq.currval from dual
2 /
CURRVAL
----------
8141
SQL>
Your next question is, "does CodeIgniter support the RETURNING sysntax?" I have no idea, but I suspect it does not. Most non-Oracle frameworks don't.
There is always the option to wrap the INSERT statement in a stored procedure, but that's an architectural decision whoch many people dislike.
You can not fetch the SEQUENCE current value without issuing NEXTVAL (see here). So, if you do not want to increment the sequence value (by using NEXTVAL), you should instead query USER_SEQUENCES.
Something like this:
select Sequence_Name
, Last_Number
from user_sequences
where sequence_name = 'WCOMP_SEQ'
/
SEQUENCE_NAME LAST_NUMBER
------------- -----------
WCOMP_SEQ 20
Hope this helps.
In order to get currval on the sequence you will need to have at least one reference to the corresponding nextval for the sequence in the current user session. This is what causes it to set the currval value which would belong to the session.
If you are using it outside, it defeats the purpose which value could it return if there were other sessions active.