I have a table (granule) with about 4 million unique geometry objects that currently have SRID = 8307.
I am trying to create a SECOND table, with the same data, but using a cartesian coordinate system.
I created the table,
create table granule_cartesian (
granule varchar(64) not null,
SHAPE sdo_geometry NOT NULL );
and insert the proper geom_metadata
insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
values ( 'GRANULE_CARTESIAN', 'SHAPE',
mdsys.sdo_dim_array(
mdsys.sdo_dim_element('longitude', -180, 180, .5),
mdsys.sdo_dim_element('latitude', -90, 90, .5)),
null);
And now I want to copy the geometry contents of granule into granule_cartesian.
Obviously, the straight copy won't work because of SRID mismatch.
I can copy a few at a time by converting to wkt and back to geometry, stripping SRID:
insert into granule_cartesian
select granule,
SDO_GEOMETRY(SDO_UTIL.TO_WKTGEOMETRY(shape), null) as shape
from granule
where platform = 'ZZ'; -- granule has a few other columns...
This works if I select a subset of granule table that is less than ~ 10k (about +/-10 minutes). Any more than 10K and the runs for hours, some times ungracefully disconnecting me.
It seems like there should be a way to do this WITHOUT doing <10K chunks. Besides taking FOREVER to actually migrate, this would pose a serious logistical nightmare on our active and dynamic production DB. I've tried using SDO_CS.TRANSFORM like this:
SDO_CS.TRANSFORM(geom => shape, to_srid => null )
... But oracle will not accept a NULL SRID here:
12:57:49 [SELECT - 0 row(s), 0.000 secs] [Error Code: 1405, SQL State: 22002] ORA-01405: fetched column value is NULL
ORA-06512: at "MDSYS.SDO_CS", line 114
ORA-06512: at "MDSYS.SDO_CS", line 152
ORA-06512: at "MDSYS.SDO_CS", line 5588
ORA-06512: at "MDSYS.SDO_CS", line 3064
SDO_CS.TRANSFORM_LAYER will refuse to accept a NULL SRID.
After extensive searching, I cannot find any method to do a streamline geodetic -> cartesian (SRID=NULL) conversion. Does anyone have any ideas besides brute force small batching?
EDITS
1) For Clarity, I understand that I could probably break it up using PL/SQL and do 450 blocks of 10K rows. But # ~470 seconds per block, that is still 2.5 DAYS of execution. And that is a BEST case scenario. Changing projections/coordinate systems using update granule set shape.srid = 8307 is FAST and EASY. Changing coordinate system from cartesian to geodetic using insert into granule select SDO_CS.TRANSFORM(geom => shape, to_srid => 8307 ) .... is FAST and EASY. What I'm looking for is an equally as simple/fast solution to go from geodetic to cartesian.
2) Tried to insert 300K as a test. It ran for approximately 10 hours and died like this:
20:06:59 [INSERT - 0 row(s), 0.000 secs] [Error Code: 4030, SQL State: 61000] ORA-04030: out of process memory when trying to allocate 8080 bytes (joxcx callheap,f:CDUnscanned)
ORA-04030: out of process memory when trying to allocate 8080 bytes (joxcx callheap,f:CDUnscanned)
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,kgmtlbdl)
ORA-06512: at "MDSYS.SDO_UTIL", line 2484
ORA-06512: at "MDSYS.SDO_UTIL", line 2511
This is a beefy enterprise level server with nothing but oracle. We recently had a Oracle Consultant (From Oracle) analyze all our DB systems (including this one). It was given a clean bill of health.
Something is wrong with the database. I have geom tables with 64 million rows ( every mapped road in North America - yes, Canada , US, and Mexico ) in them and I routinely perform sdo_anyinteract / sdo_contains queries and get 200 square mile responses in less than 5 seconds.
To do this first of all, drop any and all indexes and turn off logging on the target table or tablespace. If you don't have the permissions ask your DBA but the command is:
alter table [table] nologging ; or alter tablespace [tablesspace] nologging ;
That should keep you for running out of redo space although if you are running out of redo space your DBA should fix that by adding redo segments.
Use a cursor because you have to add the SRID when taking in a WKT since the SRID has to be set on the SDO object.
declare
newGeom sdo_geometry ;
begin
for rec in ( select statement ) loop
newGeom := sdo_util.to_wktgeometry(rec.geom);
newGeom.sdo_srid := [srid that matches the target ] ;
insert into [table] (geom column, ... )values( newGeom, ... );
end loop;
commit ;
end ;
With 4 million rows this should happen in just a few minutes, if not your DB is seriously out of whack.
MAKE SURE YOUR WORK WITH YOUR DBA
When the process finishes then rebuild this domain indexes. That might take a couple of hours. Last time I did it on 64 million rows it took 3 days. You have to understand that R-Trees are essentially indexes within indexes and use minimum bounding rectangles to get the speed and they take a long time to build since each insert represents traversal from the root of the index.
You can use things like BULK COLLECT but that is to complex for this place. I suggest that if you don't already have one, get an Oracle account ( they are free ) and ask questions like this in the Oracle Forums under Database -> spatial
BrianB,
sorry, I just can't understand what are you trying to do with SDO_GEOMETRY(SDO_UTIL.TO_WKTGEOMETRY(shape), null) conversion. If I get it right, the resulting geometry will have the same geometry type, points, segments and ordinates as in source shape.
So, if this is true, you can use one of those:
create table granule_cartesian (
granule varchar(64) not null,
SHAPE sdo_geometry NOT NULL );
insert into granule_cartesian
select granule, shape
from granule
where platform = 'ZZ'; -- granule has a few other columns...
update granule_cartesian t
set t.shape.sdo_srid = null;
insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
values ( 'GRANULE_CARTESIAN', 'SHAPE',
mdsys.sdo_dim_array(
mdsys.sdo_dim_element('longitude', -180, 180, .5),
mdsys.sdo_dim_element('latitude', -90, 90, .5)),
null); -- add metadata after all rows are updated to null srid
or, if for some reason you hate to insert and then update, there is another way:
insert into granule_cartesian
select granule, mdsys.sdo_geometry (t.shape.SDO_GTYPE, null, t.shape.SDO_POINT, t.shape.SDO_ELEM_INFO, t.shape.SDO_ORDINATES)
from granule t
where platform = 'ZZ'; -- granule has a few other columns...
in that case you can have a row in user_sdo_geom_metadata table and even a spatial index before you insert rows into granule_cartesian.
hth. good luck.
Related
I have the following table
Create table my_source(
ID number(15) not null,
Col_1 Varchar 2(3000),
Col_2 Varchar 2(3000),
Col_3 Varchar 2(3000),
Col_4 Varchar 2(3000),
Col_5 Varchar 2(3000),
...
Col_90 Varchar 2(3000)
);
This table have 6,926,220 rows.
Now I am going to create two table based on this table.
Target1
Create table el_temp as
select
id,
Col_1,
Col_2,
Col_3,
Col_4,
Col_5,
Col_6,
Col_7,
Col_8,
Col_9,
Col_10,
Col_11,
Col_12
from
my_source;
Target2:
Create table el_temp2 as
select DISTINCT
id,
Col_1,
Col_2,
Col_3,
Col_4,
Col_5,
Col_6,
Col_7,
Col_8,
Col_9,
Col_10,
Col_11,
Col_12
from
my_source;
select count(*) from el_temp; -- 6926220
select count(*) from el_temp2; --6880832
The only difference between el_temp and el_temp2 is the "distinct" operator.
Now I got the following result from SQL Developer
It is a surprise result to me that EL_TEMP, the one with more rows have a smaller size, while the el_temp2 have less row but a bigger size.
Could anyone share me any reason and how to avoid this happen?
Thanks in advance!
The most likely cause is that the table has undergone some updates to existing rows over its lifetime.
By default, when you create a table, we reserve 10% of the space in each block for rows to grow (due to updates). As updates occur, that space is used up, so your blocks might be (on average) around 95% full.
When you do "create table as select" from that table to another, we will take those blocks and pad them out again to 10% free space, thus making it slightly larger.
If PCTFREE etc is unfamiliar to you, I've also got a tutorial video to get you started here
https://youtu.be/aOZMp5mncqA
I have a requirement to insert huge (50GB of random data) into my database, so that I can use a backup application to check the de-duplication ratio. I have written a small procedure like below
This is taking more than 1 hour. I don't know how to improve the performance so that I get good throughput for the insert statements. I have set SGA as 16GB.
I am newbie to Oracle. I do not know how to set parallelism to optimize my procedure to get good throughput. Please help.
alter session force parallel query parallel 4;
create table table_1(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));
create table table_2(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));
create table table_3(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));
create table table_4(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));
My insert script:
Declare
rows_inserted number := 0;
Begin
Loop
Begin
INSERT INTO table_1(COL1, ..COL50)
VALUES(dbms_random.string('L', 400),..for all 50 values);
INSERT INTO table_2(COL1, ..COL50)
VALUES(dbms_random.string('L', 400),..for all 50 values);
INSERT INTO table_3(COL1, ..COL50)
VALUES(dbms_random.string('L', 400),..for all 50 values);
INSERT INTO table_4(COL1, ..COL50)
VALUES(dbms_random.string('L', 400),..for all 50 values);
--Only increment counter when no duplicate exception
rows_inserted := rows_inserted + 1;
--Exception When DUP_VAL_ON_INDEX Then Null;
End;
exit when rows_inserted = 10000;
End loop;
commit;
End;
/
I have tried this procedure on Oracle12c, which is installed on rhel 7 VM. The Vm has 32 GB memory and 20GB swap memory and 16 vcpus.
It's taking more than 1 hour and its still running. How to implement parallelism and optimize above procedure to get a good throughput rate?
You're doing single row inserts inside a loop: that's a very slow way of doing of things. SQL is a set-based language and set operations are the most performative way of doing bulk-operations. Also, you're also relying on random data to provide duplicates. Be in control of it and guarantee the ratios. Besides, how can you get DUP_VAL_ON_INDEX when your tables have no unique keys? (And if they did, you wouldn't be able to insert the duplicates you want for your experiment.)
A better approach would be to use bulk sql:
INSERT INTO table_1(COL1, COL50)
select dbms_random.string('L', 400), dbms_random.string('L', 400)
from dual
connect by level <= 10000
/
INSERT INTO table_1(COL1, COL50)
select *
from table_1
where rownum <= 1000
/
This will give you 11000 rows in table_1, 1000 of which are duplicates. Repeat the second insertion to increase the number of duplicates.
There should be no need for parallelism.
ALl i want now is good throughput, which can insert 50 GB of data within 30 minutes,with or without parallelism.
However, this new piece of information changes my assessment. The simplest way to run this in parallel is to build separate routines for each table and run each in a separate session.
I'm working on Oracle database and have to alter the table 'RETURNS' and add the columns RENTAL_SALES and INBOUND_SALES.
ALTER TABLE
RETURNS
ADD(
RENTAL_SALES NUMBER (14,2) NULL,
INBOUND_SALES NUMBER (14,2) NULL
);
How do I set the Histogram to "Yes"
Run the gather status using method_opt='FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 {colum name on which you want to enable histogram}' .
Check whether it is enabled or not
Select column_name, histogram from
User_tab_column_statics where table_name='tableName';
Why you need to use histograms? are you facing wrong query planes?
There are type of histograms, depending on Number of distinct values the type is assigned.
frequency(top) histograms, high balanced histograms and hybrid histograms.
The database will assign a histogram by gathering the statistics auto, then query on the tables (when querying on the table data will be update on SYS.COL_USAGE$) then update statistic again.
BEGIN
dbms_stats.Gather_table_stats('SCHEMA_NAME', 'TABLE',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
END;
/
select * from TABLE where ....
BEGIN
dbms_stats.Gather_table_stats('SCHEMA_NAME', 'TABLE',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
END;
/
Note: ( If you already created an index before or already updated statistics and you were querying on the table, updating the statistics again will create the histogram)
Another Note: this method_opt='FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 column name will assign the column to high balanced , maybe this columns needs frequency type, so if you don't know the NDV and how much data there its better let the database choose, else you might have bad query plan, and the rest columns will not have histograms created because all columns size 1 collects base column statistics.
Table myfirst3 have 4 columns and 1.2 million records.
Table mtl_object_genealogy has over 10 million records.
Running the below code takes very long time. How to tune this code using with options?
WITH level1 as (
SELECT mln_parent.lot_number,
mln_parent.inventory_item_id,
gen.lot_num ,--fg_lot,
gen.segment1,
gen.rcv_date.
FROM mtl_lot_numbers mln_parent,
(SELECT MOG1.parent_object_id,
p.segment1,
p.lot_num,
p.rcv_date
FROM mtl_object_genealogy MOG1 ,
myfirst3 p
START WITH MOG1.object_id = p.gen_object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
CONNECT BY nocycle PRIOR MOG1.parent_object_id = MOG1.object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
UNION all
SELECT p1.gen_object_id,
p1.segment1,
p1.lot_num,
p1.rcv_date
FROM myfirst3 p1 ) gen
WHERE mln_parent.gen_object_id = gen.parent_object_id )
select /*+ NO_CPU_COSTING */ *
from level1;
execution plan
CREATE TABLE APPS.MYFIRST3
(
TO_ORGANIZATION_ID NUMBER,
LOT_NUM VARCHAR2(80 BYTE),
ITEM_ID NUMBER,
FROM_ORGANIZATION_ID NUMBER,
GEN_OBJECT_ID NUMBER,
SEGMENT1 VARCHAR2(40 BYTE),
RCV_DATE DATE
);
CREATE TABLE INV.MTL_OBJECT_GENEALOGY
(
OBJECT_ID NUMBER NOT NULL,
OBJECT_TYPE NUMBER NOT NULL,
PARENT_OBJECT_ID NUMBER NOT NULL,
START_DATE_ACTIVE DATE NOT NULL,
END_DATE_ACTIVE DATE,
GENEALOGY_ORIGIN NUMBER,
ORIGIN_TXN_ID NUMBER,
GENEALOGY_TYPE NUMBER,
);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N1 ON INV.MTL_OBJECT_GENEALOGY(OBJECT_ID);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N2 ON INV.MTL_OBJECT_GENEALOGY(PARENT_OBJECT_ID);
Your explain plan shows some very big numbers. The optimizer reckons the final result set will be about 3227,000,000,000 rows. Just returning that many rows will take some time.
All table accesses are Full Table Scans. As you have big tables that will eat time too.
As for improvements, it's pretty hard to for us understand the logic of your query. This is your data model, you business rules, your data. You haven't explained anything so all we can do is guess.
Why are you using the WITH clause? You only use the level result set once, so just have a regular FROM clause.
Why are you using UNION ALL? That operation just duplicates the records retrieved from myfirst3 ( all those values are already included as rows where MOG1.object_id = p.gen_object_id.
The MERGE JOIN CARTESIAN operation is interesting. Oracle uses it to implement transitive closure. It is an expensive operation but that's because treewalking a hierarchy is an expensive thing to do. It is unfortunate for you that you are generating all the parent-child relationships for a table with 27 million records. That's bad.
The full table scans aren't the problem. There are no filters on myfirst3 so obviously the database has to get all the records. If there is one parent for each myfirst3 record that's 10% of the contents mtl_object_genealogy so a full table scan would be efficient; but you're rolling up the entire hierarchy so it's like you're looking at a much greater chunk of the table.
Your indexes are irrelevant in the face of such numbers. What might help is a composite index on mtl_object_genealogy(OBJECT_ID, PARENT_OBJECT_ID, END_DATE_ACTIVE).
You want all the levels of PARENT_OBJECT_ID for the records in myfirst3. If you run this query often and mtl_object_genealogy is a slowly changing table you should consider materializing the transitive closure into a table which just has records for all the permutations of leaf records and parents.
To sum up:
Ditch the WITH clause
Drop the UNION ALL
Tune the tree-walk with a composite index (or materializing it)
I have the following table
CREATE TABLE "METRIC_VALUE_RAW"
(
"SUBELEMENT_ID" INTEGER NOT NULL ,
"METRIC_METADATA_ID" INTEGER NOT NULL ,
"METRIC_VALUE_INT" INTEGER,
"METRIC_VALUE_FLOAT" FLOAT(126),
"TIME_STAMP" TIMESTAMP NOT NULL
) ;
Every hour data will be loaded into the table using sql loader.
I want to create partitions so that data for every day go into a partition.
In table I want to store data for 30 days. So when it crosses 30 days, the oldest partition should get deleted.
Can you share your ideas on how I can design the partitions.
here is an example how to do it on Oracle 11g and it works very well. I haven't tried it on Oracle 10g, you can try it.
This is the way, how to create a table with daily partitions:
CREATE TABLE XXX (
partition_date DATE,
...,
...,
)
PARTITION BY RANGE (partition_date)
INTERVAL (NUMTODSINTERVAL(1, 'day'))
(
PARTITION part_01 values LESS THAN (TO_DATE('2000-01-01','YYYY-MM-DD'))
)
TABLESPACE MY_TABLESPACE
NOLOGGING;
As you see above, Oracle will automaticaly create separate partitions for each distinct partition_day after 1st January 2000. The records, whose partition_date is older than this date, will be stored in partition called 'part_01'.
You can monitore your table partitions using this statement:
SELECT * FROM user_tab_partitions WHERE table_name = 'XXX';
Afterwards, when you would like to delete some partitions, use following command:
ALTER TABLE XXX DROP PARTITION AAAAAA UPDATE GLOBAL INDEXES
where 'AAAAAA' is parition name.
I hope it will help you!
As i said , There are big differences in partition automation between 10g and 11g.
In 10G you will have to manually manage the partitions during your ETL process (I'm sure every 10g DBA has a utility package he wrote to manage partitions ... ).
For steps 1 & 2 , you have several options
load data directly into the daily partition.
load data into a new partition and merge it into the daily one.
load data into a new partition every hour, and during a maintenance
window merge all hourly partitions into a daily partition.
The right way for you depends on your needs. Is the newly added data is queried immediately ? In what manner ? Would you query for data across several hours (or loads...) ? Are you showing aggregations ? are you performing DML operations on the data (DDL operations on partitions cause massive locking).
about 3, again - manually. drop old partitions.
In 11G, you have the new interval partition feature with automates some of the tasks mentioned above.
Following is a sample create table sql to parititon data:
CREATE TABLE quarterly_report_status (
report_id INT NOT NULL,
report_status VARCHAR(20) NOT NULL,
report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-02 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-03 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-04 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-05 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-06 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-07 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-08 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-09 00:00:00') ),
PARTITION p9 VALUES LESS THAN (MAXVALUE)
);
Paritions will be created by DBa and rest will be taken care of by oracle.
If you want to delete partition then you will have to write separate jobs for it.