Insert huge data into Oracle DB - oracle

My setup - Oracle DB 12.1C, Spring application with Hibernate.
table:
create table war
(
id int generated by default as identity not null constraint wars_pkey primary key,
t1_id int references t1 (id) on delete cascade not null,
t2_id int references t2 (id) on delete cascade not null,
day timestamp not null,
diff int not null
);
I would like to insert 10 000 records into table. With repository.saveAll(<data>) it takes 70s, with using JpaTemplate.batchUpdate(<insert statements>) it takes 68s. When I create new temporary table without constraints it takes 65s.
What is the best/fastest way, how to insert this amount of records into Oracle DB? Unfortunately CSV is not an option.

My solution was to redesign our model - we used int_array for storing diff -> this was almost 10 times faster than first solution.

Related

How to implement bidirectional relationship in Spring Spanner?

In code, I tried with #Interleaved in 1-many relationship at non-owning side to get child list. Could anyone help with below questions:
How to implement bidirectional relationship e.g. get parent from child for 1-1, 1-many relationship
Regarding many-many relationship, what are best practices to implement it and how to implement bidirectional relationship for it.
Thank you very much.
Cloud Spanner currently doesn't offer a way to enforce foreign-key constraints between non-interleaved tables. You will have to enforce such constraints in your application logic. You could use DML statements in Cloud Spanner(that come with the ability to read-your-writes in a Cloud Spanner transaction) to enforce these constraints at insert time by inserting into your tables as follows:
INSERT INTO Referenced(key1,value1) VALUES ('Referenced','Value1');
INSERT INTO Referencing(key2, value2, key1)
SELECT 'Referencing', 'Value2', key1 FROM Referenced WHERE
key1 = 'Referenced';
Running the two statements in a read-write transaction will ensure that the PK-FK relationship between the Referenced and Referencing table is always maintained at insert time. You may have to similarly modify update requests/SQL update statements in your application logic to enforce the PK-FK constraint for updates.
For a 1-many relationship, when using interleaved tables, then the child row's primary key already contains the primary key of its parent, so it is trivial to get the parent row.
CREATE TABLE parent (
parent_key INT64 NOT NULL,
...
) PRIMARY KEY (parent_key);
CREATE TABLE child (
parent_key INT64 NOT NULL,
child_key INT64 NOT NULL,
...
) PRIMARY KEY (parent_key, child_key),
INTERLEAVE IN PARENT parent ON DELETE CASCADE;
If for some reason you do not have the key of the parent, and only the key of the child, then for efficiency you would need to create an index for the reverse lookup:
CREATE INDEX child_to_parent_index
ON child (
child_key
);
and force use of that index when performing the query for the parent:
SELECT
p.*
FROM
parent as p
JOIN
child#{FORCE_INDEX=child_by_id_index} AS c ON p.parent_key = c.parent_key
WHERE
c.child_key = #CHILD_KEY_VALUE;
Many-many relationships would have to be implemented using a 'mapping' table linking table1-key to table2-key.
You will also need a top-level index to get efficient reverse-lookups, and use the FORCE_INDEX directive as above in your queries.
And as #adi mentioned, foreign key constraints would have to be enforced by the application.
CREATE TABLE table1 (
table1_key INT64 NOT NULL,
...
) PRIMARY KEY (table1_key);
CREATE TABLE table2 (
table2_key INT64 NOT NULL,
...
) PRIMARY KEY (table2_key);
CREATE TABLE table1_table2_map (
table1_key INT64 NOT NULL,
table2_key INT64 NOT NULL,
) PRIMARY KEY (table1_key, table2_key);
CREATE INDEX table2_table1_map_index
ON table1_table2_map (
table2_key
) STORING (
table1_key
);
Your application would be responsible for keeping the referential integrity of the mapping table - deleting the mapping rows when rows in table1 or table2 are deleted
If you want to use interleaved tables, then if your application needs to perform bi-directional lookups, you may have to create 2 mapping tables - as a child of each parent, so that finding the mappings from both directions are equally efficient.
CREATE TABLE table1 (
table1_key INT64 NOT NULL,
...
) PRIMARY KEY (table1_key);
CREATE TABLE table2 (
table2_key INT64 NOT NULL,
...
) PRIMARY KEY (table2_key);
CREATE TABLE table1_table2_map (
table1_key INT64 NOT NULL,
table2_key INT64 NOT NULL,
) PRIMARY KEY (table1_key, table2_key),
INTERLEAVE IN PARENT table1 ON DELETE CASCADE;
CREATE TABLE table2_table1_map (
table2_key INT64 NOT NULL,
table1_key INT64 NOT NULL,
) PRIMARY KEY (table2_key, table1_key),
INTERLEAVE IN PARENT table2 ON DELETE CASCADE;
Note that the application needs to keep both of these mapping tables up to date -- ie when deleting a row from table1, the application has to get the referenced table2_key values and delete the mappings from the table2_table1_map (and vice versa).

Partitioning a table refenceing other

I have a target table T1 which doesn't have a Date field. The current size is increasing rapidly. Hence I need to add a Date field and also perform table partitioning on this target table.
T1 has PRIMARY KEY (DOCID, LABID)
and has CONSTRAINT FOREIGN KEY (DOCID) REFERENCES T2
Table T2 is also complex table and has many rules in it.
T2 has PRIMARY KEY (DOCID)
My question is, as I need to partition T1. Is it possible NOT TO perform any step for T2 before T1 is partition? DBA told me that I need to partition T2 first before I touch T1??
There is no need to partition T2 before partitioning T1. Foreign key constraints do not care in the slightest bit about partitioning.
Best of luck.
You have – as proposed by others - two options. The first one is to add a redundant column DATE to the table T1 (child) and introduce the range partitioning on this column.
The second option is to use reference partitioning. Below is the simplified DDL for those options.
Range partitioning on child
create table T2_P2 /* parent */
(docid number not null,
trans_date date not null,
pad varchar2(100),
CONSTRAINT t2_p2_pk PRIMARY KEY(docid)
);
create table T1_P2 /* child */
(docid number not null,
labid number not null,
trans_date date not null, /** redundant column **/
pad varchar2(100),
CONSTRAINT t1_p2_pk PRIMARY KEY(docid, labid),
CONSTRAINT t1_p2_fk
FOREIGN KEY(docid) REFERENCES T2_P2(docid)
)
PARTITION BY RANGE(trans_date)
( PARTITION Q1_2016 VALUES LESS THAN (TO_DATE('01-APR-2016','DD-MON-YYYY'))
);
Reference partition
create table T2_RP /* parent */
(docid number not null,
trans_date date not null,
pad varchar2(100),
CONSTRAINT t2_rp_pk PRIMARY KEY(docid)
)
PARTITION BY RANGE(trans_date)
( PARTITION Q1_2016 VALUES LESS THAN (TO_DATE('01-APR-2016','DD-MON-YYYY'))
);
create table T1_RP /* child */
(docid number not null,
labid number not null,
pad varchar2(100),
CONSTRAINT t1_rp_pk PRIMARY KEY(docid, labid),
CONSTRAINT t1_rp_fk
FOREIGN KEY(docid) REFERENCES T2_RP(docid)
)
PARTITION BY REFERENCE(t1_rp_fk);
Your question is basically if the first option is possible, so the answer is YES.
To decide if the first option is preferable I’d suggest checking three criteria:
Migration
The first option requires a new DATE column in the child table that must be initialized during the migration (and of course correct maintained by the application).
Lifecycle
It could be that the lifecycle of both tables is the same (e.g. both parent and child records are kept for 7 year). In this case is preferable if both tables are partitioned (on the same key).
Access
For queries such as below you will profit from the reference partitioning (both tables are pruned - i.e. only the partitions with the accessed date are queried).
select * from T2_RP T2 join T1_RP T1 on t2.docid = t1.docid
where t2.trans_date = to_date('01012016','ddmmyyyy');
In the first option you will (probalbly) end with FTS on T2 and to get pruning on T1 you need to add predicate T2.trans_date = t1.trans_date
Having said that, I do not claim that the reference partition is superior. But I think it's worth to examine both options in your context and see which one is better.

Materialized view data doesn't update

I want to create a materialized view with fast refresh. The view aggregates values from a single table:
CREATE TABLE N_INSP_DTSEDIF_PLANTAS (
IMPORTACION_ID NUMBER(*,0) NOT NULL,
ID NUMBER(10) NOT NULL,
INSPECCION_ID NUMBER(10) NOT NULL,
NOMBRE_PLANTA VARCHAR2(255 CHAR),
NUM_VIVIENDAS NUMBER(10),
SUP_CONSTRUIDA_VIVIENDAS DECIMAL(10,4),
-- Plus some other columns I don't need
CONSTRAINT N_INSP_DTSEDIF_PLANTAS_P PRIMARY KEY (
IMPORTACION_ID,
ID
) ENABLE,
CONSTRAINT N_INSP_DTSEDIF_PLANTAS_F FOREIGN KEY (IMPORTACION_ID)
REFERENCES IMPORTACION (IMPORTACION_ID)
ON DELETE CASCADE
ENABLE
);
CREATE INDEX N_INSP_DTSEDIF_PLANTAS_X ON N_INSP_DTSEDIF_PLANTAS (IMPORTACION_ID);
CREATE SEQUENCE N_INSP_DTSEDIF_PLANTAS_S
INCREMENT BY 1
START WITH 1
MINVALUE 1
CACHE 20;
CREATE OR REPLACE TRIGGER N_INSP_DTSEDIF_PLANTAS_T
BEFORE INSERT
ON N_INSP_DTSEDIF_PLANTAS
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
BEGIN
IF :NEW.ID IS NULL THEN
SELECT N_INSP_DTSEDIF_PLANTAS_S.NEXTVAL INTO :NEW.ID FROM DUAL;
END IF;
END N_INSP_DTSEDIF_PLANTAS_T;
/
ALTER TRIGGER N_INSP_DTSEDIF_PLANTAS_T ENABLE;
I've composed this through trial and error:
CREATE MATERIALIZED VIEW LOG ON N_INSP_DTSEDIF_PLANTAS
WITH ROWID, SEQUENCE (IMPORTACION_ID, INSPECCION_ID, NUM_VIVIENDAS, SUP_CONSTRUIDA_VIVIENDAS)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW V_PLANTAS
REFRESH FAST
AS
SELECT IMPORTACION_ID, INSPECCION_ID,
SUM(NUM_VIVIENDAS) AS NUM_VIVIENDAS, SUM(SUP_CONSTRUIDA_VIVIENDAS) AS SUP_CONSTRUIDA_VIVIENDAS
FROM N_INSP_DTSEDIF_PLANTAS
GROUP BY IMPORTACION_ID, INSPECCION_ID;
Objects get created without errors and SELECT * FROM V_PLANTAS returns data. However, the view is stalled. New rows added to N_INSP_DTSEDIF_PLANTAS don't show up at V_PLANTAS.
What did I misunderstand from the documentation?
In the mess of random changes that follow panic and despair I inadvertently dropped the ON COMMIT clause:
CREATE MATERIALIZED VIEW V_PLANTAS
REFRESH FAST ON COMMIT
AS
-- ...
The log itself is also invalid for fast refresh because I also omitted the PRIMARY KEY clause. It should be like:
CREATE MATERIALIZED VIEW LOG ON N_INSP_DTSEDIF_PLANTAS
WITH ROWID, PRIMARY KEY, SEQUENCE (INSPECCION_ID, NUM_VIVIENDAS, SUP_CONSTRUIDA_VIVIENDAS)
INCLUDING NEW VALUES;
(Said that, it's worth noting that materialized tables are not just a simple results cache but a fairly large and complex feature that requires careful planning and maintenance. In many situations is easier to just optimize the underlying query.)

Trigger for incrementing a date and inserting into another table

I want to create a trigger, in Oracle. When the dateOrdReceived in my order table is updated or inserted the trigger takes this date whatever it may be and updates it by 14 days into another table productList ordDateDelivery so that it equals to
dateOrdReceived + 14 days = new ordDateDelivery
I did have a couple of attempts and guessed I'd need a query which would join my two tables. I also learned that maybe using DATEADD would allow me add 14 days but altogether I can't quite get it right.
My trigger attempt
`CREATE OR REPLACE TRIGGER "PRODUCTLIST_DATE_DELIVERY"
BEFORE
insert or update on "PRODUCTLIST"
for each row
begin
select p.dateOrdRecieved, o.ordDateDelivery
from productList p JOIN orders o
ON p.ordID = o.ordID;
new.OrdDateDelivery := DATEADD(day,14,new.p.dateOrdRecieved)
end;
/
ALTER TRIGGER "PRODUCTLIST_DELIVERY_DATE" ENABLE
and my tables for this trigger are as follows
PRODUCTLIST TABLE
CREATE TABLE "PRODUCTLIST"
( "ORDID" NUMBER(3,0) NOT NULL ENABLE,
"PRODUCTID" NUMBER(3,0) NOT NULL ENABLE,
"QUANTITY" NUMBER(4,2) NOT NULL ENABLE,
"ORDDATEDELIVERY" DATE,
"DISCOUNT" NUMBER(3,0),
"TOTALCOST" NUMBER(4,2),
CONSTRAINT "PK_PRODUCTLIST" PRIMARY KEY ("ORDID", "PRODUCTID") ENABLE
)
/
ALTER TABLE "PRODUCTLIST" ADD CONSTRAINT "FK_ORDERS" FOREIGN KEY ("ORDID")
REFERENCES "ORDERS" ("ORDID") ENABLE
/
ALTER TABLE "PRODUCTLIST" ADD CONSTRAINT "FK_PRODUCTS" FOREIGN KEY ("PRODUCTID")
REFERENCES "PRODUCT" ("PRODUCTID") ENABLE
/
ORDERS TABLE
CREATE TABLE "ORDERS"
( "ORDID" NUMBER(3,0) NOT NULL ENABLE,
"DATEORDRECIEVED" DATE,
"CUSID" NUMBER(3,0) NOT NULL ENABLE,
PRIMARY KEY ("ORDID") ENABLE
)
/
ALTER TABLE "ORDERS" ADD CONSTRAINT "FK_CUSTOMER" FOREIGN KEY ("CUSID")
REFERENCES "CUSTOMER" ("CUSID") ENABLE
/
DATEADD() is not an Oracle function... Oracle's datetime arithmetic is based around the day. If you add 1 to a date it increments the date by one day, adding 1.5 by 36 hours etc.
Now, your trigger.
You can't automatically update or insert a record into another table. The trigger is "on" one table, which means you need to create the DML in order to add or update it into that table.
update productlist
set dateOrdRecieved = :new.OrdDateDelivery + 14
where ordid = :new.ordid
The :new. here references the new data of the table on which the trigger is on. It's a specific "variable" that you can access rather than a general concept of what you're trying to achieve. You can't use it to assign data to other tables directly, though you can use it as a means of doing so.
Next you need to consider where your trigger is. You're looking to update PRODUCTLIST whenever ORDERS is changed, this means that the trigger needs to be on the table ORDERS.
create or replace trigger productlist_date_delivery
before insert or update on orders
for each row
begin
update productlist
set OrdDateDelivery = :new.dateOrdRecieved + 14
where ordid = :new.ordid;
end;
/
Notice a few extra differences to your own:
I use :new. instead of new.
I'm not selecting from the table; there's no need to do this as the data is already available. It's also impossible as you're selecting data that Oracle's trying to update, it forbids this to ensure integrity.
I haven't used cased identifiers. There's no need to do this; Oracle upper-cases everything by default. It's also really painful if everything's not upper case as you have to remember
Every statement ends in a semi-colon.
If you're having problems I recommend Tech on the Net, it has a good basic guide. As always though, there's the documentation on the CREATE TRIGGER statement.

ON DELETE CACADE is very slow

I am using Postgres 8.4. My system configuration is window 7 32 bit 4 gb ram and 2.5ghz.
I have a database in Postgres with 10 tables t1, t2, t3, t4, t5.....t10.
t1 has a primary key a sequence id which is a foreign key reference to all other tables.
The data is inserted in database (i.e. in all tables) apart from t1 all other tables have nearly 50,000 rows of data but t1 has one 1 row whose primary key is referenced from all other tables. Then I insert the 2nd row of data in t1 and again 50,000 rows with this new reference in other tables.
The issue is when I want to delete all the data entries that are present in other tables:
delete from t1 where column1='1'
This query takes nearly 10 min to execute.
I created indexes also and tried but the performance is not at all improving.
what can be done?
I have mentioned a sample schema below
CREATE TABLE t1
(
c1 numeric(9,0) NOT NULL,
c2 character varying(256) NOT NULL,
c3ver numeric(4,0) NOT NULL,
dmlastupdatedate timestamp with time zone NOT NULL,
CONSTRAINT t1_pkey PRIMARY KEY (c1),
CONSTRAINT t1_c1_c2_key UNIQUE (c2)
);
CREATE TABLE t2
(
c1 character varying(100),
c2 character varying(100),
c3 numeric(9,0) NOT NULL,
c4 numeric(9,0) NOT NULL,
tver numeric(4,0) NOT NULL,
dmlastupdatedate timestamp with time zone NOT NULL,
CONSTRAINT t2_pkey PRIMARY KEY (c3),
CONSTRAINT t2_fk FOREIGN KEY (c4)
REFERENCES t1 (c1) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT t2_c3_c4_key UNIQUE (c3, c4)
);
CREATE INDEX t2_index ON t2 USING btree (c4);
Let me know if there is anything wrong with the schema.
With bigger tables and more than just two or three values, you need an index on the referenced column (t1.c1) as well as the referencing columns (t2.c4, ...).
But if your description is accurate, that can not be the cause of the performance problem in your scenario. Since you have only 2 distinct values in t1, there is just no use for an index. A sequential scan will be faster.
Anyway, I re-enacted what you describe in Postgres 9.1.9
CREATE TABLE t1
( c1 numeric(9,0) PRIMARY KEY,
c2 character varying(256) NOT NULL,
c3ver numeric(4,0) NOT NULL,
dmlastupdatedate timestamptz NOT NULL,
CONSTRAINT t1_uni_key UNIQUE (c2)
);
CREATE temp TABLE t2
( c1 character varying(100),
c2 character varying(100),
c3 numeric(9,0) PRIMARY KEY,
c4 numeric(9,0) NOT NULL,
tver numeric(4,0) NOT NULL,
dmlastupdatedate timestamptz NOT NULL,
CONSTRAINT t2_uni_key UNIQUE (c3, c4),
CONSTRAINT t2_c4_fk FOREIGN KEY (c4)
REFERENCES t1(c1) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);
INSERT INTO t1 VALUES
(1,'OZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf', 234, now())
,(2,'agdsOZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf', 4564, now());
INSERT INTO t2
SELECT'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
,'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
, g, 2, 456, now()
from generate_series (1,50000) g
INSERT INTO t2
SELECT'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
,'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
, g, 2, 789, now()
from generate_series (50001, 100000) g
ANALYZE t1;
ANALYZE t2;
EXPLAIN ANALYZE DELETE FROM t1 WHERE c1 = 1;
Total runtime: 53.745 ms
DELETE FROM t1 WHERE c1 = 1;
58 ms execution time.
Ergo, there is nothing fundamentally wrong with your schema layout.
Minor enhancements:
You have a couple of columns defined numeric(9,0) or numeric(4,0). Unless you have a good reason to do that, you are probably a lot better off using just integer. They are smaller and faster overall. You can always add a check constraint if you really need to enforce a maximum.
I also would use text instead of varchar(n)
And reorder columns (at table creation time). As a rule of thumb, place fixed length NOT NULL columns first. Put timestamp and integer first and numeric or text last. More here..

Resources