Manage insert update scheduled tasks - spring

I'm not used to work with scheduled tasks, I need some advice (is my thought good or bad)
I'm designing a function that runs every 20 minutes. This function retrieves data from a json file (which I do not have control over) and inserts the data into the database.
When I was doing this I did not think that this will create a unique ID problem in the database view that it is the same data that updates each time.
I thought of doing two functions:
1: the first insertions (INSERT)
2: Update the data according to the ID (UPDATE)
#Component
public class LoadSportsCompetition {
#PostConstruct
public void insert() {
// 1 : get json data
// 2 : insert in DB
}
#Scheduled(cron="0 0/20 * * * ?")
public void update() {
// 1 : get json data
// 2 : update rows by ID
}
}

The (most probably) best way to handle this in PostgreSQL 9.5 and later, is to use INSERT ... ON CONFLICT ... DO UPDATE.
Let's assume this is your original table (very simple, for the sake of this example):
CREATE TABLE tbl
(
tbl_id INTEGER,
payload JSONB,
CONSTRAINT tbl_pk
PRIMARY KEY (tbl_id)
) ;
We fill it with the starting data:
INSERT INTO tbl
(tbl_id, payload)
VALUES
(1, '{"a":12}'),
(2, '{"a":13, "b": 25}'),
(3, '{"a":15, "b": [12,13,14]}'),
(4, '{"a":12, "c": "something"}'),
(5, '{"a":13, "x": 1234.567}'),
(6, '{"a":12, "x": 1234.789}') ;
Now we perform a non-conflicting insert (i.e.: the ON CONFLICT ... DO won't be executed):
-- A normal insert, no conflict
INSERT INTO tbl
(tbl_id, payload)
VALUES
(7, '{"x": 1234.56, "y": 3456.78}')
ON CONFLICT ON CONSTRAINT tbl_pk DO
UPDATE
SET payload = excluded.payload ; -- Note: the excluded pseudo-table comprises the conflicting rows
And now we perform one INSERT that would generate a PRIMARY KEY conflict, which will be handled by the ON CONFLICT clause and will perform an update
-- A conflicting insert
INSERT INTO tbl
(tbl_id, payload)
VALUES
(3, '{"a": 16, "b": "I don''t know"}')
ON CONFLICT ON CONSTRAINT tbl_pk DO
UPDATE
SET payload = excluded.payload ;
And now, a two row insert that will conflict on one row, and insert the other:
-- Now one of each
-- A conflicting insert
INSERT INTO tbl
(tbl_id, payload)
VALUES
(4, '{"a": 18, "b": "I will we updated"}'),
(9, '{"a": 17, "b": "I am nuber 9"}')
ON CONFLICT ON CONSTRAINT tbl_pk DO UPDATE
SET payload = excluded.payload ;
We check now the table:
SELECT * FROM tbl ORDER BY tbl_id ;
tbl_id | payload
-----: | :----------------------------------
1 | {"a": 12}
2 | {"a": 13, "b": 25}
3 | {"a": 16, "b": "I don't know"}
4 | {"a": 18, "b": "I will we updated"}
5 | {"a": 13, "x": 1234.567}
6 | {"a": 12, "x": 1234.789}
7 | {"x": 1234.56, "y": 3456.78}
9 | {"a": 17, "b": "I am nuber 9"}
Your code should loop through your incoming data, get it, and perform all the INSERT/UPDATE (sometimes called MERGE or UPSERT) one row at a time, or in batches, with multi-line VALUES.
You can get all the code at dbfiddle here
There is also one alternative, which is better suited if you work in batches. Use a WITH statement, that has one UPDATE clause, followed by an INSERT one:
-- Avoiding (most) concurrency issues.
BEGIN TRANSACTION ;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
WITH data_to_load (tbl_id, payload) AS
(
VALUES
(3, '{"a": 16, "b": "I don''t know"}' :: jsonb),
(4, '{"a": 18, "b": "I will we updated"}'),
(7, '{"x": 1234.56, "y": 3456.78}'),
(9, '{"a": 17, "b": "I am nuber 9"}')
),
update_existing AS
(
UPDATE
tbl
SET
payload = data_to_load.payload
FROM
data_to_load
WHERE
tbl.tbl_id = data_to_load.tbl_id
)
-- Insert the non-existing
INSERT INTO
tbl
(tbl_id, payload)
SELECT
tbl_id, payload
FROM
data_to_load
WHERE
data_to_load.tbl_id NOT IN (SELECT tbl_id FROM tbl) ;
COMMIT TRANSACTION ;
You'll get the same results, as you can see at dbfiddle here.
In both cases, be ready for error handling, and be prepared to retry your transactions if they conflict due to concurrent actions also modifying your database. Your transactions can be explicit (like in the second case), or implicit, if you have some kind of auto-commit every single INSERT

Related

Modify object attribute/property — without creating custom function

I have an Oracle 18c SDO_GEOMETRY object that has attributes (aka properties):
with cte as (
select
sdo_util.from_wktgeometry('MULTILINESTRING ((0 5 0, 10 10 10, 30 0 33.54),(50 10 33.54, 60 10 -10000))') shape
from dual)
select
a.shape.sdo_gtype as old_gtype,
a.shape.sdo_gtype + 300 as new_gtype,
a.shape
from
cte a
OLD_GTYPE NEW_GTYPE SHAPE
--------- --------- -----
3006 3306 SDO_GEOMETRY(3006, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1, 10, 2, 1), SDO_ORDINATE_ARRAY(0, 5, 0, 10, 10, 10, 30, 0, 33.54, 50, 10, 33.54, 60, 10, -10000))
I want to modify the GTYPE attribute of the SDO_GEOMETRY object:
Old GTYPE: 3006
New GTYPE: 3306
It's possible to modify the GTYPE attribute using a custom function (or an inline function):
See #AlbertGodfrind's answer in Convert M-enabled SDE.ST_GEOMETRY to SDO_GEOMETRY using SQL
However, as an experiment, I want to modify the GTYPE attribute right in the SELECT clause in a query -- without using a custom function.
For example, I wonder if there might be OOTB functionality like this:
modify_object_property(object, property_name, new_val) returns sdo_geometry
Is there a way to modify an SDO_GEOMETRY GTYPE attribute/property — without creating a custom function?
Related: Replace value in SDO_ELEM_INFO_ARRAY varray
You can use the sdo_geometry constructor as follows
with cte as (
select
sdo_util.from_wktgeometry('MULTILINESTRING ((0 5 0, 10 10 10, 30 0 33.54),(50 10 33.54, 60 10 -10000))') shape
from dual)
select sdo_geometry(a.shape.sdo_gtype + 300,
a.shape.sdo_srid,
a.shape.sdo_point,
a.shape.sdo_elem_info,
a.shape.sdo_ordinates) as shape
from cte a;
You can modify the contents of an object in SQL. You just need to make sure you use an alias on the table. For example:
update my_table t
set t.geom.sdo_gtype = t.geom.sdo_gtype + 300
where ... ;
But that is not what you are looking for. Modifying an object that way in the select list is not possible. Hence the approach via a custom function.

How to compare two records of 2 separate tables in the same schema and show only the difference in Oracle Database?

I have a source table and a target table in the same schema. Based on the primary key value , I would want to compare the records of the source and destination tables and show only the columns which are having different values.
Could you please help me out on how to get a solution for the same ?
Note : DB version I am having : Oracle Database 19c Enterprise Edition
In Oracle, you can use CONCAT to concatenate multiple texts into one. Since you have multiple columns, you can view this problem-space as the concatenation of many expressions, one for each column. Your expression for a column could look like:
CASE WHEN ((r1.c1 IS NULL) AND (r2.c1 IS NULL)) OR (r1.c1 = r2.c1)
THEN ''
ELSE CONCAT('Table1: ', r1.c1, ', Table2: ', r2.c1, ';')
END
Now that we know how the expression looks like for a field, let's assume there are three fields:
SELECT CONCAT(
CASE WHEN ((r1.c1 IS NULL) AND (r2.c1 IS NULL)) OR (r1.c1 = r2.c1)
THEN ''
ELSE CONCAT('Table1: ', r1.c1, ', Table2: ', r2.c1, ';')
END,
CASE WHEN ((r1.c2 IS NULL) AND (r2.c2 IS NULL)) OR (r1.c2 = r2.c2)
THEN ''
ELSE CONCAT('Table1: ', r1.c2, ', Table2: ', r2.c2, ';')
END,
CASE WHEN ((r1.c3 IS NULL) AND (r2.c3 IS NULL)) OR (r1.c3 = r2.c3)
THEN ''
ELSE CONCAT('Table1: ', r1.c3, ', Table2: ', r2.c3, ';')
END) AS diff
FROM table1 r1
JOIN table2 r2
ON r1.id = r2.id;
This is the basic idea, but you will encounter some problems, like the types of the fields. If they are not textual, then you will need to convert them into some text. Also, if you need to reuse this diff tool, then you cannot assume that you know the number of fields to compare or even their name, so you will need to load the fields and generate the query based on their names and types.
I have a similar job which compares two large tables (26 mio rows each). I don't need to know which column is different, but if I rephrase the query to do that, it would be something along the lines of
CREATE TABLE t1 (
id NUMBER PRIMARY KEY,
a NUMBER NOT NULL,
b NUMBER NOT NULL,
c NUMBER NULL
);
CREATE TABLE t2 (
id NUMBER PRIMARY KEY,
x NUMBER NOT NULL,
y NUMBER NOT NULL,
z NUMBER NULL
);
INSERT INTO t1 VALUES (1, 10, 20, 30);
INSERT INTO t2 VALUES (1, 10, 20, 30);
INSERT INTO t1 VALUES (2, 10, 20, 30);
INSERT INTO t2 VALUES (2, 11, 20, 30);
INSERT INTO t1 VALUES (3, 10, 21, 30);
INSERT INTO t2 VALUES (3, 10, 20, 30);
INSERT INTO t1 VALUES (4, 10, 20, 31);
INSERT INTO t2 VALUES (4, 10, 20, 30);
INSERT INTO t1 VALUES (5, 10, 20, null);
INSERT INTO t2 VALUES (5, 10, 20, 30);
SELECT id,
CASE WHEN a <> x THEN 1 ELSE 0 END a_x,
CASE WHEN b <> y THEN 1 ELSE 0 END b_y,
CASE WHEN c <> z
OR (c IS NULL AND z IS NOT NULL)
OR (c IS NOT NULL AND z IS NULL) THEN 1 ELSE 0 END c_z
FROM t1 JOIN t2 USING (id)
WHERE a <> x
OR b <> y
OR c <> z OR (c IS NULL AND z IS NOT NULL)
OR (c IS NOT NULL AND z IS NULL);
ID A_X B_Y C_Z
2 1 0 0
3 0 1 0
4 0 0 1
5 0 0 1
The code is lengthy, but I got a script that looks at the data dictionary and writes the query.
However, I'm not totally happy with the comparison of the nullable columns. There is a undokumented system function that makes that easier, but I never got it properly working...

Oracle: is it possible to trim a string and count the number of occurances, and insert to a new table?

My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.
Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.
I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

Oracle. How to exp/imp data from one DB to another which already has data?

I have one Oracle DB with ~40 tables. Some of them have IDs = 1, 2, 3, 4, 5... and constraints.
Now I want to "copy" this data from all tables to another Oracle DB which already has the same tables.
The problem is that another DB also has records (can be the same IDs = 1, 2, 3, 77, 88...) and I don't want to lose them.
Is there some automated way to copy data from one table to another with IDs shifting and constraints?
1, 2, 3, 77, 88 +
**1, 2, 3, 4, 5**
=
1, 2, 3, 77, 88, **89, 90, 91, 92, 93**
Or I need to do it by myself?
insert into new.table
select new.sequence_id.nextval, t.* from old.table t
save new.id - old.id mapping and etc etc etc for all 40 tables?
That's a bit dirty solution but if all IDs are numeric you can first update old IDs to negative number ID = -1 * ID (or just do it in select statement on the fly) then do insert. In that case you have all your IDs consistent, constraints are valid and they can live together with new data.
Firs, you need expdp, is second you ned remap schema new schema name in impdp

Versioning normalized Data in Oracle

I'm trying to find a good way to check if data I'm given through a bulk load (SQLLoader) already exists in my data set so I don't load it again.
Currently we have a set up like this,
TableA
col1, col2, bulkLoadName
This table would contain data like,
col1, col2, bulkLoadName
"Joe", 35, "Load1"
"Tim", 65, "Load1"
"Ray", 95, "Load1"
"Joe", 35, "Load2"
And I'd like to change it to,
TableA
PK, col1, col2
TableAtoBulkLoadName
PK, TABLEA_PK, BulkLoadName_PK
BulkLoadName
PK, bulkLoadName
Where the data would look like,
PK, col1, col2
1, "Joe", 35
2, "Tim", 65
3, "Ray", 95
PK, TABLEA_PK, BulkLoadName_PK
1, 1, 1
2, 2, 1
3, 3, 1
4, 1, 2
PK, bulkLoadName
1, "Load1"
2, "Load2"
This normalizes the data so I can easily check for a specific load without a string search and MOST importantly prevents me from loading duplicate data in the database just because something is defined again in a later load.
I'm having trouble deciding how I should implement the duplicate checks. I'm not well verse with SQL and need a solution that is in ORACLE 11g. I've looked and I've come up with 2 possible solutions...
Solution 1:
Use a temp table to store the bulk load and run a stored procedure once loaded to check.
Solution 2:
Use a MERGE clause on the TableA that adds new records to TableA or creates a new intersection record in TableAtoBulkLoadName if the record already exists.
My questions now that all of the background info is out there is what are the pro's and con's with these approaches? Is this kind of normalization normal? Are there standard ways of doing this sort of thing?
Thanks!
Strictly from a performance stand point, if you can do everything in one statement, that's usually better.
But as soon as you start to transform the data in various ways, I personally find that by using a staging table, the resulting code is a lot easier to read and modify.

Resources