Quickest way to copy over table data in oracle/postgresql - spring

I'm looking at a spring boot application that is used to copy data from temp to permanent table based on last updated date. It copies only if last updated date is greater than desired date, so not all records are copied over. Currently the table has around 300K+ records and the process with spring JPA is taking over 2 hours (for all of them) and is not at all feasible. The goal is to bring it down to under 15 mins maximum. I'm trying to see how much difference using JDBCtemplate would bring in. Would a pl/sql script be a better option? Also wanted to see if there are better options out there. Appreciate your time.
Using oracle database at the moment but postgresql migration is on the cards.
Thanks!

You can do this operation with a straight SQL query (which will work on Oracle or PostgreSQL). Assuming your temp_table has the same columns as the permanent table, the last updated date column is called last_updated and you wanted to copy all records updated since 2020-05-03 you could write a query like:
INSERT INTO perm_table
SELECT *
FROM temp_table
WHERE last_updated > TO_DATE('2020-05-03', 'YYYY-MM-DD')
In your app you would pass '2020-05-03' via a placeholder either directly or via JdbcTemplate.

Related

Does oracle store transaction date of data?

Im currently using oracle 11g. I had extracted data from the schema once at a specific date to do some cleansing process. Suppose that now i would want to extract again but only with new/updated data from the last date i extracted, is there anyway i could get it? unfortunately these data does not have any column that store last edited date.
i was wondering if Oracle would automatically store that type of info that i could check? perhaps any transaction log?
Thanks,
A Physal
One way would be to enable flashback and then you can do:
SELECT * FROM table1
MINUS
SELECT * FROM table1 AS OF TIMESTAMP TIMESTAMP '2018-01-01 00:00:00.000';
To get all the rows changed since 2018-01-01.

PLSQL Daily record of changes on table, then select from day

Oracle PL SQL question: One table should be archived day by day. Table counts about 50.000 records. But only few records during a day are changed. Second table (destination/history table) has one additional field - import_date. Two days = 100.000 records. Should be 50.000 + feq records with informations about changes during a day.
I need one simple solution to copy data from source table to destination like a "LOG" - only changes are copied/registered. But I should have possibility to check dataset of source table from given day.
Is there such mechanism like MERGE or something like that?
Normally you'd have a day_table and a master_table. All records are loaded from the day_table into master and only master is manipulated with the day table used to store the raw data.
You could add a new column to master such as a date_modified and have the app update this field when a record changes, or a flag used to indicate it's changed.
Another way to do this is to have an active/latest flag. Instead of changing the record it is duplicated with a flag set to indicate this is a better/old record. This might be easier for comparison
e.g. select * from master_table where record = 'abcd'
This would show 2 rows - the original loaded at 1pm and the modified active one changed at 2pm.
There's no need to have another table, you could base a view on this flag then
e.g. CHANGED_RECORDS_VIEW = select * from master_table where flag = 'Y'
Once i faced a similar issue. And please find the solution below.
Tables we had :
Master table always has records it and keeps adding up.
One backup table to store all the master records on daily basis.
Solution:
From morning to evening records are inserted and updated into the master table. The concept of finding out the new records was the timestamp. Whenever a new record is inserted/updated then corresponding timestamp is added and kept.
At night, we had created a job schedule to run a procedure (Create_Job-> please check oracle documentations for further learning) which runs exactly at 10:00 pm to bulk collect all the records available in master table based on today's date and insert into the backup table.
This scenrio which i have explained to you will help you. Please check out the concept of Job scheduling which will help you. Thank you .

How to update/insert a table without creating a new table (temporary or otherwise)

Background: My team has an etl job that updates an aggregate table. Each row contains data for a particular date, but this row can and will get updated after the row date (which means any row can contain data from multiple jobs). This ETL job missed some data for one day last week and now I need to backfill it.
Problem: I have the missing data, and what I was planning on doing was dumping that data into a temporary table and then merging it with the agg table. That way I can deal with whether the ETL job already contains a row for that data (update) or whether a new row needs to be added (insert), but I don't have sufficient permissions to create a temp table, and I'd prefer not to involve the DBA.
Question: Can I do an insert/update sort of behavior without creating a temporary table (this is Oracle SQL by the way).
Edit: The data is coming from a tsv file.
Why do you want to avoid involving the DBA? The DBA should have full knowledge of what's going on in the database, as they are ultimately responsible for the condition of the data within it. So you shouldn't be playing sneaky commando with them.
As you have a file of missing data, the easiest way to present it to the database is with an external table. This requires the creation of the table and probably a directory object as well. You will need the DBA's help with this task.
The only way to avoid creating database objects is to convert your TSV file into a series of DML statements. An IDE which supports regex and/or records macros will prove invaluable here. I like TextPad; other editors are available.
The DML statement for doing upserts in Oracle is the MERGE statement. The one thing you need to watch for is recency. Your missing data comes from last week. If a row exists it may have have been added or amended in the intervening period. You must write your MERGE statement so it does not overwrite more recent data with the older stuff. Hopefully your table has useful metadata columns such as DATE_CREATED and LAST_UPDATED.

Best way to bulk insert data into Oracle database

I am going to create a lot of data scripts such as INSERT INTO and UPDATE
There will be 100,000 plus records if not 1,000,000
What is the best way to get this data into Oracle quickly? I have already found that SQL Loader is not good for this as it does not update individual rows.
Thanks
UPDATE: I will be writing an application to do this in C#
Load the records in a stage table via SQL*Loader. Then use bulk operations:
INSERT INTO SELECT (for example "Bulk Insert into Oracle database")
mass UPDATE ("Oracle - Update statement with inner join")
or a single MERGE statement
To keep It as fast as possible I would keep it all in the database.
Use external tables (to allow Oracle to read the file contents),
and create a stored procedure to do the processing.
The update could be slow, If possible, It may be a good idea to consider creating a new table based on all the records in the old (with updates) then switch the new & old tables around.
How about using a spreadsheet program like MS Excel or LibreOffice Calc? This is how I perform bulk inserts.
Prepare your data in a tabular format.
Let's say you have three columns, A (text), B (number) & C (date). In the D column, enter the following formula. Adjust accordingly.
="INSERT INTO YOUR_TABLE (COL_A, COL_B, COL_C) VALUES ('"&A1&"', "&B1&", to_date ('"&C1&"', 'mm/dd/yy'));"

oracle and creating history

I am working on a system to track a project's history. There are 3 main tables: projects, tasks, and clients then 3 history tables for each. I have the following trigger on projects table.
CREATE OR REPLACE TRIGGER mySchema.trg_projectHistory
BEFORE UPDATE OR DELETE
ON mySchema.projects REFERENCING NEW AS New OLD AS Old
FOR EACH ROW
declare tmpVersion number;
BEGIN
select myPackage.GETPROJECTVERSION( :OLD.project_ID ) into tmpVersion from dual;
INSERT INTO mySchema.projectHistiry
( project_ID, ..., version )
VALUES
( :OLD.project_ID,
...
tmpVersion
);
EXCEPTION
WHEN OTHERS THEN
-- Consider logging the error and then re-raise
RAISE;
END ;
/
I got three triggers for each of my tables (projects, tasks, clients).
Here is the challenge: Not everything changes at the same time. For example, somebody could just update a certain tasks' cost. In this case, only one trigger fires and I got one insert. I'd like to insert one record into 3 history tables at once even if nothing changed in the projects and clients tables.
Also, what if somebody changes a project's end_date, the cost, and say the picks another client. Now, I have three triggers firing at the same time. Only in this case, I will have one record inserted into my three history tables. (which I want)
If i modify the triggers to do insert into 3 tables for the first example, then I will have 9 inserts when the second example happens.
Not quite sure how to tackle this. any help?
To me it sounds as if you want a transaction-level snapshot of the three tables created whenever you make a change to any of those tables.
Have a row level trigger on each of the three tables that calls a single packaged procedure with the project id and optionally client / task id.
The packaged procedure inserts into all three history tables the relevant project, client and tasks where there isn't already a history record for that key and transaction (ie you don't want duplicates). You got a couple of choices when it comes to the latter. You can use a unique constraint and either a BULK select and insert with FORALL/SAVE EXCEPTIONS, DML error logging (EXCEPTIONS INTO) or a INSERT...SELECT...WHERE NOT EXISTS...
You do need to keep track of your transactions. I'm guessing this is what you were doing with myPackage.GETPROJECTVERSION. The trick here is to only increment versions when you have a new transaction. If, when you get a new version number, you hold it in a pacakge level variable, you can easily tell whether your session has already got a version number or not.
If your session is going to run multiple transaction, you'll need to 'clear' out the session-level version number if it was part of a previous transaction. If you get DBMS_TRANSACTION.LOCAL_TRANSACTION_ID and store that at the package/session level as well, you can determine if you are in a new transaction, or part of the same transaction.
From your description, it looks like you would be capturing the effective and end date for each of the history rows once any of the original rows change.
Eg. Project_hist table would have eff_date and exp_date which has the start and end date for a given project. Project table would just have an effective date. (as it is the active project).
I don't see why you want to insert rows for all three history tables when only one of the table values is updated. You can pretty much get the details as you need (as of a given date) using your current logic. (inserting old row in the history table for the table that has been updated only.).
Alternative answer.
Have a look at Total Recall / Flashback Archive
You can set the retention to 10 years, and use a simple AS OF TIMESTAMP to get the data as of any particular timestamp.
Not sure on performance though. It may be easier to have a daily or weekly retention and then a separate scheduled job that picks out the older versions using the VERSIONS BETWEEN syntax and stores them in your history table.

Resources