How to Update in Postgresql without affecting the row number? - spring

I have writing API using Spring JPA and Postgresql.
After update statement It's affect the row number of record.
The order of record 2 which updated will going to the last order.
I don't want to change the row number
How can I update without changing the row number?

A database table is an unordered set of tuples (also known as a relation), so you cannot rely on the order of rows returned from a SELECT * FROM tablename.
You need to enforce an ordering with an ORDER BY clause if you need it.
The internal reason why the location of a row changes after an update is that PostgreSQL actually writes a new version of the row, which in this case is appended at the end. But you cannot rely on that either: if there is free space in the middle of the table, the new row version can be added there.

Related

Deduplication in Oracle

Situation:-
Table 'A' is receiving data from OracleGoldenGate feed and gets the data as New,Updated,Duplicate feed that either creates a new record or rewrites the old one based on it's characteristics (N/U/D). Every entry in table has its UpdatedTimeStamp column contain insertion timestamp.
Scope:-
To write a StoredProcedure in Oracle that pulls the data for a time period based on UpdatedTimeStamp column and publishes an xml using DBMSXMLGEN.
How can I ensure that a duplicate entered in the table is not processed again ??
FYI-am currently filtering via a new table that I created, named as 'A-stg' and has old data inserted incrementally.
As far as I understood the question, there are a few ways to avoid duplicates.
The most obvious is to use DISTINCT, e.g.
select distinct data_column from your_table
Another one is to use timestamp column and get only the last (or the first?) value, e.g.
select data_column, max(timestamp_column)
from your_table
group by data_column

Oracle 12c - refreshing the data in my tables based on the data from warehouse tables

I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful

Can this update cause a deadlock in oracle 10g

I came across this update statement and was wondering how the internal working is. It updates a column which also is used in the where clause of the update.
Should this be ideally done in two steps, or does oracle takes care of it automatically?
UPDATE TBL1 SET DATE1=DATE2 WHERE DATE2> DATE1
Oracle takes care of it automatically. Effectively when it runs the update, Oracle performs the following steps:
Queries the table - i.e. evaluate the WHERE clause predicate for each row in the table
For each row that is returned by step 1, update it as per the SET clause. The values of each column are those that were fetched.
For this reason, it is perfectly possible to run an update like this which swaps the values of columns:
UPDATE TBL1 SET DATE1=DATE2, DATE2=DATE1 WHERE DATE2 > DATE1;
The update might be blocked if another session tries to update or delete one of the same rows. Deadlocks are possible but Oracle automatically resolves these by rolling back one of the sessions and raising an exception.

Oracle sequence generator within interval

I'm using oracle 11gr2 and for the product table when a new product is inserted I need to assign an autoincrement id going from 1 to 65535. Product could be then be deleted.
When I reach the 65535th, I need to scan the table to find a free hole for assigning new ID.
As I have this requirement oracle sequence could not be used, so I am using a function (tried also a trigger on insert) in order to generate a free id...
The problem is that I could not handle batch insert for example and I have concurrency problems...
How could I solve this ? By using some sort of external Id generator ?
Sounds like an arbitrary design. Is there a good reason for having a 16-bit max product id or for reusing IDs? Both constraints are bad practice.
I doubt any external generator is going to provide anything that Oracle doesn't already provide. I recommend using sequences for batch insert. The problem you have is how to recycle the IDs. Oracle plain sequences don't track the primary key, so you need a solution to find recycled keys first, then fallback to the sequence perhaps.
Product ID Recycling
Batch Inserts - Use sequence for keys the first time you load them. For this small range, set NOCACHE on the sequence to eliminate gaps.
Deletes - When a product is deleted, instead of actually deleting the row, set a DELETED = 'Y' flag on the row.
Inserts - Update the first record available with DELETED flag set, or either select the min ID from product table where DELETED = 'Y'. Update record with new product info (but same ID) and set DELETED = 'N'
This ensures you always recycle before you insert new sequence IDs
If you want to implement the logic in the database, you can create a view (VIEW$PRODUCTS) where DELETED = 'N' and an INSTEAD OF INSERT trigger to do the insert.
In any scenario, when you run out of sequences (or sequence wraps), you are out of luck for batch inserts. I'd reconsider that part of the design if I were you.

create index before adding columns vs. create index after adding columns - does it matter?

In Oracle 10g, does it matter what order create index and alter table comes in?
Say i have a query Q with a where clause on column C in table T. Now i perform one of the following scenarios:
I create index I(C) and then add columns X,Y,Z.
Add columns X,Y,Z then create index I(C).
Q is 'select * from T where C = whatever'
Between 1 and 2 will there be a significant difference in performance of Q on table T when T contains a very large number of rows?
I personally make it a practice to do #2 but others seem to have a different opinion.
thanks
It makes no difference if you add columns to a table before or after creating an index. The optimizer should pick the same plan for the query and the execution time should be unchanged.
Depending on the physical storage parameters of the table, it is possible that adding the additional columns and populating them with data may force quite a bit of row migration to take place. That row migration will generate changes to the indexes on the table. If the index exists when you are populating the three new columns with data, it is possible that populating the data in X, Y, and Z will take a bit longer because of the additional index maintenance.
If you add columns without populating them, then it is pretty quick as it is just a metadata change. Adding an index does require the table to be read (or potentially another index) so that can be very time consuming and of much greater impact than the simple metadata change of recording the new index details.
If the new columns are going to be populated as part of the ALTER TABLE, it is a different matter.
The database may undergo an unplanned shutdown during the course of adding that data to every row of the table data
The server memory may not have room to record every row changed in that table
Therefore those row changes may be written to datafiles before commit, and are therefore written as dirty blocks
The next read of those blocks, after the ALTER table has successfully completed will do a delayed block cleanout (ie record the fact that the change has been committed)
If you add the columns (with data) first, then the create index will (probably) read the table and do the added work of the delayed block cleanout.
If you create the index first then add the columns, the create index may be faster but the delayed block cleanout won't happen and that housekeeping will be picked up by the application later (potentially by the select * from T where C = whatever)

Resources