i m using spring data saveAll to save 3500 records in an Oracle database but it execute very slowly, is there a way to do bulk insert or any other fast way
noteRepository.saveAll(noteEntityList);//<- this one is slow for 3000 records
thanks in advance
By default, saveAll does not create batch, the batch processing needs to be enabled.
You need to set below properties to enable batch processing
spring.jpa.properties.hibernate.jdbc.batch_size=100
spring.jpa.properties.hibernate.order_inserts=true (if inserts)
OR
spring.jpa.properties.hibernate.order_updates=true (if updates)
First property collects the transaction in batch and second property collects the statements grouped by entity.
Check this thread for more details
How to do bulk (multi row) inserts with JpaRepository?
Also, if you want to do batch inserts, make sure that if your table has an auto-incremented column (say as a PK), that its set up as a Sequence (not Identity) and that the allocationSize (Java) and increment_by value (DB Sequence) are set to the batch size you are trying to persist. Don't set those values to one, else insert will still be slow as JPA will need to keep going back to the DB to get the next value from the sequence.
Related
I am working with a Spring boot project and Oracle Database. My task is importing data from a csv file. The file includes 10000 rows; each row is equal to a row in a table and includes three attributes which are sCode, isDeleted (isDeleted=1 -> delete, isDeleted=0 -> update or insert). I am using batch size of 1000 to insert, update or delete. Each batch, I do the following steps:
Use JPA findAllBySCodeInAndDepartmentId to file all rows that exist in the table (departmentId is the department of current user, (sCode, departmentId) is unique)
Use Map to store data from the first step's result (key is sCode)
Filter out rows in batch that have isDeleted = 1 and exist in the Map to delete (I use JPA deleteAll)
Filter out rows that isDeleted = 0 and exist in the Map to update (I use JPA saveAll)
Left rows in batch which were not updated or deleted (not in the Map) will be inserted into the table (I use JPA saveAll)
It takes around 5 minutes to finish importing 10000 rows
How can I shrink the amount of time to around 1 minute?
I'm looking at a spring boot application that is used to copy data from temp to permanent table based on last updated date. It copies only if last updated date is greater than desired date, so not all records are copied over. Currently the table has around 300K+ records and the process with spring JPA is taking over 2 hours (for all of them) and is not at all feasible. The goal is to bring it down to under 15 mins maximum. I'm trying to see how much difference using JDBCtemplate would bring in. Would a pl/sql script be a better option? Also wanted to see if there are better options out there. Appreciate your time.
Using oracle database at the moment but postgresql migration is on the cards.
Thanks!
You can do this operation with a straight SQL query (which will work on Oracle or PostgreSQL). Assuming your temp_table has the same columns as the permanent table, the last updated date column is called last_updated and you wanted to copy all records updated since 2020-05-03 you could write a query like:
INSERT INTO perm_table
SELECT *
FROM temp_table
WHERE last_updated > TO_DATE('2020-05-03', 'YYYY-MM-DD')
In your app you would pass '2020-05-03' via a placeholder either directly or via JdbcTemplate.
I have an app where I want to save some entities which have some fields. I used liquibase to add some when the application starts. The problem is that when I try to save a new one Hibernate tries to give to it the id 1 but this id already exists in the database. How can I make hibernate aware of liquibase inserts?
It depends what is your ID generation strategy. If you are using a sequence simply set the initial value of the sequence to 10,000. This will allow you to insert up to 10,000 records with Liquibase.
For example if you are using PotgreSQL you can do
ALTER SEQUENCE sequence_name
MINVALUE 10000
START 10000
RESTART 10000;
I am using CrudRepository with methods annotated with #Lock. This will result in shared/exclusive locks on row(s). The underlying DB is Postgresql 9.4. But is there a way to lock the whole table for the transaction?
UPDATE:
The problem why I want to lock a whole table:
Consider a table containing rows with a single value, an integer. Every transaction has to compute the sum of the values in the table, and insert the sum as a new row. The next transaction has to compute the sum of the values again, but now with the newly inserted value.
There may be a better solution, other ideas are welcomed.
You can try this when you open transaction:
entityManager.createNativeQuery("LOCK TABLE public.table_name IN
EXCLUSIVE MODE").executeUpdate();
Of course PostgreSQL specific, but you have also specific problem
I have a database with about 125 000 rows, each row with primary key, couple of int columns and couple of varchars.
I've added an int column and I'm trying to populate it before adding not null constraint.
The db is persisted in script file. I've read somewhere that all the affected rows get loaded to memory before the actual update, which means there wont be a disk write for every row. The whole db is about 20MB which would mean loading it and doing the update should be reasonably fast, right?
So, no joins, no nested queries, basic update.
I've tried multiple db managers including the one bundled with hsql jar.
update tbl1 set col1 = 1
Query never finishes executing.
It is probably running out of memory.
The easier way to do this operation is to define the column with DEFAULT 1, which does not use much memory regardless of the size of the table. You can even add the not null constraint at the same time
ALTER TABLE T ADD COLUMN C INT DEFAULT 1 NOT NULL