Data Migration using springboot application taking too much time

Data Migration using springboot application taking too much time - spring

I have created a command line application in java using springboot which migrates data from oracle database to mysql database
I am doing following in service class
#Service
public class MyService{
#Autowired
public OracleUserRepository oracleUserRepository;
#Autowired
public OracleUserAddressRepository oracleUserAddressRepository;
#Autowired
public OracleUserDetailsRepository oracleUserDetailsRepository;
#Autowired
public MysqlUserRepository mysqlUserRepository;
#Autowired
public MysqlUserAddressRepository mysqlUserAddressRepository;
#Autowired
public MysqlUserDetailsRepository mysqlUserDetailsRepository;
public void migrateData(){
List<OracleUserEntity> oracelUserEntities=oracleUserRepository.findAll();
for (OracleUserEntiy oracleUserEntity: oracleUserEntities){
migrateEntity(oracleUserEntity);
}
}
#Transactional("mysqlTransactionManager")
public void migrateEntity(OracleUserEntity oracleUserEntity){
OracleUserAddressEntity oracleUserAddressEntity=getAddressEntity(oracleUserEntity);
OracleUserDetailsEntity oracleUserDetailsEntity=getDetailsEntity(oracleDetaislEntity);
MysqlUserEntity mysqlUserEntity=convertToMysqlUserEntity(oracleUserEntity);
mysqlUserRepository.save(mysqlUserEntity);
MysqlUserAddressEntity mysqlUserAddressEntity=convertToMysqlAddressEntity(oracleUserAddressEntity);
mysqlUserAddressRepository.save(mysqlUserAddressEntity);
MysqlUserDetailsEntity mysqlUSerDetailsEntity=convertToMysqlUserDetailsEntity(oracleUserDetailsEntity);
mysqlUserDetailsRepository.save(mysqlUserAddressEntity);
}
}
I am saving each user using transactional because I want to rollback if any of userAddressEntity or userDetailsEntity fails getting saved
I have around 70K entries in oracle database. The oracleUserRepository.findAll() method alone took around 40 minutes to load all entities and saving entities to mysql DB is taking even more time.
Is this the correct way to do it? Is there anyway to improve performance in this?

JPA does have batch functionality to improve performance.
However, this use case seems ideal for a scripting solution so that one has full control of the SQL statements and transactions.

70K shouldn't be a problem. Try reading each row and inserting to mysql instead of finding all into memory. When you do each row approach, remove Transactional which might cause little slowness. When it fails, log exception and retry or fix the data if data issue and rerun.

This isn't going to be very efficient, but it should not take 40 minutes for a .findall method to load 70k records... Assuming of course your find all is simply trying to return every row, and not being overloaded with a limitation selection and the original table is huge and your select is on a non indexed column. Are you certain you are not resource bound on your Java process, or on the Oracle Server? Is your JVM Heap big enough?
If you connect to the database with sqlplus, and run the select to find all from your User Table, how long does that take? If that is slow as well, you have some sort of resource/contention/locking going on at the Database level... If that it responding quickly, but your program is taking 40 minutes to get the result, you need to check your JVM.. is it resource bound? etc

Related

Getting OOM(Java out of memory) issue while doing bulk delete operation from java in MongoTemplate method

Created a scheduler to delete the older records from mongoDB, which runs once a day but getting OOM issue while deleting the records from DB.
We are getting more than 50K records to get deleted
The method is like below
#Override
public void purgeLteTimeBlock(Long timeBlock) {
Query query = new Query();
query.addCriteria(Criteria.where(Constants.TIME_BLOCK).lte(timeBlock));
mongoTemplate.findAllAndRemove(query, abcEntity.class);
}
From our observation we have found that mongoTemplate provides 3 findAllAndRemove methods and each of them returning the list of objects which are getting deleted.
so we thought that this might be the reason of getting OOM(out of memory issue) because it's giving back more than 50K records in a code
So is there any solution to handle this kind of delete operations from mongoDB?

RowCallbackHandler loades rows info memory

I need to query a big dataset from DB. Actually I'm gonna use pagination parameters (limit and offset) to avoid loading large dataset into heap. For that purpose I'm trying to fetch rows with RowCallBackHadler interface, because docs say An interface used by JdbcTemplate for processing rows of a ResultSet on a per-row basis. and also I've read advices to use that interface to deal with rows one by one.
But something goes wrong every time when I try to fetch data. Here my code below and also screenshot from visualVM with heap space graphic which indicates that all rows were loaded into memory. Query, which I'm trying to execute, returns something around 1.5m rows in DB.
// here just sql query, map with parameters for query, pretty simple RowCallbackHandler
jdbcTemplate.query(queryForExecute, params, new RowCallbackHandler() {
#Override
public void processRow(ResultSet rs) throws SQLException {
while (rs.next()) {
System.out.println("test");
}
}
});
heap via visualVM:
update: I made a mistake when called rs.next(), but removing this line didn't change the situation with loading rows into memory at all

The main problem was with understanding documentation. Doc says
An interface used by JdbcTemplate for processing rows of a ResultSet on a per-row basis.
Actually my code does things in right way: returns me a ResultSet which contains all row (because limit is not defined). I had no confidence that adding LIMIT to any sql query will work good and decided to implement LIMIT via RowCallbackHandler and it was a bad idea, because LIMIT works great with all type of sql queries (complex and simple).

How to make Spring boot post api Idempotent

I have created simple CRUD api using Spring Data JPA in my Spring boot application.My Post method in the controller looks like below:-
#RequestMapping(value = "/article", method = RequestMethod.POST, produces = "application/json")
public Article createArticle(#RequestBody Article article) {
return service.createArticle(article);
}
Service Method is as follows:-
#Override
public Article createArticle(Article articleModel) {
return repository.save(articleModel);
}
My JsonPayload looks like below:
{
"article_nm":"A1",
"article_identifier":"unique identifier"
}
Now I want to make my POST request as Idempotent so that even if i got the json payload with the same article_identifier again It would not create a record in DB.
I can't do any scheme/constraint change in database and article_identifier field is also not primary key in table.
I understand that first I can check in database and return the saved record in response if it already exists but here if multiple request (original and duplicate request) comes at same time, both will check in database and would not find any record with that identifier and will create 2 record (one for each). Also as it's a distributed application how can i maintain the consistency across multiple database transactions.
How can I use some locking mechanism so that there would not be 2 records with same article_identifier ever. Can somebody please suggest some refers how to implement it in Spring boot ?

Idempotency in this case is needed to solve the post-back (or double post request). The simples way would be just to check at the service level whether a post with a given information exists (as you pointed out). You can use repository.exists() variations for that.
I understand that first I can check in database and return the saved record in response if it already exists
As for
if multiple request (original and duplicate request) comes at same time, both will check in database and would not find any record with that identifier and will create 2 record (one for each)
You need to isolate the transactions from each other if it is a single database (I know you said it is not, but I'm trying to explain my reasoning so bear with me). For that spring has the following anotation: #Transactional(isolation = Isolation.SERIALIZABLE). Although in this case #Transactional(isolation = Isolation.REPEATABLE_READ) would be enough.
Also as it's a distributed application how can i maintain the consistency across multiple database transactions.
How is it distributed? You first need to think about the database. Is it a master-slave mysql / postgress / mongodb ? Is it some weird globally distributed system? Assuming it is the traditional master-slave setup then the write transaction will be handled by the master (to my knowledge all the selects belonging to the transaction will also be there) so there should be no problem. However the answer can only really be given if more details are provided.

Hibernate Envers with QueryDSL Update

Hibernate, Hibernate Envers and QueryDSL are configured and working correctly in a Spring boot 1.4.1.RELEASE.
The problem is when using UpdateClause<JPAUpdateClause> updateQueryBuilder = queryFactory.update(collectionTransaction); to build update query and execute that update query, Hibernate Envers does not pick up and audit those changes.
Following is the Spring Data JPA repository that implements QueryDSL
public class CollectionTransactionRepositoryImpl extends QueryDslRepositorySupport implements CollectionTransactionRepositoryCustom {
#Autowired
private JPAQueryFactory queryFactory;
public CollectionTransactionRepositoryImpl() {
super(CollectionTransaction.class);
}
#Override
public Collection<CollectionTransaction> updateCollectionTransaction(UpdateCollectionTransaction updateCollectionTransaction) {
QCollectionTransaction collectionTransaction = QCollectionTransaction.collectionTransaction;
UpdateClause<JPAUpdateClause> updateQueryBuilder = queryFactory.update(collectionTransaction);
.....//Code omitted for brevity
long updated = updateQueryBuilder.execute();
//.....
return ...
}
}
Is it possible for Hibernate Envers to pick up changes in this situation ?

This is a known concern outlined in JIRA HHH-10318.
Envers works based on Hibernate's event subsystem where Hibernate effectively notifies various callbacks that state for an entity has been modified in some way, and provides both the prior and new entity state. This state is precisely what Envers uses to determine what changed and insert audit change rows.
Lets take a trivial example:
UPDATE MyEntity e SET e.status = :status
Hibernate will perform the following tasks:
Flush the persistence context any any modifications.
Invalidate any cached instances of MyEntity.
Execute the bulk update operation.
No where in any of these steps does Hibernate load any existing state. It simply guarantees that current changes are flushed prior to the bulk update and that any subsequent operations will fetch from the datastore rather than a cache due to the bulk update.
Therefore from Envers perspective, it gets no callbacks and thus isn't aware that any operation took place because Hibernate ORM cannot provide any entity state for such an operation, it simply does not exist.
The big question here is how (if possible) to model and handle a change unit for such an operation.
It's difficult because Envers would effectively need some type of PreBulkOpEvent so that it can cache what it needs that is about to change and a PostBulkOpEvent to require and merge the two results to generate change log entries. The concern with such a concept really centers around how to do this effectively to avoid
Running out of memory due to large result-set manipulation.
Long execution time to load state from datastore for large result-set manipulation.
Anyway, you're welcomed to read over the JIRA and provide any feedback or ideas. But presently, its just something that falls outside the scope of what we can capture at the present moment.

In memory database, with hibernate and periodically persisting to an actual db

I would like to use an in memory db with hibernate, so my queries are super quick.
But moreover i would like to periodically persist that in memory state into a real mysql db.
Ofcourse the in memory database should load its initial content on startup from that mysql db.
Are there any good frameworks/practices for that purpose? (Im using spring) any tutorials or pointers will help.

I'll be honest with you, most decent databases can be considered in-memory to an extent given that they cache data and try not to hit the hard-disk as often as they can. In my experience the best in-memory databases are either caches, or alagamations of other data sources that are already persisted in some other form, and then are updated in a live fashion for time-critical information, or refreshed periodically for non-time-critical information.
Loading data from a cold start in to memory is potentially a lengthy process, but subsequent queries are going to be super-quick.
If you are trying to cache what's already persisted you can look at memcache, but in essence in memory databases always rely on a more persistent source, be it MySQL, SQLServer, Cassandra, MongoDB, you name it.
So it's a little unclear what you're trying to achieve, suffice to say it is possible to bring data in from persistent databases and have a massive in memory cache, but you need to design around how stale certain data can get, and how often you need to hit the real source for up-to-the-second results.

Actually the simplest would be to use some core Hibernate features for that, use the hibernate Session itself and combine it with the second level cache.
Declare the entities you want to cache as #Cacheable:
#Entity
#Cacheable
#Cache(usage = CacheConcurrencyStrategy.NON_STRICT_READ_WRITE)
public class SomeReferenceData { ... }
Then implement the periodically flushing like this, supposing you are using JPA:
open an EntityManager
load the entities you want to cache using that entity manager and no other
Keep the entity manager opened until the next periodic flush, Hibernate is keeping track what instances of SomeReferenceData where modified in-memory via it's dirty checking mechanism, but no modification queries are being issued.
Reads on the database are being prevented via the second level cache
When the moment comes to flush the session, just begin a transaction and commit immediately.
Hibernate will update modified entities in the database, update the second level cache and resume execution
eventually close the entity manager and replace it with a new one, if you want to reload from the database eveything
otherwise keep the same entity manager open
code example:
Try this code to see the overall idea:
public class PeriodicDBSynchronizeTest {
#Test
public void testSynch() {
// create the entity manager, and keep it
EntityManagerFactory factory = Persistence.createEntityManagerFactory("testModel");
EntityManager em = factory.createEntityManager();
// kept in memory due to #Cacheable
SomeReferenceData ref1 = em.find(SomeReferenceData.class, 1L);
SomeReferenceData ref2 = em.find(SomeReferenceData.class, 2L);
SomeReferenceData ref3 = em.find(SomeReferenceData.class, 3L);
....
// modification are tracked but not committed
ref1.setCode("005");
// these two lines will flush the modifications into the database
em.getTransaction().begin();
em.getTransaction().commit();
// continue using the ref data, and tracking modifications until the next request
...
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio