Spring Batch/Data JPA application not persisting data to db - spring-boot

I'm having really weird issue. I need to say that my code working perfectly fine in my local but not persisting some datas in our pod (k8 environment).
I have different datasources to work with in this batch. Everything running fine. Job Repository is map-based and using ResourcelessTransactionManager for it. I configured it like this
#Configuration
#EnableBatchProcessing
public class BatchConfigurer extends DefaultBatchConfigurer {
#Override
public void setDataSource(DataSource dataSource){
}
}
I also use different platformtransactionmanager then spring batch (issue). So I set my spring allow bean overriding to true in my properties. The platform transaction manager in my configurer is right binded one, I debugged it.
I have custom writer for one of my step. Updating records in multiple tables which in multiple dbs (different datasources, in brief)
public class MyWriter implements ItemWriter<MyDTO> {
#Autowired
private MyFirstRepo myfirstRepo; //table in first datasource
#Autowired
private MySecondRepo mySecondRepo; //table in second datasource
#Override
public void write(List<? extends MyDTO> myDtoList) throws Exception {
//some logic
mySecondRepo.delete(deletableEntity)
//some logic
mySecondRepo.saveAll(updatableEntities)
//some logic
myfirstRepo.saveAll(updatableEntities)
}
}
Since I have multiple datasources, I defined multiple transaction managers, and to give transaction manager to my step I defined chained transaction manager that includes that managers.
#Bean
public Step myStep(#Qualifier("chainedTransactionManager") ChainedTransactionManager chainedTransactionManager) {
return getCommonStepBuilder("myStep")
.transactionManager(chainedTransactionManager)
.<MyDTO,MyDTO>chunk(200)
.reader(myPaginingReader())
.writer(myWriter())
.taskExecutor(myTaskExecutor())
.throttleLimit(15)
.build();
}
chained transaction manager config (both of these transaction manager is JpaTransactionManager):
#Configuration
public class TransactionManagerConfig {
#Primary
#Bean(name = "chainedTransactionManager")
public ChainedTransactionManager transactionManager(
#Qualifier("firstTransactionManager") PlatformTransactionManager firstTransactionManager,
#Qualifier("secondTransactionManager")PlatformTransactionManager secondTransactionManager) {
return new ChainedTransactionManager(firstTransactionManager,secondTransactionManager);
}
}
So my first two jpa operations in writer working just fine( operations that made over MySecondRepo) but the last operation is not persisting data to db. It doesn't throws any errors, job completing succesfully but it doesn't update my records in table.
I must mention second time, it does update in my local actually. Just not updating in our app that lives on k8 environment (dockerized microservice). Which is making it so confusing. Any idea why is it happening?
Edit: I created another writer bean for myfirstRepo.saveAll(updatableEntities) (as jdbc batch item writer, executing same logic) and add two of these writer to composite one. Now it's working as expected. But I have a lot of concerns now since I don't know what caused it. Any idea?
Edit 2: I came across with this thread. I was using JdbcPagingItemReader, does entities fetched with this component are in managed state? Entites inside mySecondRepo.delete(deletableEntity) and
mySecondRepo.saveAll(updatableEntities) are fetched inside writer by using hibernate but myfirstRepo.saveAll(updatableEntities) entities are the ones that came from reader.
It all makes sense if it is the case but even it is then why it was working fine in local?

mySecondRepo.saveAll(updatableEntities) are fetched inside writer by using hibernate but myfirstRepo.saveAll(updatableEntities) entities are the ones that came from reader.
Fetching items in the item writer is the cause of your issue. This is incorrect, it is not expected to read items in the item writer. That's why items coming from the reader are saved, but not the ones fetched in the writer.
What you should know is that all writers in the composite are running in the scope of a single transaction, driven by the transaction manager of the step. So if you are writing data to multiple datasources, you need to make sure the transaction manager is coordinating the transaction between all datasources. ChainedTransactionManager is deprecated, you can use a JtaTransactionManager for your case.

Related

Spring multithreading with hibernate

Some short version infos: Spring Boot 2.1 with Hibernate 5 and Java 8.
We try to do a multithreaded processing step, in which we use spring services to work with hibernate entities. Basically it looks like the following snippet.
ExecutorService executorService = Executors.newFixedThreadPool(4);
List<Callable<String>> executions = new ArrayList<>();
for (String partition : partitions) {
Callable<String> partitionExecution = () -> {
step.execute(partition);
return partition;
};
executions.add(partitionExecution);
}
executorService.invokeAll(executions);
Problem is that the hibernat session is somehow not available in the created threads. We get the following exception:
org.hibernate.LazyInitializationException:
failed to lazily initialize a collection of role: ..., could not initialize proxy - no Session
If I remove to multithreading part (i.e. remove the executor service) everyhting works fine.
We already tried the following:
Use a spring managed ThreadPoolTaskExecutor
Put #Transactional at the top of the method/class (which is wired from another class and invoked there, so should basically work)
Any hints/suggestions appreciated :)
I came up with a working version. Basically I let Spring spawn the threads itself using the Async annotation. This way the threads that are created get the wanted hibernate session attached.
I created a spring service that delegates through an async method.
#Component
public AsyncDelegate {
#Async
#Transactional
public Future delegate(Step step, String partition){
step.execute(partition);
return new AsyncResult(partition);
}
}
And adapted the initial code like this:
#Autowired
AsyncDelegate asyncDelegate;
List<Future> executions = new ArrayList<>();
for (String partition : partitions) {
executions.add(asyncDelegate.delegate(step, partition));
}
In short, I'd strongly advise against passing Hibernate managed objects inside multiple threads as this can cause:
Lazy initialization problems (as you encountered)
Missing updates
Locking exceptions
For more details, this blog post explains why: https://xebia.com/blog/hibernate-and-multi-threading/
For a solution, I'd find a way to split the work so that each thread would get list of entity id's to work with, and they'd do their independent work inside thread (fetch the entities from database, do the actual work etc.).
This is briefly mentioned in the above blog post.

Understanding Redis inside Spring Boot

I have a Spring Boot application, where I need to get data from a table when the app initializes.
I have a repository with the following code:
#Repository
public interface Bookepository extends JpaRepository<Book, Integer> {
Proveedor findByName(String name);
#Cacheable("books")
List<Proveedor> findAll();
}
Then from my service:
#Service
public class ServiceBooks {
public void findAll(){
booksRepo.findAll();
}
public void findByName(String name){
booksRepo.findByName(name);
}
}
And then I have a class that implements CommandLineRunner:
#Component
public class AppRunner implements CommandLineRunner {
private final BookRepository bookRepository;
public AppRunner(BookRepository bookRepository) {
this.bookRepository = bookRepository;
}
#Override
public void run(String... args) throws Exception {
bookRepository.findAll());
}
}
So here,when the application initializes, it queries to the Books table and caches the result. Inside the application each time I call find.all(), the cache is working, and I get the data from my cache.
So here are my 2 questions:
About Redis, I am not using Redis and I am doing database cache without any problem. So, where does Redis fit into this approach? I don't understand why everybody uses Redis when cache is working without needing other libraries.
When I call findByName(name), is there any chance to execute that query over the data I already have cached? I know I can have a cache on that method, but the cache will save data each time I search a particular name. If a name is searched for the first time, it will go to the database for that value. I don't want that, I would like that Spring performs the query using the data from the first cache where I have all Books.
The answers to your question
Redis avoids the DB call as it stores your response in Memory. You can use #cacheable even in controller or service. If you use #cacheable in controller, your request will not even execute the controller method, if it is already cached.
for FindByName, Redis provides a nice way to store the data based on keys.
Refer the link Cache Keys.
Once you request by Name, it will get the data from DB, the next time you request with same name, it will get from cache based on the key.
Coming back to your question, NO you should not do a search on your cached data, as caches are highly volatile, you cannot trust the data from cache. also searching through the cached data might affect the performance and you might need to write lines of unneeded additional code.
Spring boot manages the cache per application or per service. When you are using multiple instance of a service or app then certainly you'll want to manage the cache centrally. Because per service cache is not usable in this case because what one app caches in its own spring boot is logically not accessible by another apps.
So here Redis comes into picture. If you use Redis, then each instance of service will connect to the same Redis cache and get the same result.

Can Spring Data MongoDB be configured to support a different database for each repository?

I've been struggling for the past week to successfully integrate Spring Data MongoDB into our application. We use the fairly common practice of having separate databases for each collection that we rely on. For instance, TenantConfiguration database contains only the TenantConfigurations collection.
I've read through the documentation several times and trawled through the code for a solution but have turned up nothing. Surely such a widely adopted project has some solution for this issue? My current attempt looks like this:
#Configuration
#EnableMongoRepositories(basePackages = "com.whatever.service.repository",
basePackageClasses = TenantConfigurationRepository.class,
mongoTemplateRef = "tenantConfigurationTemplate")
public class TenantConfigurationRepositoryConfig {
#Value("${mongo.hosts}")
private List<String> mongoHosts;
#Bean
public MongoTemplate tenantConfigurationTemplate() throws Exception {
final List<ServerAddress> serverAddresses = new ArrayList<>();
for (String host : mongoHosts) {
serverAddresses.add(new ServerAddress(host, 27017));
}
final MongoClientOptions clientOptions = new MongoClientOptions.Builder()
.connectTimeout(25000)
.readPreference(ReadPreference.primaryPreferred())
.build();
final MongoClient client = new MongoClient(serverAddresses, clientOptions);
return new MongoTemplate(client, "TenantConfiguration");
}
}
Here is one of the other individual repository configurations:
#Configuration
#EnableMongoRepositories(basePackages = "com.whatever.service.repository",
basePackageClasses = RegisteredCardRepository.class,
mongoTemplateRef = "registeredCardTemplate")
public class RegisteredCardRepositoryConfig {
#Value("${mongo.hosts}")
private List<String> mongoHosts;
#Bean
public MongoTemplate registeredCardTemplate() throws Exception {
final List<ServerAddress> serverAddresses = new ArrayList<>();
for (String host : mongoHosts) {
serverAddresses.add(new ServerAddress(host, 27017));
}
final MongoClientOptions clientOptions = new MongoClientOptions.Builder()
.connectTimeout(25000)
.readPreference(ReadPreference.primaryPreferred())
.build();
final MongoClient client = new MongoClient(serverAddresses, clientOptions);
return new MongoTemplate(client, "RegisteredCard");
}
}
Now here is the actual repository definition for the RegisteredCard repository:
#Repository
public interface RegisteredCardRepository extends MongoRepository<RegisteredCard, Guid>,
QueryDslPredicateExecutor<RegisteredCard> { }
This all makes perfect sense to me, the individual configurations uniquely identify the specific repository interfaces they configure and the specific template bean to use with that repository via the mongoTemplateRef parameter of the annotation. At least, this is how the documentation seems to imply it should work.
In reality, when I start up the application, the RegisteredCard repository resolves to a MongoDB repository instance with an associated MongoDbFactory that is bound to the TenantConfiguration database. In fact, every single repository receives the same, incorrect MongoOperations object. Despite each repository having its own unique configuration, it appears that whatever database is accessed first remains the target database for every repository.
Are there any solutions available to this problem?
It's taken me almost a week, but I've actually found a passable solution to this issue. Here's a quick run-down of facts I've picked up while researching this issue:
#EnableMongoRepositories(basePackageClasses = Whatever.class) simply uses a qualified class name to indicate what package it should scan for all of your defined data models. This is entirely equivalent to doing #EnableMongoRepositories(basePackageClasses = "com.mypackage.whatevers") if Whatever.class resides in that package.
#EnableMongoRepositories is not repeatable but can be used to annotate several classes. This has been covered in other SO conversations but bears repeating here. You will need to define several repository configuration classes; one for each database you intend to interact with.
Each of your individual repository configurations must specify its own MongoTemplate instance in the #EnableMongoRepositories annotation. You can get away with providing only a single Mongo bean but the MongoTemplate relies on a specific MongoMappingContext.
The #EnableMongoRepositories annotation helps define your mapping context, this understands the structure of your data models and how to serialize them. It also understands the #Document and #Field annotations and does the heavy lifting of persisting your objects. The Mongo template instances are where your specify what database you want to interact with. So by providing the #EnableMongoRepositories annotation with both a basePackage attribute and a mongoTemplateRef attribute you can tell Spring Data Mongo to "take these models and persist them in this specific database".
The unfortunate requirement for this solution is that you must organize your data models into separate packages depending on what database they belong in. If, like me, you are using a Mongo database structure that allocates a single collection to each database (this is fairly common for heavily accessed collections), this means that each of your data models must reside in its own package. Each of these packages must be pointed to by an #EnableMongoRepositories annotation also containing a mongoTemplateRef attribute to a unique MongoTemplate bean.
I hope this helps someone avoid the trouble I've gone through trying to accomplish what should be a fairly run-of-the-mill Mongo integration.
PS: Abandon all hope, those who seek to combine auditing with this configuration.
I know this is old but for those who are looking for a short solution like me:
#Autowired
#Qualifier("registeredCardTemplate")
private MongoTemplate template;
Qualifier name is your "mongoTemplateRef={XXX}"

Accessing Spring #Transactional service from multiple threads

I would like to know if the following is considered safe.
Usual Spring service class that accesses a bunch of DAOS / hibernate entities:
#Transactional
public class MyService {
...
public SomeObject readStuffFromDB(String key) {
...
//return some records from the DB via hibernate entity etc
}
A class in the application that has the service wired in:
public class ServiceHolder {
private MyService myService;
private SomeOtherObject multiThreadedMethod() {
...
//calls myService.readStuffFromDB() and uses the results
//to return something useful
}
multiThreadedMethod will be called from multiple threadpool threads. I would like to know if the multiThreadedMethod is safe in its calls to myService.
It is NOT making any modifications to the DB - only reading.
What happens if two threads call myService.readStuffFromDB() at exactly the same time? Will a concurrent modification exception be thrown from somewhere?
I've been running it with no issues but I'm not 100% sure it will always work.
Yes you will call the same object in the same time as long as your service bean is defined as singleton (which is default and proper), but you should not rely on local variables in you services. So the methods should be written that way they can work independently (you don't need a mutual exclusion here). If you called db and tried do any operations nothing would happen because every thread would receive a new instance of entity manager. If you modified db in the same time and any type of db exception was thrown you would get a rollback exception which is perfectly fine.
entityManager.persist() will do more or less entityManager.getEntityManagerAssignedToCurrentThread().persist()
It is a proxy not real object. So you are safe :)

In Spring 3.2, should we use #Transactional annotation for db activities?

I use spring 3.2 and Hibernate 4 in my project. When i query table i get a "No Session found for current thread" message. I try to use #Transactional annotation(it get success) but i don't want to put #Transactional to every service implementation.
Is there an another way?
In other words "How can i do a simple "insert" operation without using #Transaction?"
Thx...
You should not have #Transactional on you DAO methods, in fact you should never be accessing your DAO methods directly, you should be using an #Service. A service will use zero or more DAO classes to perform operations, only after all operations are completed will the transaction be committed.
#Repository
public class CustomerDao() {
// dao methods here, they are not transactional but will be run within a sevice transaction
}
#Service
#Transactional
public class CustomerService() {
private final CustomerDao customerDao;
#Autowired
public CustomerService(CustomerDao customerDao) {
this.customerDao = customerDao;
}
//service methods here (they are all transactional because we have annotated the class)
}
#Transactional is used for making a java code call in transaction so that in case any exception occurred during the process then all database changes will be rolled back. In ideal scenario every service which you think should be independent should have #Transactional annotation. Hibernate also want each database calls in transaction thats why they have implemented in such a way that Transaction will be required for each database query to be successful. I am not sure why you wanted your service to be out of transaction still they would like to fire database calls.

Resources