How to turn off JPA for SpringBatch under SpringBoot - spring-boot

We have a Spring Boot application that uses Spring Integration and Spring Batch. We drop a file in the poller and it processes. This process inserts records into a database and then reads them back out does some processing and writes a file. Let's say there are 10 records. The first time we get 10 records read and 10 written. Without stopping the server, we delete all the records through a SQL client on the database, run the same file again and we get 10 records read with 20 written. I believe there is some JPA or caching going on with the datasource. We've tried turning off several auto configuration options for JPA and caching but we haven't found the right configuration option to turn off caching.
Adding a bit more detail to the question.
Basically we have cron scheduler that has a FileHandler. This the handleFile methods we have the following.
public File handleFile(File file) throws Throwable {
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
Job job = (Job) appContext.getBean("processInitialFileJob");
JobExecution jb = jobLauncher.run(job, jobParametersBuilder.toJobParameters());
....
}
What can we do to the code above to ensure that it has a new JPA session or not use the JPA session at all? This job needs to read from the database each time and not a cached representation of the database.

Are u using Hibernate. Hibernate First Level cache may be creating the problem for u. Hibernate manages a First Level cache which is local to your Session. So once u create a session and do any transactions in that hibernate syncs that within. But when u do any changes to the table outside hibernate then hibernate wont sync that until flush is called on the session and session is closed.
To make sure this is not happening, inside your poller logic try creating new Session(or EntityManager in case of JPA) and close the session for every read/process/write cycle.
Also make sure this hibernate.current_session_context_class is not set to Thread. Since thread can be reused by the poller so the same Hibernate Session may be injected again.

This ended up not being an issue with Hibernate or JPA, but an issue of a StringBuilder holding on to data from previous runs. I believe this will need to be setup as #JobScope so that it is not reused across different executions of the job.

Related

WebSphere insert/update statements with SQL-SERVER hangs with REQUIRES_NEW propagation

We are facing an issue in our spring batch application when we are deploying the application on WebSphere.
Example: One class contains parent() method and Second class contains child() method, where child method requires a new transaction. After execution of the methods when transaction is committed the commit routine hangs and nothing happens further.
#Transactional //using current transaction
public void parent(){
child();
}
#Transactional(propagation=REQUIRES_NEW) //creates new transaction
public void child(){
//Database save statements including update, insert and deletes
}
This issue only persists in WebSphere and code works fine on our local machine where we are using tomcat as web container.
WebSphere logs/stacktrace shows that the WebSphere prepared statement keeps on waiting for the response from the database. At the same time update and inserts are locked out on the affected tables i.e. if we run an insert or update query manually on the affected table the query doesn't execute.
We are using Spring JPA for data persistence and Spring’s JpaTransactionManager for transaction management and MSSQLServer database.
Is it that WebSphere does not support creating new transaction from existing transaction?
Yes, the pattern you are describing is supported by WebSphere Application Server. Given that this involved locked entries within the database, you might be running into a difference between the application servers in which transaction isolation level is used by default. In WebSphere Application Server, you get a default of java.sql.Connection.TRANSACTION_REPEATABLE_READ for SQL Server, whereas I think in most other cases you end up with a default of java.sql.Connection.TRANSACTION_READ_COMMITTED (less locking). If the default value is the problem, you can change it on the data source configuration.
If you are using WebSphere Application Server Liberty, then the default isolation level can be configured in server.xml as a property of the dataSource element, like this,
<dataSource isolationLevel="TRANSACTION_READ_COMMITTED" jndiName=...
If you are using WebSphere Application Server traditional, then the default isolation level can be configured as the webSphereDefaultIsolationLevel custom property, which can be set to the numeric value of the isolation level constant on java.sql.Connection (value for TRANSACTION_READ_COMMITTED is 2).
See this linked article for the steps of doing so via the admin console.

Retrieve DataStax Session from CassandraOperations

Spring Boot Data Cassandra has removed the ability to retrieve a com.datastax.driver.core.Session from org.springframework.data.cassandra.core.CassandraOperations. I'm trying to rectify old code that has these usages. Is there a simply way to retrieving the cassandra session? I'm looking for a way to create a prepared statement from an Insert, with only access to an instance of CassandraOperations, e.g.
cassandraOperations.getSession().prepare(insert);
We've removed getSession() from CassandraOperations because of two reasons:
Interface split into CassandraOperations and CqlOperations. CassandraTemplate (which implements CassandraOperations) now uses CqlOperations as lower-level API.
We introduced SessionFactory to be able to route CQL calls into various Cassandra Sessions. CQL execution obtains a session from the configured SessionFactory. A session is considered valid during the execute call as the next command could be executed on a different session.
You can still obtain a Session. Either call:
CqlTemplate cqlTemplate = (CqlTemplate) cassandraTemplate.getCqlOperations();
cqlTemplate.getSession();
or obtain Session through Spring's context (autowiring, lookup via context.getBean(Session.class), …).

The Hibernate session (EntityManager) scope in Spring Batch?

As I’m new to Spring and Spring Batch, I have a general question about Spring Batch and JPA using Hibernate as provider.
Please, I want to know when the Hibernate session (wrapped by the EntityManager) is flushed? Between Reader, Processor and Writer? or for each commit interval? We can control it or not?
Please, I want to know when the Hibernate session (wrapped by the EntityManager) is flushed? Between Reader, Processor and Writer? or for each commit interval?
The session is flushed after writing a chunk of items, at each commit interval. For more details, take a look at:
HibernateItemWriter: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/HibernateItemWriter.java#L95
JpaItemWriter: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/JpaItemWriter.java#L84
We can control it or not?
If you use the HibernateItemWriter, you can set the clearSession flag to clear the session after each chunk.
To the best of my knowledge when the Spring transaction is committed which would be after each chunk.

Multiple Embedded HSQLDB databases in jUnit errors during build

I'm working on a new Spring Batch (3.0.3.RELEASE) application where there will be multiple databases accessed during the jobs. For testing we are using HSQLDB (2.3.2) as the embedded database.
In my Application context I have the following.
<jdbc:embedded-database id="dataSource">
</jdbc:embedded-database>
<jdbc:embedded-database id="proDataSource">
<jdbc:script location="classpath:script-tables.sql" />
<jdbc:script location="classpath:script-constraints.sql" />
</jdbc:embedded-database>
<jdbc:embedded-database id="altDataSource">
<jdbc:script location="classpath:script-alt-tables.sql" />
</jdbc:embedded-database>
When I run a single test in Eclipse, things are fine. When I build from the command line, after the first test, I get errors
Failed to execute SQL script statement at line 3 of resource class path resource [script-promrkt-promo.sql]
object name already exists: PROMRKT
It appears to me that the population process in EmbeddedDatabaseFactory is receiving an already populated database. From what I can tell is that after each test there is not a SHUTDOWN being executed and HSQLDB is leaving the already populated database in memory.
I have re-reviewed the documentation and in a Spring Doc this does show a explicit shutdown command. But if spring starts up the embedded database when my test starts why doesn't it shut it down when the test completes ?
Is it expected the embedded databases will remain after each unit test for the same application context?
What is the order that spring starts up an embedded database and when is the transactional context initialized?
Do I need to use a database cleaner ?
Can the populate be updated to only populate when the database is first started, and rollback to the original script configuration when my test is complete ( kinda like how the AbstractTransactionalSpringContextTests worked )
Do I need some transactional markers? Spring Batch's JobRepo is properly being populated and destroyed between each test. Why are my custom dataSources not ?
The script the log message is complaining about isn't in your configuration. I presume it's being executed somewhere else? If that's the case, you'll probably need to add #DirtiesContext to your tests so that Spring doesn't cache the context (I'm assuming you're using the SpringJunit4Runner with #ContextConfiguration but can't be sure since your actual test isn't in the question).
If my assumption is correct, Spring caches the context in an effort to improve performance over the running of a unit test suite. If your test modifies the context in a way that can impact other tests (like running scripts in one test that need to be run again in others), you mark the tests with #DirtiesContext and Spring won't cache the context. You can use the annotation at either the method or class level. You can read more about the annotation here: http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/test/annotation/DirtiesContext.html
I spent a lot of time looking at this and reading the Spring Framework documentation (gasp!) and tracing through code. There are some interesting changes in 4.1 spring core, especially the testing.
I found out that ApplicationContext(s) are cached at the JVM level now. If a second test asks for a context the TestContext looks first in it's cache to see if some other test has already asked for the identical configuration.
I have some profiles for some of my tests. A test with a different profile but the same #ContextConfiguration causes the that context to be re-loaded with the profile applied. When the "Bean Loader" arrives at creating the embedded databases, the EmbeddedDatabaseFactory does not take into consideration that the embedded database (in memory HSQLDB) may have already been created or cached from previous tests and does not need to be re-initialized.
Therefore I added some logic to the EmbeddedDatabaseFactory.initDatabase() checking if the database already exists before re-initializing & running the DatabasePopulator.
List existingDataBases = org.hsqldb.DatabaseManager.getDatabaseURIs();
boolean isExisting = false;
String localDBName = StringUtils.lowerCase(this.databaseName);
for (Object object : existingDataBases) {
if (object.toString().contains(localDBName)) {
isExisting = true;
break;
}
}
// Now populate the database
if (!isExisting && this.databasePopulator != null) {
( of course this isn't quite kosher for what spring would need but it gets the point across )
In my opinion it looks like an issue partially with the EmbeddedDatabaseFactory and the TestContext caching mechanism. My "jdbc:embedded-database" definitions do not have any profiles associated with them. Why does the cache need to re-create them and not load them out of the existing cached beans?
You can try to force creation of new embedded database by setting unique name with generateUniqueName(true) each time new object is created.
Here is an example:
embeddedDatabase = new EmbeddedDatabaseBuilder()
.setType(EmbeddedDatabaseType.H2)
.generateUniqueName(true)
.addScripts("db/sql/create-db.sql", "db/sql/insert-data.sql")
.build();

XA transactions and message bus

In our new project we would like to achieve transactions that involve jpa (mysql) and a message bus (rabbitmq)
We started building our infrastructure with spring data using mysql and rabbitmq (via spring amqp module). Since rabbitMq is not XA-transactional we configured the neo4j chainedTransactionManager as our main transactionManager. This manager takes as argument the jpa txManager and the rabbitTransactionManager.
Now, I do get the ability to annotate a service with #Transacitonal and use both the jpa and rabbit inside it. If I throw an exception within the service then none of the actions actually occur.
Here are my questions:
Is this configuration really gives me an atomic transaction?
I've heard that the chained tx manager is not using a 2 phase commit but a "best effort", is this best effort less reliable? if so how?
What the ChainedTransactionManager does is basically start and commit transactions in reverse order. So if you have a JpaTransactionManager and a RabbitTransactionManager and configured it like so.
#Bean
public PlatformTransactionManager transactionManager() {
return new ChainedTransactionManager(rabbitTransactionManager(), jpaTransactionManager());
}
Now if tha JPA commit succeeds but your commit to rabbitMQ fails your database changes will still be persisted as those are already committed.
To answer your first question it doesn't give you a real atomic transaction, everything that has been committed prior to the occurence of the Exception (on committing) will remain committed.
See http://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/transaction/ChainedTransactionManager.html

Resources