How to change a Jackrabbit persistence manager on the fly? - derby

I used internal persistence manager based on derby DB, and filesystem repository.
Now it around 1.5 million files and 3 TB in repo, and around 6 million records in derby DB.
I think is too much for that DB, because I have extremely slowing down on performance last time.
so I want to change persistence manager to something like MySQL or Oracle.
What is the best way to export data from a Apache Jackrabbit derby DB and import to MySQL?
How can I do this in the easiest and fastest way?

How to migrate to a version of Jackrabbit or a new persistence manager is described at the Backup and Migration page.
In my experience, MySQL or Oracle are not actually faster, is Derby is embedded (in-process). MySQL and Oracle are remote, so for each request there is a network roundtrip.
Instead, what you could do is use a higher bundle cache size and/or a higher database cache size.

Related

Is Spring Batch H2 in-memory database production stable as a Job Repository?

I have written a spring batch solution which currently uses the embedded H2 in-memory database.
The read and write operations uses SOLR Cloud API calls.
Ideally, we dont want to introduce an proper relational database as a job repo database, for the read-write batch operation.
I read that H2 in-memory databases are best used for Dev and Test in spring batch.
Does anyone have experience of using this H2 database in spring batch on a proper live environment dealing with millions of records in the batch processing, but batch job will ran only once a day at most?
If H2 is not stable for prod, I might have to ditch spring batch OMG, or anyother alternatives?
Open to any ideas or references.
Thanks in advance.
H2 is a light-weight Java database, as you mentioned yourself, that it is ideal for dev testing !
When considering production, you might be missing on lot of features which a RDBMS , NoSQL databases provide!
For e.g. Replication, memory and performance optimizations etc.
If frequent reads and writes are concerned and you don't want RDBMS, you may choose MongoDB or Couchbase to manipulate records , they are fast too !
So considering Millions of records, I don't think H2 would be a good choice for production databases
A similar article might throw some light & help you decide !
Are there any reasons why h2 database shouldn't be used in production?

In memory or actual stored database

In Spring, I was switching over from MySQL to use MongoDB instead.
In MySQL, I can have an in-memory database (H2) and an actual locally stored database in MySQL. Is this not possible with MongoDB? If so, how? Is Spring Data MongoDB an in-memory one or locally stored?
yes it's possible, try to check this embedded one: https://github.com/flapdoodle-oss/de.flapdoodle.embed.mongo
example of usage: https://www.baeldung.com/spring-boot-embedded-mongodb
I used Fongo a few years ago:
https://github.com/fakemongo/fongo

Is it possible to change Apache Nifi H2 database to some other DB (like Mysql / postgress)

Is it possible to replace Apache Nifi H2 database to some other DB (like Postgres or MySql)
Locked at ApPache Nifi documentations and configurations but couldn't find any
It is not possible. The database is meant to be something you don't really need to know much about, it is just another data store on disk like all the other repositories (flow file, content, provenance), it just so happens to be backed by an embedded DB.

Migrating from Apache Cassandra 2.2 to Oracle Coherence Oracle 12

I am looking for a migration path for a Java-based project which uses Apache Cassandra 2.2 to Oracle Coherence 12 – and Oracle 12 backend.
The existing application uses CQL to interact with a 3 node Cassandra cluster.
Elswhere we specifically do not use any ORM (e.g. Hibernate/JPA) but use JDBC to interact with the database directly.
Yes, Cassandra is free while the Oracle solution is quite expensive but this is outside the scope of this question.
Any technical suggestions are welcomed.
You have a couple of options depending on your use case.
If you are using the SQL to interact with Cassandra for standard request/response interactions and need to migrate it to use Oracle DB which would require the least code changes and still use a standard approach would be to use an Object Relational Mapping (ORM) tool like Hibernate/JPA and use Coherence as the L2 cache (personally I like MyBatis since you have complete control over the SQL code. You may be able to use this Coherence integration with MyBatis ).
If you have other applications/ops users updating the database directly and need those changes to be available to your application then you will need to implement a CacheStore (use your favorite ORM here if you like) to save updates to the database and use Oracle Golden Gate Hotcache feature to push updates made to the database outside your application to Coherence. Your application will need to be changed to interact with Coherence directly using either their Map interface or using the Coherence Query Language (CQL) which is "SQL like". This approach will have an additional advantage of being able to support any asynchronous use cases you may have as Coherence API supports listening to cache changes (using MapListeners) similar to Cassandra's executeAsync.
I hope this helps.

Scaling and Clustering JPA

I am putting together a regular Java EE application on jboss7 that will use JPA in the data tier. I would like to make this application such that it scales up with load. While it is pretty clear how to scale up the web tier: create more machines and throw them behind a load balancer, scaling up the data tier is less so.
I can probably cluster my database (MySQL). Stil, that leaves the JPA layer unclustered. Ideally, JPA will scale up by using in (clustered) memory caching backed by MySQL.
When I look around, all information around JPA scaling seems to be 3-4 years old. People talk about ehcache, memcached and infinispan. I am not sure if this is still current.
Can someone tell me the state of the art in Java EE clustering and scaling, especially in the data tier.
Various caching strategies are still the way to scale JPA/Hibernate (you basically named the most popular options in your question). Nothing extraordinary happend since 4-5 years in this field, as far as I know. One more option you haven't mentioned is JBoss Cache. So the Second Level Cache for JPA/Hibernate still rules in this area.
Why no progress here? My wild guess is that first of all people, who need scalable application tend to ignore JPA and Hibernate in areas where high performance is needed. Usually people go with SQL dressed in Spring Framework JDBCTemplate helpers and transaction management. Then scalability is the matter of database capabilities in this area.
The other trend is to use No-SQL databases. There is plany of solutions: MongoDB, CouchoDB, Cassandra, Redis, to name a few. These are usually Google BigTable like key-value storages (this is oversimplification, but it is more or less the idea behind that approach) and they scale as hell, if you accept their limitations (relations are no longer managed easily, etc.).
There are many solutions, the two main categories of solutions are:
scaling the database
using a clustered cache to reduce database load
EclipseLink supports data partitioning for sharding data across a set of database instances,
see:
http://java-persistence-performance.blogspot.com/2011/05/data-partitioning-scaling-database.html
You can also use MySQL Cluster,
see:
http://www.mysql.com/products/cluster/
Oracle TopLink Grid provides EclipseLink JPA support for integration with Oracle Coherence as a distributed cache,
see:
http://www.oracle.com/technetwork/middleware/ias/tl-grid-097210.html
EclipseLink's cache supports clustering through cache coordination,
see:
http://wiki.eclipse.org/EclipseLink/Examples/JPA/CacheCoordination

Resources