Infinispan JPA Cache loader? - caching

How do I implement Infinispan JPA cache loader?is there any pattern or way to implement it in infinispan API?

Most existing CacheLoader implementations in Infinispan are assuming the data just needs storage and consider it blindly as an array of bytes. The integration API in Infinispan doesn't expose much of a context other than "store(Key,Value)" or "load(Key)". I'm oversimplifying a bit, but that's the core.
There is one exception which is the LuceneCacheLoader. This was designed to work exclusively in combination with the Lucene Directory for Infinispan, as it takes advantage of the fact
It knows which types to expect
Takes advantage of the known needs of the Directory (such as access pattern)
Have a look at the sources to get inspired; note I only implemented loading (it's a CacheLoader).
If you control both the application using Infinispan and the CacheLoader, you could take advantage of these details as well.
Tricky aspects:
While writing multiple keys even in the same transaction, you'll have access to one entry at a time in the scope of the CacheLoader logic -> hard to map relations: have to deal with one entity at a time and "restore connections"
With write behind you might receive entries out of order -> not sure how to deal with referential integrity
With write behind you're not going to have the same Transactional context -> might be acceptable?
Taking these into account, I'm sure you could write one. How easy? That depends on your app.
I'm not sure if a general purpose solution could work. If you find out it can, please contribute it as it would be a great addition to the project.

Related

Handling dictionary values stored in DB - Spring

I am developing some SPA with a backend written in Java (Spring Boot). In relational DB that backend connects to, there is a table with some dictionary values. Values can edited by users of the app, but it's done really, really rarely (almost never).
Those dictionary values are used in a lot of pages on UI and because of that I would like to "cache" them in a way. What I want to achieve is that I want to load dictionary values on startup to avoid asking DB for values during every request between UI and Backend.
Firstly, I thought about just loading it on the UI part of the app, when user enters the page for the first time. Then I ruled it out, since when one of the users changes the values, it should be reloaded.
What I think might work is just loading them on startup of Backend into some collection (that can be safely used in concurent environment, probably ConcurrentMap) and then during some GET requests asking that collection for the values (instead of DB). When the values are changed, that request just updates the DB table and reloads them into collection.
Then I thought that the collection solution won't be enough, when my backend would be scaled up to more than one instance. In that case, only one of instances will be updated and the second one will provide outdated data. We can avoid it and force refreshes i.e. every 15 minutes (instead of on demand during values update).
But what I think is the best solution is to start some redis service on a side, load dictionary values into it and after every DB update of the values just update the redis instance with the new ones. Every instance of backend would use the same instance of redis, which seems quicker than executing query (select * from _ where _ = _) on DB.
What do you think? Is my thought process is correct? Do you have any ideas that can help solve my issue?
If you are using Spring you could check out Spring Cache Abstraction. That way your cache will be up-to-date whenever some change occurs.
Out of the box few implementations are supported by Spring:
Spring provides a few implementations of that abstraction: JDK java.util.concurrent.ConcurrentMap based caches, Ehcache 2.x, Gemfire cache, Caffeine, and JSR-107 compliant caches (such as Ehcache 3.x). See Plugging-in Different Back-end Caches for more information on plugging in other cache stores and providers.
If you decide to use Memcached implementation you can check out this library (uses Xmemcached under the hood) here.
You could also check a small demo app of how to use Spring Cache Abstraction in your project (link).
I think your in the right path with your approach in terms of 'caching'. I suggest you also check Memcached for it simplicity. Redis is a good choice but still it depends on your requirements and if you need that much feature. just my 2cent
https://aws.amazon.com/elasticache/redis-vs-memcached/
https://devcenter.heroku.com/articles/spring-boot-memcache#add-caching-to-spring-boot
Thanks,

Infinispan: How many DefaultCacheManager Instances?

In my web application project i have to build 2 kind on caching mechanism.
The first one is strictly related to the session. So i have implemented a job made pattern by which i can clean the infinispan cache when the user session is ended.
Distributed session cache makes use of 1 single DefaultCacheManager stored inside my application server JNDI. So every time it needs to write or read from the cache, it lookup for it ad makes the CRUD operations.
The second one is a normal infinispan distributed cache with its expiration policy and i'm going to implement it.
My question is:
is it correct to use the same DefaultCacheManager bound with JNDI? or is it better to create new one?
On infinispan guide i read it's a really heavy object and it's suggested to create just one.
Thanks.
Yes, I agree with #Jakub. The only reason why you might want to have separate CacheManagers is when you need them to operate on separate clusters, which is not your case.

Using org.springframework.cache.support.SimpleCacheManager in the cloud

I noticed that Spring reference application (Sagan) uses the SimpleCacheManager implementation. See here for source code of Sagan.
I was surprised by this choice because I thought that all but small applications running on a single node would use something like a Redis cache manager and not the simple cache manager.
How can a large application like Sagan -which I assume runs on cloudfoundry- use this simple implementation?
Any comment welcome.
Well, the SimpleCacheManager choice has been made because it was the simplest solution that could possibly work. Note that Sagan is, at least for now, not storing a lot of data in that cache and merely using it to respect various APIs rate-limiting and get better performance on some parts of the application.
Yes, Sagan is running on CloudFoundry (see this presentation) and is using CF marketplace services.
Even if cache consistency between instances is not a constraint for now, we could definitely add another marketplace service, here a Redis Cloud instance, and use this as a central cache repository.
Now that we're considering using that cache for more features, it even makes sense to at least consider that use case, since it could lower our monthly bill (pay a small fee for a redis service and use less memory for our CF instances).
In any case, thanks a lot balteo for this insightful question, we've created a Github issue for that.

Using interception to implement caching - how to define keys?

TL;DR Can someone point me to a through implementation of a caching system that is added to the solution through interception?
I'm refactoring one of my solutions so that cross-cutting concerns are implemented through Unity Intercept. I've read the guides from MSFT, and now I think I can very easily implement the interception behaviors.
However, I was wondering about caching; I want to consistently use the cache regions and keys throughout the solution. Furthermore, I have key-specif configurations for expiration on my caching system.
On one example in the Unity's Developer Guide, it checks the method name -- this is a bad approach since it would mean altering the implementation everytime a new class/method must use cache (obviously).
I'm having this (mad) idea of implementing a configurable Interceptor that learns how to compose the region and key from the given parameters, and is configurable for each class(type)/method. However this would push a lot of responsibility to configuration; I don't like the feeling that I'm programming in the *.config file.
As you can see, I'm a tad bit lost on how to go about this. I don't like singletons and right now the caching system is a singleton, accessed everywhere by the solution. Can someone link me to a good documentation on how I should proceed about this? Is it possible to add cache and have proper keys/regions defined on the cache?
Quick search on the similar matter lead me to the "Attribute Based Cache using Unity Interception" project on CodePlex. Entire project looks to be abandoned in some Alpha stage, however, it should provide you with the baseline to start with.

hazelcast vs ehcache

Question is clear as you see in the title, it would be appreciated to hear your ideas about adv./disadv. differences between them.
UPDATE:
I have decided to use Hazelcast because of the advantages like distributed caching/locking mechanism as well as the extremely easy configuration while adapting it to your application.
We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast. When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.
Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.
Our caching layer is on top of hazelcast for more than a year, we are quite pleased with it.
Even though Ehcache has been popular among Java systems, I find it less flexible than other caching solutions. I played around with Hazelcast and yes it did the job, it was easy to get running etc and it is newer than Ehcache. I can say that Ehcache has much more features than Hazelcast, is more mature, and has big support behind it.
There are several other good cache solutions as well, with all different properties and solutions such as good old Memcache, Membase (now CouchBase), Redis, AppFabric, even several NoSQL solutions which provides key value stores with or without persistence. They all have different characteristics in the sense they implement CAP theorem, or BASE theorem along with transactions.
You should care more about, which one have the functionality you want in your application, again, you should consider CAP theorem or BASE theorem for your application.
This test was done very recently with Cassandra on the cloud by Netflix. They reached to million writes per second with about 300 instances. Cassandra is not a memory cache but you data model is like a cache, which is consist of key value pairs. You can as well use Cassandra as a distributed memory cache.
Hazelcast has been a nightmare to scale and stability is still a major issue.
The dedicated client to grid component choices are
The messy version that cant survive node loss anywhere, negating the point of backups (superclient), or
An incredibly slow native client option that does not allow for any type of load balancing to processing nodes in the grid.
If any host could request records from this data grid it would be a sweet design, but you are stuck with those two lackluster option to get anything out of it.
Also multiple issues with database thread pools locking up on individual members and not writing anything to the databases, causing permanent records loss is a frequent issue and we often have to take the whole thing down for hours to refresh any of the JVM's. Split brain is also still an issue, although in 1.9.6 it seems to have calmed down a little.
Rallying to move to Ehcache and improving the database layer instead of using this as a band-aid.
Hazelcast serializes everything whenever there is a node (standard-one), so the data you will save to Hazelcast must implement serialization.
http://open.bekk.no/efficient-java-serialization/
Hazelcast has been a nightmare for me. I was able to get it "working" in a clustered Websphere environment. I use the term "working" loosely. First, all of Hazelcast's documentation is out of date and only shows examples using deprecated method calls. Trying to use the new code without comments in the Javadocs and no examples in the documentation is very hard. Also, the J2EE container code simply does not work at this point because it does not support XA transactions in Websphere. An error is thrown calling code that follows their only J2EE example explicitly(it does look like Milestone 3.0 is addressing this). I had to forget about joining Hazelcast to a J2EE transaction. It does seem Hazelcast is definitely geared to a non EJB/Non-J2EE container environment. Making calls to Hazelcast.getAllInstances() fails to retain any information about Hazelcast's state when switching from one enterprise java bean to another. That forces me to create a new Hazelcast instance just to run calls that give me access to my data. That causes many Hazelcast Instances to start up on the same JVM. Also,retrieving data from Hazelcast is not fast. I tried retrieving data using both the Native Client and directly as a member of the cluster. I stored 51 lists, each containing only 625 objects in Hazelcast. I could not perform a query directly on a list and did not want to store a map just to get access to that feature (SQL operations can be performed on a map). It took about a half second to retrieve each list of 625 objects because Hazelcast Serializes the entire list and sends it over the wire rather than just giving me the delta (what has changed). Another thing, I had to switch to a TCPIP configuration and explicitly list the ip addresses of the servers I wanted to be in the cluster. The default Multicast configuration did not work and from the group discussions in google, other people are experiencing that difficulty as well. To sum up; I did eventually get 8 machines communicating in a cluster through many hours of torturous programmatic configuration and trial and error (the documentation will be little help) but when I did, I still had no control over the number of instances and partitions being created on each JVM due to the half finished nature of Hazelcast for EJB/J2EE and it was VERY SLOW. I implemented a real use case in the unemployment insurance application I work on and the code was much faster making direct calls to the database. It would have been cool if Hazelcast worked as advertised because I really did not want to use a separate service to implement what I am trying to do. I have used MongoDB extensively so I may skip the whole in memory cache and just serialize my objects as documents in a separate repository.
One advantage of Ehcache is that it is backed by a company (Terracotta) that does extensive performance, failover, and platform testing in a large performance lab. Terracotta provides support, indemnity, etc. For many companies, that sort of thing is important.
I have not used Hazelcast but I've heard that it is easy to use and that it works. I haven't heard anything with respect to scalability or performance of Hazelcast vs Terracotta/Ehcache but given the amount of scalability and failover testing that Terracotta does, it's hard for me to imagine that Hazelcast would be competitive in a production deployment. But I presume it would work fine for smaller uses.
[Bias: I'm a former employee of Terracotta.]
Developers describe Ehcache as "Java's Most Widely-Used Cache". Ehcache is an open-source, standards-based cache for boosting performance, offloading your database, and simplifying scalability. It's the most widely-used Java-based cache because it's robust, proven, and full-featured. Ehcache scales from in-process, with one or more nodes, all the way to mixed in-process/out-of-process configurations with terabyte-sized caches. On the other hand, Hazelcast is detailed as "Clustering and highly scalable data distribution platform for Java". With its various distributed data structures, distributed caching capabilities, elastic nature, memcache support, integration with Spring and Hibernate and more importantly with so many happy users, Hazelcast is feature-rich, enterprise-ready and developer-friendly in-memory data grid solution.
Ehcache and Hazelcast are primarily classified as "Cache" and "In-Memory Databases" tools respectively.

Resources