Configuration of Level 1 and Level 2 cache in JPA - caching

I have read the following pages and I have several doubts.
About the persistence context type for Level 1 cache
What is the difference between Transaction-scoped Persistence context and Extended Persistence context?
About the Level 2 cache
http://www.objectdb.com/java/jpa/persistence/cache
Now, my questions are:
In a normal situation what is the best PersistenceContextType to
choose for L1 cache, TRANSACTION or EXTENDED? I suppose the answer
is TRANSACTION as it is the default. However I would like to know when
should I use EXTENDED.
In a normal situation what are the best values to choose for the
following porperties of L2 cache?:
javax.persistence.sharedCache.mode (I suppose the answer is ALL as it is the default and caches all the entities)
javax.persistence.cache.retrieveMode (I suppose the answer is USE as it is the default and uses the cache on retrieval)
javax.persistence.cache.storeMode (I suppose the answer is USE as it is the default, however I still don't understand the difference with REFRESH which seems better for me)
Can someone explain how to correctly put these properties of L1 and L2 correctly and explain when to use some values or others?

NOTE: this answer is not yet complete, I will update with details WRT cache modes
When working with Java EE, the default persistence context (PC) setting is TRANSACTION. This is also the optimal mode for almost all tasks. Because of it's relatively short lifespan, it has the benefit of being low or zero maintenance.
I can think of primarily two reasons to prefer an extended EM over a transactional one:
communication with external systems or the UI. You can manipulate managed entities and save them with the least possible moving parts - no merging and even no explicit saving is necessary. See this example by Adam Bien.
mimicking a conversation scope - using a single transaction spanning multiple HTTP requests is not practical, so an extended PC can be used instead. Examples here and here
an application where data is rarely written, but read very frequently. If you have reason to believe that the data is not going to change, you can have the benefits of caching the entities for frequent reads instead of fetching them from DB each time.
There are some downsides to using an extended EM
if a transaction is rolled back, all managed entities are detached. Restoring the PC to a consistent usable state may be quite hard to accomplish.
when used without caution, an extended PC can get cluttered with entities you no longer need. A long-living cache can contain large amounts of stale data.
You may need a strategy for refreshing/refetching the managed entities and a strategy for evicting entities, classes or clearing the cache altogether. Failure to design appropriate strategies can result in bugs that hard to spot and harder to reproduce. Proper cache invalidation is not trivial
So if using an extended EM, use it for a single purpose, so you can reason about the contents of the cache more easily.
I am not sure about the appropriate storeMode and retrieveMode settings yet. As for the storeMode, I have some doubts about their exact function

Related

What's the advantage of Read-through, write-behind over cache-aside pattern in AppFabric?

In cache-aside as well as Read-through patterns, in both the patterns we need to write code to write to the database. So whats the real advantage of read-through,write-behind approach?Please clarify my doubt.
Yes you need to write code in both these patterns but there are a number of benefits for using read-through/write-behind approach.
E.g. in cache-aside pattern, your application is responsible for reading and writing from the database and also to keep cache synchronize with the database. This will make your application's code complex and also may cause code duplication if multiple applications are dealing with the same data. Read-through/write-behind on the other hand simplifies the application's logic.
Furthermore read-through may reduce database calls by blocking parallel calls for same object. As explained in this article by NCache
There are many situations where a cache-item expires and multiple parallel user threads end up hitting the database. Multiplying this with millions of cached-items and thousands of parallel user requests, the load on the database becomes noticeably higher.
Similarly write-behind(asynchronous) can improve your application's performance by speeding up the write operation,
In cache-aside, application updates the database directly synchronously. Whereas, a Write- behind lets your application quickly update the cache and return. Then it lets the cache update the database in the background.
See this article for further details on advantages of using read-through/write-behind over cache-aside. I hope this will help :)

Hibernate - how to return objects not tracked by session?

In Entity Framework there is an option called AutoDetectChangesEnabled which significantly improves performance when performing bulk operations.
Is there any equivalent in Hibernate, which could improve performance while selecting/inserting many records to the database?
Or maybe the question should be, is such really needed?
There are many options:
Session.setDefaultReadOnly() - looks like a direct equivalent of AutoDetectChangesEnabled. However, it only disables detection of changes, but keeps session cache enabled, because it's needed for other features. So, it only affects performance, but not memory consumption.
StatelessSession - has no session cache (doesn't keep references to entities at all), and lacks many features of regular Session because of that
Another common approach to this problem is to clear() the session periodically (say, after each 100 entities) during processing (or evict() individual entities manually). This approach combines advantages of previous options, because it keeps normal semantics of Session while discarding entities when they are no longer needed

Are there any good reasons to not have your application deal with any transactions?

Are there any good reasons why one would not have transaction management in their code?
The question came up when talking with a dba who gets very nervous when I bring up spring/hibernate. I mention that Spring can handle transactions, in use with Hibernate mapping tables to objects etc, and the issue comes up that the database(Oracle10g) already handles transaction management, so we should just use that. He even offered up the idea that we create a bunch of DB procedures to do inserts/updates so the database can handle things more efficiently, returning a 0/1 on whether the insert/update worked.
Are there any good reasons to not have your application deal with any transactions? Is my dba clueless? I'm thinking he is, but I'm not a great speaker when I'm unsure of the answer... which is why I'm out looking for the answer.
I think there is some misunderstanding here.
The point is that database doesn't manage transactions in the same sense as Spring/Hibernate.
Database "manages transactions" by providing transactional behaviour, and your application "manages transactions" by using that behaviour and defining transaction boundaries (in particular, with the help of Spring or Hibernate).
Since boundaries of transactions are defined by business logic, implementing an application without transaction management would require you to move all your business logic to the database side. Even if you implement simple insert/update operations as stored procedures, it won't be enough to free application from transaction management as long as application needs to define that several inserts/updates should be executed inside the same transaction.
I am not entirely sure if you mean that there will be a bunch of crud stored procedures (that do single inserts or updates), or if there will be stored procedures encompassing business logic (transaction scripts). If you mean the crud stored procedures, that is an entirely bad idea. (Actually even if you start with the crud approach you will end up with transaction scripts as business logic accretes, so it amounts to the same thing.) If you mean transaction scripts, that's an approach some places take. It is painful and there is no reuse, and you end up with a bunch of very complex stored procedures that are terribly hard to test. But DBAs like it because they know what's going on.
There is also an argument (applying to Transaction Scripts) that it's faster because there are less round trips, you have one call to the stored procedure that goes and does everything and returns a result, as opposed to your usual Spring/Hibernate application where you have multiple queries or updates and each statement is going over the network to the database (although Hibernate caches and reorders to try to minimize this). Minimizing network round-trips is probably the most valid reason for this approach, you have to weigh whether it is worth sacrificing flexibility for the reduced network traffic, or if it is a premature optimization.
Another argument made in favor of transaction scripts is that less competence is required to implement the system correctly. In particular Hibernate expertise is not required. You can hire a horde of code monkeys and have them bang out the code. All the hard stuff is removed from them and placed under the DBA's control.
So, to recap, here are the arguments for transaction scripts:
Less network traffic
Cheap developers
total DBA control (from your point of view, he will be a total bottleneck)
As mentioned above, there's no way to "use transactions" from the database standpoint without making your application aware of it at some level. Although, if you're using Spring, you can make this fairly painless by using <tx:annotation-driven> and applying the #Transactional annotations to the relevant methods in the service implementation classes.
That said, there are times when you should bypass transactions and write directly to the database. Specifically any time when speed is more important than guaranteed data integrity.

Is ORM (Linq, Hibernate...) really that useful?

I have been playing with some LINQ ORM (LINQ directly to SQL) and I have to admit I like its expressive powers . For small utility-like apps, It also works quite fast: dropping a SQL server on some surface and you're set to linq away.
For larger apps however, the DAL never was that big of an issue to me to setup, nor maintain, and more often than not, once it was set, all the programming was not happening there anyway...
My, honest - I am an ORM newbie - question : what is the big advantage of ORM over writing a decent DAL by hand?
(seems like a double, couldn't find it though)
UPDATE : OK its a double :-) I found it myself eventually :
ORM vs Handcoded Data Access Layer
Strong-typing
No need to write the DAL yourself => time savings
No need to write SQL code yourself =>
less error-prone
I've used Hibernate in the past to dynamically create quite complex queries. The logic involved to create the appropriate SQL would have been very time-consuming to implement, compared with the logic to build the appropriate Criteria. Additionally, Hibernate knew how to work with various different databases, so I didn't need to put any of that logic in our code. We had to test against different databases of course, and I needed to write an extension to handle "like" queries appropriately, but then it ran against SQL Server, Oracle and HSqldb (for testing) with no issues.
There's also the fact that it's more code you don't have to write, which is always a nice thing :) I can't say I've used LINQ to SQL in anything big, but where I've used it for a "quick and dirty" web-site (very small, rarely updated, little benefit from full layer abstraction) it was lovely.
I used JPA in a project, and at first I was extremely impressed. Gosh it saved me all that time writing SQL! Gradually, however, I became a bit disenchanted.
Difficulty defining tables without surrogate keys. Sometimes we need tables that don't have surrogate keys. Sometimes we want a multicolumn primary key. TopLink had difficulties with that.
Forced datastructure relationships. JPA uses annotations to describe the relationship between a field and the container or referencing class. While this may seem great at first site, what do you do when you reference the objects differently in the application? Say for example, you need just specific objects that reference specific records based on some specific criteria (and it needs to be high-performance with no unnecessary object allocation or record retrieval). The effort to modify Entity classes will almost always exceed the effort that would have existed had you never used JPA in the first place (assuming you are at all successful getting JPA to do what you want).
Caching. JPA defines the notion of caches for your objects. It must be remembered that the database has its own cache, typically optimized around minimizing disk reads. Now you're caching your data twice (ignoring the uncollected GC heap). How this can be an advantage is beyond me.
Data != Objects. For high-performance applications, the retrieval of data from the DB must be done very efficiently. Forcing object creation is not always a good thing. For example, sometimes you may want arrays of primitives. This is about 30 minutes of work for an experienced programmer working with straight JDBC.
Performance, debugging.
It is much more difficult to gauge the performance of an application with complex things going on in the (sub-optimal, autogenerated) caching subsystem, further straining project resources and budgets.
Most developers don't really understand the impedence mismatch problem that has always existed when mapping objects to tables. This fact ensures that JPA and friends will probably enjoy considerable (cough cough) success for the forseeable future.
Well, for me it is a lot about not having to reinvent/recreate the wheel each time I need to implement a new domain model. It is simply a lot more efficient to use for instance nHibernate (my ORM of choice) for creating, using and maintaining the data access layer.
You don't specify exactly how you build your DAL, but for me I used to spend quite some time doing the same stuff over and over again. I used to start with the database model and work my way up from there, creating stored procedures etc. Even if I sometimes used little tools to generate parts of the setup, it was a lot of repetitive coding.
Nowadays I start with the domain. I model it in UML, and for most of the time I'm able to generate everything from that model, including the database schema. It need a few tweaks here and there, but with my current setup I get 95% of the job with the data access done in no time at all. The time I save I can use to fine tune the parts that need tuning. I seldom need to write any SQL statements.
That's my two cents. :-)
Portability between different db vendors.
My, honest - i am an ORM newbie - question : what is the big advance of ORM over writing a decent DAL by hand?
Not all programmers are willing or even capable of writing "a decent DAL". Those who can't or get scared from the mere thought of it, find LINQ or any other ORM a blessing.
I personally use LINQ to manipulate collections in the code because of its expressiveness. It offers a very compact and transparent way to perform some common tasks on collections directly in code.
LINQ will stop being useful to you when you will want to create very specific and optimized queries by hand. Then you are likely to get a mixture of LINQ queries intermingled with custom stored procedures wired into it. Because of this considerations, I decided against LINQ to SQL in my current project (since I have a decent (imho) DAL layer). But I'm sure LINW will do just fine for simple sites like maybe your blog (or SO for that matter).
With LINQ/ORM there may also be a consideration of lagging for high traffic sites (since each incoming query will have to be compiled all over again). Though I have to admit I do not see any performance issues on SO.
You can also consider waiting for the Entity Framework v2. It should be more powerful than LINQ (and hopefully not that bad as v1 (according to some people)).
Transparent persistence - changes get saved (and cascaded) without you having to call Save(). At first glance this seems like a nightmare, but once you get used to working with it rather than against it, your domain code can be freed of persistence concerns almost completely. I don't know of any ORM other than Hibernate / NHibernate that does this, though there might be some...
The best way to answer the question is to understand exactly what libraries like Hibernate are actually accomplishing on your behalf. Most of the time abstractions exist for a reason, often to make certain problems less complex, or in the case Hibernate is almost a DSL for expression certain persistance concepts in a simple terse manner.
One can easily change the fetch strategy for collections by changing an annotation rather than writing up lots of code.
Hibernate and Linq are proven and tested by many, there is little chance you can achieve this quality without lots of work.
Hibernate addresses many features that would take you months and years to code.
Also, while the JPA documentation says that composite keys are supported, it can get very (very) tricky quickly. You can easily spend hours (days?) trying to get something quite simple working. If JPA really makes things simpler then developers should be freed from thinking too much about these details. It doesn't, and we are left with having to understand two levels of abstraction, the ORM (JPA) and JDBC. For my current project I'm using a very simple implementation that uses a package protected static get "constructor" that takes a ResultSet and returns an Object. This is about 4 lines of code per class, plus one line of code for each field. It's simple, high-performance, and quite effective, and I retain total control. If I need to access objects differently I can add another method that reads different fields (leaving the others null, for example). I don't require a spec that tells me "how ORMs must (!) be done". If I require caching for that class, I can implement it precisely as required.
I have used Linq, I found it very useful. I saves a lot of your time writing data access code. But for large applications you need more than DAL, for them you can easily extent classes created by it. Believe me, it really improves your productivity.

Caching and clearing in a proxy with relations between proxy calls

As part of a system I am working on we have put a layer of caching in a proxy which calls another system. The key to this cache is built up of the key value pairs which are used in the proxy call. So if the proxy is called with the same values the item will be retrieved from the cache rather than from the other service. This works and is fairly simple.
It gets more complicated when it comes to clearing the cache as it is not obvious which items to clear when an item is changed. if object A is contained in nodeset B and object A is changed, how do we know that nodeset B is stale.
We have got round the problem by having the service that we call return the nodesets to clear when objects are changed. However this breaks encapsulation and adds a layer of complexity in that we have to look in the responses to see what needs clearing.
Is there a better/standard way to deal with such situations.
Isn't thsi the sort of thing that could be (and should be) handled with the Observer pattern? Namely, B should listen to events that affect it's liveness, in this case the state of A.
A Map is a pretty natural abstraction for a cache and this is how Oracle Coherence and Terracotta do it. Coherence, with which I'm far more familiar, has mechanisms to listen to cache events either in general or for specific nodes. That's probably what you should emulate.
You also might want to look at the documentation for either of those even if its just as a guide or source of ideas.
You don't say what platform you're running in but perhaps we can suggest some alternatives to rolling your own, which is always going to be fraught with problems, particularly with something as complicated as a cache (make no mistake: caches are complicated).

Resources