JPA2 Entities Caching - caching

As it stands I am using a JSF request scoped bean to do all my CRUD operations. As I'm sure you most likely know Tomcat doesn't provide container managed persistence so in my CRUD request bean I am using EnityManagerFactory to get fold of enity manager. Now about the validity of my choice to use request scoped bean for this task, it's probably open for a discussion (again) but I've been trying to put it in the context of what I've read in the articles you gave me links to, specifically the first and second one. From what I gather EclipseLink uses Level 2 cache by default which stored cached entity. On ExlipseLink Examples - JPA Caching website it says that:
The shared cache exists for the duration of the persistence unit ( EntityManagerFactory, or server)
Now doesn't that make my cached entities live for a fraction of time during the call that is being made to the CRUD request bean because the moment the bean is destroyed and with it EntityManagerFactory then so is the cache. Also the last part of the above sentence "EntityManagerFactory, or server" gets me confused .. what precisely is meant by or server in this context and how does one control it. If I use the #Cache annotation and set appropriate amount of expire attribute, will that do the job and keep the entities stored on the servers L2 cache than, regardless of whether my EntityManagerFactory has been destroyed ?
I understand there is a lot of consideration to do and each application has specific requirements . From my point of view configuring L2 cache is probably the most desirable (if not only, on Tomcat) option to get things optimized. Quoting from your first link:
The advantages of L2 caching are:
avoids database access for already loaded entities
faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
memory consumption for large amount of objects
stale data for updated objects
concurrency for write (optimistic lock exception, or pessimistic lock)
bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
read often
modified infrequently
not critical if stale
Almost all of the above points apply to my app. At the heart of it, amongst other things, is constant and relentless reading of entities and displaying them on the website (the app will serve as a portal for listing properties). There's also a small shopping cart being build in the application but the products sold are not tangible items that come as stock but services. In this case stale entities are no problem and also, so I think, isn't concurrency as the products (here services) will never be written to. So the entities will be read often, and they will be modified infrequently (and those modified are not part of the cart anyway, an even those are modified rarely) and therefore not critical if stale. Finally the first two points seem to be exactly what I need, namely avoidance of database access to already loaded entities and fast reading of frequently accessed unmodified enteties. But there is one point in disadvantages which still concerns me a bit: memory consumption for large amount of objects. Isn't it similar to my original problem?
My current understanding is that there are two options, only one of which applies to my situation:
To be able to delegate the job of longer term caching to the persistence layer than I need to have access to PersistenceContext and create a session scoped bean and set PersistenceContextType.EXTENDED. (this options doesn't apply to me, no access to PersistenceContext).
Configure the L2 #Cache annotation on entities, or like in option 1 above create a session scoped bean that will handle long term caching. But aren't these just going back to my original problem?
I'd really like to hear you opinion and see what do you think could be a reasonable way to approach this, or perhaps how you have been approaching it in your previous projects. Oh, and one more thing, just to confirm.. when annotating an entity with #Cache all linked entities will be cached along so I don't have to annotate all of them?
Again all the comments and pointers much appreciated.
Thanks for you r answer .. when you say
"In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server."
Does it mean I could for example declare and initialize static EntityManagerFactory field in an application scoped been to be later used by all the beans throughout the life of the application ?

EclipseLink uses a shared cache by default. This is shared for all EntityManagers accessed from an EntityManagerFactory. You do not need to do anything to enable caching.
In general, you do not want to be creating a new EntityManagerFactory per request, only a new EntityManager. Creating a new EntityManagerFactory is quite expensive, so not a good idea, even ignoring caching (it has its own connection pool, must initialize the meta-data, etc.).
In Tomcat you would be best to have some static manager that holds onto the EntityManagerFactory for the duration of the server. Either never close it, or close it when a Servlet is destroyed.

Related

Spring caching implementation

I am exploring spring caching facility. I have few queries regarding this.
First, should it be applied at service method level or DAO method level, where service method is calling DAO method.
Second, how should I avoid cache data getting stale?
IMO, The answer to both questions is "it depends".
Technically, Spring Wise, cache annotations applied on both Service and DAO will work, I don't think there is any difference, so it boils down to the concrete use case.
For example, if you "logically" plan to provide a cacheable abstraction of what should be calculated as a result of some computational process done on server, you better go with Caching at the service level.
If, on the other hand you have a DAO method that looks like Something getSomethingById(id) in the dao, and you would like to avoid relatively expensive calls to the underlying database, you can provide a cache at the level of the DAO. Having said that, it probably won't be useful to apply caching if you have methods like List<Something> fetchAll() or List<Something> fetchAllByFilter(). If you're working with JPA (implemented with Hibernate) they have their own abstraction of cache, but its kind of beyond the scope of the question, just something you should be aware of...
There are plenty of tutorials available on internet, some illustrate the service based approach, some go for DAO's methods annotations but again, these are only simple examples, in the real world you'll have to take a decision.
Now regarding the second question. In general caching makes sense if your data doesn't change much, so first off if it changes often, then probably caching is not appropriate/relevant for the use case.
Other than that, there are many techniques:
Cache Data Eviction (usually time based). See this tutorial
Some kind of messaging system that will send the message about the cache entry change. This one is especially useful if you have a distributed application and keep the cache in-memory only. When getting a message you might opt for "cache replication" or totally wiping out the cache, so that it will be "filled" with a new data eventually
Using the Distributed cache technologies like Hazelcast or Redis as opposed to the in-memory caching. So that technically the cached data consistency will be guaranteed by caching provider.
I would like also to recommend you This tutorial - the speaker talks about different aspects of caching implementation and I think its really relevant to your question.

Spring bean scoping

I've been googling and I just realized something really odd (at least to me). apparently it's recommended to set spring objects as singletons. I have several questions:
logically, it would affect performance right? if 100 users (total exaggeration) are using the same #service objects, wouldn't that mean the other 99 must queue in line to get that #service? (thus, performance issue)
what if by some diabolical design, those spring objects have states (synchronization problem)? This usually happens with ORM since the BaseDAOImpl usually have protected injected sessionfactory
speaking of injected sessionfactory, I thought annotations are not inherited? could somebody give explanation about this?
thanks
apparently it's recommended to set spring objects as singletons.
It's not necessarily recommended, but it is the default.
logically, it would affect performance right? if 100 users (total
exaggeration) are using the same #service objects, wouldn't that mean
the other 99 must queue in line to get that #service? (thus,
performance issue)
First of all, forget about users. Think about objects, threads, and object monitors. A thread will block and wait if it tries to acquire an object monitor that is owned by another thread. This is the basis of Java's synchronization concept. You use synchronization to achieve mutual exclusion from a critical section of code.
If your bean is stateless, which a #Service bean probably should be (just like a #Controller beans), then there is no critical section. No object is shared between threads using the same #Service bean. As such, there are no synchronized blocks involved and no thread waits on any other thread. There wouldn't be any performance degradation.
what if by some diabolical design, those spring objects have states
(synchronization problem)? This usually happens with ORM since the
BaseDAOImpl usually have protected injected sessionfactory
A typical design would have you use SessionFactory#getCurrentSession() which returns a Session bound to thread, using ThreadLocal. So again, these libraries are well written. There's almost always a way with which you can avoid any concurrency issue, either through ThreadLocal design as above or by playing with bean scopes.
If you can't, you should write your code so that the bottleneck is as small as possible, ie. just the critical section.
speaking of injected sessionfactory, I thought annotations are not
inherited? could somebody give explanation about this?
I'm not sure what you mean with this. You are correct that annotations are not inherited (but methods are inherited with their annotations). But that might not apply to the situation you are asking about, so please clarify.
service object being singleton doesnt mean that it has synchronised access. multple users can invoke it simultaneouly. Like a single servlet instance is used by many concurrent users in a webapplication. You only need to ensure that there is no state in your singleton object.
SessionFactory is a threaf safe object as its immutable, session is not thread safe. Ad since session factory is a heavy object, its recommended to share one session factory object per application but to use mutiple sessions.
Not clear about your 3 point, can you please elaborate a little.
Hope it helps

Caching (Ehcache) - Hibernate 2nd level cache and Spring

In my web application (Spring 3.1, Hibernate 4), I am using Ehcache for Hibernate 2nd level cache and Spring #Cache. I would like to know, where to use Hibernate Cache and Spring Cache?
For Example, I have few domain classes (view in database) which I am using as lookup values on screen. I can cache them using Hibernate 2nd level cache as well as Spring #Cache.
So, in my service layer if I cache these domain objects using Spring #Cache, I would receive these objects without hitting persistence layer at all (hibernate HQL query), once cached. Is it right approach?
Depends on your layer architecture.
Assume you have three services (or three methods within the same service) that all return a collection of Customer entities i.e. domain objects. If you cache at service layer there's a fair chance the same representation of a single database record will live in the cache multiple times. They are multiple objects of essentially the same information. Why? Because the results of Service.getWhateverCustomers(String) and Service.getWhateverCustomers(String, Integer) are stored under two different cache keys.
If you cache at entity level using the JPA #Cachable annotation on the other hand your Customer entity is cached no matter from which service or service method you call the code that actually retrieves the entity. Of course the rules about when the JPA provider can/does cache an entity apply. Read up on them if you're not familiar with them.
Hope this gives you an idea which path to follow. Post follow-up comments if you have more questions and I'll edit this answer.
The right approach is:
Ask yourself if you even need to mess with the complexity of caching. Is your app failing to perform up to requirements?
Only if the answer to the previous question is "yes", profile your app to find out where the performance problem(s) is/are.
Determine the appropriate way to solve a performance problem identified in step 2. It may or may not involve caching to prevent costly operations. If it does involve caching, then where to cache and which cache to use should be abundantly clear because you'll know exactly what you're trying to prevent from happening.
The moral of the story is that you don't cache because it's cool. You cache for performance. And you only optimize code when it's proven necessary.

ASP.NET MVC - Repository pattern with Entity Framework

When you develop an ASP.NET application using the repository pattern, do each of your methods create a new entity container instance (context) with a using block for each method, or do you create a class-level/private instance of the container for use by any of the repository methods until the repository itself is disposed? Other than what I note below, what are the advantages/disadvantages? Is there a way to combine the benefits of each of these that I'm just not seeing? Does your repository implement IDisposable, allowing you to create using blocks for instances of your repo?
Multiple containers (vs. single)
Advantages:
Preventing connections from being auto-closed/disposed (will be closed at the end of the using block).
Helps force you to only pull into memory what you need for a particular view/viewmodel, and in less round-trips (you will get a connection error for anything you attempt to lazy load).
Disadvantages:
Access of child entities within the Controller/View is limited to what you called with Include()
For pages like a dashboard index that shows information gathered from many tables (many different repository method calls), we will add the overhead of creating and disposing many entity containers.
If you are instantiating your context in your repository, then you should always do it locally, and wrap it in a using statement.
If you're using Dependency Injection to inject the context, then let your DI container handle calling dispose on the context when the request is done.
Don't instantiate your context directly as a class member, since this will not dispose of the contexts resources until garbage collection occurs. If you do, then you will need to implement IDipsosable to dispose the context, and make sure that whatever is using your repository properly disposes of your repository.
I, personally, put my context at the class level in my repository. My primary reason for doing so is because a distinct advantage of the repository pattern is that I can easily swap repositories and take advantage of a different backend. Remember - the purpose of the repository pattern is that you provide an interface that provides back data to some client. If you ever switch your data source, or just want to provide a new data source on the fly (via dependency injection), you've created a much more difficult problem if you do this on a per-method level.
Microsoft's MSDN site has good information the repository pattern. Hopefully this helps clarify some things.
I disagree with all four points:
Preventing connections from being auto-closed/disposed (will be closed
at the end of the using block).
In my opinion it doesn't matter if you dispose the context on method level, repository instance level or request level. (You have to dispose the context of course at the end of a single request - either by wrapping the repository method in a using statement or by implementing IDisposable on the repository class (as you proposed) and wrapping the repository instance in a using statement in the controller action or by instantiating the repository in the controller constructor and dispose it in the Dispose override of the controller class - or by instantiating the context when the request begins and diposing it when the request ends (some Dependency Injection containers will help to do this work).) Why should the context be "auto-disposed"? In desktop application it is possible and common to have a context per window/view which might be open for hours.
Helps force you to only pull into memory what you need for a
particular view/viewmodel, and in less round-trips (you will get a
connection error for anything you attempt to lazy load).
Honestly I would enforce this by disabling lazy loading altogether. I don't see any benefit of lazy loading in a web application where the client is disconnected from the server anyway. In your controller actions you always know what you need to load and can use eager or explicit loading. To avoid memory overhead and improve performance, you can always disable change tracking for GET requests because EF can't track changes on a client's web page anyway.
Access of child entities within the Controller/View is limited to what
you called with Include()
Which is rather an advantage than a disadvantage because you don't have the unwished surprises of lazy loading. If you need to populate child entities later in the controller actions, depending on some condition, you could load them through additional repository methods (LoadNavigationProperty or something) with the same or even a new context.
For pages like a dashboard index that shows information gathered from
many tables (many different repository method calls), we will add the
overhead of creating and disposing many entity containers.
Creating contexts - and I don't think we are talking about hundreds or thousands of instances - is a cheap operation. I would call this a very theoretical overhead which doesn't play a role in practice.
I've used both approaches you mentioned in web applications and also the third option, namely to create a single context per request and inject this same context into every repository/service I need in a controller action. They all three worked for me.
Of course if you use multiple contexts you have to be careful to do all the work in the same unit of work to avoid attaching entities to multiple contexts which will lead to well know exceptions. It's usually not a problem to avoid this situations but requires a bit more attention, especially when processing POST requests.
I lately use contexts per request, because it is easier and I just don't see the benefit of having very narrow contexts and I see no reason to use more than one single unit of work for the whole request processing. If I would need multiple contexts - for whatever reason - I could always create specialized methods which act with their own context instead of the "default context" of the request.

Performance in JavaEE 6 Applications (Glassfish v3) - Logging, DI, Database-Operations, EJBs, Managed Beans

The important technologies i use are: Glassfish v3, JSF 2.0, JPA 2.0, EclipseLink 2.0.2, log4j 1.2.16, commons-logging 1.1.1.
My problem is that some parts of the application are pretty slow. I analysed this with the netbeans 6.8 Profiling capabilities.
I. Logging - i use log4j and apache commons logging to generate logs in a logging file and in the console. The logs also appear in glassfish's server log. I use loggers as follows:
private static Log logger = LogFactory.getLog(X.class);
...
if (logger.isDebugEnabled()) {
...
logger.debug("Log...");
}
The Problem is that sometimes such short statements take much time (about 800 ms). When i switch to java.util.logging its not that bad but also very slow (200 ms band). What's the problem?
I need some logging... UPDATE - The problem with the slow logging was solved after switching from Netbeans 6.8 to Netbeans 6.9.1. - Netbeans 6.8 possibly is very slow when logs are printed to its console?! So it had nothing to do with Log4J or commons logging..
II. DB Operation:
The first time i call the find Method of the following EJB it takes 2,4 s! Additional calls last only some ms. So why takes the first operation that long? Is this (only because of) the connection establishment or has it something to do with the Dependency Injections
of the XFacade and when are these Injections performed?:
#Stateless
#PermitAll
public class XFacade {
#PersistenceContext(unitName = "de.x.persistenceUnit")
private EntityManager em;
// Other DI's
...
public List<News> find(int maxResults) {
return em.createQuery(
"SELECT n FROM News n ORDER BY n.published DESC").setMaxResults(maxResults).getResultList()
}
}
III. Dependency Injection, JNDI Lookup: Is there a difference beetween DI like (#EJB ...) and InitialContext lookups concerncing performance? Is there a difference (performance view) between injecting local, remote and no-interface EJB's?
IV. Managed Beans - I use many Session Scoped Beans, because the ViewScope seems to be very buggy and Request Scoped is not always practically. Is there an alternative? - because these Beans are not slow but the server side memory is stressed during a whole session. And when a user logs out it takes some time!
V. EJBs - I don't use MDB only Session Beans and Singleton Beans. Often they inject other Beans with the #EJB Annotation. One Singleton Bean use #Schedule Annotations to start repeatedly operations. A interesting thing i found is that since EJB 3.1 you can use the #Asynchronous Annotation to make Session Bean Method's asynchronous. What should i generally consider when implementing EJBs concerning performance?
Maybe someone could give me some general and/or specific tips to increase the performance of javaee applications, especially concerning the above issues. Thanks!
To start with, you should bench your application in a real environment using load testing tools, you can't really make valid conclusions from the behavior you observe in your IDE. On top of that don't forget that profiling actually alters performances.
I. Logging (...) The problem with the slow logging was solved after switching from Netbeans 6.8 to Netbeans 6.9.1
That's the first proof you can't use trust the behavior inside your IDE.
II. DB Operation: The first time i call the find Method of the following EJB it takes 2,4 s! Additional calls last only some ms. So why takes the first operation that long?
Maybe because some GlassFish services are (lazy) loaded, maybe because the stateless session beans (SLSB) have to be instantiated, maybe because the EntityManagerFactory has to be created. What did the profiling say? Why do you see when activating app server logging? And what's the problem since subsequent calls are ok?
III. Dependency Injection, JNDI Lookup: Is there a difference beetween DI like (#EJB ...) and InitialContext lookups concerncing performance?
JNDI lookups are expensive and it was a de facto practice to use some caching in the good old service locator. I thus don't expect the performances to be worst when using DI (I actually expect the container to be good at it). And honestly, it has never been a concern for me.
When you work on performance optimization, the typical workflow is 1) detect a slow operation 2) find the bottleneck 2) work on it 3) if the operation is still not fast enough, go back to 2). To my experience, the bottleneck is 90% of the time in the DAL. If your bottleneck is DI, you have no performance problem IMO. In other words, I think you're worrying too much and you're very close to "premature optimization".
IV. Managed Beans - I use many Session Scoped Beans, because the ViewScope seems to be very buggy and Request Scoped is not always practically. Is there an alternative? - because these Beans are not slow but the server side memory is stressed during a whole session. And when a user logs out it takes some time!
I don't see any question :) So I don't have anything to say. Update (answering a comment): Using a conversation scope might be indeed less expensive. But as always, measure.
V. EJBs - I don't use MDB only Session Beans and Singleton Beans. Often they inject other Beans with the #EJB Annotation. One Singleton Bean use #Schedule Annotations to start repeatedly operations. A interesting thing i found is that since EJB 3.1 you can use the #Asynchronous Annotation to make Session Bean Method's asynchronous. What should i generally consider when implementing EJBs concerning performance?
SLSBs (and MDBs) perform very well in general. But here are some points to keep in mind when using SLSBs:
Prefer Local over Remote interfaces if your clients and EJBs are collocated to avoid the overhead of remote calls.
Use Stateful Session Beans (SFSB) only if necessary (they are harder to use, managing state has a performance cost and they don't scale very well by nature).
Avoid chaining EJBs too much (especially if they are remote), prefer coarse grained methods over multiple fine-grained methods invocation.
Tune the SLSB pool (so that you have just enough beans to serve your concurrent clients, but not too much to avoid wasting resources).
Use transactions appropriately (e.g. use NOT_SUPPORTED for read only methods).
I would also suggest to not use something unless you really need the feature and there is a real problem to solve (e.g. #Asynchronous).
Maybe someone could give me some general and/or specific tips to increase the performance of javaee applications, especially concerning the above issues. Thanks!
Focus on your data access code, I'm pretty sure this represents 80% of the execution time of most business flows.
And as I already hinted, are you sure that you actually have a performance problem? I'm not convinced. But if you really are, measure performances outside your IDE.
I agreed with Pascal Thivent, in that "premature optimization" is bad. Design your code well, and then worry about performance.
Also, you can optimize for performance or memory-conservation. Pick which one is causing you headaches in the morning, and optimize for that.
I actually really like NetBeans' profiler; it's one of the best free ones, IMHO.
I would recommend:
Profiling for Performance but limiting the scope to either either just your own packages (e.g., de.x) or excluding Java core classes. This is so your application isn't bogged down with profiling and giving you misleading performance numbers. Basically, try to get the "Overhead" as low as possible, as indicated by the bar at the bottom.
Browse around your web app for a while see what is taking up most of your CPU time. Also take note of the fact that some objects are lazily-loaded so the first time you access pieces of code it may be slow. Also, you're using a JIT-compiler, so code may be slow the first (or second...) time you run it, but it will get faster as you use that piece of code more often. In short, use the web app for a while. You can "record" a web session using Selenium and "play" it back. This will allow you to get better performance metrics and see where your application really is slow instead of measuring "warm up time."
If there is a specific piece of code or class that is causing you trouble, use profiling points to see exactly what is causing that piece of code to be slow.
If you have a specific piece of code that's causing you problems, feel free to post about that.

Resources