I am working on a basic Struts based application that is experience major spikes in memory. We have a monitoring tool that will notice one request per user adding 3MB to the JVM heap memory. Are there any tips to encourage earlier garbage collection, free up memory or improve performance?
The application is a basic Struts application but there are a lot of rows in the JSP report, so there may be a lot of objects created. But it isn't stuff you haven't seen before.
Perform a set of database query.
Create an serialized POJO object bean. This represents a row.
Add a row to an array list.
Set the array list to the form object when the action is invoked.
The JSP logic will iterate through the list from the ActionForm and the data is displayed to the user.
Notes:
1. The form is in session scope and possibly that array list of data (maybe this is an issue).
2. The POJO bean contains 20 or so fields, a mix of String or BigDecimal data.
The report can have 300 to 1200 or so rows. So there are at least that many objects created.
Given the information you provide, I'd estimate that you're typically loading 1 to 2 megabytes of data for a result: 750 rows * 20 fields * 100 bytes per field = 1.4 Mb. Now consider all of the temporary objects needed between the database and the final markup. 3 Mb isn't surprising.
I'd only be concerned if that memory seems to have leaked; i.e., the next garbage collection of the young generation space doesn't collect all of those objects.
List item
When desiging reports to be rendered in web application, consider the number of records fetched from database.
If the number of records is high and the overall recordset is taking lot of memory, then consider using pagination of report.
As far as possible donot invoke garbage collector explicitly. This is so because of two reasons:
Garbage collection is costly process
as it scans whole of the memory.
Most of the production servers would
be tuned at JVM level to avoid
explicit garabage collection
I believe the problem is the arraylist in the ActionForm that needs to allocate a huge chunk of memory space. I would write the query results directly to the response: read the row from the resultset, write to response, read next row, write etc. Maybe it's not MVC but it would be better for your heap :-)
ActionForms are fine for CRUD operations, but for reports ... I don't think so.
Note: if the ActionForm has scope=session the instance will be alive (along with the huge arraylist) until session expires. If scope=request the instance will be available for the GC.
Related
I have a Spring Batch application with JpaPagingItemReader (i modified it a bit) and 4 Jpa repositories to enrich Model which comes from JpaPagingItemReader.
My flow is:
Select Model (page size = 8192), then i collect this List<Model> to Map<String, List<Model>> (group by id, because models not unique and i need to enrich by id) then enrich it with 4 custom JpaRepositories with native queries with IN clause, and merge them with Java 8 Streams.
Convert data to XML object and with Stax writing with MultiFileItemWriter to files, which are splitted no more than 20000 per file.
All works great, but today i tried to run flow with big amount of data from database. I generated 20 files (2.2 GB). But sometimes i got OutOfMemory Java Heap (I had 1Gb XMS, XSS), then i up it to 2 GB and all works good, but in Instana i see, that Old gen Java memory is always 900 in use after GC. It is about 1.3-1.7Gb in use. So i start to think, how can i optimize GC of Spring Data Jpa objects. I think they are much time in memory.
When i select Model with JpaPagingItemReader i detach every Model (with entityManager.detach), but when i enrich Model with custom Spring Data Jpa requests i am not detaching results. Maybe the problem in this and i should detach them?
I do not need to insert data to database, i need just to read it. Or do i need to make page size less and select about 4000 per request?
I need to process 370 000 records from database and enrich them.
Solved. Added flags to my run configuration, and increase XMS and XMX twice.
We have a parent object with a collection of 500.000 child objects. We are using Hibernate for mapping with ehcache as the cache provider. Using the 2nd level cache for entities and collection works fine as we can avoid requests to the database.
But loading 500.000 objects by 2nd level cache still produces a lot of cpu and memory garbage and results in a reponse time of a few seconds. As the child objects are not immutable we can't enable the hibernate.cache.use_reference_entries property.
With using an application layer cache of dao objects in top of the hibernate 2nd level cache, there's no cpu and no garbage memory overhead. The response time is a few milliseconds instead of seconds.
But the big disadvantage of this solution is, that we have to manage this cache by ourself. Including invalidation und synchronization in a clustered multithreading system.
My question is, if there's a better solution with the advantages of low cpu and garbage? Does anyone have experience in handling large collections?
Do you really need that 500k at once?
You could remove the collection from the Parent and query the objects from Child by parent: SELECT c FROM Child c WHERE c.parent = :parent and add pagination or filtering when you dont need the 500k at once.
You could also load the data Child entieties as DTO which would improve memory performance because hibernate would not consider these DTO's for dirty checking. I guess this would remove memory footprint by half, although i never benchmarked it. Also a DTO would allow you to omit attributes which you dont need in this certain use case saving memory and CPU.
You could also take a look at enableDirtyTracking in Hibernate 5.
I need to display 40,000 records, I got system out of memory exception in MVC 5. Sometimes 70,000 records loads correctly and sometimes not even 40,000 records load. I need to display all records and export these records to the MS-Excel.
I used kendo grid to display the records.
I saw somewhere kendo grid doesn't load huge number of records.
From the Telerik forum:
When OpenAccess executes a query the actual retrieval of results is split into chunks. There is a fetch size that determines the number of records that are read from the database in a single pass. With a query that returns a lot of records this means that the fetch size is not exceeded and not all 40 000 records will be retrieved at one time in memory. Iterating over the result data you will get several reads from the database until the iteration is over. However, when you iterate over the result set subsequent reads are accumulated when you keep references to the objects that are iterated.
An out of memory exception may be caused when you operate with all the records from the grid. The way to avoid such an error would be to work with the data in chunks. For example, a paging for the grid and an option that exports data sequentially from all pages will achieve this. The goal is to try to reduce the objects kept in-memory at a time and let the garbage collection free unneeded memory. A LINQ query with Skip() and Take() is ideal in such cases where having all the data in-memory is costly.
and from http://docs.telerik.com/devtools/aspnet-ajax/controls/grid/functionality/exporting/overview
We strongly recommend not to export large amounts of data since there is a chance to encounter an exception(Timeout or OutOfMemory) if more than one user tries to export the same data simultaneously. RadGrid is not suitable for such scenarios and therefore we suggest that you limit the number of columns and rows. Also it is important to note that the hierarchy and the nested controls have a considerable effect on the performance in this scenario.
What the above is basically saying, is to reduce your result set via paging and/or reducing the number of columns that are fetched from the db to show only what is actually needed.
Not really sure what else you could do. You have too much data, and you're running out of memory. Gotta reduce the data to reduce the memory used.
Please go for the paging.And try to export all 40000 records without loding on to page. Which clears you that load data takes time and goes to memory out of exception.
I am having performance problems where an aggregate has a bag which has a large number of entities (1000+). Usually it only contains at most 50 entities but sometimes a lot more.
Using NHibernate profiler I see that the duration to fetch 1123 records of this bag from the database is 18ms but it takes NHibernate 1079ms to process it. Problem here is that all those 1123 records have one or two additional records. I fetch these using fetch="subselect" and fetching these additional records takes 16ms to fetch from the database and 2527ms processing by NHibernate. So this action alone takes 3,5 seconds which is way too expensive.
I read that this is due the fact that updating the 1st level cache is the problem here as it performance gets slow when loading a lot of entities. But what is alot? NHibernate Profiler says that I have 1145 entities loaded by 31 queries (which is in my case the absolute minimum). This number of entities loaded does not seem like a lot to me.
In the current project we are using NHibernate v3.1.0.4000
I agree, 1000 entities aren't too many. Are you sure that the time isn't used in one of the constructors or property setters? You may stop the debugger during the load time to take a random sample where it spends the time.
Also make sure that you use the reflection optimizer (I think it's turned on by default).
I assume that you measure the time of the query itself. If you measure the whole transaction, it most certainly spends the time in flushing the session. Avoid flushing by setting the FlushMode to Never (only if there aren't any changes in the session to be stored) or by using a StatelessSession.
A wild guess: Removing the batch-size setting may even make it faster because it doesn't need to assign the entities to the corresponding collections.
My application has internationalization for all tables. So all tables has its another table for different languange support with key as language code like 'en-us'. Every time if it hits the db and to show in page then applications get slow. so We implemented by extending AbstractMessageSource class.I referred the link http://forum.springsource.org/showthread.php?t=15223 But based on this stored all the messages are stored in the memory. if table size/number of table grows this mesage hash also grows. then memory problem comes. So we have planned to keep it in disk using ehcache technique. Please provide me the sample. Let me know is this valid option to store the objects?
Change the Map entries in DataSourceMessageSource to:
/** Cache holding already generated MessageFormats per message code and Locale
* Map
/** all messages (for all basenames) per locale
* Map
That will get you going. You also need an ehcache.xml with cache entries for each of the above. You should speicfy overflowToDisk=true.
Note that you will incur a deserialization cost. If you are seeing a high cost in cpu doing that it might be worth restructuring the code to return what you want speficically rather than a map.
Greg Luck