We have a JVM heap issue in the production server. When we took the heap dump and analyzed the report, we found that more than 80% of the heap size is exhausted by the Ehcache objects and server become slow. We have suspected that memory leak or accumulating the cache objects without clearing the old caches. The site is very high volume site hence the number of caches would have increased.
I want to review the meaning of this configurations and suggest if this has something to do with that issue:
<ehcache>
<diskStore path="java.io.tmpdir" />
<defaultCache maxElementsInMemory="10" eternal="false"
timeToIdleSeconds="120" timeToLiveSeconds="120" overflowToDisk="true" />
<cache name="name1" maxElementsInMemory="50" eternal="false" overflowToDisk="true" timeToIdleSeconds="0" timeToLiveSeconds="86400" />
<cache name="name2" maxElementsInMemory="50" eternal="false" overflowToDisk="true" timeToIdleSeconds="0" timeToLiveSeconds="86400" />
<cache name="name3" maxElementsInMemory="50000" eternal="true" overflowToDisk="true"/>
<cache name="name4" maxElementsInMemory="500" eternal="true" overflowToDisk="true"/>
<cache name="name5" maxElementsInMemory="500" eternal="true" overflowToDisk="true"/>
<cache name="name6" maxElementsInMemory="50000" eternal="true" overflowToDisk="true"/>
<cache name="name7" maxElementsInMemory="50000" eternal="true" overflowToDisk="true"/>
</ehcache>
In the Java class I have the following implementation:
#Cacheable("name6")
public ServiceResponse getQualifiedProducts(
ValueObject parameterObject, List < Input > list)
{
return cachedService.getQualifiedProducts(parameterObject, list);
}
Any suggestions would be thankful.
Update: What is the significance of overflowToDisk="true". I don't have any disk store configured. How it will impact the performance?
The example you gave of cache content is a ServiceResponse object.
Do you know how big that object is? Does it contain references to types that are linked to the frameworks you use? Or is it pure business data?
This is the kind of investigation you need to perform in order to understand what ends up referenced by the cache and how this impacts your memory usage.
As a general advice, you want to cache objects that are yours, so that you can control their size precisely.
As for the overflowToDisk=true, it will be applied, as you have a disk store configured in the xml snippet you shared.
But while it says overflow for backwards dependency reasons, it no longer behaves like that since Ehcache 2.6.0 at least. The current Ehcache model is that all mappings exist in the slower store - disk here - to give you more predictable latency. And when a mapping is accessed, it is copied to heap for faster retrieval on subsequent hits.
Related
I've implemented a web application using GeoServer to provide tile maps. In order to apply caching strategy, I've enabled the embedded GeoWebCache and set tiling page in a PostgreSQL database. The disk quota is set 5 MB and LFU approach to test the truncate behavior on quota limit exceeding. The problem is shown when the caching volumes are more than 5 MB and GeoWebCahe delete all tiles without regarding the "frequency_of_use" of each tile. Is this the expected behavior because I think it should remove least used tiles first.
<!-- geowebcache-diskquota.xml -->
<gwcQuotaConfiguration>
<enabled>true</enabled>
<cacheCleanUpFrequency>10</cacheCleanUpFrequency>
<cacheCleanUpUnits>SECONDS</cacheCleanUpUnits>
<maxConcurrentCleanUps>2</maxConcurrentCleanUps>
<globalExpirationPolicyName>LFU</globalExpirationPolicyName>
<globalQuota>
<value>5</value>
<units>MiB</units>
</globalQuota>
<quotaStore>JDBC</quotaStore>
</gwcQuotaConfiguration>
and the geowebcache-diskquota-jdbc.xml file:
<gwcJdbcConfiguration>
<dialect>PostgreSQL</dialect>
<JNDISource>java:comp/env/jdbc/gwc</JNDISource>
<connectionPool>
<driver>org.postgresql.Driver</driver>
<url>jdbc:postgresql://localhost:5432/gwc</url>
<username>postgres</username>
<password></password>
<minConnections>1</minConnections>
<maxConnections>10</maxConnections>
<connectionTimeout>10000</connectionTimeout>
<maxOpenPreparedStatements>50</maxOpenPreparedStatements>
</connectionPool>
</gwcJdbcConfiguration>
The disk quota mechanism does not track each single tile, but "tile pages", groups of tiles whose statistics are tracked as a unit, in order to reduce the accounting database size.
I don't know the details of the implementation to the point of telling you how big a tile page is, but for a tile cache that is potentially hundreds of gigabytes, I would not be surprised if the minimum tracking unit is more than 5MB. If that's the case, then a delete of all the tiles available in a 5MB quota would be very likely.
We have a Coldfusion application that is running a large query (up to 100k rows) and then displaying it in HTML. The UI then offers an Export button that triggers writing the report to an Excel spreadsheet in .xlsx format using the cfspreadsheet tags and spreadsheet function, in particular, spreadsheetSetCellValue for building out row column values, spreadsheetFormatRow and spreadsheetFormatCell functions for formatting. The ssObj is then written to a file using:
<cfheader name="Content-Disposition" value="attachment; filename=OES_#sel_rtype#_#Dateformat(now(),"MMM-DD-YYYY")#.xlsx">
<cfcontent type="application/vnd-ms.excel" variable="#ssObj#" reset="true">
where ssObj is the SS object. We are seeing the file size about 5-10 Mb.
However... the memory usage for creating this report and writing the file jumps up by about 1GB. The compounding problem is that the memory is not released right away after the export completes by the java GC. When we have multiple users running and exporting this type of report, the memory keeps climbing up and reaches the heap size allocated and kills the serer's performance to the point it brings down the server. A reboot is usually necessary to clear it out.
Is this normal/expected behavior or how should we be dealing with this issue? Is it possible to easily release the memory usage of this operation on demand after the export has completed, so that others running the report readily get access to the freed up space for their reports? Is this type of memory usage for a 5-10Mb file common with cfspreadsheet functions and writing the object out?
We have tried temporarily removing the expensive formatting functions and still the memory usage is large for the creation and writing of the .xlsx file. We have also tried using the spreadsheetAddRows approach and the cfspreadsheet action="write" query="queryname" tag passing in a query object but this too took up a lot of memory.
Why are these functions so memory hoggish? What is the optimal way to generate Excel SS files without this out of memory issue?
I should add the server is running in Apache/Tomcat container on Windows and we are using CF2016.
How much memory do you have allocated to your CF instance?
How many instances are you running?
Why are you allowing anyone to view 100k records in HTML?
Why are you allowing anyone to export that much data on the fly?
We had issues of this sort (CF and memory) at my last job. Large file uploads consumed memory, large excel exports consumed memory, it's just going to happen. As your application's user base grows, you'll hit a point where these memory hogging requests kill the site for other users.
Start with your memory settings. You might get a boost across the board by doubling or tripling what the app is allotted. Also, make sure you're on the latest version of the supported JDK for your version of CF. That can make a huge difference too.
Large file uploads would impact the performance of the instance making the request. This meant that others on the same instance doing normal requests were waiting for those resources needlessly. We dedicated a pool of instances to only handle file uploads. Specific URLs were routed to these instances via a load balancer and the application was much happier for it.
That app also handled an insane amount of data and users constantly wanted "all of it". We had to force search results and certain data sets to reduce the amount shown on screen. The DB was quite happy with that decision. Data exports were moved to a queue so they could craft those large excel files outside of normal page requests. Maybe they got their data immediately, maybe the waited a while to get a notification. Either way, the application performed better across the board.
Presumably a bit late for the OP, but since I ended up here others might too. Whilst there is plenty of general memory-related sound advice in the other answer+comments here, I suspect the OP was actually hitting a genuine memory leak bug that has been reported in the CF spreadsheet functions from CF11 through to CF2018.
When generating a spreadsheet object and serving it up with cfheader+cfcontent without writing it to disk, even with careful variable scoping, the memory never gets garbage collected. So if your app runs enough Excel exports using this method then it eventually maxes out memory and then maxes out CPU indefinitely, requiring a CF restart.
See https://tracker.adobe.com/#/view/CF-4199829 - I don't know if he's on SO but credit to Trevor Cotton for the bug report and this workaround:
Write spreadsheet to temporary file,
read spreadsheet from temporary file back into memory,
delete temporary file,
stream spreadsheet from memory to
user's browser.
So given a spreadsheet object that was created in memory with spreadsheetNew() and never written to disk, then this causes a memory leak:
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(arguments.theSheet)#" />
...but this does not:
<cfset local.tempFilePath = getTempDirectory()&CreateUUID()&arguments.filename />
<cfset spreadsheetWrite(arguments.theSheet, local.tempFilePath, "", true) />
<cfset local.theSheet = spreadsheetRead(local.tempFilePath) />
<cffile action="delete" file="#local.tempFilePath#" />
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(local.theSheet)#" />
It shouldn't be necessary, but Adobe don't appear to be in a hurry to fix this, and I've verified that this works for me in CF2016.
Using NLog I know I can change the minLevel in the Nlog.config so I can exclude certain log messages. I think this is generally great when software is running in production. If a problem happens I can switch the minLevel and see more detail. This makes sense.
What I have problems with is during debugging the "Debug" level quite honestly seems a bit inadequate. This is mostly because "Debug" seems to be the catch all for everything a developer may care about and no one else.
For backend systems that do a lot I have seen this fill up a 25 MB log file in a few seconds. Sorting through this and trying to tie pieces together is a bit difficult.
Is it possible to have multiple levels of "Debug" so that I can limit the amount of information to actually make using the log file easier?
No sure if this solves your problem,
but it's common in NLog to use the following pattern:
use different loggers for each class or process (by using LogManager.GetCurrentClassLogger() or LogManager.GetLogger("loggernameForFlow1"))
always write all the logs messages to the logger (e.g. logger.Trace(...), logger.Debug(...) etc
filter the logs in the config by level, but also by logger name. Because LogManager.GetCurrentClassLogger() creates a logger with the current class name with namespace, you could easily filter per class. e.g
filter on namespace:
<logger name="myNamespace.*" minLevel=... writeTo=... />
filter on 1 class
<logger name="myNamespace.MyClass.*" minLevel=... writeTo=... />
I have a PERSISTENT cache configured like this :-
<region name="stock-hist" refid="PARTITION_PERSISTENT" >
<region-attributes disk-store-name="myOverflowStore" disk- synchronous="false">
<partition-attributes local-max-memory="1024" />
<eviction-attributes>
<!-- Overflow to disk when 100 megabytes of data reside in the
region -->
<lru-memory-size maximum="100" action="overflow-to-disk"/>
</eviction-attributes>
</region-attributes>
The problem is that when I storing say 8 GB of data the cache crashes due to too much memory. I do not want that to happen. Like I need the data to overflow to disk when it is beyond 100MB, but get it back to cache if I try to access it. I also want persistent cache.
Also in case I write behind to a database, how can I evict data after sometime.
How does this work?
This is a use-case for which an In-Memory Data Grid is not intended. Based on the problem that you are describing, you should consider using a relational DB OR you should increase memory to use an IN-MEMORY Data Grid. Overflow features are intended as a safety valve and not for "normal" use.
I do not understand when you say that "it" crashes due to "too much" memory since it obviously does not have "enough" memory. I suspect that there is not have sufficient disk space defined. If you think not, check your explicit and not implicit disk allocations.
As for time-based eviction/ expiration, please see "PARTITION_HEAP_LRU" at: http://gemfire.docs.pivotal.io/docs-gemfire/latest/reference/topics/region_shortcuts_reference.html
I'd like to use as much RAM as possible with ehcache but ehcache still uses some space on harddrive. this is my config
name="MainCacheManager"
overflowToDisk="false"
diskPersistent="false"
updateCheck="true"
monitoring="autodetect"
dynamicConfig="true"
maxBytesLocalHeap="2G"
maxBytesLocalDisk="1M"
Is it possible to disable swap at all?
Also this value doesn't work
maxBytesLocalDisk="1M"
ehcache swap takes much more space than 1M
Ehcache uses a tiering model in recent versions (2.6+). This means that the lower tier, also called authority, will always contain all the entries from the cache.
So you should never configure a disk store to be smaller than the onheap store, as it will limit the cache capacity.
If you do not want a disk store, drop the maxBytesLocalDisk="1M" config line.
Also, the disk store of ehcache should not be compared to a swap file.