Hazelcast map memory growing while entry numbers stay stable - caching

We are using a standalone single node installation of Hazelcast 3.3.5 for storing some user's session info both logged and not logged users.
The map in wich we store the not logged sessions is set like that:
Max size: 10.000
Backup count: 0
Async Backup Count: 0
Max Idle: 0
TTL: 86400
Eviction Policy: LRU
Eviction Percentage: 25
Read Backup Data: False
Monitoring with Mancenter we see that the number of entries is stable around 9.500 but "Entry Memory" grows progressively...
When Hazelcast is restarted both "Entries" and "Entry Memory" values are "0" then when "Entries" reaches the top of 9500 "Entry Memory" value is 37MB. So far so rigth. 24 hours later, the number of "Entries" is still the same so Eviction Policy is working fine but "Entry Memory" value is 160MB and growing until Hazelcast throws an OutOfMemory exception.
Is there something wrong in our configuration? It seems like GC is unable to free the memory of the removed entries.
Any ideas?
Thanks in advance

I naively tried to use Hazelcast as my primary data storage - replacing a database. Everything was good until I put 40K of 1K records into one of the maps. The first thing I noticed monitoring it in the ManCenter is how much memory it uses for queries. It goes down after the query, but I suspect there is a memory leak or it keeps some data cached to avoid deserialization for future queries. So I made sure I explicitly disabled all near-caches and similar internal caches inside Hazelcast. Explicit e.g. Portable serialization helps to reduce the memory footprint. Look at internal storage format too. 37M for 10K records is pretty high anyway, but that's a separate architecture issue of keeping the session/user context small (under 15K) - storing only the relevant info.
Again, I use it as a database, not cache, so I don't have any eviction. Maybe there is indeed a bug (memory leak) there. I'd try to test it w/o eviction/TTL.

Related

Does EX second impact performance in Redis?

I tried googling something similar , but wasn't habel to find something on the topic
I'm just curious, does it matter how big the number of seconds are set in a key impact performance in redis?
For example:
set mykey "foobarValue" EX 100 VS set mykey "foobarValue" EX 2592000
To answer this question, we need to see how Redis works.
Redis maintains tables of a key, value pair with an expiry time, so each entry can be translated to
<Key: <Value, Expiry> >
There can be other metadata associated with this as well. During GET, SET, DEL, EXPIRE etc operations Redis calculates the hash of the given key(s) and tries to perform the operation. Since it's a hash table, it needs to prob during any operation, while probing it may encounter some expired keys. If you have subscribed for "Keyspace notification" then notification would be sent and the given entry is removed/updated based on the operation being performed. It also does rehashing, during rehashing it might find expired keys as well. Redis also runs background tasks to cleanup expire keys, that means if TTL is too small then more keys would be expired, as this process is random, so more event would be generated.
https://github.com/antirez/redis/blob/a92921da135e38eedd89138e15fe9fd1ffdd9b48/src/expire.c#L98
It does have a small performance issue when TTL is small since it needs to free the memory and fix some pointers. But it can so happen that you're running out of memory since expired keys are also present in the database. Similarly, if you use higher expiry time then the given key would present in the system for a longer time, that can create memory issue.
Setting smaller TTL has also more cache miss for the client application, so client will have performance issues as well.

Redis Cache throws OOM Error with volitile-lru

For debugging we have set Redis to volitile-lru and maxmemory of 10mb
We are using Redis for HTTP Caching in an Ecommerce shop - when there are parallel Requests on a Page the error:
OOM command not allowed when used memory > 'maxmemory'
appears. Shouldn't this be avoided by setting the maxmemory-policy to volitile-lru ? Is redis not fast enought to set the memory free and set the new one (each request has about 200-600kb)
From the docs:
volatile-lru: evict keys by trying to remove the less recently used (LRU) keys first, but only among keys that have an expire set, in order to make space for the new data added.
It seems like your keys might not have an expiration. If thats the case, you might want to consider using allkeys-lru as your eviction policy.
You can also use INFO stats to see if evicted_keys has a value greater than zero.

Redis cache lru start softlimit

I know redis can be used as LRU cache, but is there softlimit flag, where we can state after specific criteria is reached "redis will start cleaning LRU items".
Actually I'm getting OOM errors on redis, I've set redis to LRU cache, but it hits OOM limit and application stops.
I know of "maxmemory " flag, but is there a softlimit, where we've some 10% space left, and we can start eviction of some items, so that application doesn't stop !
Did you set a specific eviction policy?
See: Eviction policies http://redis.io/topics/lru-cache
I would then check, to make sure that you are not inadvertently setting PERSIST on your redis objects. PERSISTED objects, I believe, cannot be LRU'd out.
You can use http://redis.io/commands/ttl TTL to find out the time limit on your keys. And "Keys" to get a list of keys (this is dangerous on a production server, as the list could be very long and blocking). http://redis.io/commands/keys
-daniel

Low loading into cache speed

I'm using Infinispan 6.0.0 in a 3-node setup (distributed caching with 2 replicas for each entry, no writes into persistent store) and I'm just reading the file line-by-line and storing that lines' contents into the cache. The speed seems a bit low to me (I can achieve more writes onto the SSD (persistent storage) than into RAM with Infinispan), but there isn't any obvious bottleneck in the test code (I'm using buffered input streams, and their limits certainly aren't reached. As for now, I'm able to write 100K entries each ~45 seconds and that doesn't satisfy me. Assume simplified code snippet:
while ((s = reader.readLine()) != null) {
cache.put(s.substring(0,2), s.substring(2,5));
}
And CacheManager is created as follows:
return new DefaultCacheManager(
GlobalConfigurationBuilder.defaultClusteredBuilder()
.transport().addProperty("configurationFile", "jgroups.xml").build(),
new ConfigurationBuilder()
.clustering().cacheMode(CacheMode.DIST_ASYNC).hash().numOwners(2)
.transaction().transactionMode(TransactionMode.TRANSACTIONAL).lockingMode(LockingMode.OPTIMISTIC)
.build());
What could I be possibly doing wrong?
I am not fully aware of all the asynchronous mode specialities, but I'd afraid that something in the two-phase commit (Prepare and Commit) might force some blocking RPC => waiting for network latency => slow down.
Do you need transactional behaviour? If not, switch them off. If you really need it, you may disable just the autocommit feature and load the cluster via non-transactional operations. Or, you may try one phase commits.
Another option could be mass loading via putAll (with tens or hundreds of entries, depends on your entry size), but routing of this message is not really smart. In transactional mode it could behave a bit better, I guess.
The last option if you just want to load the cluster fast and then operate on it could be transferring the bulk data to each node without Infinispan (using your own JGroups channel, or just with sockets), and loading all nodes with the CACHE_MODE_LOCAL flag.
By default Infinispan follows the Map.put() contract of returning the previous value, so even though you are using the DIST_ASYNC cache mode you're still implicitly performing a synchronous cache.get() for every put.
You can avoid this in two ways:
configurationBuilder.unsafe().unreliableReturnValues(true) will suppress the remote lookup for all the operations on the cache.
cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES).put(k, v) will suppress the remote lookup for a single operation.

APC User Cache Entries not being removed after timeout

I'm running APC mainly to cache objects and query data as user cache entries, each item it setup with a specific time relevant to the amount of time it's required in the cache, some items are 48 hours but more are 2-5 minutes.
It's my understanding that when the timeout is reached and the current time passes the created at time then the item should be automatically removed from the user cache entries?
This doesn't seem to be happening though and the items are instead staying in memory? I thought maybe the garbage collector would remove these items but it doesn't seem to have done even though it's running once an hour at the moment.
The only other thing I can think is that the default apc.user_ttl = 0 overrides the individual timeout values and sets them to never be removed even after individual timeouts?
In general, a cache manager SHOULD keep your entries for as long as possible, and MAY delete them if/when necessary.
The Time-To-Live (TTL) mechanism exists to flag entries as "expired", but expired entries are not automatically deleted, nor should they be, because APC is configured with a fixed memory size (using apc.shm_size configuration item) and there is no advantage in deleting an entry when you don't have to. There is a blurb below in the APC documentation:
If APC is working, the Cache full count number (on the left) will
display the number of times the cache has reached maximum capacity and
has had to forcefully clean any entries that haven't been accessed in
the last apc.ttl seconds.
I take this to mean that if the cache never "reached maximum capacity", no garbage collection will take place at all, and it is the right thing to do.
More specifically, I'm assuming you are using the apc_add/apc_store function to add your entries, this has a similar effect to the apc.user_ttl, for which the documentation explains as:
The number of seconds a cache entry is allowed to idle in a slot in
case this cache entry slot is needed by another entry
Note the "in case" statement. Again I take this to mean that the cache manager does not guarantee a precise time to delete your entry, but instead try to guarantee that your entries stays valid before it is expired. In other words, the cache manager puts more effort on KEEPING the entries instead of DELETING them.
apc.ttl doesn't do anything unless there is insufficient allocated memory to store new coming variables, if there is sufficient memory the cache will never expire!!. so you have to specify your ttl for every variable u store using apc_store() or apc_add() to force apc to regenerate it after end of specified ttl passed to the function. if u use opcode caching it will also never expire unless the page is modified(when stat=1) or there is no memory. so apc.user_ttl or apc.ttl are actually have nothing to do.

Resources