Does Cache Coherence always prevent reading a stale value? Do invalidation queues allow it?

In MESI protocol you write to the cache line only when holding it in the Exclusive/Modified state. To acquire the Exclusive state, you send an Invalidate request to all the cores holding the same cache line.
But is there an micro-architecture where some core will respond with acknowledgement before actually invalidating the cache line? If it's a case, isn't it a violation of Cache Coherence?
The reason I'm asking this question is because I'm confused by this answer - Memory barriers force cache coherency?. It says:
Placing an entry into the invalidate queue is essentially a promise by
the CPU to process that entry before transmitting any MESI protocol
messages regarding that cache line. So invalidation queues are the
reason why we may not see the latest value even when doing a simple
read of a single variable.
But how can we read a "stale" variable if there are no new value yet? I mean the writing core will not write a new value until receiving Invalidation acknowledgedment from all the other cores.


Ehcache and CacheWriter (write-behind) relation

Suppose we have a Cache configured with a write-behind CacheWriter. Let's assume we put some object in the cache and later on the object is removed because of an eviction policy.
What's is guaranteed regarding writing? More precisely, is write() event guaranteed to happen for that object, even though it was removed before it "had a chance" to be written?
No, write() is not guaranteed to happen. In a write-behind case, all writes are stored in a queue while some background threads read from that queue to update the underlying SoR (System of Records, i.e.: your database). That queue can be read or modified by other threads concurrently reading or modifying the same cache.
For instance, if a put() happens on a certain key, write() enqueues the command. If before one of the background thread had the chance to consume the write command before remove() happens on that same key, the write command can be removed from the queue (note the 'can' here). There are other similar optimizations that can take place ('can' again), those can change and new ones can be added in any minor version as this is all considered an implementation detail, as long as the data served by Ehcache follows its general visibility guarantees.
This means Write-Behind, and more generally all CacheWriters must not be used for any form of accounting, if that's the use-case you had in mind.

Guava cache 'expireAfterWrite' does not seem to always work

private Cache<Long, Response> responseCache = CacheBuilder.newBuilder()
.expireAfterWrite(10, TimeUnit.MINUTES)
I am expecting that response objects that are not send to client within 10 minutes are expired and removed from cache automatically but I notice that Response objects are not always getting expired even after 10, 15, 20 minutes. They do get expire when cache is being populated in large numbers but when the system turn idle, something like last 500 response objects, it stops removing these objects.
Can someone help to understand this behavior? Thank you
This is specified in the docs:
If expireAfterWrite or expireAfterAccess is requested entries may be evicted on each cache modification, on occasional cache accesses, or on calls to Cache.cleanUp(). Expired entries may be counted by Cache.size(), but will never be visible to read or write operations.
And there's more detail on the wiki:
Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.
The reason for this is as follows: if we wanted to perform Cache
maintenance continuously, we would need to create a thread, and its
operations would be competing with user operations for shared locks.
Additionally, some environments restrict the creation of threads,
which would make CacheBuilder unusable in that environment.
Instead, we put the choice in your hands. If your cache is
high-throughput, then you don't have to worry about performing cache
maintenance to clean up expired entries and the like. If your cache
does writes only rarely and you don't want cleanup to block cache
reads, you may wish to create your own maintenance thread that calls
Cache.cleanUp() at regular intervals.
If you want to schedule regular cache maintenance for a cache which
only rarely has writes, just schedule the maintenance using

Low loading into cache speed

I'm using Infinispan 6.0.0 in a 3-node setup (distributed caching with 2 replicas for each entry, no writes into persistent store) and I'm just reading the file line-by-line and storing that lines' contents into the cache. The speed seems a bit low to me (I can achieve more writes onto the SSD (persistent storage) than into RAM with Infinispan), but there isn't any obvious bottleneck in the test code (I'm using buffered input streams, and their limits certainly aren't reached. As for now, I'm able to write 100K entries each ~45 seconds and that doesn't satisfy me. Assume simplified code snippet:
while ((s = reader.readLine()) != null) {
cache.put(s.substring(0,2), s.substring(2,5));
And CacheManager is created as follows:
return new DefaultCacheManager(
.transport().addProperty("configurationFile", "jgroups.xml").build(),
new ConfigurationBuilder()
What could I be possibly doing wrong?
I am not fully aware of all the asynchronous mode specialities, but I'd afraid that something in the two-phase commit (Prepare and Commit) might force some blocking RPC => waiting for network latency => slow down.
Do you need transactional behaviour? If not, switch them off. If you really need it, you may disable just the autocommit feature and load the cluster via non-transactional operations. Or, you may try one phase commits.
Another option could be mass loading via putAll (with tens or hundreds of entries, depends on your entry size), but routing of this message is not really smart. In transactional mode it could behave a bit better, I guess.
The last option if you just want to load the cluster fast and then operate on it could be transferring the bulk data to each node without Infinispan (using your own JGroups channel, or just with sockets), and loading all nodes with the CACHE_MODE_LOCAL flag.
By default Infinispan follows the Map.put() contract of returning the previous value, so even though you are using the DIST_ASYNC cache mode you're still implicitly performing a synchronous cache.get() for every put.
You can avoid this in two ways:
configurationBuilder.unsafe().unreliableReturnValues(true) will suppress the remote lookup for all the operations on the cache.
cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES).put(k, v) will suppress the remote lookup for a single operation.

How does ehcache write-behind handle: shutdown, eviction b/c cache is full, eviction b/c TTL expired?

We're switching to using EhCache's write-behind feature but I can't tell from the documentation how these three cases are handled.
If I've put something in the cache via putWithWriter() and my cacheWriter hasn't yet been called, what happens if the element is evicted (due to space or due to ttl)? Is my cacheWriter automatically called with this item prior to eviction?
Similar question at program exit time....if I call getCacheManager.shutdown() are all of the unwrite items sent to my cache writer?

Buffer management for socket application best practice

Having a Windows IOCP app............
I understand that for async i/o operation (on network) the buffer must remain valid for the duration of the send/read operation.
So for each connection I have one buffer for the reading.
For sending I use buffers to which I copy the data to be sent. When the sending operation completes I release the buffer so it can be reused.
So far it's nice and not of a big issue.
What remains unclear is how do you guys do this?
Another thing is that even when having things this way, I mean multi-buffers, the receiver side might be flooded (talking from experience) with data.
Even setting SO_RCVBUF to 25MB didn't help in my testings.
So what should I do? Have a to-be-sent queue?
I reference count the per connection (socket) and per operation (buffer) structures. This works very well and deals with the lifetime issues perfectly. Each time an overlapped operation is posted the reference count of the per connection is incremented and a new buffer is allocated from the pool. When the operation completes I process the results and release the reference on the socket and the buffer. If this is the last reference then the structure is cleaned up (buffers go back to the pool, etc).
You can see all of this in action in my free IOCP client/server framework which is available for download from here.
