Here's my code of using ehcache when I do multi-threaded reading and writing:
write code:
try {
targetCache.acquireWriteLockOnKey(key);
targetCache.putIfAbsent(new Element(key, value));
}
finally {
targetCache.releaseWriteLockOnKey(key);
}
reading code:
try{
cache.acquireReadLockOnKey(key);
cacheCarId = (String)ele.getObjectValue();
}
finally {
cache.releaseReadLockOnKey(key);
}
key and value are both String.
My config is as follows:
CacheConfiguration config = new CacheConfiguration();
config.name("carCache");
config.maxBytesLocalHeap(128, MemoryUnit.parseUnit("M"));
config.eternal(false);
config.timeToLiveSeconds(60);
config.setTimeToIdleSeconds(60);
SizeOfPolicyConfiguration sizeOfPolicyConfiguration = new SizeOfPolicyConfiguration();
sizeOfPolicyConfiguration.maxDepth(10000);
sizeOfPolicyConfiguration.maxDepthExceededBehavior("abort");
config.addSizeOfPolicy(sizeOfPolicyConfiguration);
Cache memoryOnlyCache = new Cache(config);
CacheManager.getInstance().addCache(memoryOnlyCache);
Values are evict within 60s and will be written by multi-thread. The total number of key is less than 25,000.
The reading and writing was ok at the beginning, but after a couple of hours, i get inconsistence of reading and writing...
Could Anybody help me with this problem? Thanks a lot
A Cache is already a thread safe data structure, so you should not need to use explicit locking as you do.
Also the method Cache.putIfAbsent is already an atomic operation that guarantees that only one thread will succeed with the put.
Note that eviction and expiry are two different things. With your configuration, eviction happens when the cache size grows beyond 128MB and expiry indeed happens after 60 seconds. However Ehcache does expiry in-line, so it is triggered when you read or write the mapping.
As for your remark on inconsistence, you will need to describe in more detail what you mean by that.
Related
I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}
Can anyone help me out finding correct API to improve write performance?
We use MultipleOutputs<ImmutableBytesWritable, Result> class to write data we read from a table, we use the newly created file as a backup. We face performance issue in write using MultipleOutputs, it takes nearly 5 seconds for every 10000 records we write.
This is the code we use:
Result[] results = // result from another table
MultipleOutputs<ImmutableBytesWritable, Result> mos = new MultipleOutputs<ImmutableBytesWritable, Result> ();
for(Result res : results ){
mos.write(new ImmutableBytesWritable(result.getRow()), result, baseoutputpath);
}
We get a batch of 10000 rows and write them in a loop, with baseoutputpath changing depending on Result content.
We are facing performance dip when writing into MultipleOutputs, we suspect that it might be due to writing in a loop.
Is there any other API in maprdb or HBase which push data to database using fewer RPC calls by buffering upto certain limit.
We write data as records so no file system write class would work for us.
Please note that we use mapreduce job to do all of the above.
I am trying to implement session management, where we store jwt token to redis. Now I want remove the key if the object idle time is more than 8 hours. Pls help
There is no good reason that comes to my mind for using IDLETIME instead of using the much simpler pattern of issuing a GET followed by an EXPIRE apart from very trivial memory requirements for key expiry.
Recommended Way: GET and EXPIRE
GET the key you want.
Issue an EXPIRE <key> 28800.
Way using OBJECT IDLETIME, DEL and some application logic:
GET the key you want.
Call OBJECT IDLETIME <key>.
Check in your application code if the idletime > 8h.
If condition 3 is met, then issue a DEL command.
The second way is more cumbersome and introduces network latency since you need three round trips to your redis server while the first solution just does it in one round trip if you use a pipeline or two round trips without any app server time at worst.
This is what I did using Jedis. I am fetching 1000 records at a time. You can add a loop to fetch all records in a batch.
Jedis jedis = new Jedis("addURLHere");
ScanParams scanParams = new ScanParams().count(1000);
ScanResult<String> scanResult = jedis.scan(ScanParams.SCAN_POINTER_START, scanParams);
List<String> result = scanResult.getResult();
result.stream().forEach((key) -> {
if (jedis.objectIdletime(key) > 8 * 60 * 60) { // more than 5 days
//your functionality here
}
});`
I have the following issues in our production environment (Web-Farm - 4 nodes, on top of it Load balancer):
1) Timeout performing HGET key, inst: 3, queue: 29, qu=0, qs=29, qc=0, wr=0/0
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor``1 processor, ServerEndPoint server) in ConnectionMultiplexer.cs:line 1699 This happens 3-10 times in a minute
2) No connection is available to service this operation: HGET key at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor``1 processor, ServerEndPoint server) in ConnectionMultiplexer.cs:line 1666
I tried to implement as Marc suggested (Maybe I interpreted it incorrectly) - better to have fewer connections to Redis than multiple.
I made the following implementation:
public class SeRedisConnection
{
private static ConnectionMultiplexer _redis;
private static readonly object SyncLock = new object();
public static IDatabase GetDatabase()
{
if (_redis == null || !_redis.IsConnected || !_redis.GetDatabase().IsConnected(default(RedisKey)))
{
lock (SyncLock)
{
try
{
var configurationOptions = new ConfigurationOptions
{
AbortOnConnectFail = false
};
configurationOptions.EndPoints.Add(new DnsEndPoint(ConfigurationHelper.CacheServerHost,
ConfigurationHelper.CacheServerHostPort));
_redis = ConnectionMultiplexer.Connect(configurationOptions);
}
catch (Exception ex)
{
IoC.Container.Resolve<IErrorLog>().Error(ex);
return null;
}
}
}
return _redis.GetDatabase();
}
public static void Dispose()
{
_redis.Dispose();
}
}
Actually dispose is not being used right now. Also I have some specifics of the implementation which could cause such behavior (I'm only using hashes):
1. Add, Remove hashes - async
2. Get -sync
Could somebody help me how to avoid this behavior?
Thanks a lot in advance!
SOLVED - Increasing Client connection timeout after evaluating network capabilities.
UPDATE 2: Actually it didn't solve the problem. When cache volume starting to get increased e.g. from 2GB.
Then I saw the same pattern actually these timeouts were happend about every 5 minutes.
And our sites were frozen for some period of time every 5 minutes until fork operation was finished.
Then I found out that there is an option to make a fork (save to disk) every x seconds:
save 900 1
save 300 10
save 60 10000
In my case it was "save 300 10" - save in every 5 minutes if at least 10 updates were happened. Also I found out that "fork" could be very expensive. Commented "save" section resolved the problem at all. We can commented "save" section as we are using only Redis as "cache in memory" - we don't need any persistance.
Here is configuration of our cache servers "Redis 2.4.6" windows port: https://github.com/rgl/redis/downloads
Maybe it has been solved in recent versions of Redis windows port in MSOpentech: http://msopentech.com/blog/2013/04/22/redis-on-windows-stable-and-reliable/
but I haven't tested yet.
Anyway StackExchange.Redis has nothing to do with this issue and it works pretty stable in our production environment, thanks to Marc Gravell.
FINAL UPDATE:
Redis is single-threaded solution - it is ultimately fast but when it comes to the point of releasing the memory (Removing items that are stale or expired) the problems are emerged due to one thread should reclaim the memory (that is not fast operation - whatever algorithm is used) and the same thread should handle GET, SET operations. Of course it happens when we are talking about medium-loaded production environment. Even if you use a cluster with slaves when the memory barrier is reached it will have the same behavior.
It looks like in most cases this exception is a client issue. Previous versions of StackExchange.Redis used Win32 socket directly which sometimes has a negative impact. Probably Asp.net internal routing somehow related to it.
The good news is that StackExchange.Redis's network infra was completely rewritten recently. The last version is 2.0.513. Try it and there is a good chance that your problem will go.
I have a Lucene (4.1) index of about 500M documents. I try to build a search interface on it, but I run into some performance issues.
Initially, I show all the hits (paginated) by using a MatchAllDocumentsQuery. This search takes long (about 10 seconds). I think this is because of the collector I use, it is one that tries to find the total number of hits TotalHitCountCollector.
I would like to be able to time-limit the query, so I found the TimeLimitingCollector. Unfortunatly the API docs are a bit shady. It uses a Counter that is not much documented.
Does anyone have experience using the TimeLimitingCollector in Lucene 4.x? And if so, are there approaches to get a guesstimate on the total number of hits?
I read the: https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/core/org/apache/lucene/search/TimeLimitingCollector.html and the example, but it is not clear on setting the Counter and how to use that in combination with the numTicks
Counter can either be thread safe or not - just use the static Counter.newCounter(boolean threadSafe) method to instantiate one that fits you.
Then, let's say we allow 10 ticks and we update ticks in a separate thread. Code should look like this:
Counter clock = Counter.newCounter(true);
TimeLimitingCollector collector = new TimeLimitingCollector(c, clock, 10);
collector.setBaseline(0);
new Thread() {
public void run() {
clock.addAndGet(1); // will kill the indexSearcher.search(...) after 10 ticks (10 seconds)
Thread.sleep(1000); // try-catch is necessary here, yes
}
}.start();
indexSearcher.search(query, collector);
I, however, find the above a bit cumbersome. Guava's TimeLimiter.callWithTimeout(...) looks much cleaner even though not native to Lucene.