Ehcache & multi-threading: how to lock when inserting to the cache?

Ehcache & multi-threading: how to lock when inserting to the cache? - ehcache

Let's suppose I have a multi-threading application with 4 threads which share one (Eh)cache; the cache stores UserProfile objects in order to avoid fetching them from the database every time.
Now, let's say all these 4 threads request the same UserProfile with ID=123 at the same moment - and it hasn't been cached yet. What has to be done is to query the database and insert obtained UserProfile object into the cache so it could be reused later.
However, what I want to achieve is that only one of these threads (the first one) queries the database and updates the cache, while the other 3 wait (queue) for it to finish... and then get the UserProfile object with ID=123 directly from cache.
How do you usually implement such scenario? Using Ehcache's locking/transactions? Or rather through something like this? (pseudo-code)
public UserProfile getUserProfile(int id) {
result = ehcache.get(id)
if (result == null) { // not cached yet
synchronized { // queue threads
result = ehcache.get(id)
if (result == null) { // is current thread the 1st one?
result = database.fetchUserProfile(id)
ehcache.put(id, result)
}
}
}
return result
}

This is called a Thundering Herd problem.
Locking works but it's really efficient because the lock is broader than what you would like. You could lock on a single ID.
You can do 2 things. One is to use a CacheLoaderWriter. It will load the missing entry and perform the lock at the right granularity. This is the easiest solution even though you have to implement a loader-writer.
The alternative is more involved. You need some kind of row-locking algorithm. For example, you could do something like this:
private final ReentrantLock locks = new ReentrantLocks[1024];
{
for(int i = 0; i < locks.length; i)) {
locks[i] = new ReentrantLock();
}
}
public UserProfile getUserProfile(int id) {
result = ehcache.get(id)
if (result == null) { // not cached yet
ReentrantLock lock = locks[id % locks.length];
lock.lock();
try {
result = ehcache.get(id)
if (result == null) { // is current thread the 1st one?
result = database.fetchUserProfile(id)
ehcache.put(id, result)
}
} finally {
lock.unlock();
}
}
return result
}

Use a plain java object lock :
private static final Object LOCK = new Object();
synchronized (LOCK) {
result = ehcache.get(id);
if ( result == null || ehcache.isExpired() ) {
// cache is expired or null so going to DB
result = database.fetchUserProfile(id);
ehcache.put(id, result)
}
}

Related

Hibernate saveAndFlush() takes a long time for 10K By-Row Inserts

I am a Hibernate novice. I have the following code which persists a large number (say 10K) of rows from a List<String>:
#Override
#Transactional(readOnly = false)
public void createParticipantsAccounts(long studyId, List<String> subjectIds) throws Exception {
StudyT study = studyDAO.getStudyByStudyId(studyId);
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
for(String subjectId: subjectIds) { // LOOP with saveAndFlush() for each
// ...
user.setRoleTypeId(4);
user.setActiveFlag("Y");
user.setCreatedBy(auth.getPrincipal().toString().toLowerCase());
user.setCreatedDate(new Date());
List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
StudyParticipantsT sp = new StudyParticipantsT();
sp.setStudyT(study);
sp.setUsersT(user);
sp.setSubjectId(subjectId);
sp.setLocked("N");
sp.setCreatedBy(auth.getPrincipal().toString().toLowerCase());
sp.setCreatedDate(new Date());
participants.add(sp);
user.setStudyParticipantsTs(participants);
userDAO.saveAndFlush(user);
}
}
}
But this operation takes too long, about 5-10 min. for 10K rows. What is the proper way to improve this? Do I really need to rewrite the whole thing with a Batch Insert, or is there something simple I can tweak?
NOTE I also tried userDAO.save() without the Flush, and userDAO.flush() at the end outside the for-loop. But this didn't help, same bad performance.

We solved it. Batch-Inserts are done with saveAll. We define a batch size, say 1000, and saveAll the list and then reset. If at the end (an edge condition) we also save. This dramatically sped up all the inserts.
int batchSize = 1000;
// List for Batch-Inserts
List<UsersT> batchInsertUsers = new ArrayList<UsersT>();
for(int i = 0; i < subjectIds.size(); i++) {
String subjectId = subjectIds.get(i);
UsersT user = new UsersT();
// Fill out the object here...
// ...
// Add to Batch-Insert List; if list size ready for batch-insert, or if at the end of all subjectIds, do Batch-Insert saveAll() and clear the list
batchInsertUsers.add(user);
if (batchInsertUsers.size() == maxBatchSize || i == subjectIds.size() - 1) {
userDAO.saveAll(batchInsertUsers);
// Reset list
batchInsertUsers.clear();
}
}

Running async function in parallel using LINQ's AsParallel()

I have a Document DB repository class that has one get method like below:
private static DocumentClient client;
public async Task<TEntity> Get(string id, string partitionKey = null)
{
try
{
RequestOptions requestOptions = null;
if (partitionKey != null)
{
requestOptions = new RequestOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var result = await client.ReadDocumentAsync(
UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id),
requestOptions);
return (TEntity)(dynamic)result.Resource;
}
catch (DocumentClientException e)
{
// Have logic for different exceptions actually
throw;
}
}
I have two collections - Collection1 and Collection2. Collection1 is non-partitioned whereas Collection2 is partitioned.
On the client side, I create two repository objects, one for each collection.
private static DocumentDBRepository<Collection1Item> collection1Repository = new DocumentDBRepository<Collection1Item>("Collection1");
private static DocumentDBRepository<Collection2Item> collection2Repository = new DocumentDBRepository<Collection2Item>("Collection2");
List<Collection1Item> collection1Items = await collection1Repository.GetItemsFromCollection1(); // Selects first forty documents based on time
List<UIItem> uiItems = new List<UIItem>();
foreach (var item in collection1Items)
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
This works fine. But since it is happening sequentially with foreach, I wanted to do those Get calls in parallel. When I do it in parallel as below:
ConcurrentBag<UIItem> uiItems = new ConcurrentBag<UIItem>();
collection1Items.AsParallel().ForAll(async item => {
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
);
It doesn't work and uiItems is always empty.

You don't need Parallel.For to run async operations concurrently. If they are truly asynchronous they already run concurrently.
You could collect the task returned from each operation and simply call await Task.WhenAll() on all the tasks. If you modify your lambda to create and return a UIItem, the result of await Task.WhenAll() will be a collection of UIItems. No need to modify global state from inside the concurrent operations.
For example:
var itemTasks = collection1Items.Select(async item =>
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId);
return new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
}
});
var results= await Task.WhenAll(itemTasks);
A word of caution though - this will fire all Get operations concurrently. That may not be what you want, especially when calling a service with rate limiting.

Try simply starting tasks and waiting for all of them at the end. That would result in parallel execution.
var tasks = collection1Items.Select(async item =>
{
//var collection2Item = await storageRepository.Get...
return new UIItem
{
//...
};
});
var uiItems = await Task.WhenAll(tasks);
PLINQ is useful when working with in-memory constructs and using as many threads as possible, but if used with the async-await technique (which is for releasing threads while accessing external resources), you can end up with strange results.

I would like to share a solution for an issue i saw in some comments.
If you're scared about thread rate limit, and you want to limit this by yourself, you can do something like this, using SemaphoreSlim.
var nbCores = Environment.ProcessorCount;
var semaphore = new SemaphoreSlim(nbCores, nbCores);
var processTasks = items.Select(async x =>
{
await semaphore.WaitAsync();
try
{
await ProcessAsync();
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(processTasks);
In this example, i called concurrently my "ProcessAsync" but limited to {processor number} concurrent processes.
Hope that's help someone.
NB : You could set the "nbCores" variable as a proper value that satisfy your code condition, of course.
NB 2 : This example is for some use cases, not all of them. I would highly suggest with a big load of task to refer to TPL programming

Spring Data Neo4j Ridiculously Slow Over Rest

public List<Errand> interestFeed(Person person, int skip, int limit)
throws ControllerException {
person = validatePerson(person);
String query = String
.format("START n=node:ErrandLocation('withinDistance:[%.2f, %.2f, %.2f]') RETURN n ORDER BY n.added DESC SKIP %s LIMIT %S",
person.getLongitude(), person.getLatitude(),
person.getWidth(), skip, limit);
String queryFast = String
.format("START n=node:ErrandLocation('withinDistance:[%.2f, %.2f, %.2f]') RETURN n SKIP %s LIMIT %S",
person.getLongitude(), person.getLatitude(),
person.getWidth(), skip, limit);
Set<Errand> errands = new TreeSet<Errand>();
System.out.println(queryFast);
Result<Map<String, Object>> results = template.query(queryFast, null);
Iterator<Errand> objects = results.to(Errand.class).iterator();
return copyIterator (objects);
}
public List<Errand> copyIterator(Iterator<Errand> iter) {
Long start = System.currentTimeMillis();
Double startD = start.doubleValue();
List<Errand> copy = new ArrayList<Errand>();
while (iter.hasNext()) {
Errand e = iter.next();
copy.add(e);
System.out.println(e.getType());
}
Long end = System.currentTimeMillis();
Double endD = end.doubleValue();
p ((endD - startD)/1000);
return copy;
}
When I profile the copyIterator function it takes about 6 seconds to fetch just 10 results. I use Spring Data Neo4j Rest to connect with a Neo4j server running on my local machine. I even put a print function to see how fast the iterator is converted to a list and it does appear slow. Does each iterator.next() make a new Http call?

If Errand is a node entity then yes, spring-data-neo4j will make a http call for each entity to fetch all its labels (it's fault of neo4j, which doesn't return labels when you return whole node in cypher).
You can enable debug level logging in org.springframework.data.neo4j.rest.SpringRestCypherQueryEngine to log all cypher statements going to neo4j.
To avoid this call use #QueryResult http://docs.spring.io/spring-data/data-neo4j/docs/current/reference/html/#reference_programming-model_mapresult

hbase InternalScanner and filter in coprocessor

all:
Recently,I wrote a coprocessor in Hbase(0.94.17), A Class extends BaseEndpointCoprocessor, a rowcount method to count one table's rows.
And I got a problem.
if I did not set a filter in scan,my code works fine for two tables. One table has 1,000,000 rows,the other has 160,000,000 rows. it took about 2 minutes to count the bigger table.
however ,If I set a filter in scan, it only work on small table. it will throw a exception on the bigger table.
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1#2c88652b, java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
trust me,I check my code over and over again.
so, to count my table with filter, I have to write the following stupid code, first, I did not set a filter in scan,and then ,after I got one row record, I wrote a method to filter it.
and it work on both tables.
But I do not know why.
I try to read the scanner source code in HRegion.java,however, I did not get it.
So,if you know the answer,please help me. Thank you.
#Override
public long rowCount(Configuration conf) throws IOException {
// TODO Auto-generated method stub
Scan scan = new Scan();
parseConfiguration(conf);
Filter filter = null;
if (this.mFilterString != null && !mFilterString.equals("")) {
ParseFilter parse = new ParseFilter();
filter = parse.parseFilterString(mFilterString);
// scan.setFilter(filter);
}
scan.setCaching(this.mScanCaching);
InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getScanner(scan);
long sum = 0;
try {
List<KeyValue> curVals = new ArrayList<KeyValue>();
boolean hasMore = false;
do {
curVals.clear();
hasMore = scanner.next(curVals);
if (filter != null) {
filter.reset();
if (HbaseUtil.filterOneResult(curVals, filter)) {
continue;
}
}
sum++;
} while (hasMore);
} finally {
scanner.close();
}
return sum;
}
The following is my hbase util code:
public static boolean filterOneResult(List<KeyValue> kvList, Filter filter) {
if (kvList.size() == 0)
return true;
KeyValue kv = kvList.get(0);
if (filter.filterRowKey(kv.getBuffer(), kv.getRowOffset(), kv.getRowLength())) {
return true;
}
for (KeyValue kv2 : kvList) {
if (filter.filterKeyValue(kv2) == Filter.ReturnCode.NEXT_ROW) {
return true;
}
}
filter.filterRow(kvList);
if (filter.filterRow())
return true;
else
return false;
}

Ok,It was my mistake. After I use jdb to debug my code, I got the following exception,
"org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
It is obvious ,my result list is empty.
hasMore = scanner.next(curVals);
it means, if I use a Filter in scan,this curVals list might be empty, but hasMore is true.
but I thought,if a record was filtered, it should jump to the next row,and this list should never be empty. I was wrong.
And my client did not print any remote error message on my console, it just catch this remote Exception, and retry.
after retry 10 times, it print an another exception,which was meaningless.

X++ Coming Out Of QueryRun In Fetch Method

I can't seem to find the resolution for this. I have modified the Fetch method in a report, so that if the queryRun is changed, and the new ID is fetched, then the while loop starts over and a new page appears and 2 elements are executed. This part works fine, the next part does not, in each ID there are several Records which I am using Element.Execute(); and element.Send(); to process. What happens is, the first ID is selected, the element (body) of the reports is executed and the element is sent as expected, however the while loop does not go onto the next ID?
Here is the code;
public boolean fetch()
{
APMPriorityId oldVanId, newVanId;
LogisticsControlTable lLogisticsControlTable;
int64 cnt, counter;
;
queryRun = new QueryRun(this);
if (!queryRun.prompt() || !element.prompt())
{
return false;
}
while (queryRun.next())
{
if (queryRun.changed(tableNum(LogisticsControlTable)))
{
lLogisticsControlTable = queryRun.get(tableNum(LogisticsControlTable));
if (lLogisticsControlTable)
{
info(lLogisticsControlTable.APMPriorityId);
cnt = 0;
oldVanId = newVanId;
newVanId = lLogisticsControlTable.APMPriorityId;
if(newVanId)
{
element.newPage();
element.execute(1);
element.execute(2);
}
}
if (lLogisticsControlTable.APMPriorityId)
select count(recId) from lLogisticsControlTable where lLogisticsControlTable.APMPriorityId == newVanId;
counter = lLogisticsControlTable.RecId;
while select lLogisticsControlTable where lLogisticsControlTable.APMPriorityId == newVanId
{
cnt++;
if(lLogisticsControlTable.APMPriorityId == newVanId && cnt <= counter)
{
element.execute(3);
element.send(lLogisticsControlTable);
}
}
}
}
return true;
}

You are using lLogisticsControlTable as a target of both a queryRun.get() and a while select. However these two uses interfere; there are two SQL cursors to control.
Use two different record variables.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ehcache & multi-threading: how to lock when inserting to the cache? - ehcache

Use a plain java object lock : private static final Object LOCK = new Object(); synchronized (LOCK) { result = ehcache.get(id); if ( result == null || ehcache.isExpired() ) { // cache is expired or null so going to DB result = database.fetchUserProfile(id); ehcache.put(id, result) } }

Related

Hibernate saveAndFlush() takes a long time for 10K By-Row Inserts

Running async function in parallel using LINQ's AsParallel()

Spring Data Neo4j Ridiculously Slow Over Rest

hbase InternalScanner and filter in coprocessor

X++ Coming Out Of QueryRun In Fetch Method

Categories

Resources