Purge data from redis cache in chunk

Purge data from redis cache in chunk - caching

I need to delete a pool from redis cache. However, this pool might have millions of keys. I am using following code to delete the keys from cache
String regex = "*." + poolname + ".*";
Set<String> rkeys = jedis.keys(regex);
for (String key : rkeys) {
LOGGER.info("key ===>" + key);
jedis.del(key);
}
I am afraid that redis server might crash in case, there are million rows.
Is there any way I can tell redis to select only 100 rows and delete at time. Something like
while (true) {
//sleep for 1 minute
//get 100 rows from cache
if (keys.isEmpty()) {
break;
}
jedis.del(key);
}

Redis shouldn't ever crash, and I would test the scenario before making my code more complicated on a hunch. I just created a million keys and deleted them. It took 2 minutes and the bottleneck was the ruby client, not redid.
That said, you may want to check out https://redis.io/commands/unlink, which is a new non-blocking version of DEL.

Related

How do I update one column of all rows in a large table in my Spring Boot application?

I have a Spring Boot 2.x project with a big Table in my Cassandra Database. In my Liquibase Migration Class, I need to replace a value from one column in all rows.
For me its a big perfomance hit, when I try to solve this with
SELECT * FROM BOOKING
forEach Row
Update Row
Because of the total number of rows. Even when I select only 1 Column.
Is it possible to make something like "partwise/pagination" loop?
Pseudecode
Take first 1000 rows
do Update
Take next 1000 rows
do Update
loop.
Im also happy about all other solution approaches you have.

Must known:
Make sure there is a way to group the updates by partition. If you try a batchUpdate on 1000 rows not in same partition the coordinator of the request will suffer, you are moving the load from your client to the coordinator, and you want the parallelize the writes instead. A batchUpdate with cassandra has nothing to do with the one in relational databases.
For fined-grained operations like this you want to go back to the usage of the drivers with CassandraOperations and CqlSession for maximum control
There is a way to paginate with Spring Data cassandra using Slice but do not have control over how operations are implemented.
Spring Data Cassandra core
Slice<MyEntity> slice = MyEntityRepo.findAll(CassandraPageRequest.first(size));
while(slice.hasNext() && currpage < page) {
slice = personrepo.findAll(slice.nextPageable());
currpage++;
}
slice.getContent();
Drivers:
// Prepare Statements to speed up queries
PreparedStatement selectPS = session.prepare(QueryBuilder
.selectFrom( "myEntity").all()
.build()
.setPageSize(1000) // 1000 per pages
.setTimeout(Duration.ofSeconds(10)); // 10s timeout
PreparedStatement updatePS = session.prepare(QueryBuilder
.update("mytable")
.setColumn("myColumn", QueryBuilder.bindMarker())
.whereColumn("myPK").isEqualTo(QueryBuilder.bindMarker())
.build()
.setConsistencyLevel(ConsistencyLevel.ONE)); // Fast writes
// Paginate
ResultSet page1 = session.execute(selectPS);
Iterator<Row> page1Iter = page1.iterator();
while (0 < page1.getAvailableWithoutFetching()) {
Row row = page1Iter.next();
cqlsession.executeAsync(updatePS.bind(...));
}
ByteBuffer pagingStateAsBytes =
page1.getExecutionInfo().getPagingState();
selectPS.setPagingState(pagingStateAsBytes);
ResultSet page2 = session.execute(selectPS);
You could of course include this pagination in a loop and track progress.

Caching is working for one hour while it should be for days

I have created an API using .NETCore 2.0 ; This API is connected to an oracle database to retrieve needed data; One of the functions takes too much time so I decided to use caching in order to retrieve data faster;
Function description: Get ranking
Caching period: Data should be renewed in cache memory each Monday
I am using IMemoryCache, but the problem is that data is not being cached for multiple days; It lasts only for one hour, after that data is being retrieved from database and takes too much time (10 s.); Below is my code:
var dateNow = DateTime.Now;
int diff = 7; // if today is Monday then should add 7 days to get next Monday date
if (dateNow.DayOfWeek != DayOfWeek.Monday) {
var daysToStartWeek = dateNow.DayOfWeek - DayOfWeek.Monday;
diff = (7 - (daysToStartWeek)) % 7;
}
var nextMonday = dateNow.AddDays(diff).Date;
var totalDays = (nextMonday - dateNow).TotalDays;
if (_cache.TryGetValue("GetRanking", out IEnumerable<GetRankingStruct> objRanking))
{
return Ok(objRanking);
}
var dp = new DataProvider(Configuration);
var response = dp.GetRanking(userName, asAtDate);
_cache.Set("GetRanking", response, TimeSpan.FromDays(diff));
return Ok(response);
Could be related to the token life Time since it's only 1 hour?

Firstly - have you tried checking to see if your worker process is being restarted? You don't specify how you are hosting your application but, obviously, if the application (worker process) is restarted your memory cache will be empty.
If your worker process / process is restarting then you could load the cache on start up.
Secondly - I believe that the implementation may choose to empty the cache due to inactivity or memory constraints. You can set the priority to never remove - https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.cacheitempriority?view=dotnet-plat-ext-3.1
I believe you can set this by passing a MemoryCacheOptions object to the constructor of the memory cache https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.memorycache.-ctor?view=dotnet-plat-ext-3.1#Microsoft_Extensions_Caching_Memory_MemoryCache__ctor_Microsoft_Extensions_Options_IOptions_Microsoft_Extensions_Caching_Memory_MemoryCacheOptions__.
Finally - I assume you've made your _cache object static so it is shared by all instances of your class. (Or made the controller, if that's what it is, a singleton).
These are my suggestions.
Good luck.

Is there a way to speed up the in-memory full text search indexing speed?

To my surprise I have discovered that the indexing of documents into the full text search engine in H2 is comparably slow and I would like to speed that up.
I'm using the in-memory version of H2, which makes this case especially surprising.
Some benchmarks using 100k small documents (only title and some tags):
Using org.h2.fulltext.FullTextLucene.init it takes ~15s to index.
Using org.h2.fulltext.FullText.init makes no change.
The SQL inserting alone (i.e. full text indexing disabled) only takes 1s.
When using Elasticsearch (with bulk indexing) I would expect that this amount would be processed and be searchable within 3s, i.e. it's even stored on disk.
Some additional info which might help:
Connection is reused.
No stop words are used (but that wouldn't make much difference in terms of documents size).
EDIT_2: I added a big list of stop words (>100). This made it like <10% faster (~15s to ~14s).
The SQL inserting alone (i.e. full text indexing disabled) only takes 1s, so the problem should be with the full text search indexing.
The official tutorial and page about performance don't seem to offer a solution.
There doesn't seem to be a possibility for bulk indexing like in Elasticsearch.
EDIT_1: I also tried to create the SQL table and inserts FIRST (which take 1s) and AFTER THAT create the full text search index and run FullTextLucene.reindex(). But that makes the process even a bit slower.
If it's of any help, here's the code how the index is created and the inserts are made:
Create index:
private void createTablesAndLuceneIndex() {
try {
final Statement statement = this.createStatement();
statement.execute("CREATE ALIAS IF NOT EXISTS FT_INIT FOR \"org.h2.fulltext.FullTextLucene.init\"");
statement.execute("CALL FT_INIT()");
// FullTextLucene.setIgnoreList(this.conn, "to,this"); // Do we need stop words?
FullTextLucene.setWhitespaceChars(this.conn, " ,.-");
// Set up SQL table & Lucene index
statement.execute("CREATE TABLE " + PNS_VIDEOS + "(ID INT PRIMARY KEY, TITLE VARCHAR, TAGS VARCHAR, ACTORS VARCHAR)");
statement.execute("CALL FT_CREATE_INDEX('PUBLIC', '" + PNS_VIDEOS + "', NULL)");
// Close statement
statement.close();
} catch (final SQLException e) {
throw new SqlTableCreationException(e); // todo logging?!
}
}
Index document:
public void index(final PnsVideo pnsVideo) {
try (PreparedStatement statement = this.conn.prepareStatement("INSERT INTO " + PNS_VIDEOS + " VALUES(?, ?, ?, ?)")) {
statement.setInt(1, this.autoKey.getAndIncrement());
statement.setString(2, pnsVideo.getTitle());
statement.setString(3, Joiner.on(",").join(pnsVideo.getTags()));
statement.setString(4, Joiner.on(",").join(pnsVideo.getActors()));
statement.execute();
} catch (final SQLException e) {
throw new FTSearchIndexException(e); // todo logging?!
}
}
Thanks for any suggestion!

Grails hibernate session in batches

GORM works fine out of the box as long as there is no batch with more than 10.000 objects. Without optimisation you will face the outOfMemory problems.
The common solution is to flush() and clear() the session each n (e.g.n=500) objects:
Session session = sessionFactory.currentSession
Transaction tx = session.beginTransaction();
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
Date yesterday = new Date() - 1
Criteria c = session.createCriteria(Foo.class)
c.add(Restrictions.lt('lastUpdated',yesterday))
ScrollableResults rawObjects = c.scroll(ScrollMode.FORWARD_ONLY)
int count=0;
while ( rawObjects.next() ) {
def rawOject = rawObjects.get(0);
fooService.doSomething()
int batchSize = 500
if ( ++count % batchSize == 0 ) {
//flush a batch of updates and release memory:
try{
session.flush();
}catch(Exception e){
log.error(session)
log.error(" error: " + e.message)
throw e
}
session.clear();
propertyInstanceMap.get().clear()
}
}
session.flush()
session.clear()
tx.commit()
But there are some problems I can't solve:
If I use currentSession, then the controller fails because of session is empty
If I use sessionFactory.openSession(), then the currentSession is still used inside FooService. Of cause I can use the session.save(object) notation. But this means, that I have to modify fooService.doSomething() and duplicate code for single operation (common grails notation like fooObject.save() ) and batch operation (session.save(fooObject() ).. notation).
If I use Foo.withSession{session->} or Foo.withNewSession{session->}, then the objects of Foo Class are cleared by session.clear() as expected. All the other objects are not cleared(), what leads to memory leak.
Of cause I can use evict(object) to manualy clear the session. But it is nearly impossible to get all relevant objects, due to autofetching of assosiations.
So I have no idea how to solve my problems without making the FooService.doSomething() more complex. I'm looking for something like withSession{} for all domains. Or to save session at the begin (Session tmp = currentSession) and do something like sessionFactory.setCurrentSession(tmp). Both doesn't exists!
Any idea is wellcome!

I would recommend to use stateless session for this kind of batch processing. See this post: Using StatelessSession for Batch processing

A modified approach to what you are doing would be:
Loop over your entire collection (rawObjects) and save a list of all the ids for those objects.
Loop over the list of ids. At each iteration, look up just that single object, by its id.
Then use the same periodic clearing of the session cache like you are doing now.
By the way, someone else has suggested an approach similar to yours. But note that the code in this link is incorrect; the lines that clear the session should be inside the if statement, just like you have in your solution.

Ehcache Statistics by key

I am interested in getting statistics on the Ehcache I have running.
I would like to see the number of hits/misses for a given key over a period of time. Perhaps in the form of a map. For example.
For the passed hour (or however long it has been running)
Key A had 30 hits and 2 misses Key
B had 400 hits and 100 misses Key
C had 2 hits and 1 misses Key D
had 150 hits and 10 misses
I have looked through the documentation (SampledCacheStatistics, SampledCacheStatisticsImpl, SampledCacheStatisticsWrapper, etc) and I am having a terrible time figuring this out.
Has anyone else had experience implementing this?
Any help or ideas on this would be MUCH appreciated!

The EhCache Monitor gives you that type of information... http://ehcache.org/documentation/monitor.html
Programmatic access is available as follows:
CacheManager cacheManager = CacheManager.getInstance();
String[] cacheNames = cacheManager.getCacheNames();
for (int i = 0; i < cacheNames.length; i++) {
String cacheName = cacheNames[i];
System.out.println(cacheName+" - "+ cacheManager.getCache(cacheName).getStatistics().toString());
}

You can't track misses on a per-key basis because the statistics are stored on object IN the cache and if there was a miss, there would be no element in the cache to track it. But if you want a hit-count for all the keys in a cache you'd need to do something like:
public Map<Object,long> getKeyHits(Ehcache cache)
{
Map<Object,long> hitMap = new HashMap<Object,long>();
Map<Object,Element> allElements = cache.getAll(cache.getKeys());
for (Object key : allElements.keySet())
{
hitMap.put(key, allElements.get(key).hitCount());
}
return hitMap;
}
If you'd rather see statistics aggregated over an entire cache (or you want to track misses), you can call getStatistics() on the cache. See http://ehcache.org/apidocs/net/sf/ehcache/Ehcache.html.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Purge data from redis cache in chunk - caching

Related

How do I update one column of all rows in a large table in my Spring Boot application?

Caching is working for one hour while it should be for days

Is there a way to speed up the in-memory full text search indexing speed?

Grails hibernate session in batches

Ehcache Statistics by key

Categories

Resources