I am trying to implement session management, where we store jwt token to redis. Now I want remove the key if the object idle time is more than 8 hours. Pls help
There is no good reason that comes to my mind for using IDLETIME instead of using the much simpler pattern of issuing a GET followed by an EXPIRE apart from very trivial memory requirements for key expiry.
Recommended Way: GET and EXPIRE
GET the key you want.
Issue an EXPIRE <key> 28800.
Way using OBJECT IDLETIME, DEL and some application logic:
GET the key you want.
Call OBJECT IDLETIME <key>.
Check in your application code if the idletime > 8h.
If condition 3 is met, then issue a DEL command.
The second way is more cumbersome and introduces network latency since you need three round trips to your redis server while the first solution just does it in one round trip if you use a pipeline or two round trips without any app server time at worst.
This is what I did using Jedis. I am fetching 1000 records at a time. You can add a loop to fetch all records in a batch.
Jedis jedis = new Jedis("addURLHere");
ScanParams scanParams = new ScanParams().count(1000);
ScanResult<String> scanResult = jedis.scan(ScanParams.SCAN_POINTER_START, scanParams);
List<String> result = scanResult.getResult();
result.stream().forEach((key) -> {
if (jedis.objectIdletime(key) > 8 * 60 * 60) { // more than 5 days
//your functionality here
}
});`
Related
I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}
I Am going to build a system for flash sale which will share the same Redis instance and will run on 15 servers at a time.
So the algorithm of Flash sale will be.
Set Max inventory for any product id in Redis
using redisTemplate.opsForValue().set(key, 400L);
for every request :
get current inventory using Long val = redisTemplate.opsForValue().get(key);
check if it is non zero
if (val == null || val == 0) {
System.out.println("not taking order....");
}
else{
put order in kafka
and decrement using redisTemplate.opsForValue().decrement(key)
}
But the problem here is concurrency :
If I set inventory 400 and test it with 500 request thread,
Inventory becomes negative,
If I make function synchronized I cannot manage it in distributed servers.
So what will be the best approach to it?
Note: I can not go for RDBMS and set isolation level because of high request count.
Redis is monothreaded, so running a Lua Script on it is always atomic.
You can define then a Lua script on your Redis instance and running it from your Spring instances.
Your Lua script would just be a sequence of operations to execute against your redis instance (the only one to have the correct value of your stock) and returns the new value for instance or an error if the value is negative.
Your Lua script is basically a Redis transaction, there are other methods to achieve Redis transaction but IMHO Lua is the simplest above all (maybe the least performant, but I have found that in most cases it is fast enough).
Recently, I'm using Redis to cache token for OpenStack Keystone. The function is fine, but some expired cache data still in Redis.
my Keystone config:
[cache]
enabled=true
backend=dogpile.cache.redis
backend_argument=url:redis://127.0.0.1:6379
[token]
provider = uuid
caching=true
cache_time= 3600
driver = kvs
expiration = 3600
but some expired data in Redis:
Data was over expiration time, but still in here, because the TTL is -1.
My question:
How can I change settings to stop this rubbish data created?
Is some gracefully way to clean it up?
I was trying to use command 'keystone-manage token_flush', but after reading code, I realized this command just clean up the expired tokens in Mysql
I hope this question still relevant.
I'm trying to do the same thing as you are, and for now the only option I found working is the argument on dogpile.cache.redis: redis_expiration_time.
Checkout the backend dogpile.redis API or source code.
http://dogpilecache.readthedocs.io/en/latest/api.html#dogpile.cache.backends.redis.RedisBackend.params.redis_expiration_time
The only problem with this argument is that it does not let you choose a different TTL for different categories, for example you want tokens for 10 minutes and catalog for 24 hours or so. The other parameters on keystone.conf just don't work from my experience (expiration_time and cache_time on each category)... Anyway this problem isn't relevant if you are using redis to store only keystone tokens.
[cache]
enabled=true
backend=dogpile.cache.redis
backend_argument=url:redis://127.0.0.1:6379
// Add this line
backend_argument=redis_expiration_time:[TTL]
Just replace the [TTL] with your wanted ttl and you'll start noticing keys with ttl in redis and after a while you will see that they are no more.
about the second question:
This is maybe not the best answer you'll see, but you can use OBJECT idletime [key] command on redis-cli to see how much time the specific key wasn't used (even GET reset idletime). You can delete the keys that have bigger idletime than your token revocation using a simple script.
Remember that the data on Redis isn't persistent data, meaning you can always use FLUSHALL and your OpenStack and keystone will work as usual, but ofc the first authentications will take longer.
I have a webservice that stores an authenticated users token in the HttpRuntime.Cache to be used on all subsequent requests. The cached item has a sliding expiration on it of 24 hours.
Secondly I have a vb.net app that is pinging this webservice every 15 seconds. It gets authenticated once, then uses the cached token for all subsequent requests. My problem is that the application appears to lose authentication at random intervals of time less than the 24 hr sliding expiration. However with it getting pinged every 15 sec the authentication should never expire.
I am looking for a way to view the HttpRuntime.cache to try and determine if the problem is in the webservice security methods or within the vb.net app. Can I view the HttpRuntime.cache somehow?
The webservice is part of a web forms site that was built with asp.net 2.0 on a Windows Server 2008.
The name of my key's were unknown as they were system generated guid values with a username as the value. So in order to view a cache collection that was unknown I used a simple loop as follows.
Dim CacheEnum As IDictionaryEnumerator = Cache.GetEnumerator()
While CacheEnum.MoveNext()
Dim cacheItem As String = Server.HtmlEncode(CacheEnum.Entry.Key.ToString())
Dim cacheItem2 As String = Server.HtmlEncode(CacheEnum.Entry.Value.ToString())
Response.Write(cacheItem & ":" & cacheItem2 & "<br />")
End While
Hope this helps others.
First off, HttpRuntime.Cache would not be the best place to store user authentication information. You should instead use HttpContext.Current.Session to store such data. Technically the cache is allowed to "pop" things in it at its own will (whenever it decides to).
If you actually need to use the cache, you can check if your item is in the cache by simply doing:
HttpRuntime.Cache["Key"] == null
We are trying to track down the cause of a performance problem.
We have a table with a single row that contains a primary key and a counter. Within a transaction we read the value of the counter, increment the value by one and save the new value.
The read and update is done using Entity Framework, we use a serializable transaction scope, we need to ensure that a counter value is read once only.
Most of the time this takes 0.1 seconds, then sometimes it takes over 1 second. We have not been able to find any pattern as to why this happen.
Has anyone else experienced variable performance when using transaction scope? Would it help to drop using transaction scope and set the transaction directly on the connection?
I remember commenting on this question a long time ago, but recently some developers in my shop have started using TransactionScope, and have also run into performance issues. While trying to search for some information, this came up pretty high in the Google Search results.
The issue we ran into was that, apparently chaining commands (INSERTs, etc.) with BeginChain does not work when using a TransactionScope (at least on the version we are running, Client v9.7.4.4 connecting to DB2 z/OS v 10).
I thought that I would leave a workaround for the issue we ran into (slow performance when running lots [1k+] of INSERTs under TransactionScope, but ran fine when the scope was removed and chaining was allowed). I'm not really sure if it would help the original question directly, but there are some options if you look in the IBM.Data.DB2.dll classes that allow you to update rows using a DB2DataAdapter and an underlying DataTable.
Here's some example code in VB.NET:
Private Sub InsertByBulk(tableName As String, insertCollection As List(Of Object))
Dim curTimestamp = Date.Now
Using scope = New TransactionScope
'Something that opens a connection to DB2, may vary
Using conn = GetDB2Connection()
'Dumb query to get DataTable from the ResultSet
Dim sql = String.Format("SELECT * FROM {0} WHERE 1=0", tableName)
Using adapter = New DB2DataAdapter(sql, conn)
Using table As New DataTable
adapter.FillSchema(table, SchemaType.Source)
For Each item In insertCollection
Dim row = table.NewRow()
row("ID") = item.Id
row("CHAR_FIELD") = item.CharField
row("QUANTITY") = item.Quantity
row("UPDATE_TIMESTAMP") = curTimestamp
table.Rows.Add(row)
Next
Using bc = New DB2BulkCopy(conn)
bc.DestinationTableName = tableName
bc.WriteToServer(table)
End Using 'BulkCopy
End Using 'DataTable
End Using 'DataAdapter
End Using 'Connection
scope.Complete()
End Using
End Sub
We have now solved this problem.
The root of the problem was that the DB2 provider does not support transaction promotion. This results in Transaction Scope using MSDTC distributed transactions for everything.
We replaced the use of Transaction Scope with transactions set on the database connection.
Composite services that included the code in the question were then reduced from 3 seconds to 0.3 seconds.