Redis session with concurrent user licensing - session

I want to use Redis for session. Users will be stored in Redis with expire which will be updated on each request. I would like to implement concurrent license.
How can I count number of currently stored keys?
I found out that there is KEYS command but it should not be used on production. I also thought about some triggers when key expires but again it's not what I should rely on.
How can I implement concurrent user licensing with Redis?

This is not a great use for EXPIRE or the top level of Redis keys. If you ever want to store anything else in Redis, it will mess up your logic. Also, although you can count the total number of keys with a command like DBSIZE, it may be inaccurate because Redis does not actively expire items. In my impression, Redis is built so that the exact number of keys in the top level should not be important.
For cases where an exact number of keys is important, Redis has some great datastructures you can make use of. In your case, I'd recommend a sorted set where the key is a user_id and the score is the expiration date in Unix time. This would look something like:
ZADD users_set 1453771862 "user1"
ZADD users_set 1453771563 "user2"
ZADD users_set 1453779999 "user3"
Then, any time you need to know how many current users there are, you can just do a ZCOUNT for all expiration times higher than the current time:
ZCOUNT users_set 1453771850
>>> 2
ZADD operations are idempotent, so you can also easily add/update expiration times on users:
ZADD users_set 1453779999 "user2"
ZCOUNT users_set 1453771850
>>> 3
This way you get an exact count of relevant users every time you do a ZCOUNT, and every operation you're doing is a relatively cheap O(log(n)).
Finally, if literally removing/expiring users from the sorted set is important to you, you can do this as a pretty cheap batch job with ZREMRANGEBYRANK on whatever interval you like.

Related

Is it better to use txt file to get the current counter value instead of database?

I am working on a website in laravel, wherein I am loading a current counter value from the database. And then the user clicks on the button to increase the score.
But as the website has around 4000 concurrent users at any given time, the Database connection is taking its toll on the server and resulting in timeouts.
If I load the current score from the txt file and then write it back to the same file, will it be better?
Or should I use an Application variable to store the score?
I have tried using the cache, but it doesn't pull the latest value. Database optimization is also not working due to the number of users.
I am looking at best way to show and increment counter without database usage.
A database would do a better job. A NoSQL database is perfect for your use case. You can use Redis, it stores the data in-memory (RAM), which means read and write operations will be much faster than other database that operates in secondary disk (Hard Drive).
Redis itself supports data structure to increment values, using INCR command. INCR increments the number stored at key by one. If the key does not exist, it is set to 0 before performing the operation.
For example your key that holds the value is my_counter. You can play around with redis-cli like so.
redis> SET my_counter "10"
"OK"
redis> INCR my_counter
(integer) 11
redis> GET my_counter
"11"
Fortunately, there is a Redis client for Laravel. You can have a read here:
https://laravel.com/docs/5.8/redis
Good luck :)
Edit 1:
If a high amount of user is causing the server to slow down, you have other server and architectural options that can be set alongside a new database. Such as horizontal and vertical scaling.
References:
https://github.com/phpredis/phpredis

Proper strategy for Redis caching relational data

We have the following use case example:
We have users, stores, friends (relationships between users) and likes. We store these tables in MySQL and as a key-value stores in Redis, in order to read from the Redis cache and not hit the database. Writes are done to both data stores.
Our app is therefore VERY fast, and scalable since we rarely hit the database for reads. We are using AWS for scalable Redis.
However, we have a problem when a user is logged in and we have to show a list of stores, AND which of his friends like that store. This is a join, and Redis does not support joins directly. We'd like to know what is the best way to store and show this data. Ex: if this should be stored in a Redis table where the key value is "store/user_who likes" and mantained with every write, or maybe have an hourly cron that construct this. Then we can read already stored data or we should construct this join on demand?
We notice that not even Facebook updates this info in realtime, but rather it takes several minutes for a friend to see which of my friends likes a page we have in common.
Thanks in advance for any responses.
Depends how important it is to you. Why not store each person's friends as a set, and each store's likes as a set, and then when you need the friends who like a given store, you just take the SINTER (set intersection) between the two. Should be fast, and storing friends and store likes as sets will get you a lot of similarly nice operations as well. Not sure how you're currently using Redis cache, but you could use these as a likely cheaper memory replacement as well for getting users' friends, stores' likes, etc...
As for cron, not sure how that would help. Redis is more than fast enough to handle the above sorts of writes. Memory will be your bottleneck first.

Count redis keys without fetching them in Ruby

I'm keeping a list of online users in Redis with one key corresponding to one user. Keys are set to time out in 15 minutes, so all I have to do to see how many users have roughly been active in the past 15 minutes, I can do:
redisCli.keys('user:*').count
The problem is as the number of keys grows, the time it takes to fetch all the keys before counting them is increasing noticeably. Is there a way to count the keys without actually having to fetch all of them first?
There is an alternative to directly indexing keys in a Set or Sorted Set, which is to use the new SCAN command. It depends on the use case, memory / speed tradeoff, and required precision of the count.
Another alternative is that you use Redis HyperLogLogs, see PFADD and PFCOUNT.
Redis does not have an API for only counting keys with a specific pattern, so it is also not available in the ruby client.
What I can suggest is to have another data-structure to read to number of users from.
For instance, you can use redis's SortedSet, where you can keep each user with the timestamp of its last TTL set as the score, then you can call zcount to get the current number of active users:
redisCli.zcount('active_users', 15.minutes.ago.to_i, Time.now.to_i)
From time to time you will need to clean up the old values by:
redisCli.zremrangebyscore 'active_users', 0, 15.minutes.ago.to_i

Username uniqueness validation - Design Approach

This is a general design problem - I want to validate a username field for uniqueness when the user enters the value and tabs out. I do a Ajax validation and get a response from the server. This is all very standard. Now, what if I have a HUGE user database ? How to handle this situation ? I want to find if a username "foozbarz" is present among 150Million usernames ?
Database queries are out of question [EDIT] - Read the username database once and populate the cache/hash for faster lookup (to clarify Emil Vikström's point)
In memory databases wont help either
Keep an in-memory hash (or cache/memcache) to store all usernames - usernames can be easily hashed and lookup will be very fast. But there are some problems with this:
a. Size of the hash - can we optimize so that we can reduce the hash size ?
b. Hash/cache refresh frequencies (users might get added while we are validating)
Shard the username table based on some criteria (e.g.: A-B in table username_1 and so on) - thanks piotrek for this suggestion
Or, any other better approach ?
why don't you simply partition the data? if you have/plan to have 150M+ users i assume you have/will have budget for this. if you are just starting (with 2k users) do it traditional way with simple indexed search on database. when you have so many users that you observe performance issues and measure that this is because of your database (and not e.g. www server) then you simply put another database. on the first one you will have users with name from a to m and rest on the other one. you may choose other criterion, like hash, to make data be balanced. when you need more you will add more databases. but if you don't have so many users right now, i advise you not to do any premature optimizations. there are many things that may become a bottleneck with this amount of data
You are most likely right about doing some kind of hashing where you store the taken names and, obviously, not hashed means it's free.
What you shouldn't do is rely on that validation. There can be a lot of time between user pressing Register and user checking if name is free.
To be fair, you only have one issue here and that's consideration for whether you REALLY need to worry whether you will get 150 million users. Scalability is often an issue, but unless this happens over night, you can probably swap in a better solution before this happens.
Secondly, your worry about both users getting a THIS NAME IS FREE and then one taking it. First of all, the chances of that happening are pretty damn low. Secondly, the only ways I can think of ‘solving’ this in a way where user will never click OK with validated name and get a USERNAME TAKEN is to either
a) Remember what user validated last, store that, and if someone else registers that in a mean time, use AJAX to change the name field to taken and notify the user. Don't do this. A lot of wasted cycles and really too much effort to implement.
b) Lock usernames as user validates one, for a short period of time. This results in a lot of free usernames coming up as taken when they actually aren't. You probably don't want this either.
The easiest solution for this is to simply put hash things into the table as users actually click OK, but before doing that, check if the name exists again. If it does, just send the user back with USERNAME TAKEN. The chances of someone racing someone else for a name are really, really slim and I doubt anyone will make a big fuss over how your validator (which did its job, the name was free at the point of checking) ‘lied’ to the user.
Basically your only issue is how you want to store the nicknames.
Your #1 criteria is flawed because this is exactly what you have a database system for: to store and manage data. Why do you even have a table with usernames if you're not going to read it?
The first thing to do is improving the database system by adding an index, preferably a HASH index if your database system supports it. You will have a hard time writing anything near the performance of this yourself.
If this is not enough, you must start scaling your database, for example by building a clustered database or by partitioning the table into multiple sub-tables.
What I think is a fair thing to do is implement caching in front of the database, but for single names. Not all usernames will have a collision attempt, so you may cache the small subset where the collisions typically happen. A simple algorithm for checking the collision status of USER:
Check if USER exist in your cache. If it does:
Set a "last checked" timestamp for USER inside the cache
You are done and USER is a collision
Check the database for USER. If it does exist:
Add USER to the cache
If the cache is full (all X slots is used), remove the least recently used username from the cache (or the Y least recently used usernames, if you want to minimize cache pruning).
You are done and USER is a collision
If it didn't match the cache or the db, you are done and USER is NOT a collision.
You will of course still need a UNIQUE contraint in your database to avoid race conditions.
If you're going the traditional route you could use an appropriate index to improve the database lookup.
You could also try using something like ElasticSearch which has very low latency lookups on large data sets.
If you have 150M+ users, you will have to have in place some function that:
Checks that the user exists, and signals if not found
Verifies the password is correct, and signals if it is not
Retrieves the user's data
This problem you will have, and will have to solve it. In all likelihood with something akin to a user's query. Even if you heavily rely on sessions, still you will have the problem of "finding session X among many from a 150M+ pool", which is structurally identical to "finding user X among many from a 150M+ pool".
Once you solve the bigger problem, the problem you now have is just its step #1.
So I'd check out a scalable database solution (possibly a NoSQL one), and implement the "availability check" using that.
You might end with a
retrieveUserData(user, password = None)
which returns the user info if user and password are valid and correct. For the availability check, you would send no password, and expect an UserNotFound exception if the username is available.

Is Redis just a cache?

I have been reading some Redis docs and trying the tutorial at http://try.redis-db.com/. So far, I can't see any difference between Redis and caching technologies like Velocity or the Enterprise Library Caching Framework
You're effectively just adding objects to an in-memory data store using a unique key. There do not seem to be any relational semantics...
What am I missing?
No, Redis is much more than a cache.
Like a Cache, Redis stores key=value pairs. But unlike a cache, Redis lets you operate on the values. There are 5 data types in Redis - Strings, Sets, Hash, Lists and Sorted Sets. Each data type exposes various operations.
The best way to understand Redis is to model an application without thinking about how you are going to store it in a database.
Lets say we want to build StackOverflow.com. To keep it simple, we need Questions, Answers, Tags and Users.
Modeling Questions, Users and Answers
Each object can be modeled as a Map. For example, a Question is a map with fields {id, title, date_asked, votes, asked_by, status}. Similarly, an Answer is a map with fields {id, question_id, answer_text, answered_by, votes, status}. Similarly, we can model a user object.
Each of these objects can be directly stored in Redis as a Hash. To generate unique ids, you can use the atomic increment command. Something like this -
$ HINCRBY unique_ids question 1
(integer) 1
$ HMSET question:1 title "Is Redis just a cache?" asked_by 12 votes 0
OK
$ HINCRBY unique_ids answer 1
(integer) 1
$ HMSET answer:1 question_id 1 answer_text "No, its a lot more" answered_by 15 votes 1
OK
Handling Up Votes
Now, everytime someone upvotes a question or an answer, you just need to do this
$ HINCRBY question:1 votes 1
(integer) 1
$ HINCRBY question:1 votes 1
(integer) 2
List of Questions for Homepage
Next, we want to store the most recent questions to display on the home page. If you were writing a .NET or Java program, you would store the questions in a List. Turns out, that is the best way to store this in Redis as well.
Every time someone asks a question, we add its id to the list.
$ lpush questions question:1
(integer) 1
$ lpush questions question:2
(integer) 1
Now, when you want to render your homepage, you ask Redis for the most recent 25 questions.
$ lrange questions 0 24
1) "question:100"
2) "question:99"
3) "question:98"
4) "question:97"
5) "question:96"
...
25) "question:76"
Now that you have the ids, retrieve items from Redis using pipelining and show them to the user.
Questions by Tags, Sorted by Votes
Next, we want to retrieve questions for each tag. But SO allows you to see top voted questions, new questions or unanswered questions under each tag.
To model this, we use Redis' Sorted Set feature. A Sorted Set allows you to associate a score with each element. You can then retrieve elements based on their scores.
Lets go ahead and do this for the Redis tag
$ zadd questions_by_votes_tagged:redis 2 question:1
(integer) 1
$ zadd questions_by_votes_tagged:redis 10 question:2
(integer) 1
$ zadd questions_by_votes_tagged:redis 5 question:613
(integer) 1
$ zrange questions_by_votes_tagged:redis 0 5
1) "question:1"
2) "question:613"
3) "question:2"
$ zrevrange questions_by_votes_tagged:redis 0 5
1) "question:2"
2) "question:613"
3) "question:1"
What did we do over here? We added questions to a sorted set, and associated a score (number of votes) to each question. Each time a question gets upvoted, we will increment its score. And when a user clicks "Questions tagged Redis, sorted by votes", we just do a zrevrange and get back the top questions.
Realtime Questions without refreshing page
And finally, a bonus feature. If you keep the questions page opened, SO will notify you when a new question is added. How can Redis help over here?
Redis has a pub-sub model. You can create channels, for example "channel_questions_tagged_redis". You then subscribe users to a particular channel. When a new question is added, you would publish a message to that channel. All users would then get the message. You will have to use a web technology like web sockets or comet to actually deliver the message to the browser, but Redis helps you with all the plumbing on the server side.
Persistence, Reliability etc.
Unlike a Cache, Redis persists data on the hard disk. You can have a master-slave setup to provide better reliability. To learn more, go through Persistence and Replication topics over here - http://redis.io/documentation
Not just a cache.
In memory key-value storage
Support multiple datatypes (strings, hashes, lists, sets, sorted sets, bitmaps, and hyperloglogs)
It provides an ability to store cache data into physical storage (if needed).
Support pub-sub model
Redis cache provides replication for high availability (master/slave)
Redis has unique abilities like ultra-fast lua-scripts. Its execution time equals to C commands execution. This also brings atomicity for sophisticated Redis data manipulation required for work many advanced objects like Locks and Semaphores.
There is a Redis based in memory data grid called Redisson which allows to easily build distributed application on Java. Thanks to distributed Lock, Semaphore, ReadWriteLock, CountDownLatch, ConcurrentMap objects and many others.
Perfectly works in cloud and supports AWS Elasticache, AWS Elasticache Cluster and Azure Redis Cache support
Actually there is no dependency between relative data representation (or any type of data representation) and database role (cache, permanent persistence etc).
Redis is good for cache it's true, but it's much more then just a cache. It's high speed fully in-memory database. It does persist data on disk. It's not relational, it's key-value storage.
We use it in production. Redis helps us to build software that handles thousands of requests per second and keep customer business data during whole natural lifecycle.
Redis is a cache which best suited for distributed environment/Microservice architecture.
It is fast, reliable, provides atomicity and consistency and has range of datatypes such as sets, hashes, list etc.
I am using it from last one year and it really comes as a saviour when you to need provide a production ready solution very fast and for any performance related issues as you can always use it to cache data.
Redis supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
implementaion with python
https://beyondexperiment.com/vijayravichandran06/redis-data-structure-with-python/
Usages of Redis:
Cache with multiple data structures, like: string, set, zset, list, hash and bitmap (which could be used in many aggregation use cases)
KV DB. Data in Reids memory can be stored on disk: RDB and AOF can get the snapshots and edit logs.
Message Queue. But one message can only be consumed by one consumer
Pubsub
Distributed lock. Rely on the setnx command, and only the first thread executing it successfully will hold the lock. https://redis.io/commands/setnx
it is not just key-value cache, it is key-dataStructure cache.
Redis is not only cache, but also a data store. whatever is written to the cache is also written to the disk. that allows us to take backups. this allows us to restart our cache nodes. If we restart them, our cache nodes will be prepopulated with the backup. we can restart the entire cluster. But in Memcached, when a Memcached node fails or restarts, all keys stored on that node are lost
redis is also used as a message-queue
As an addition, Redis has capabilities beside caching purpose. Based on latest Redis Documentation (https://redis.io/docs/modules/), Redis has some external modules that support different kind of tasks such as:
Redis Search, full-text search capability
Redis Graph, graph database on top of Redis
Redis Time Series, module that adds a time series data structure to Redis.
Redis AI,
Neural Network for Redis, neural networks module for Redis
etc.
Personally, I used Redis for message queue by utilize Celery for Django REST Framework application beside caching at production.
Its key value datastore ,mainly deployed in private subnet main in conjunction with cloud databases to provide micro second latency. Its able to provide that with either lazy loading or write through strategy ,based on specific use-case.
It way more complex than memcached & operates in cluster -enabled/disabled mode.
It supports shards, which makes data highly avialable ,multi- az deployment.
It supports encryption of data # rest & in transit
& is extremely useful for use-cases such as streaming application ,messaging ,real time analytics ..& applications where data's value depreciates at a very fast pace w.r.t time...
Hence its not just cache ,it brings allot many more features with it ,which makes it all the more useful
Besides being a cache server, Redis is specifically a data structure server.
Being a cache in the form of a data structure server means a lot, because data structures are fundamentals of programs, or applications. Consider you are using SQL databases as storage technology and need to construct a list, a hash map, a ranking set or things like that, it's kind of pain in the neck. Redis can provide you these functionalities directly in a very simple way, thus highly simplify the development.
On the other hand, a data structure server does not have to be in the form of a cache. There are projects compatible with Redis but have persistent storage engines.
In addition to so far made answer's and then to summarize
Redis is a very fast non-relational database that stores a mapping of keys to five different types of values (strings, hashes, lists, sets, sorted sets, bitmaps, and hyperloglogs). This is explained by details #Sripathi Krishnan answers.
Redis supports in-memory persistent storage on disk
Replication to scale read performance
Client-side sharding to scale write performance
If you want to get more detail and depth information about Redis, you can look at Redis In Action and Redis Essentials's books.

Resources