Detecting updated keys on HSET in Redis in efficient way

Detecting updated keys on HSET in Redis in efficient way - performance

Lets say I have some user data in Redis cache. There are hash keys with format user:{userId} and they have keys username and city. E.g.:
HSET user:123 username "Mark" city "Chicago"
HSET user:456 username "Chris" city "New York"
HSET user:789 username "Anna" city "Las Vegas"
I periodically receive the full set of data (unfortunately not just the delta), and sometimes some users data are changing. For the instance:
data for user 123 doesn't change
city for user 456 has changed
user 789 has been deleted.
And now I need to:
update data for updated user 456
delete key for deleted user 789
And the most important part is - I need to know which users has been updated so I can inform the rest of the system about changes. Important: I don't need to know about deleted ones. I'm using Redis client for java so part of computations I can do within an app.
How can I do that in the most efficient way, with the least number of operations, making the most of the possibilities of Redis?

Related

Correct way to model foreign keys to entities that do not (yet) exist?

I'm building a Spring Boot application using Spring Data JPA. I'll give a simplified description of the application that illustrates my problem:
I have a table of Students that has a student_id primary key and various personal information (incl name, etc). This personal data is loaded from an external source and may only be retrieved once the student gives his permission to retrieve it by logging into the application. I thus cannot create a list of all the users that might log in ahead of time. They are created and inserted into the database when the student logs in the first time.
I also load data sets like historical grades into the database ahead of time. These are of the form student_id (foreign key), course_id (foreign key), grades, year (and some other fields). The point is that once a student logs in, their historical grades will be visible. However, the database (initialized as empty by Spring Data JPA) will not let me insert the historical data as it complains that e.g. student_id 1234 (foreign key in the grades table) cannot be found as a primary key in the Students table. Which is true, because I will only create that user 1234 when and if he/she logs in.
I see the following options and don't really like any of them:
disable all constraints on foreign keys for the relevant classes (in which case: How do I tell Spring Data JPA to do that? I googled but couldn't find it) -> Disabling integrity checks sounds like a bad idea though.
Create 'dummy' students, i.e. simply go through the historical data, list all the student_id's and then pre-fill my Student table with entries like student_id = 1234, name = "", address = "", etc. This data would be filled in if/when the students logs in -> This also feels like a 'dirty' solution.
Keep the historical data in .csv files, or another manually created table and have the application load it into the 'real' database only after the student logs in for the first time -> This just sounds like a terrible mess.
Conceptually I'm inclined towards option 1, because I do in fact want to create/save a piece of historical data about a student, despite having no other information about that student. I'm afraid, however, that if I e.g. retrieve all grades for a course_id, those Grade entities will contain links to Student entities that do not in fact exist and this will just result in more errors.
How do you handle such a situation?

Redis advice- when to group data and when to split it up

Hoping to get some sage advice from those that have been in the trenches, and can help me better understand when to keep data clustered together in Redis, and when to split it up.
I am working on a multi-tenant platform that has GPS data coming in every 3-5 seconds for each asset I track. As data is processed, we store additional information associated with the asset (i.e. is it on time, is it late, etc).
Each asset belongs to one or more tenant. For example, when tracking a family car, that car may exist for the husband, and the wife. Each needs to know where it is relative to their needs. For example, the car may be being used by the teenager and is on time for the husband to use it at 3:00 pm, but late for the wife to use it at 2:30 pm.
As an additional requirement, a single tenant may want read access to other tenants. I.E. the Dad wants to see the family car, and any teenager's cars. So the heirarchy can start to look something like this:
Super-Tenant --
--Super Tenant (Family)
--Tenant (Dad)
--Vehicle 1
--Gps:123.456,15.123
--Status:OnTime
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
--Tenant (Mom)
--Vehicle 1
--Gps:123.456,15.123
--Status:Late
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
--Tenant (Teenager)
--Vehicle 1
--Gps:123.456,15.123
--Status:Unused
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
My question has to do with the best way to store this in Redis.
I can store by tenant - I.E. I can use a Key for Dad, then have a collection of all of the vehicles he has access to. Each time a new GPS location comes in (whether for Vehicle 1 or Vehicle 2), I would update the contents of the collection. My concern is that if there are dozens of vehicles that we would be updating his collection way to often.
Or
I can store by tenant, then by vehicle. This means that when Vehicle 1's GSP location comes in I will be updating information across 3 different tenants. Not to bad.
What gives me pause is that I am working on a website for Dad to see all his vehicles. That call is going to come in and ask for all Vehicles under the Tenant of Dad. If I split out the data so it is stored by tenant/vehicle, then I will have to store the fact that Dad has 2 vehicles, then ask Redis for everything in (key1,key2,etc).
If I make it so that everything is stored in a collection under the Dad tenant, then my request to Redis would be much simpler and will be asking for everything under the key Dad.
In reality, each tenant will have between 5-100 vehicles, and we have 100's of tenants.
Based on your experience, what would be your preferred approach (please feel free to offer any not offered here).

From your question it appears you're hoping to store everything you need under one key. Redis doesn't support nested hashes as-is. There are a few recommendations from this answer on how to work around.
Based on the update cadence of the GPS data, it's best to minimize the total number of writes required. This may increase the number of operations to construct a response on read; however, adding read only slave instances should allow you to scale reads. You can tune your writes with some pipelining of updates.
From what you have described it sounds like the updates are limited to the GPS and status of a vehicle for a user. The data requested on read would be for a single user view what their set of vehicle position and status is.
I would start with a Tenant stored as a hash with the user's name, and a field referenceing the vehicles and sessions associated to the user. This is not really necessary if you take similar naming conventions, but shown as an option if additional user data needs cached.
- Tenant user-1 (Hash)
-- name: Dad (String)
-- vehicles: user-1:vehicles (String) reference to key for set of vehicles assigned.
-- sessions: user-1:sessions (String) reference to key for set of user-vehicle sessions.
The lookup of a the vehicles could be done with key formatting if none of the other tenant data is needed. The members would be references to the keys of vehicles.
- UserVechicles user-1:vehicles (Set)
-- member: vehicle-1 (String)
This would allow lookup of the details for the vehicle. Vehicles would have their position. You could include a field to reference a vehicle centric schedule of sessions similar to the user sessions below. Also you could place a vehicle name or other data if this is also required for response.
- Vehicle: vehicle-1 (Hash)
-- gps: "123.456,15.123" (String)
Store the sessions specific to a user in a sorted set. The members would be references to the keys storing session information. The score would be set to a timestamp value allowing range lookups for recent and upcoming sessions for that user.
- Schedule user-1:sessions
-- member: user-1:vehicle-1:session-1 (String)
-- score: 1638216000 (Float)
Tenant Sessions on a vehicle you could go simply with the listing of the status in a string. An alternative here is shown that would allow storing additional state of scheduled and available times if you need to support vehicle centric views of a schedule. Combining this with a sorted-set of vehicle sessions would round this out.
- Session user-1:vehicle-1:session-1 (Hash)
-- status: OnTime (String)
-- scheduled_start: 1638216000 (String) [optional]
-- scheduled_end: 1638216600 (String) [optional]
-- earliest_available: 1638215000 (String) [optional]
If you're not tracking state elsewhere you could use a hash to store the counters for the cache objects you have for when it comes time to issue a new one. Read and increment these when adding new cache objects.
- Globals: global (Hash)
-- user: 0
-- vehicle: 0
-- session: 0
For updates you would have: 200k write operations per update cycle.
100k tenants-vehicles (1000 tenants * 100 vehicles/tenant) each with
1 HSET vehicle
1 HSET session
Pipelining and tuning the number of requests in the pipeline can improve write performance, but I would anticipate you should be able to complete all writes in <2s.
For a read you would have something like: ~300 operations per user per request.
1 HGETALL user
1 ZRANGESTORE tempUSessions user-sessions LIMIT 200 (find up-to 200 sessions in a timeframe for the user)
200 HGETALL session
1 SMEMBERS user-vehicles (find all vehicles for the user)
100 HGET vehicle gps (get vehicle location for all vehicles)
Considerations:
A process to periodically remove sessions and their references after they pass would keep the memory from growing unbounded and performance consistent.
Adding some scripts to allow for easier updates to the cache when a new user or vehicle is added and for returning the state you described needing for display to a user would round this out.

Can I somehow tag data in Redis?

I have an object Company and multiple methods that can be used to get this object. Ex. GetById, GetByEmail, GetByName.
What I'd like is to cache those method calls with a possibility to invalidate all cache entries related to one object at once.
For example, a company is cached. There are 3 entries in cache with following keys:
Company:GetById:123
Company:GetByEmail:foo#bar.com
Company:GetByName:Acme
All three keys are related to one company.
Now let's assume that company has changed. Then I would like to invalidate all keys related to this company. I didn't find any built-in solution for that purpose.
Tagging cache entries with some common id (companyId for example) and then removing all entries by it would be great, but this feature doesn't seem to exist.

So to answer your question directly, You'd probably want to maintain all the keys related to your company in a list, scan through that list, and delete all the associated keys with a DEL command.
So something like:
LPUSH companies-keys:Acme Company:GetById:123 Company:GetByEmail:foo#bar.com Company:GetByName:Acme
Then
RPOP companies-keys:Acme
and for each entry you get out of the list:
UNLINK keyname
To answer it not so directly, you may want to consider using a Hash rather than just keys, that way you can just modify one of the fields in the hash rather than having to invalidate all the keys associated with it.
So you could create it with:
HSET companies:123 id 123 email foo#bar.com name acme
Then you could update a particular entry in the company record with HMSET:
HMSET companies:123 email bar#foo.com
Since it sounds like being able to look up a given record by different fields is really important to your use case - you may also want to consider adding RediSearch and indexing the fields you want to be able to search on different fields for the set of fields listed above, and index of:
FT.CREATE companies-idx ON HASH PREFIX 1 companies: SCHEMA id TAG email TEXT name TEXT
Might be appropriate - then you could look up a company with a given email like:
FT.SEARCH companies-idx "#email: foo"

Laravel - Store and Resume data for ALL the sessions

I use the laravel session with the database. I would like to know how to resume the informations are stored for ALL the sessions, that it means between all the row in the session table.
I am developing a booking room system. I use the session in this way:
First step:
A user searches all the available room for a certain date. A list appears with all the available rooms.
Second step:
After the user selects the room, it is redirected to the pay form.
In the first step when the user selects the room, the room id is stored in the session.
The things I would like to do is this:
The session is used to store all the room are chosen by the users, in case two or more users are looking for a same room in the same period. So that the rooms are not available in the search of other users until the first user pays.
Laravel has a payload column where it stores all the key and value but it is only a sequence of letter and number.

Laravel has a payload column where it stores all the key and value but it is only a sequence of letter and number
When you call \Session::put('foo', 'bar') the value es added into an associative array that keeps all data. If you are using database driver, then laravel serialize the array to store it as a string. That is why you only can see a lot of text. So, working with session will be so complicated because you has to serialize/unserialize over again.
How to block the room? Well, there are many ways to do that, all depends from your architecture. For example, you can store selected rooms in other table for 5 minutes (enough time to pay). Lets say you can work with two tables:
selected_rooms
------------------
number | expire_at
and...
rooms
number | beds | busy
The user search for a cool room. The system must ignores all the rooms that have references to selected_rooms table that has not been expired.
When the user select a room, you store it at selected_rooms room table with a expire_at value of now + 5 minutes.
When the user pay the room, you can remove it from selected_rooms. If the user just logout or just leave the pc, it does not matter, because after 5 minutes the room is available again

Quick retrieval of same data with multi dimensions

Here is my case.I have fields of a user like name,phone,email,city,country,other user attributes. If I search by name or phone or email ,I need to get all users details who are satisfying those conditions in very short time.As far as I know,To implement this in a key value cache, I need to have same data with different key.
My current plan of implementing is as follows
All the following info are stored in my cache
User level complete data
unique_user_id(XXX) :{name:abc, phone:12345, email:abc#gmail.com..etc}
unique_user_id(YYY) :{name:def, phone:67891, email:def#gmail.com..etc}
Email level info
abc#gmail.com : XXX
def#gmail..com : YYY
Name level info
abc : XXX
def : YYY
Phone level info
12345 : XXX
67891 : YYY
If I search with email, I query 'Email level info' and get the unique user id and query the 'User level info' for user details. Here I have duplicated data for each type of possible key.I do not want to duplicate data. I want to have single key value for each user.
Is there any better way than key value caches where data stored should be minimal and response time is very less. I am okay to go with database along with hash indexing kind of stuff.Please suggest an appropriate software for this.
I have never used caches. Please bear with my understanding and language

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio