Redis advice- when to group data and when to split it up

Redis advice- when to group data and when to split it up - caching

Hoping to get some sage advice from those that have been in the trenches, and can help me better understand when to keep data clustered together in Redis, and when to split it up.
I am working on a multi-tenant platform that has GPS data coming in every 3-5 seconds for each asset I track. As data is processed, we store additional information associated with the asset (i.e. is it on time, is it late, etc).
Each asset belongs to one or more tenant. For example, when tracking a family car, that car may exist for the husband, and the wife. Each needs to know where it is relative to their needs. For example, the car may be being used by the teenager and is on time for the husband to use it at 3:00 pm, but late for the wife to use it at 2:30 pm.
As an additional requirement, a single tenant may want read access to other tenants. I.E. the Dad wants to see the family car, and any teenager's cars. So the heirarchy can start to look something like this:
Super-Tenant --
--Super Tenant (Family)
--Tenant (Dad)
--Vehicle 1
--Gps:123.456,15.123
--Status:OnTime
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
--Tenant (Mom)
--Vehicle 1
--Gps:123.456,15.123
--Status:Late
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
--Tenant (Teenager)
--Vehicle 1
--Gps:123.456,15.123
--Status:Unused
--Vehicle 2
--Gps:123.872,15.563
--Status:Unused
My question has to do with the best way to store this in Redis.
I can store by tenant - I.E. I can use a Key for Dad, then have a collection of all of the vehicles he has access to. Each time a new GPS location comes in (whether for Vehicle 1 or Vehicle 2), I would update the contents of the collection. My concern is that if there are dozens of vehicles that we would be updating his collection way to often.
Or
I can store by tenant, then by vehicle. This means that when Vehicle 1's GSP location comes in I will be updating information across 3 different tenants. Not to bad.
What gives me pause is that I am working on a website for Dad to see all his vehicles. That call is going to come in and ask for all Vehicles under the Tenant of Dad. If I split out the data so it is stored by tenant/vehicle, then I will have to store the fact that Dad has 2 vehicles, then ask Redis for everything in (key1,key2,etc).
If I make it so that everything is stored in a collection under the Dad tenant, then my request to Redis would be much simpler and will be asking for everything under the key Dad.
In reality, each tenant will have between 5-100 vehicles, and we have 100's of tenants.
Based on your experience, what would be your preferred approach (please feel free to offer any not offered here).

From your question it appears you're hoping to store everything you need under one key. Redis doesn't support nested hashes as-is. There are a few recommendations from this answer on how to work around.
Based on the update cadence of the GPS data, it's best to minimize the total number of writes required. This may increase the number of operations to construct a response on read; however, adding read only slave instances should allow you to scale reads. You can tune your writes with some pipelining of updates.
From what you have described it sounds like the updates are limited to the GPS and status of a vehicle for a user. The data requested on read would be for a single user view what their set of vehicle position and status is.
I would start with a Tenant stored as a hash with the user's name, and a field referenceing the vehicles and sessions associated to the user. This is not really necessary if you take similar naming conventions, but shown as an option if additional user data needs cached.
- Tenant user-1 (Hash)
-- name: Dad (String)
-- vehicles: user-1:vehicles (String) reference to key for set of vehicles assigned.
-- sessions: user-1:sessions (String) reference to key for set of user-vehicle sessions.
The lookup of a the vehicles could be done with key formatting if none of the other tenant data is needed. The members would be references to the keys of vehicles.
- UserVechicles user-1:vehicles (Set)
-- member: vehicle-1 (String)
This would allow lookup of the details for the vehicle. Vehicles would have their position. You could include a field to reference a vehicle centric schedule of sessions similar to the user sessions below. Also you could place a vehicle name or other data if this is also required for response.
- Vehicle: vehicle-1 (Hash)
-- gps: "123.456,15.123" (String)
Store the sessions specific to a user in a sorted set. The members would be references to the keys storing session information. The score would be set to a timestamp value allowing range lookups for recent and upcoming sessions for that user.
- Schedule user-1:sessions
-- member: user-1:vehicle-1:session-1 (String)
-- score: 1638216000 (Float)
Tenant Sessions on a vehicle you could go simply with the listing of the status in a string. An alternative here is shown that would allow storing additional state of scheduled and available times if you need to support vehicle centric views of a schedule. Combining this with a sorted-set of vehicle sessions would round this out.
- Session user-1:vehicle-1:session-1 (Hash)
-- status: OnTime (String)
-- scheduled_start: 1638216000 (String) [optional]
-- scheduled_end: 1638216600 (String) [optional]
-- earliest_available: 1638215000 (String) [optional]
If you're not tracking state elsewhere you could use a hash to store the counters for the cache objects you have for when it comes time to issue a new one. Read and increment these when adding new cache objects.
- Globals: global (Hash)
-- user: 0
-- vehicle: 0
-- session: 0
For updates you would have: 200k write operations per update cycle.
100k tenants-vehicles (1000 tenants * 100 vehicles/tenant) each with
1 HSET vehicle
1 HSET session
Pipelining and tuning the number of requests in the pipeline can improve write performance, but I would anticipate you should be able to complete all writes in <2s.
For a read you would have something like: ~300 operations per user per request.
1 HGETALL user
1 ZRANGESTORE tempUSessions user-sessions LIMIT 200 (find up-to 200 sessions in a timeframe for the user)
200 HGETALL session
1 SMEMBERS user-vehicles (find all vehicles for the user)
100 HGET vehicle gps (get vehicle location for all vehicles)
Considerations:
A process to periodically remove sessions and their references after they pass would keep the memory from growing unbounded and performance consistent.
Adding some scripts to allow for easier updates to the cache when a new user or vehicle is added and for returning the state you described needing for display to a user would round this out.

Related

How to manage store "created by" in micro-service?

I am building the inventory service, all tables keep track the owner of each record in column createdBy which store the user id.
The problem is this service does not hold the user info, so it cannot map the id to username which is required for FE to display data.
Calling user service to map the username and userid for each request does not make sense in term of decouple and performance. Because 1 request can ask for maximum 100 records. If I store the username instead of ID, there will be problem when user change their username.
Is there any better way or pattern to solve this problem?

I'd extend the info with the data needed with from the user service.
User name is a slow changing dimension so for most of the time the data is correct (i.e. "safe to cache")
Now we get to what to do when user info changes - this is, of course, a business decision. In some places it makes sense to keep the original info (for example what happens when the user is deleted - do we still want to keep the original user name (and whatever other info) that created the item). If this is not the case, you can use several strategies - you can have a daily (or whatever period) job to go and refresh the users info from the user service for all users used in the inventory, you can publish a daily summary of changes from the user service and have the inventory subscribe to that, you can publish changes as they happen and subscribe to that etc. - depending on the requirement for freshness. The technology to use depends on the strategy..

In my option what you have done so far is correct. Inventory related data should be Inventory Services' responsibility just like user related data should be User Services'.
It is FE's responsibility to fetch the relevant user details from User Service that are required to populate the UI (Remember, call backend for each user is not acceptable at all. Bulk search is more suitable).
What you can do is when you fetch inventory data from Inventory Service, you can publish a message to User Service to notify that "inventory related data was fetched for these users. So there is a possibility to fetch user related data for these users. Therefore you better cache them."
PS - I'm not an expert in microservices architecture. Please add any counter arguments if you have any.*

Need help in choosing right caching strategy

We car planning to store prices data to Memcache. prices are subject to car variant and location(city). This is how it is stored in the database.
variant, city, price
21, 48, 40000
Now the confusion is that how do we store this data into Memcache.
Possibility 1 : We store each price in separate cache object and do a multiget if the price of all variant belongs to a model need to be displayed on a single page.
Possibility 2 : We store prices at the model, city level. Prices of all variants of a model will be stored in a single object. This object will be slightly heavy but multiget wouldn't be required.
Need your help in taking the right decision.

TLDR: It all depends on how you want to expose the feature to your end users, and what the query pattern looks like.
For example:
If your flow is that a user can see all the variant prices on a detail page for a city, then you could use <city_id>_<car_model_id> as the key, and store all data for variants against that key (Possibility 2).
If the flow is that a user can see prices of all variants across cities on a single page, then you would need the key as <car_model_id> and store all data as Json against this key
If the flow is that a user can see prices of one variant at a time only for every city, then you would use the key <city_id>_<car_variant_id> and store prices.
One thing to definitely keep in mind is the frequency with which you may have to refresh the cache/ perform upserts, which in the case of cars should be infrequent (who changes the prices of a car every day/second). So, I would have gone with option 1 above (Possibility 2 as described by you).

Laravel - Store and Resume data for ALL the sessions

I use the laravel session with the database. I would like to know how to resume the informations are stored for ALL the sessions, that it means between all the row in the session table.
I am developing a booking room system. I use the session in this way:
First step:
A user searches all the available room for a certain date. A list appears with all the available rooms.
Second step:
After the user selects the room, it is redirected to the pay form.
In the first step when the user selects the room, the room id is stored in the session.
The things I would like to do is this:
The session is used to store all the room are chosen by the users, in case two or more users are looking for a same room in the same period. So that the rooms are not available in the search of other users until the first user pays.
Laravel has a payload column where it stores all the key and value but it is only a sequence of letter and number.

Laravel has a payload column where it stores all the key and value but it is only a sequence of letter and number
When you call \Session::put('foo', 'bar') the value es added into an associative array that keeps all data. If you are using database driver, then laravel serialize the array to store it as a string. That is why you only can see a lot of text. So, working with session will be so complicated because you has to serialize/unserialize over again.
How to block the room? Well, there are many ways to do that, all depends from your architecture. For example, you can store selected rooms in other table for 5 minutes (enough time to pay). Lets say you can work with two tables:
selected_rooms
------------------
number | expire_at
and...
rooms
number | beds | busy
The user search for a cool room. The system must ignores all the rooms that have references to selected_rooms table that has not been expired.
When the user select a room, you store it at selected_rooms room table with a expire_at value of now + 5 minutes.
When the user pay the room, you can remove it from selected_rooms. If the user just logout or just leave the pc, it does not matter, because after 5 minutes the room is available again

Database table structure for notifications like table for a social networking site

I am developing a social networking site like Facebook. I am confused how to create structure for notification table. Should it be separate for each user or a huge one for all-where records added and deleted frequently ?

I have the same problem as you and found this (found this) upon researching where the table structure given is :
id
user_id (int)
activity_type (tinyint)
source_id (int)
parent_id (int)
parent_type (tinyint)
time (datetime but a smaller type like int would be better)
where:
activity_type tells me the type of activity, source_id tells me the record that the activity is related to. So if the activity type means "added favorite" then I know that the source_id refers to the ID of a favorite record.
The parent_id/parent_type are useful for my app - they tell me what the activity is related to. If a book was favorited, then parent_id/parent_type would tell me that the activity relates to a book (type) with a given primary key (id)
I index on (user_id, time) and query for activities that are user_id IN (...friends...) AND time > some-cutoff-point. Ditching the id and choosing a different clustered index might be a good idea - I haven't experimented with that.
Pretty basic stuff, but it works, it's simple, and it is easy to work with as your needs change. Also, if you aren't using MySQL you might be able to do better index-wise.
It also suggested there to use Redis for faster access to the most recent activities.
With Redis in the mix, it might work like this:
Create your MySQL activity record
For each friend of the user who created the activity, push the ID onto their activity list in Redis.
Trim each list to the last X items
Redis is fast and offers a way to pipeline commands across one connection - so pushing an activity out to 1000 friends takes milliseconds.
For a more detailed explanation of what I am talking about, see Redis' Twitter example: http://code.google.com/p/redis/wiki/TwitterAlikeExample
I hope this might help you also

SQL Azure and Membership Provider Tenant ID

What might be a good way to introduce BIGINT into the ASP.NET Membership functionality to reference users uniquely and to use that BIGINT field as a tenant_id? It would be perfect to keep the existing functionality generating UserIds in the form of GUIDs and not to implement a membership provider from ground zero. Since application will be running on multiple servers, the BIGINT tenant_id must be unique and it should not depend on some central authority generating these IDs. It will be easy to use these tenant_id with a SPLIT AT command down the road which will allow bucketing users into new federated members. Any thoughts on this?
Thanks

You can use bigint. But you may have to modify all stored procedures that rely on user ID. Making ID global unique is usually not a problem. As long as the ID is the primary key, database will force it to be unique. Otherwise you will get errors when inserting new data (in that case, you can modify ID and retry).
So the most important difference is you may need to modify stored procedures. You have a choice here. If you use GUID, you don't need to do anything. But it may be difficult to predict how to split the federation to balance queries. As pointed out in another thread (http://stackoverflow.com/questions/10885768/sql-azure-split-on-uniqueidentifier-guid/10890552#comment14211028_10890552), you can sort existing data at the mid point. But you don't know future data will be inserted in which federation. There's a potential risk that federations will become unbalanced, and you may need to merge and split them at a regular interval to keep them in shape.
By using bigint, you have better control over the key. For example, you have two federations. The first has ID from 1 to 10000, and the second has ID from 10001 to 20000. When creating a new user, you first check how many records are in each federation. Suppose federation 1 has 500 records and federation 2 has 1000 records, to balance the load, you choose to insert to federation 1, so you choose an ID between 1 and 10000. But using bigint, you may need to do more work to modify stored procedures.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio