Approach or strategy for storing object collections in Cache - caching

I am trying to store collections of objects in cache.
{
"name": "Dep1",
"employees": [{
"id": 1,
"name": "emp1",
"profilePic": "http://test.com/img1.png"
}, {
"id": 2,
"name": "emp2",
"profilePic": "http://test.com/img2.png"
}, {
"id": 3,
"name": "emp3",
"profilePic": "http://test.com/img3.png"
}, {
"id": 4,
"name": "emp4",
"profilePic": "http://test.com/img4.png"
}]
}
In this case if Employee 1 changes his profile picture, I need to invalidate the full cached object in order to maintain data consistency.
This approach undermines caching as whenever there is any update for an employee I need to clear that complete object.
Is there any better approach or design we can be followed to optimize this?
Thanks

Is this suppose to be a redis question? I'll assume it's general purpose.
Retrieve only IDs then request each entry by it's ID.
Store the list of entries per request as merely the list of IDs.
You can now cache the list, as well as cache each entry.
Since you are only retrieving IDs you can make very simple super efficient indexes on the data (the data will always be fetched directly from the index since it's just the ID). Updates may or may not need to invalidate the cache of the list, it may be sufficient to invalidate the entry cache. Even if you invalidate the list you will still have fairly good performance since you're very unlikely to invalidate entry caches all too often.

Related

Сount how many requests there were to a specific document

I'm using elasticsearch index as a cache table for some kind of search API.
I am currently using the following mapping:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.
I want to clear cached search results due to their unpopularity among users (i.e. low cache hitrate). Because of that, I need to count how many cache hits (i.e. request hits) there were to each document for a certain period of time (last month for example).
I understand, that I can explicitly add a hit_rate field and update it every time when the new query hits some cashed query, but is there a more elegant way to do this (maybe via some built-in elasticsearch statistic)?
That's not possible. Actually the App Search product has an analytics feature that records the document clicks and uses a different index to do that (also store the search query).

How to prevent unnecessary G Suite API data consumption?

I am currently consuming data from the G Suite API.
An inconvenience I have found is that for some of the APIs the number of resources available might be quite large.
For instance, when I consume the Users:list API (https://www.googleapis.com/admin/directory/v1/users), given the number of resources and the maximum number of results per query I need to perform a significant number of queries. Find below an example JSON response:
{
"kind": "admin#directory#users",
"etag": "\"WqpSTs-zelqnIvn63V............................/v3ENarMfXkTh9ijs3OVkQRoUSVU\"",
"users": [
{
"kind": "admin#directory#user",
"id": "7720745322191632224007",
"etag": "\"WqpSTs-zelqnIvn63V........................PfcSmik3zEJwHAl1UbgSk\"",
"primaryEmail": ...,
...
},
{
"kind": "admin#directory#user",
"id": "227945583287518253104",
"etag": "\"WqpSTs-zelqnIvn63V..........-zY30eInIGOmLI\"",
"primaryEmail": ...,
...
},
...
N-users
...
]
}
I am running this query several times a day.
Ideally I would only retrieve the resources that have changed and the new ones, excluding from the response the ones that have not changed.
Is it possible to do that? If so, how?
Thank you in advance for your answers.
You could create custom attributes for your users, and then filter your requests using the query parameter according to your custom attribute.
Or define exactly what you mean by "changed" or "not changed" as the user properties will change on every login to update the last login attribute.
Update:
You can watch for changes on the list of users in your domain by supplying an address to receive notifications in a POST request to the watch endpoint:
https://www.googleapis.com/admin/directory/v1/users/watch
References:
Users.watch
Custom User Fields
Query string for User fields

Using elastic search to build flow/funnel results based on unique identifiers

I want to be able to return a set of counts of individual documents from a single index based on a previous set of results, and am wondering if there is a way to do it without running a separate query for each.
So, given a data set like this (simplified version of my ES documents):
{
"name": "visit",
"sessionId": "session1"
},
{
"name": "visit",
"sessionId": "session2"
},
{
"name": "visit",
"sessionId": "session3"
},
{
"name": "click",
"sessionId": "session1"
},
{
"name": "click",
"sessionId": "session3"
}
What I would like to do is be able to search for name: visit and give a count of all those. That part is easy. But I would also like to be able to now count my name: click docs that have the sessionId of the name: visit result set and return a count of how many of those name: click there were as well as the name: visit.
Is there an easy way to do this? I have looked at aggregation APIs but they all seem to not quite fit my needs. There also seems to be a parent/child relationship but it doesn't apply to my situation since both documents I want to individually get counts of are of the same type.
Expected result would be something like this:
{
"count": {
// total number of visit events since this is my start point
"visit": 3,
// the amount of click results that have sessionId
// matching my previous search's sessionId values
"click": 2
}
}
At first glance, you need to do this in two queries:
the first aggregation query to retrieve the sessionIds and
a second aggregation query filtered with those sessionIds to find the count of clicks.
I don't think it's a big deal to run those two queries, but that depends on how much data you have and how many sessionIds you want to retrieve at once.

Rethinkdb: Including a subdocument for nested doc

I am performing an operation, and it works, but I want to know if there is a better or more efficient way to do what I want.
I have an object in my db that looks like this:
{
"id": "testId",
"name": "testName",
"products": [
{
"name": "product1"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
},
{
"name": "product2"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
}
]
}
As you can see, each "product" in the "products" array has a sub-array of templateIds. These match templates stored in another table. What I want to do is create a query that merges those templates onto each product object before I send it all back.
Currently I am doing this with sub-merges:
r.table('suites').get('testId').merge(function(suite){
return {
products: suite('products').merge(function(product){
return {
templates: r.expr(product('templateIds')).map(function(id) {
return r.table('templates').get(id)
})
}
})
}
})
My question is: is there a more efficient way to do this? Or is there a completely different way of thinking I should employ to do this?
Thanks guys!
That looks right to me. The only thing I can think of is that r.table('templates').get_all(r.args(product('templateIds'))) is shorter than product('templateIds').map(function(id){ return t.table('templates').get(id);}) and might well be faster.
EDIT: If you have a small number of templates, another thing that would make this run faster would be to do the substitution in the client instead and cache the retrieved templates by ID. RethinkDB will have to do a separate read for each template ID, even if it sees the same one over and over again, because it doesn't know enough to know whether or not caching those values is safe.

RestKit 2.0 - Mapping json array to an enity relationship loses array sequence

I have a problem mapping json to CoreData and reading it out again. I map from json to an Activity-Entity with a relationship of last participant entities. The last_particpants is an array with the most recent participants, ordered from most recent first by the API.
{
"id": 50,
"type": "Initiative",
"last_participants": [
{
"id": 15,
"first_name": "Chris",
},
{
"id": 3,
"first_name": "Mary",
},
{
"id": 213,
"first_name": "Dany",
}
]
}
I have RestKit logging on and see that the mapping reads the array elements one by one and keeps the order. However CoreData saves them as an NSSet of entities and then the order gets lost. When I read out the data its is mixed up. What options do I have to keep the order in which the array was mapped? Any help would be great.
2 options:
Use an ordered set in Core Data (set on the attribute in the properties inspector).
Use the #metadata provided by RestKit to access the collection order during mapping.

Resources