Redis data structure design for inbox - data-structures

My website allows users to communicate using conversations.
In the conversation-inbox page a user can see all the users that have contacted him, including a preview of the latest message from the specific user. The page is order by the date of the previewed message.
It looks roughly something like this:
UserA "Some message.." 2016-3-3
UserB "Other message.." 2016-3-2
UserC "..." 2016-2-15
etc..
I was wondering what is the correct combination of the Redis data structures to use to model this efficiently.
At first I thought about having a sorted set of the users (i.e. UserA, UserB, UserC), but this would mean I would have to have a loop to get the latest message from each user.
Is there a better way, avoiding the loop?
Thanks!

You'll need two data structures for each user's inbox: a Hash and a Sorted Set.
The Sorted Set's scores can be all set to 0 as we'll be using lexicographical ordering anyway (but there's no harm in setting them to the actual timestamp of the message, at least in the context of this answer). The members of the Sorted Set should be constructed in the following manner:
<date in YYYYMMDD>:<from user>:<message>
This will let you easily pull that view and page through it with ZREVRANGE.
But that's only half of the story - when userX is sent a new message from userA, you'll need some way of finding and removing userA's previous message from userX's inbox - that's why you need the Hash.
The Hash is used for looking up the latest message from a given user to userX. For each of userX's friends, keep in the Hash a field that is the sending user's ID (e.g. userA) and whose value is the inbox's Sorted Set member that represents the latest message from that user (same "syntax" as above). When a new message arrives, first fetch the previous message from the Hash, remove it from the Set, and then add the new message to the Set and update the Hash's field.
To make sure that Hash and Sorted Set are consistent, I recommend that you look into wrapping them together in a transaction. You can use a MULTI/EXEC block, but my preference is a Lua script.

Related

Map multiple values to a unique column in Elasticsearch

I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields

Sending unnecessary data in one query response or making multiple queries?

I have been working on a project. I always followed this idea. Don't send all the data in one call.
Here is an example,
Suppose there is an API to return all the list of students that can be added to test they need to finish.
So, on UI side every student have one button "add" which will show a pop up if the student is already assigned to take the test. Or it will show a pop up he has already finished the test.
I could join many table and send all the data in one api call while fetchig students. Or
I could send the send the students and then on "add" there is another API to make sure the above mentioned conditioned met.
Which approach is better?
Because If I send all the data in one api call, there might be only few students be assigned the test.
Checking if a student is already assigned or not should happen in the backend, not frontend, and also atomically so as to prevent duplicates - either using a database transaction or a unique constraint.
When the Add button is clicked then in any case a backend call will need to be made (to perform the actual Add). If the add failed, the backend can interpret the "unique constraint violation" database error and return a "student is already assigned" message.
For the rest of the question, the rule is simply: don't fetch more data than is required by the UI.
If the Add button is always shown regardless of whether or not the student is already added, there is no need to retrieve this information beforehand.
But it might be useful to give a visual indication of which students are already added, in that case obviously there's no choice but to retrieve and return this information to the UI.
Fortunately GraphQL is precisely the tool for this job - it makes it possible for the UI to request exactly what information is needed for a given page, without having to code each and every possible query by hand.

Can I somehow tag data in Redis?

I have an object Company and multiple methods that can be used to get this object. Ex. GetById, GetByEmail, GetByName.
What I'd like is to cache those method calls with a possibility to invalidate all cache entries related to one object at once.
For example, a company is cached. There are 3 entries in cache with following keys:
Company:GetById:123
Company:GetByEmail:foo#bar.com
Company:GetByName:Acme
All three keys are related to one company.
Now let's assume that company has changed. Then I would like to invalidate all keys related to this company. I didn't find any built-in solution for that purpose.
Tagging cache entries with some common id (companyId for example) and then removing all entries by it would be great, but this feature doesn't seem to exist.
So to answer your question directly, You'd probably want to maintain all the keys related to your company in a list, scan through that list, and delete all the associated keys with a DEL command.
So something like:
LPUSH companies-keys:Acme Company:GetById:123 Company:GetByEmail:foo#bar.com Company:GetByName:Acme
Then
RPOP companies-keys:Acme
and for each entry you get out of the list:
UNLINK keyname
To answer it not so directly, you may want to consider using a Hash rather than just keys, that way you can just modify one of the fields in the hash rather than having to invalidate all the keys associated with it.
So you could create it with:
HSET companies:123 id 123 email foo#bar.com name acme
Then you could update a particular entry in the company record with HMSET:
HMSET companies:123 email bar#foo.com
Since it sounds like being able to look up a given record by different fields is really important to your use case - you may also want to consider adding RediSearch and indexing the fields you want to be able to search on different fields for the set of fields listed above, and index of:
FT.CREATE companies-idx ON HASH PREFIX 1 companies: SCHEMA id TAG email TEXT name TEXT
Might be appropriate - then you could look up a company with a given email like:
FT.SEARCH companies-idx "#email: foo"

Square Connect v1 Item ID Changing Across Locations

I have been writing an Square Connect integration that rests on the fact that an item has one and only one ID, even when it is present in multiple locations. After testing with a subset of products on a separate Square account/App, things were working smoothly. I have now pointed the integration at the "real" Square account/App, using that account's credentials, which contains the same subset of products in addition to many others, and the integration is failing. It seems I have many items that now have a unique ID for each location. This means that a single item has multiple IDs. The item only displays once in the Square dashboard, but there are two unique IDs associated with it. In fact, I have one item that has two IDs, yet those IDs share a single variation ID.
I have also noticed two different formats of IDs, which from my research sounds like a variation due to information created pre- and post- a certain date.
Format 1: XXxXxXXX-xxXX-XxXx-XXX-XXXxxxxxxxXX
Format 2: XXXXXXXXXXXXXXXXXXXXXXXX
I suppose the first question is, is this normal behavior? And if not, any thoughts on what might be causing it and is there a way out of it?
There are some nuances to items with older accounts. Items were originally scoped to a location, which doesn't quite make sense with larger multi-location businesses. Internally we are migrating to a location independent item catalog, which should be invisible to you as an end user (save for the change in formats, like you mentioned) and depending on the date of your account it might have a mix of "old" and "new" item ids. It seems like you have basically a "new" location and an older one.
Basically in our current model you are only guaranteed that items will have unique ids within a location. We are working on new APIs that will allow you to manipulate items across locations more easily.

Data Structure - Inserting/Updating into ordered list

I'm using the facebook api to retrieve a list of a user's photos. I'm using this to work out the user's close friends by seeing who has been tagged the most in the user's photos. So what I have the list of the tagged users (there will be duplicates). What I want to do is go through each tag and insert the user into a data structure. If the user is already there I want to increase that user's count by one. At the end I want the list ordered so I can 'rank' the friends. What data structure will be best for this?
Step 1:
Use an associative container. Map from UserId to user's count. Keep adding new users, and updating the user's count as you process more data.
Step2:
Copy all the users to another associative container, Now the key should be a pair(user's count, UserId).
You can now iterate over the 2nd container and have your items in order.
If you're using C++, you can use map for step 1, and set for step 2.

Resources