Coupling in Elastic Search - sorting

We are thinking of doing coupling in the items stored in Elastic Search. While indexing we index the coupling information of the item in the item doc. Is there a way to query the Elastic search so that the coupled items come together in the result?
For eg:
item1 = {
...
coupled_item: item2
...
}
item2 = {
...
coupled_item: item1
...
}
query_result = [item3, item6, item1, item2, item4, item5]
Approach 1
One of the approaches which we thought was to add a score key in the item doc and set the score of the coupled products as equal and then while querying, sort it by that score.
Cons
We are already doing the sorting using this technique, we do not want to hinder that order we just want to insert the coupled item from its place to right below the item.
Approach 2
The other approach we thought was to query all the items from the ES and then handle it through the code.
Cons
Cons are that this is not the optimal solution plus we also need to handle the pagination ourselves in this case.
Is there a feature provided by Elastic Search to handle coupling internally. If not then is there any other way we can handle this.

Coupled(dependant) documents can be duplicated inside each other.
item1 = {
id: item1
coupled_item: {
id: item2
}
...
}
item2 = {
id: item2
coupled_item: {
id: item1
}
...
}
es_query_result = [item1 {item2}, item3 {item4}, item5 {item6}]
application_flattened_result = [item1, item2, item3, item4, item5, item6]
This will add some challenges to document writes because now two documents will get updated with two requests, but can be done. Additionally, pagination of search query can get tricky as well.
Only internally supported feature, that I know, that comes close enough is - https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

MongoDB dynamic ranking

I use MongoDB and have a collection with about 100000 entries.
The entries contain data like that:
{"page": "page1", "user_count": 1400}
{"page": "page2", "user_count": 1100}
{"page": "page3", "user_count": 900}
...
I want to output a ranking of the entries according to the user_count like:
#1 - page1
#2 - page2
#3 - page3
...
...so far so good. I can simply use a loop counter if I just output a sorted list.
But I also have to support various search queries. So for example I get 20 results and want to show on which rank the results are. Like:
#432 - page1232
#32 - page223
#345 - page332
...
What's the best way to do that? I don't really want to store the ranking in the collection since the collection constantly changes. I tried to solve it with a lookup dictionary I have built on the fly but it was really slow. Does MongoDB have any special functionality for such cases that could help?
There's no single command that you can use to do this, but you can do it with count:
var doc = db.pages.findOne(); // Or however you get your document
var n = db.pages.find({user_count : {$gt : doc.user_count}}).count(); // This is the number of documents with a higher user_count
var ranking = n+1; // Your doc is next in a ranking
A separate qustion is whether you should do this. Consider the following:
You'll need an index on user_count. You may already have this.
You'll need to perform a count query for each record you are displaying. There's no way to batch these up.
Given this, you may impact your performance more than if you stored the ranking in the collection depending on the CRUD profile of your application - it's up to your to decide what is the best option.
There's no simple approach to solve this problem with MongoDB.
If it is possible I would advise you to look at the Redis with its Sorted Sets. As documentation says:
With Sorted Sets you can: Take a leader board in a massive online game, where every time a new score is submitted you update it using ZADD. You can easily take the top users using ZRANGE, you can also, given an user name, return its rank in the listing using ZRANK. Using ZRANK and ZRANGE together you can show users with a score similar to a given user. All very quickly.
You can easily take ranks for random pages by using MULTI/EXEC block. So it's the best approach for your task I think, and it will much faster than using MapReduce or reranking with mongodb.
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { page: "page1", user_count: 1400 }
// { page: "page2", user_count: 1100 }
// { page: "page3", user_count: 900 }
db.test.aggregate([
{ $setWindowFields: {
sortBy: { user_count: -1 },
output: { rank: { $rank: {} } }
}},
// { page: "page1", user_count: 1400, rank: 1 }
// { page: "page2", user_count: 1100, rank: 2 }
// { page: "page3", user_count: 900, rank: 3 }
{ $match: { page: "page2" } }
])
// { page: "page2", user_count: 1100, rank: 2 }
The $setWindowFields stage adds the global rank by:
sorting documents by decreasing order of user_count: sortBy: { user_count: -1 }
and adding the rank field in each document (output: { rank: { $rank: {} } })
which is the rank of the document amongst all documents based on the sorting field user_count: rank: { $rank: {} }.
The $match stage is there to simulate your filtering requirement.

Indexes for mongodb

I have a mongo db collection for restaurants.
e.g.
{_id: uniquemongoid,
rank: 3,
city: 'Berlin'
}
Restaurants are listed by city and ordered by rank (an integer) - should I create an index on city and rank, or city/rank compound? (I query by city and sort by rank)
Furthermore there are several fields with booleans e.g. { hasParking:true, familyFriendly:true } - should I create indexes to speed up queries for these filters? compound indexes? Its not clear for me if I should create compound indexes as the queries can have only one boolean set or more booleans set.
The best way to figure out whether you need indexes is to benchmark it with "explain()".
As for your suggested indexes:
You will need the city/rank compound index. Indexes in MongoDB can only be used for left-to-right (at the moment) and hence doing an equality search on "city" and then sorting the result by "rank" will mean that the { city: 1, rank: -1 } index would work best.
Indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. After doing a selection by city (and hopefully a limit!) doing an extra filter for hasParking etc will not make MongoDB use both the city/rank and the hasParking index. MongoDB can only use one index per query.
1) create index { restaurant:1, rank: 1} which will serve your purpose.
You will avoid 2 indexes
2) Create a document in following format and you can query for any no of fields you want.
{
info: [{hasParking:true}, {familyFriendly:true}],
_id:
rank:
city:
}
db.restaurants.ensureIndex({info : 1});
db.restaurants.find({ info :{ hasParking:true}})
Note MongoDB don't use two index for the same query (except $or queries). So, in the (2) case, if you want to add addition filter over the (1) query, then this (2) option won't work. I am not sure of your (2) requirement, so posting this solution.

ElasticSearch, how to search for a document containing a specific array element

I am having a little problem with elasticsearch and wonder if someone can help me solve it.
I have a document containing an array of tuples (publications).
Something like :
{
....
publications: [
{
item1: 385294,
item2: 11
},
{
item1: 395078,
item2: 1
}
]
....
}
The problem i have is for retrieving documents who contain a specific tuple, for exemple (item1 = 395078 AND item2 = 1).
Whatever i try, it seems to always treat item1 and item2 separately, i fail to tell elasticsearch that item1 and item2 must have a specific value inside the same tuple, not accross the whole array...
Is there something i'm missing here ?
Thanks
This is not possible in the straight way.
ElasticSearch flattens the array before checking for condition.
Which mean
elasticSearch matches
a=x AND b=y1 to [{a=x,b=y},{a=x1,b=y1}] which doesnt happen in the conventianal array checking.
What you can do here is
Usage of nested type - https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html (but for each element in array , an extra document would be created)
Store the array as
publications: [
{
385294:11
},
{
395078:1
}
]

Resources