Generic way to get prev/next search results by id in Elasticsearch - elasticsearch

Say I have a million (many) documents in my index. I execute a search query sorting the items by some key X.
Now I have a very long list of results: [..., id1, id2, id3, ...]
Question: how do I get id1 and id3 if I know id2 but don't want to execute the whole search/don't want to get all ids?
I'm looking of a generic solution that works for any search query. Given an id that for certain exists in the results of a query, how to get prev/next by that id. The query should NOT have prior knowledge of anything else than the id whose prev/next are searched for. (In other words, if ordered by title and searched for prev/next of id X, the title of X is not known at query time, only X's id.)
It is of course possible to execute multiple search queries and achieve the same end result by getting id2 and then playing with ordering to get ids 1 and 3.
EDIT:
I think Luc E's answer isn't what I'm looking for. In that scenario, knowledge of the original objects title is required to query for prev/next. I'm looking for a solution where only the id is known at query time.
Example data looks like this:
[...
{id: 32, title: 'AAA'},
{id: 12, title: 'BBB'},
{id: 99, title: 'CCC'},
{id: 3, title: 'DDD'},
{id: 1001, title: 'EEE'},
...]
What I know: id 99. What I don't know: what is title of id 99.
What I want: ids of the prev/next items sorted by title field (=3 and 12).
To put it yet another way: I have id 99 but not the title in my hand. I want a query that gives me ids 3 and 12 (they are prev/next sorted by title).

What you want to do is called deep scrolling, you have only two ways to make it :
scroll
search_after
The easiest way is the search_after but you will need to make two requests :
one request for id3
Another one for id1
So, in this example I am looking for id2 : 128. I can sort documents with the field title and I have get beforehand the value of title for id2 which is title_of_128.
To perform the search_after, I have to add the _id on a sub sort condition
Here is my query :
POST test/_search
{
"size": 2,
"search_after": ["title_of_128","128"],
"sort": [
{
"title": {
"order": "asc"
},
"_id": {
"order": "asc"
}
}
]
}
The result of this query is id2 and id3
Now I inverse the direction of the sort in order to retrieve the id1 :
POST test/_search
{
"size": 2,
"search_after": ["title_of_128","128"],
"sort": [
{
"title": {
"order": "desc"
},
"_id": {
"order": "desc"
}
}
]
}
The result of this query is id2 and id1
Note that sort with _id is deprecated and the best practice is to copy the _id in another field if you want to use search_after

Related

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

elasticsearch - query between document types

I have a production_order document_type
i.e.
{
part_number: "abc123",
start_date: "2018-01-20"
},
{
part_number: "1234",
start_date: "2018-04-16"
}
I want to create a commodity document type
i.e.
{
part_number: "abc123",
commodity: "1 meter machining"
},
{
part_number: "1234",
commodity: "small flat & form"
}
Production orders are datawarehoused every week and are immutable.
Commodities on the other hand could change over time. i.e abc123 could change from 1 meter machining to 5 meter machining, so I don't want to store this data with the production_order records.
If a user searches for "small flat & form" in the commodity document type, I want to pull all matching records from the production_order document type, the match being between part number.
Obviously I can do this in a relational database with a join. Is it possible to do the same in elasticsearch?
If it helps, we have about 500k part numbers that will be commoditized and our production order data warehouse currently holds 20 million records.
I have found that you can indeed now query between indexs in elasticsearch, however you have to ensure your data stored correctly. Here is an example from the 6.3 elasticsearch docs
Terms lookup twitter example At first we index the information for
user with id 2, specifically, its followers, then index a tweet from
user with id 1. Finally we search on all the tweets that match the
followers of user 2.
PUT /users/user/2
{
"followers" : ["1", "3"]
}
PUT /tweets/tweet/1
{
"user" : "1"
}
GET /tweets/_search
{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
}
}
}
}
Here is the link to the original page
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-terms-query.html
In my case above I need to setup my storage so that commodity is a field and it's values are an array of part numbers.
i.e.
{
"1 meter machining": ["abc1234", "1234"]
}
I can then look up the 1 meter machining part numbers against my production_order documents
I have tested and it works.
There is no joins supported in elasticsearch.
You can query twice first by getting all the partnumbers using "small flat & form" and then using all the partnumbers to query the other index.
Else try to find a way to merge these into a single index. That would be better. Updating the Commodities would not cause you any problem by combining the both.

Elasticsearch: count number of hits in each document

I would like to know how many times a search term appears in a given field of each document. For example, given the following document
{
"id": 123,
"my_sentence": "There are many trees in the park. The trees are large and small"
}
and the search term of trees, what kind of query would produce a result like
{
"id": 123,
"hits": 2
}
I have seen answers that are old and use a script, such as
"script_fields": {
"tf": {
"script": "_index['field_to_search']['search_term'].tf()"
}
}
However, this only seems to work when the search term is a single word, and the particular field being searched is not stemmed.

Elasicsearch sort by inner field

I have documents that one of their field looks like the following -
"ingredients": [{
"unit": "MG",
"value": 123,
"key": "abc"
}]
And I would like to sort the different records according to the ascending value of specific ingredient. That is if I have 2 records which have use ingredient with key "abc", one with value 1 and one with value 2. The one with ingredient value 1 should appear first.
Each of those records may have more than on ingredient.
Thank you in advance!
The search query to sort will be:
{
"sort":{
"ingredients.value":{
"order":"asc"}
}}

Elasticsearch: how to know which field the results are sorted by?

In Elasticsearch, is there any way to check which field the results are sorted by? I want something like inner-hits for sort clause.
Imagine that your documents have this kind of form:
{"numerals" : [ // nested
{"key": "point", "value": 30},
{"key": "points", "value": 200},
{"key": "score", "value": 20},
{"key": "scores", "value": 40}
]
}
and you sort the results by:
{"numerals.value": {
"nested_path": "numerals",
"nested_filter": {
"match": {
"numerals.key": "score"}}}}
Now I have no idea how to know the field by which the results are actually sorted: it's probably scores at this document, but is perhaps score at the others? There are 2 problems - 1. You cannot use inner-hits nor highlight for the nested fields. and - 2. Even if you can, it doesn't solve the issue if there are multiple matching candidates.
The question is about sorting by fields that are inside nested objects.
So this is what the documention
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-sorting.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_nested_sorting_example
says:
Elasticsearch will first restrict the nested documents by the "nested_filter"-query and then sort on the same way as for multi-valued fields:
Exactly the way as if there would be only the filtered nested documents as inner objects aka as if there would be only the root document with a multi-valued field which contains exactly all value which belong to the filtered nested objects
( in your example there will only one value remain: 20).
If you want to be sure about the sort order insert a "mode" parameter:
"min", "max", "sum", "avg" or "median"
If you do not specify the "mode" parameter according to the corresponding issue the min-value will be picked for "asc" and the max-value will be picked for "desc"-order:
By default when sorting on a multi-valued field the lowest or highest
value will be picked from the field values depending on the sort
order.

Resources