Meilisearch - what's the differences & relationship among: searchableAttributes, filterableAttributes, faceting - full-text-search

searchableAttributes, filterableAttributes, faceting.
I've read the documents, but a bit confused.
Please give some insights about:
Their differences.
Their relationships.

searchableAttributes are attributes where Meilisearch can search in for matching query words.
filterableAttributes are a list of attributes that can be used as filters to refine the search results. Given a movies dataset, if you want to filter movies by release_date, you need to first add the attribute release_date to the filterableAttributes list.
Both searchableAttributes and filterableAttributes are part of Meilisearch settings. An attribute doesn't necessarily have to be searchable for it to be filterable, so no relation between both.
facets is a search parameter, not a setting, it must be added to a search request. It gives information about the number of documents found per attribute value. If you want to know how many movies there are per genre for a given query, you can pass the "facets": ["genres"] as a parameter in the search query like so:
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{
"q": "Batman",
"facets": ["genres"]
}'
The response should include a facetDistribution object with the information:
{
"hits": [
…
],
…
"facetDistribution": {
"genres": {
"action": 273,
"animation": 118,
"adventure": 132,
"fantasy": 67,
"comedy": 475,
"mystery": 70,
"thriller": 217
}
}
}
In order to have the facets information about an attribute, it must be first present in the filterableAttributes list.

Related

Elasticsearch Document search related

I have an Index in Elasticsearch with one document we can say doc id 01 and I updated the document with new doc ID we can say id 02 now I have two documents.
My Question is I want only one latest document(which is doc id 02) in search query(index/_search)
what will be the query for such type of scenario.
If you want to get the document having the maximum value (assuming you are creating doc_id in increase numerical order from the example given) for doc_id, you can use this query:
curl "https://{es_endpoint}/sample_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"sort" : [
{ "_id" : {"order" : "desc"}}
],
"size": 1
}'

Elasticsearch multi search API

I'm trying to perform multiple concurrent search requests using Elasticsearch (version 6). Here is my queries file:
{"index" : "web"}
{"max_concurrent_searches": 64, "query": {"match": {"content": "school"}}}
{"index" : "web"}
{"max_concurrent_searches": 64, "query": {"match": {"content": "car"}}}
{"index" : "web"}
{"max_concurrent_searches": 64, "query": {"match": {"content": "cat"}}}
Here is the command I use to issue the bulk request:
curl -H "Content-Type: application/x-ndjson" -XGET ''$url'/_msearch'
--data-binary "#$file"; echo
However, I get the following error indicating my wrong usage of max_concurrent_searches parameter.
{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a VALUE_NUMBER in [max_concurrent_searches].","line":1,"col":29}],"type":"parsing_exception","reason":"Unknown key for a VALUE_NUMBER in [max_concurrent_searches].","line":1,"col":29},"status":400}
If I removed "max_concurrent_searches": 64, from the queries file above, everything works just perfect.
I want to know how can I use/set the max_concurrent_searches parameter, I couldn't find useful information in Elasticsearch documentation about this except the following:
The msearch’s max_concurrent_searches request parameter can be used to
control the maximum number of concurrent searches the multi search api
will execute. This default is based on the number of data nodes and
the default search thread pool size.
You should add it to the request itself:
Sample request: GET indexName/type/_msearch?max_concurrent_searches=1100
(where indexName and type is optional)
For you its should be like:
curl -H "Content-Type: application/x-ndjson" -XGET ''$url'/_msearch**?
max_concurrent_searches=1100**'
--data-binary "#$file"; echo
You can execute above using postman also. Just change the content-type as application/x-ndjson and dont forget to add a new line character in the end. This will give you the same error and you can correct it easily by different combinations. MultiSearch is an important feature.

Penalising - but not eliminating duplicates - in ElasticSearch

I have some data with duplicate fields. I don't want duplicates to appear together on top of search results, but I don't want to eliminate them altogether. I just want to get a better variety, so the 2nd, 3rd ... nth occurrence of the same field-value would be demoted. Is that possible with ElasticSearch?
For example:
curl -XPOST 'http://localhost:9200/employeeid/info/1' -d '{
"name": "John",
"organisation": "Apple",
"importance": 1000
}'
curl -XPOST 'http://localhost:9200/employeeid/info/2' -d '{
"name":"John",
"organisation":"Apple",
"importance": 2000
}'
curl -XPOST 'http://localhost:9200/employeeid/info/3' -d '{
"name": "Sam",
"organisation": "Apple",
"importance": 0
}'
(based on this)
If we assume search is boosted by importance, the natural result for "Apple" search would be John, John, Sam. What I am looking for is a way to make the result John, Sam, John, ie penalise the second John because another John has already appeared.
You could adjust the importance field at index time by finding all duplicates and choosing one of the duplicates to be 'more important' - maybe the duplicate with the highest score is chosen. From your example, I would add 5000 to the existing value of importance.
The results would now rank as follows.
John/Apple-7000, Sam/Apple-5000, John/Apple-1000
But this means you would need to re-index if you decided to change the 5000 to 10000 to adjust the scoring as it depends on the magnitude of importance.
Alternatively, you could add another field called 'authority' for which you could give a value of 1 for the duplicate with the highest importance and use a scoring function to provide a step at query-time :-
"script_score": {
"script": "(_score * 5000) + doc['importance'].value + (doc['authority'].value * 5000)"
}
Note that the multiplier for _score depends on the original ranking algorithm, this assumes a value for _score from 0.0 to 1.0

Elastic Search - Sort by multiple fields with the missing parameter

I am trying to apply a sort to an Elastic Search query by two different fields:
price_sold and price_list
I would like to first sort on price_sold, but if that value is null, I would like to then sort by price_list
Would the query be correct if I just set the sorts to:
"sort": [
{ "price_sold": { "order": "desc"}},
{ "price_list": { "order": "desc"}}
]
I have executed the query, and I do not get any errors, and it seems like the results are correct, however I am curious if I have overlooked something.
I have been reading about the missing filter, along with possibly using a custom value. This may not be required, but I am not quite sure.
Would there be a way to define a second field to sort on if the first field is missing, or is that not necessary? Something like:
"sort": [{"price_sold: {"order": "desc", "missing": "doc['field_name']"}]
Would simply adding these two sorts give me the desired result?
Thanks.
I think I understand what you're asking. In SQL terms, you'd like to ORDER BY COALESCE(price_sold, price_list) DESC.
The first sort you listed is a little different. It's similar to ORDER BY price_sold DESC, price_list DESC - in other words, primary sort is by price_sold, and for entries where price_sold is equal, secondary sort is by price_list.
Your second sort attempt would be great if "missing" worked that way. Unfortunately, missing's "custom" option appears to allow you to specify a constant value only.
If you don't need to limit your search using from and size, you should be able to use sort's _script option to write some logic that works for you. I ended up here because I do use from and size to retrieve batches, and when I sort by _script, the items I'm getting don't make sense - the items are sorted correctly, but I'm not getting the right set of items. So, I added a new analyzer and expanded my fields to use the new analyzer, and I was hoping to be able to sort using the new field or, if the new field doesn't exist (for previously-indexed items), use the old field's value instead. But that doesn't seem to be possible. I think I'm going to have to reindex my items so my new field is populated.
In case someone is still looking I ended up creating a script similar to this:
curl -XGET 'localhost:9200/_search?pretty&size=10&from=0' -H 'Content-Type: application/json' -d'
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline": "doc[\u0027price_sold\u0027] == null ? doc[\u0027price_list\u0027].value : doc[\u0027price_sold\u0027].value"
},
"order" : "desc"
}
},
}
'
For sorting dates, the type still has to remain number but you replace .value with .date.getMillisOfDay() as discussed here.
The from and size worked fine in my version of ElasticSearch (5.1.1).
To make sure your algorithm is working fine check the generated value in the response, e.g.: "sort" : [ 5.0622E7 ].

can terms lookup mechanism query be nested

I want to know can I nest a terms lookup mechanism query in anther terms lookup mechanism.
For instance:
curl -XPUT localhost:9200/users/user/2 -d '{
"tweets" : ["1", "3"]
}'
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":["1","2","3"]
}'
curl -XPUT localhost:9200/comments/comment/1 -d '{
"uuid" : "1"
}'
As you know, we can use a terms lookup mechanism query to get tweets which belong to the user:
curl -XGET localhost:9200/tweets/tweet/_search -d'{
"query" : {
"terms" : {
"uuid" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "tweets"
}
}
}
}'
But if i want to get comments, i must do anther query.
However my documents is so many, it is not a good method.
So i want to nest terms lookup query in order to get comments in only one query by user's id, can i?
I will so appreciate it, if you can give me some help. Thank you! :)
At the moment, this is not possible as far as I know, because you expect data from three different indices to be returned in one query, which would equate to a JOIN. The terms lookup query sort of implements JOINs between two indices "only" (which is already quite cool considering the fact that ES does not want to support JOINs in the first place).
One way out of this would be to refactor your data model to get rid of the comments index and use either parent/child and/or nested relationships within the tweet mapping type. Since a comment can only belong to a single tweet and there aren't usually hundreds of comments on a tweet (I'm pretty confortable with the idea that 99% of the time there are less than half a dozen comments per tweet, if any at all), you could add comments either as a child documents or as a nested document (my preference), instead of just storing their ids in the comments array. That way you'd get your comments right away with your existing query, without the need for a second query.
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":[{
"id": 1,
"content": "Nice tweet!"
},{
"id": 2,
"content": "Way to go!"
},{
"id": 3,
"content": "Sucks!"
}]
}'
Or you can wait for this pull request (#3278) (Terms Lookup by Query/Filter (aka. Join Filter)) to be merged, which will effectively allow to do what you're asking for, but that PR has been created more than 2 years ago and there still are conflicts to be resolved.

Resources