Elastic Search - Sort By Doc Type - elasticsearch

I have an elastic search index with 2 different doc types: 'a' and 'b'. I would like to sort my results by type and give preference to type='b' (even if it has a low score). I had been consuming the results of the search below at the client end and sorting them but I've realized that this approach does not work well since I am only inspecting the first 10 results which often does not contain any b's. Increasing the return results is not ideal. I'd like to get the elastic search to do the work.
http://<server>:9200/my_index/_search?q=london

You would need to play with function_score and, depending on how you already score your documents, test some weight values, boost_modes and score_modes for each type. For example:
GET /some_index/a,b/_search
{
"query": {
"function_score": {
"query": {
# your query here
},
"functions": [
{
"filter": {
"type": {
"value": "b"
}
},
"weight": 3
},
{
"filter": {
"type": {
"value": "a"
}
},
"weight": 1
}
],
"score_mode": "first",
"boost_mode": "multiply"
}
}
}

Its working for me.you will execute below commands at command Prompt.
curl -XGET localhost:9200/index_v1,index_v2/_search?pretty -d #boost.json
boost.json
{
"indices_boost" : {
"index_v2" : 1.4,
"index_v1" : 1.3
}
}

Related

Searching in elasticsearch with proximity(slop) zero and one

I have created the following index
PUT /proximity_example_1
{
"mappings":{
"properties":{
"doc_id": {
"type": "text"
},
"test_name":{
"type": "text"
}
}
}
}
Then indexed a document
POST proximity_example_2/_doc
{
"doc_id": "id1",
"test_name": "test proximity here"
}
Then queried with proximity 0, as follow
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 0.0
}
}
}
}
But I didn't get any result, Then I searched with proximity 1 , and this time also I didn't get any document.
But when I searched with proximity greater than 1, I got results.
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 2.0
}
}
}
}
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 3.0
}
}
}
}
So does that mean in elasticsearch when we do a search with proximity 1 or 0 order of the search term matters?
Thank you...
Slop with value 0 is as good as normal phrase search(very restrictive and should have search terms in the exact same order in the Elasticsearch), as you increase the slope this restrictiveness gets reduce and you will have more search results, but beware that increasing to to high number will defeat the purpose of phrase search and you will get irrelevant results.
You can read this and this detailed blog post that explains how it works internally

Elastic search query feasibility - Can we control execution of one filter, based on the output of another filter in elasticsearch?

I have multiple filters inside a function as shown below in the sample query. I do not want to execute the second filter, if first filter is able to return me greater than 0 results. Can a control like this be achieved query level or if there are any options to achieve the same at query level only? My query looks like this :
"functions": [
{
"filter": {
"term": {
"display_text.keyword": "trai"
}
},
"weight": 10
},
{
"filter": {
"match": {
"display_text": {
"query": "trai",
"fuzziness": "AUTO:3,5"
}
}
},
"weight": 1
}
]

Elasticsearch filter based on field similarity

For reference, I'm using Elasticsearch 6.4.0
I have a Elasticsearch query that returns a certain number of hits, and I'm trying to remove hits with text field values that are too similar. My query is:
{
"size": 10,
"collapse": {
"field": "author_id"
},
"query": {
"function_score": {
"boost_mode": "replace",
"score_mode": "avg",
"functions": [
{
//my custom query function
}
],
"query": {
"bool": {
"must_not": [
{
"term": {
"author_id": MY_ID
}
}
]
}
}
}
},
"aggs": {
"book_name_sample": {
"sampler": {
"shard_size": 10
},
"aggs": {
"frequent_words": {
"significant_text": {
"field": "book_name",
"filter_duplicate_text": true
}
}
}
}
}
}
This query uses a custom function score combined with a filter to return books a person might like (that they haven't authored). Thing is, for some people, it returns books with names that are very similar (i.e. The Life of George Washington, Good Times with George Washington, Who was George Washington), and I'd like the hits to have a more diverse set of names.
I'm using a bucket_selector to aggregate the hits based on text similarity, and the query gives me something like:
...,
"aggregations": {
"book_name_sample": {
"doc_count": 10,
"frequent_words": {
"doc_count": 10,
"bg_count": 482626,
"buckets": [
{
"key": "George",
"doc_count": 3,
"score": 17.278715785140975,
"bg_count": 9718
},
{
"key": "Washington",
"doc_count": 3,
"score": 15.312204414323656,
"bg_count": 10919
}
]
}
}
}
Is it possible to filter the returned documents based on this aggregation result within Elasticsearch? IE remove hits with book_name_sample doc_count less than X? I know I can do this in PHP or whatever language uses the hits, but I'd like to keep it within ES. I've tried using a bucket_selector aggregator like so:
"book_name_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"freqWords": "frequent_words"
},
"script": "params.freqWords < 3"
}
}
But then I get an error: org.elasticsearch.search.aggregations.bucket.sampler.InternalSampler cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
Also, if that filter removes enough documents so that the hit count is less than the requested size, is it possible to tell ES to go fetch the next top scoring hits so that hits count is filled out?
Why not use top hits inside the aggregation to get relevant document that match the bucket? You can specify how many relevant top hits you want inside the top hits aggregation. So basically this will give you a certain number of documents for each bucket.

Union results of two completely separate searches

My client wants to be able to perform a search, see the results, and then perform another completely unrelated search, and have the new results appended to the previous results.
I'm trying to find a way to do this within ElasticSearch, so that I can still use the built-in pagination.
The complication here is that each search may have multiple query parts that will be combined independently of other searches. So for example, I may do one search that looks for any ACTIVE properties with the keyword "123 Anywhere St." and a price range of 100000 to 150000. That search will look like this:
{
"from": 0,
"size": 25,
"query": {
"bool": {
"filter": {
"terms": {
"statusId": [
1,
2
]
}
},
"must": [
{
"multi_match": {
"query": "123 Anywhere St.",
"fuzziness": 0,
"prefix_length": 0,
"fields": [
"searchable_name^10",
"searchable_mapAddress",
"searchable_streetName2"
]
}
},
{
"range": {
"price": {
"gte": 100000
}
}
},
{
"range": {
"price": {
"lte": 150000
}
}
}
]
}
}
}
And then, I may do another completely different search that uses the keyword "234 Elsewhere St." and search on a size range instead of price, and looks for a different status.
I want all of the results from the first search to show up, and then all of the results from the second search to show up, in a single paginated result set.
Can this be done in ElasticSearch?
You can do it using Multi Search API. All you need to do is provide the search requests to the _msearch endpoint.
GET index_name/_msearch
{}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{}
{"query" : {"match_all" : {}}}
Hope it helps !

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources