Union results of two completely separate searches - elasticsearch

My client wants to be able to perform a search, see the results, and then perform another completely unrelated search, and have the new results appended to the previous results.
I'm trying to find a way to do this within ElasticSearch, so that I can still use the built-in pagination.
The complication here is that each search may have multiple query parts that will be combined independently of other searches. So for example, I may do one search that looks for any ACTIVE properties with the keyword "123 Anywhere St." and a price range of 100000 to 150000. That search will look like this:
{
"from": 0,
"size": 25,
"query": {
"bool": {
"filter": {
"terms": {
"statusId": [
1,
2
]
}
},
"must": [
{
"multi_match": {
"query": "123 Anywhere St.",
"fuzziness": 0,
"prefix_length": 0,
"fields": [
"searchable_name^10",
"searchable_mapAddress",
"searchable_streetName2"
]
}
},
{
"range": {
"price": {
"gte": 100000
}
}
},
{
"range": {
"price": {
"lte": 150000
}
}
}
]
}
}
}
And then, I may do another completely different search that uses the keyword "234 Elsewhere St." and search on a size range instead of price, and looks for a different status.
I want all of the results from the first search to show up, and then all of the results from the second search to show up, in a single paginated result set.
Can this be done in ElasticSearch?

You can do it using Multi Search API. All you need to do is provide the search requests to the _msearch endpoint.
GET index_name/_msearch
{}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{}
{"query" : {"match_all" : {}}}
Hope it helps !

Related

Custom ordering on elastic search

I'm executing a simple query which returns items matched by companyId.
In addition to only showing clients matching a specific company I also want records matching a certain location to appear at the top.So if somehow I pass through pseudo sort:"location=Johannesburg" it would return the data below and items which match the specific location would appear on top, followed by items with other locations.
Data:
{
"clientId" : 1,
"clientName" : "Name1",
"companyId" : 8,
"location" : "Cape Town"
},
{
"clientId" : 2,
"clientName" : "Name2",
"companyId" : 8,
"location" : "Johannesburg"
}
Query:
{
"query": {
"match": {
"companyId": "8"
}
},
"size": 10,
"_source": {
"includes": [
"firstName",
"companyId",
"location"
]
}
}
Is something like this possible in elastic and if so what is the name of this concept?(I'm not sure what to even Google for to solve this problem)
It can be done in different ways.
Simplest (if go only with text matching) is use bool query with should statement.
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document. Doc
Example:
{"query":
"bool": {
"must": [
"match": {
"companyId": "8"
}
],
"should": [
"match": {
"location": "Johannesburg"
}
]
}
}
}
More complex solution is to store GEO points in location, and use Distance feature query as example.

Elasticsearch filter based on field similarity

For reference, I'm using Elasticsearch 6.4.0
I have a Elasticsearch query that returns a certain number of hits, and I'm trying to remove hits with text field values that are too similar. My query is:
{
"size": 10,
"collapse": {
"field": "author_id"
},
"query": {
"function_score": {
"boost_mode": "replace",
"score_mode": "avg",
"functions": [
{
//my custom query function
}
],
"query": {
"bool": {
"must_not": [
{
"term": {
"author_id": MY_ID
}
}
]
}
}
}
},
"aggs": {
"book_name_sample": {
"sampler": {
"shard_size": 10
},
"aggs": {
"frequent_words": {
"significant_text": {
"field": "book_name",
"filter_duplicate_text": true
}
}
}
}
}
}
This query uses a custom function score combined with a filter to return books a person might like (that they haven't authored). Thing is, for some people, it returns books with names that are very similar (i.e. The Life of George Washington, Good Times with George Washington, Who was George Washington), and I'd like the hits to have a more diverse set of names.
I'm using a bucket_selector to aggregate the hits based on text similarity, and the query gives me something like:
...,
"aggregations": {
"book_name_sample": {
"doc_count": 10,
"frequent_words": {
"doc_count": 10,
"bg_count": 482626,
"buckets": [
{
"key": "George",
"doc_count": 3,
"score": 17.278715785140975,
"bg_count": 9718
},
{
"key": "Washington",
"doc_count": 3,
"score": 15.312204414323656,
"bg_count": 10919
}
]
}
}
}
Is it possible to filter the returned documents based on this aggregation result within Elasticsearch? IE remove hits with book_name_sample doc_count less than X? I know I can do this in PHP or whatever language uses the hits, but I'd like to keep it within ES. I've tried using a bucket_selector aggregator like so:
"book_name_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"freqWords": "frequent_words"
},
"script": "params.freqWords < 3"
}
}
But then I get an error: org.elasticsearch.search.aggregations.bucket.sampler.InternalSampler cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
Also, if that filter removes enough documents so that the hit count is less than the requested size, is it possible to tell ES to go fetch the next top scoring hits so that hits count is filled out?
Why not use top hits inside the aggregation to get relevant document that match the bucket? You can specify how many relevant top hits you want inside the top hits aggregation. So basically this will give you a certain number of documents for each bucket.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

How to limit ElasticSearch results by a field value?

We've got a system that indexes resume documents in ElasticSearch using the mapper attachment plugin. Alongside the indexed document, I store some basic info, like if it's tied to an applicant or employee, their name, and the ID they're assigned in the system. A query that runs might look something like this when it hits ES:
{
"size" : 100,
"query" : {
"query_string" : {
"query" : "software AND (developer OR engineer)",
"default_field" : "fileData"
}
},
"_source" : {
"includes" : [ "applicant.*", "employee.*" ]
}
}
And gets me results like:
"hits": [100]
0: {
"_index": "careers"
"_type": "resume"
"_id": "AVEW8FJcqKzY6y-HB4tr"
"_score": 0.4530588
"_source": {
"applicant": {
"name": "John Doe"
"id": 338338
}
}
}...
What I'm trying to do is limit the results, so that if John Doe with id 338338 has three different resumes in the system that all match the query, I only get back one match, preferably the highest scoring one (though that's not as important, as long as I can find the person). I've been trying different options with filters and aggregates, but I haven't stumbled across a way to do this.
There are various approaches I can take in the app that calls ES to tackle this after I get results back, but if I can do it on the ES side, that would be preferable. Since I'm limiting the query to say, 100 results, I'd like to get back 100 individual people, rather than getting back 100 results and then finding out that 25% of them are docs tied to the same person.
What you want to do is an aggregation to get the top 100 unique records, and then a sub aggregation asking for the "top_hits". Here is an example from my system. In my example I'm:
setting the result size to 0 because I only care about the aggregations
setting the size of the aggregation to 100
for each aggregation, get the top 1 result
GET index1/type1/_search
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "input.user.name",
"size": 100
},
"aggs": {
"topHits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
There's a simpler way to accomplish what #ckasek is looking to do by making use of Elasticsearch's collapse functionality.
Field Collapsing, as described in the Elasticsearch docs:
Allows to collapse search results based on field values. The collapsing is done by selecting only the top sorted document per collapse key.
Based on the original query example above, you would modify it like so:
{
"size" : 100,
"query" : {
"query_string" : {
"query" : "software AND (developer OR engineer)",
"default_field" : "fileData"
}
},
"collapse": {
"field": "id",
},
"_source" : {
"includes" : [ "applicant.*", "employee.*" ]
}
}
Using the answer above and the link from IanGabes, I was able to restructure my search like so:
{
"size": 0,
"query": {
"query_string": {
"query": "software AND (developer OR engineer)",
"default_field": "fileData"
}
},
"aggregations": {
"employee": {
"terms": {
"field": "employee.id",
"size": 100
},
"aggregations": {
"score": {
"max": {
"script": "scores"
}
}
}
},
"applicant": {
"terms": {
"field": "applicant.id",
"size": 100
},
"aggregations": {
"score": {
"max": {
"script": "scores"
}
}
}
}
}
}
This gets me back two buckets, one containing all the applicant Ids and the highest score from the matched docs, as well as the same for employees. The script is nothing more than a groovy script on the shard that contains '_score' as the content.

Elastic Search - Sort By Doc Type

I have an elastic search index with 2 different doc types: 'a' and 'b'. I would like to sort my results by type and give preference to type='b' (even if it has a low score). I had been consuming the results of the search below at the client end and sorting them but I've realized that this approach does not work well since I am only inspecting the first 10 results which often does not contain any b's. Increasing the return results is not ideal. I'd like to get the elastic search to do the work.
http://<server>:9200/my_index/_search?q=london
You would need to play with function_score and, depending on how you already score your documents, test some weight values, boost_modes and score_modes for each type. For example:
GET /some_index/a,b/_search
{
"query": {
"function_score": {
"query": {
# your query here
},
"functions": [
{
"filter": {
"type": {
"value": "b"
}
},
"weight": 3
},
{
"filter": {
"type": {
"value": "a"
}
},
"weight": 1
}
],
"score_mode": "first",
"boost_mode": "multiply"
}
}
}
Its working for me.you will execute below commands at command Prompt.
curl -XGET localhost:9200/index_v1,index_v2/_search?pretty -d #boost.json
boost.json
{
"indices_boost" : {
"index_v2" : 1.4,
"index_v1" : 1.3
}
}

Resources