Custom ordering on elastic search - elasticsearch

I'm executing a simple query which returns items matched by companyId.
In addition to only showing clients matching a specific company I also want records matching a certain location to appear at the top.So if somehow I pass through pseudo sort:"location=Johannesburg" it would return the data below and items which match the specific location would appear on top, followed by items with other locations.
Data:
{
"clientId" : 1,
"clientName" : "Name1",
"companyId" : 8,
"location" : "Cape Town"
},
{
"clientId" : 2,
"clientName" : "Name2",
"companyId" : 8,
"location" : "Johannesburg"
}
Query:
{
"query": {
"match": {
"companyId": "8"
}
},
"size": 10,
"_source": {
"includes": [
"firstName",
"companyId",
"location"
]
}
}
Is something like this possible in elastic and if so what is the name of this concept?(I'm not sure what to even Google for to solve this problem)

It can be done in different ways.
Simplest (if go only with text matching) is use bool query with should statement.
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document. Doc
Example:
{"query":
"bool": {
"must": [
"match": {
"companyId": "8"
}
],
"should": [
"match": {
"location": "Johannesburg"
}
]
}
}
}
More complex solution is to store GEO points in location, and use Distance feature query as example.

Related

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

ElasticSearch return non analyzed version of analyzed aggregate

I am having a problem implementing a autocomplete feature using the data in elastic search.. my documents currently have this kind of structure
PUT mainindex/books/1
{
"title": "The unread book",
"author": "Mario smith",
"tags": [ "Comedy", "Romantic" , "Romantic Comedy","México"]
}
all the fields are indexed, and the mapping for the tags is a lowercase,asciifolding filter..
Now the functionality that is required is that if the user types mario smith rom..., I need to sugest tags starting with rom.. but only for books of mario smith.. this required breaking the text into components.. and I already got that part.. the current query is something like this ..
{
"query": {
"query_string": {
"query": "mario smith",
"default_operator": "AND"
}
},
"size": 0,
"aggs": {
"autocomplete": {
"terms": {
"field": "suggest",
"order": {
"_term": "asc"
},
"include": {
"pattern": "rom.*"
}
}
}
}
}
and this returns the expected result, a list of word that the user should type next based on the query.. and the prefix of the word he is starting to type..
{
"aggregations" : {
"autocomplete" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "romantic comedy",
"doc_count" : 4
},
{
"key" : "romantic",
"doc_count" : 2
}
]
}
}
}
now the problem is that I can't present these words to the user because they are lowercase and without accents words liker México got indexed like mexico.. and in my language makes some words look weird.. if i remove the filters from the tag field the values are correctly saved into the index. but the pattern rom.* will not match because the user is typing in a diferrent case and may not use the correct accents..
in general terms what is need is to take a filtered set of documents.. aggregate their tags, return them in their natural format.. but filter out the ones that dont have the same prefix. filtering them in a case/accent insentitive way..
PS: I saw some suggestions about having 2 versions of the field,one analyzed and one raw.. but cant seem to be able to filter by one and return the other..
does anyone have an idea, how perform this query or implement this functionality?

Return distinct values in Elasticsearch

I am trying to solve an issue where I have to get distinct result in the search.
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
When I perform a term query on favourite cars "ferrari". I get two results whose name is ABC. I simply want that the result returned should be one in this case. So my requirement will be if I can apply a distinct on name field to receive one 1 result.
Thanks
One way to achieve what you want is to use a terms aggregation on the name field and then a top_hits sub-aggregation with size 1, like this:
{
"size": 0,
"query": {
"term": {
"favorite_cars": "ferrari"
}
},
"aggs": {
"names": {
"terms": {
"field": "name"
},
"aggs": {
"single_result": {
"top_hits": {
"size": 1
}
}
}
}
}
}
That way, you'll get a single term ABC and then nested into it a single matching document

Elasticsearch 5.1: applying additional filters to the "more like this" query

Building a search engine on top of emails. MLT is great at finding emails with similar bodies or subjects, but sometimes I want to do something like: show me the emails with similar content to this one, but only from joe#yahoo.com and only during this date range. This seems to have been possible with ES 2.x, but it seems that 5.x doesn't allow allow filtration on fields other than that being considered for similarity. Am I missing something?
i still can't figure how to do what i described. Imagine I have an index of emails with two types for the sake of simplicity: body and sender. I know now to find messages that are restricted to a sender, the posted query would be something like:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"sender": "mike#foo.com"
}
}
]
}
}
}
}
}
Similarly, if I wish to know how to find messages that are similar to a single hero message using the contents of the body, i can issue a query like:
{
"query": {
"more_like_this": {
"fields" : ["body"],
"like" : [{
"_index" : "foo",
"_type" : "email",
"_id" : "a1af33b9c3dd436dabc1b7f66746cc8f"
}],
"min_doc_freq" : 2,
"min_word_length" : 2,
"max_query_terms" : 12,
"include" : "true"
}
}
}
both of these queries specify the results by adding clauses inside the query clause of the root object. However, any way I try to put these together gives me parse exceptions. I can't find any examples of documentations that would say, give me emails that are similar to this hero, but only from mike#foo.com
You're almost there, you can combine them both using a bool/filter query like this, i.e. make an array out of your filter and put both constraints in there:
{
"query": {
"bool": {
"filter": [
{
"term": {
"sender": "mike#foo.com"
}
},
{
"more_like_this": {
"fields": [
"body"
],
"like": [
{
"_index": "foo",
"_type": "email",
"_id": "a1af33b9c3dd436dabc1b7f66746cc8f"
}
],
"min_doc_freq": 2,
"min_word_length": 2,
"max_query_terms": 12,
"include": "true"
}
}
]
}
}
}

Elasticsearch Query Help - Multiple Nested AND/OR

I am struggling with elasticsearch filters. I have a company_office type that looks like this:
{
"company_office_id": 1,
"is_headquarters": true,
"company": {
"name": "Some Company Inc"
},
"attribute_values": [
{
"attribute_id": 1,
"attribute_value": "attribute 1 value",
},
{
"attribute_id": 2,
"attribute_value": "ABC",
},
{
"attribute_id": 3,
"attribute_value": "DEF",
},
{
"attribute_id": 3,
"attribute_value": "HIJ",
}
]
}
Let's assume that attribute_value is not_analyzed - so I can match on it exactly.
Now I want to filter on a combination of multiple attribute_id and value fields. Something like this in SQL:
SELECT *
FROM CompanyOffice c
JOIN Attributes a --omitting the ON here, just assume the join is valid
WHERE
c.is_headquarters = true AND
(
(a.attribute_id=2 AND a.attribute_value IN ('ABC')) OR
(a.attribute_id=3 AND a.attribute_value IN ('DEF','HIJ'))
)
So I need to filter on specific fields + multiple combinations of id/value.
Here is the query I tried:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "term": {"is_headquarters": true } },
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
]
}
}
}
}
}
This query is returning results even the company_office does not have any id/value pairing of 1/'HIJ'. My thinking here is that because this bool filter is sitting inside of the parent must section, then all items must be true:
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
Why would this query return results given the data sample provided at the beginning of the question? Is there a different way to write the filter and accomplish what I am trying to do?
Thanks so much for any help!
If you want to query deeper objects without flattening their structure, you need to set
"type": "nested"
on "attribute_values" property.
Then refer how to write nested queries in documentation, and you should correctly retrieve the whole document. Use inner hits to retrieve matched attribute_values.
By default, Elasticsearch does not nest properties when indexing. All subfields get's squashed into separate subfields without ability to query them by their actual structure. You will not see this effect, because original document is returned.
Apart from that, your queries are a bit off. In the last "should" statement, you have only 1 term filter so it's effectively a "must" part, but they will have to be rewritten to nested format.

Resources