Elasticsearch Query Help - Multiple Nested AND/OR - elasticsearch

I am struggling with elasticsearch filters. I have a company_office type that looks like this:
{
"company_office_id": 1,
"is_headquarters": true,
"company": {
"name": "Some Company Inc"
},
"attribute_values": [
{
"attribute_id": 1,
"attribute_value": "attribute 1 value",
},
{
"attribute_id": 2,
"attribute_value": "ABC",
},
{
"attribute_id": 3,
"attribute_value": "DEF",
},
{
"attribute_id": 3,
"attribute_value": "HIJ",
}
]
}
Let's assume that attribute_value is not_analyzed - so I can match on it exactly.
Now I want to filter on a combination of multiple attribute_id and value fields. Something like this in SQL:
SELECT *
FROM CompanyOffice c
JOIN Attributes a --omitting the ON here, just assume the join is valid
WHERE
c.is_headquarters = true AND
(
(a.attribute_id=2 AND a.attribute_value IN ('ABC')) OR
(a.attribute_id=3 AND a.attribute_value IN ('DEF','HIJ'))
)
So I need to filter on specific fields + multiple combinations of id/value.
Here is the query I tried:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "term": {"is_headquarters": true } },
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
]
}
}
}
}
}
This query is returning results even the company_office does not have any id/value pairing of 1/'HIJ'. My thinking here is that because this bool filter is sitting inside of the parent must section, then all items must be true:
{"bool": {
"must":[
{"term": {"attribute_values.attribute_id": 1}},
{"bool": { "should": [{"term": {"attribute_values.attribute_value": "HIJ"}}]}}
]
}}
Why would this query return results given the data sample provided at the beginning of the question? Is there a different way to write the filter and accomplish what I am trying to do?
Thanks so much for any help!

If you want to query deeper objects without flattening their structure, you need to set
"type": "nested"
on "attribute_values" property.
Then refer how to write nested queries in documentation, and you should correctly retrieve the whole document. Use inner hits to retrieve matched attribute_values.
By default, Elasticsearch does not nest properties when indexing. All subfields get's squashed into separate subfields without ability to query them by their actual structure. You will not see this effect, because original document is returned.
Apart from that, your queries are a bit off. In the last "should" statement, you have only 1 term filter so it's effectively a "must" part, but they will have to be rewritten to nested format.

Related

Search multiple fields and output summed score with Elasticsearch

I have multiple fields, eg. f1, f2, f3, that I want to search a single term against each and return the aggregated score where any field matches. I do not want to search each field by the same terms, only search a field by its own term, eg. f1:t1, f2:t2, f3:t3.
Originally, I was using a must bool query with multi_match and the fields all concatenated as t1 t2 t3 and all fields searched, but the results aren't great. Using a dis_max query gets better results where I'm able to search the individual fields by their own term, but if for example t1 is found in f1 AND t2 in f2 the results from dis_max give back the highest resulting score. So if I have 3 documents with { "f1": "foo", "f2": "foo" }, { "f1": "foo", "f2": "bar" }, { "f1": "foo", "f2": "baz" } and I search for f1:foo and f2:ba I can still get back the first record with f2 of foo in the case where it was created most recently. What I'm trying to do is say that f1 matched foo so there's a score related to that, and f2 matched bar so the resultant score should be f1.score + f2.score always bringing it up to the top because it matches both.
I'm finding that I could programmatically build a query that uses query_string, eg. (limiting to two fields for brevity)
GET /_search
{
"query": {
"query_string": {
"query": "(f1:foo OR f1.autocomplete:foo) OR (f2:ba OR f2.autocomplete:ba)"
}
}
}
but I need to add a boost to the fields and this doesn't allow for that. I could also use a dis_max with a set of queries, but I'm really not sure how to aggregate score in that case.
Using better words, what I'm trying to search for is: if I have people data and I want to search for first name and last name, without searching first by last and last by first, a result that matches both first and last name should be higher than if it only returns one or the other.
Is there a better/good/proper way to achieve this using something? I feel like I've been over a lot of the query API and haven't found something that would be most good.
You can use a simple should query
minimum_should_match:1,
"should" : [
{ "term" : { "f1" : "foo" } },
{ "term" : { "f2" : "ba" } }
]
more clause a document matches , more score it will have.
Unable to edit the answer provided so posting the solution that was derived from the other answer here.
GET _search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"f1": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f1.autocomplete": {
"query": "foo",
"boost": 1.5
}
}
},
{
"match": {
"f2": {
"query": "ba",
"boost": 1
}
}
},
{
"match": {
"f2.autocomplete": {
"query": "ba",
"boost": 1
}
}
}
]
}
}
}
This gets me results that meet all of my criteria.

Custom ordering on elastic search

I'm executing a simple query which returns items matched by companyId.
In addition to only showing clients matching a specific company I also want records matching a certain location to appear at the top.So if somehow I pass through pseudo sort:"location=Johannesburg" it would return the data below and items which match the specific location would appear on top, followed by items with other locations.
Data:
{
"clientId" : 1,
"clientName" : "Name1",
"companyId" : 8,
"location" : "Cape Town"
},
{
"clientId" : 2,
"clientName" : "Name2",
"companyId" : 8,
"location" : "Johannesburg"
}
Query:
{
"query": {
"match": {
"companyId": "8"
}
},
"size": 10,
"_source": {
"includes": [
"firstName",
"companyId",
"location"
]
}
}
Is something like this possible in elastic and if so what is the name of this concept?(I'm not sure what to even Google for to solve this problem)
It can be done in different ways.
Simplest (if go only with text matching) is use bool query with should statement.
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document. Doc
Example:
{"query":
"bool": {
"must": [
"match": {
"companyId": "8"
}
],
"should": [
"match": {
"location": "Johannesburg"
}
]
}
}
}
More complex solution is to store GEO points in location, and use Distance feature query as example.

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How to make use of `gt` and `fields` in the same query in Elasticsearch

In my previous question, I was introduced to the fields in a query_string query and how it can help me to search nested fields of a document.
{
"query": {
"query_string": {
"fields": ["*.id","id"],
"query": "2"
}
}
}
But it only works for matching, what if I want to do some comparison? After some reading and testing, it seems queries like range do not support fields. Is there any way I can perform a range query, e.g. on a date, over a field that can be scattered anywhere in the document hierarchy?
i.e. considering the following document:
{
"id" : 1,
"Comment" : "Comment 1",
"date" : "2016-08-16T15:22:36.967489",
"Reply" : [ {
"id" : 2,
"Comment" : "Inner comment",
"date" : "2016-08-16T16:22:36.967489"
} ]
}
Is there a query searching over the date field (like date > '2016-08-16T16:00:00.000000') which matches the given document, because of the nested field, without explicitly giving the address to Reply.date? Something like this (I know the following query is incorrect):
{
"query": {
"range" : {
"date" : {
"gte" : "2016-08-16T16:00:00.000000",
},
"fields": ["date", "*.date"]
}
}
}
The range query itself doesn't support it, however, you can leverage the query_string query (again) and the fact that you can wildcard fields and that it supports range queries in order to achieve what you need:
{
"query": {
"query_string": {
"query": "\*date:[2016-08-16T16:00:00.000Z TO *]"
}
}
}
The above query will return your document because Reply.date matches *date

ElasticSearch: Labelling documents with matching search term

I'm using elasticsearch 1.7 and am in need of a way to label documents with what part of a query_string query they match.
I've been experimenting with highlighting, but found that it gets a bit messy with some cases. I'd love to have the document tagged with matching search terms.
Here is the query that I'm using: ( note this is a ruby hash that later gets encoded to JSON )
{
query: {
query_string: {
fields: ["title^10", "keywords^4", "content"],
query: query_string,
use_dis_max: false
}
},
size: 20,
from: 0,
sort: [
{ pub_date: { order: :desc }},
{ _score: { order: :desc }}
]
}
The query_string variable is based off user followed topics and might look something like this: "(the AND walking AND dead) OR (iphone) OR (video AND games)"
Is there any option I can use so that documents returned would have a property matching a search term like the walking dead or (the AND walking AND dead)
If you're ready to switch to using bool/should queries, you can split the match on each field and use named queries, then in the results you'll get the name of the query that matched.
It goes basically like this: in a bool/should query, you add one query_string query per field and name the query so as to identify that field (e.g. title_query for the title field, etc)
{
"query": {
"bool": {
"should": [
{
"query_string": {
"fields": [
"title^10"
],
"query": "query_string",
"use_dis_max": false,
"_name": "title_query"
}
},
{
"query_string": {
"fields": [
"keywords^4"
],
"query": "query_string",
"use_dis_max": false,
"_name": "keywords_query"
}
},
{
"query_string": {
"fields": [
"content"
],
"query": "query_string",
"use_dis_max": false,
"_name": "content_query"
}
}
]
}
}
}
In the results, you'll then get below the _source another array called matched_queries which contains the name of the query that matched the returned document.
"_source": {
...
},
"matched_queries": [
"title_query"
],

Resources