Elasticsearch: Intersection of array - elasticsearch

Suppose my documents are like this
{
"salary": {
"max": 1572,
"min": 682
},
"skills": [
"Modula-3",
"Max/MSP",
"Arduino",
"SPARK",
"PL/SQL",
"Processing",
"Go",
"Mathematica",
"Modula-2",
"IDL",
"Heron",
"Scheme"
],
"company": "Merck",
"experience": 0,
"role": "Airport Security Screener",
"cities": [
"Ahmedabad",
"Mangaluru",
"Malegaon",
"Bokaro Steel City",
"Vadodara",
"Kollam"
]
}
And I want to do a query in which I will provide a set of cities and will get the documents ordered according the cardinality of intersection. i.e suppose my set of cities is ["Ahmedabad", "Mangaluru"], then the cardinatlity of intersection of this query with the above document is 2. What should be my query?
Sample Response
{"_score": 4.0202227, "cities": ["Ahmedabad","Mangaluru","Visakhapatnam", "Vijayawada"]}
{"_score": 2.27, "cities": ["Ahmedabad","Visakhapatnam", "Vijayawada"]}
{"_score": 1.79, "cities": ["Mangalauru","Vijayawada", "delhi", "bombay"]}
I am using elasticsearch 5.2.2

Maybe something like this will help you?
{
"query": {
"function_score": {
"query": {
"match": {
"cities": "Ahmedabad Mangaluru"
}
},
"functions": [
{
"filter": {
"match": {
"cities": "Ahmedabad"
}
},
"weight": 1
},
{
"filter": {
"match": {
"cities": "Mangaluru"
}
},
"weight": 1
}
],
"score_mode": "sum"
}
}
}

Related

Cannot seem to use must and must_not together in an elastic search query

If I run the following query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "boxing",
"fuzziness": 2,
"minimum_should_match": 2
}
}
],
"must_not": [
{
"terms_set": {
"allowedCountries": {
"terms": ["gb", "mx"],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"filter": [
{
"range": {
"expireTime": {
"gt": 1674061907954
}
}
},
{
"term": {
"region": {
"value": "row"
}
}
},
{
"term": {
"sourceType": {
"value": "article"
}
}
}
]
}
}
}
against an index with articles that look like:
{
"_index": "content-items-v10",
"_type": "_doc",
"_id": "e7hm75ui4dma1mm4j8q5v7914",
"_score": 4.3724976,
"_source": {
"allowedCountries": ["gb", "ie"],
"body": "Both Joshua Buatsi and Craig Richards join The DAZN Boxing Show ahead of their clash at London's O2 Arena. Matchroom's Eddie Hearn also gives his take on the night, as well as Chantelle Cameron previewing her contest with Victoria Noelia Bustos.",
"competitions": [
{
"id": "8lo6205qyio0fksjx9glqbdhj",
"name": "Buatsi v Richards"
}
],
"contestants": [
{
"id": "7rq59j3eiamxlm12vhxcsgujj",
"name": "Joshua Buatsi"
},
{
"id": "boby9oqe23g6qyuwphrxh8su5",
"name": "Craig Richards"
}
],
"countries": [
{
"id": "7yasa43laq1nb2e6f8bfuvxed",
"name": "World"
},
{
"id": "258l9t5sm55592i08mdpqzr3t",
"name": "United Kingdom"
}
],
"dotsLastUpdateTime": 1673979749396,
"expireTime": 4800000000000,
"fixtureDate": {},
"headline": "Buatsi vs. Richards: Preview",
"id": "e7hm75ui4dma1mm4j8q5v7914",
"importance": 0,
"languageKeys": ["en"],
"languages": ["en"],
"lastUpdateTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"promoImageUrl": null,
"publication": {
"typeId": "1plcw0iyhx9vn1fcanbm2ja3rf",
"typeName": "Shoulder"
},
"publishedTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"region": "row",
"shortHeadline": null,
"sourceType": "article",
"sports": [
{
"id": "2x2oqzx60orpoeugkd754ga17",
"name": "Boxing"
}
],
"teaser": "",
"thumbnailImageUrl": "https://images.daznservices.com/di/library/babcock_canada/45/3e/the-dazn-boxing-show-20052022_xc4jbfqi022l1shq9lu641h9e.png?t=-477976832",
"translations": {}
}
}
I get the following validation error from elasticsearch:
{
"ok": false,
"errors": {
"validation": [
{
"message": "\"query.bool.must_not\" is not allowed",
"path": [
"query",
"bool",
"must_not"
],
"type": "object.unknown",
"context": {
"child": "must_not",
"label": "query.bool.must_not",
"value": [
{
"terms_set": {
"allowedCountries": {
"terms": [
"gb",
"mx"
],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"key": "must_not"
}
}
]
},
"correlationId": "d29e9275-9ab3-4ff8-944d-852b98d4b503"
}
And I cannot figure out what the issue might be! From the elastic docs it should be OK.
I'm using ElasticSearch 7.9.3 running in a local docker container.
I'm hoping someone out there will give me a clue!
Cheers!
I would expect this to just work.
I'm trying to filter out articles that have both of the country codes gb and mx in the field allowedCountries.
I can include them easily enough in the results when I add the terms_set query to the bool.must section of the query.
It works well, you just need to enclose your query in the query section
{
"query": { <--- add this
"bool": { <--- your query starts here
"must": [
...
Thank you for responding!
I was helping with a system I did not have full context on - it turns out there is a proxy in the mix with validation that was blocking the must_not query. So, with the proxy fixed, it now works.

how to match multiple fields inside filter keyword in elastic search query?

I want to add one more field inside match inside function block in my query, but when i am adding, i am getting an error ------ "reason" : "[match] query doesn't support multiple fields, found [gender] and [id]",
How do i do it?
GET exp/_search
{
"_source": ["score","answer","gender","id"]
, "query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"match":{
"gender":"male",
"id":1
}
},
"weight": 2
}
]
}
}
}
You can create bool query inside filter and it will be resolved your issue. match query does not support providing 2 diffrent field and values. You can use bool query for same purpose.
{
"_source": [
"score",
"answer",
"gender",
"id"
],
"query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"match": {
"gender": "male"
}
},
{
"match": {
"id": 1
}
}
]
}
},
"weight": 2
}
]
}
}
}
Also, If you want to apply two different boosting value for gender and id then you can give two filter clause as shown below:
{
"_source": [
"score",
"answer",
"gender",
"id"
],
"query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"match": {
"gender": "male"
}
},
"weight": 2
},
{
"filter": {
"match": {
"id": 1
}
},
"weight": 1
}
]
}
}
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Nested filtering in elasticsearch with more than one term of the same nested type

I'm new to elasticsearch, so maybe my approach is plain wrong, but I want to make an index of recipes and allow the user to filter it down with the aggregated ingredients that are still found in the subset.
Maybe I'm using the wrong language to explain so maybe this example will clarify. I would like to search for recipes with the term salt; which results in three recipes:
with ingredients: salt, flour, water
with ingredients: salt, pepper, egg
with ingredients: water, flour, egg, salt
The aggregate on the results ingredients returns salt, flour, water, pepper, egg. When I filter with flour I only want recipe 1 and 3 to appear in the search results (and the aggregate on ingredients should only return salt, flour, water, egg and salt). When I add another filter egg I want only recipe 3 to appear (and the aggregate should only return water, flour, egg, salt).
I can't make the latter to work: one filter next to the default query does narrow down the results as desired but when adding the other term (egg) to the terms filter the results again start to include b as well, as if it were an OR filter. Adding AND however to the filter execution results in NO results ... what am I doing wrong?
My mapping:
{
"recipe": {
"properties": {
"title": {
"analyzer": "dutch",
"type": "string"
},
"ingredients": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"analyzer": "dutch",
"include_in_parent": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
My query:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"_all": "salt"
}
}
]
}
},
"filter": {
"nested": {
"path": "ingredients",
"filter": {
"terms": {
"ingredients.name": [
"flour",
"egg"
],
"execution": "and"
}
}
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"nested": {
"path": "ingredients"
},
"aggregations": {
"count": {
"terms": {
"field": "ingredients.name.raw"
}
}
}
}
}
}
Why are you using a nested mapping here? Its main purpose is to keep relations between the sub-object attributes, but your ingredients field has just one attribute and can be modeled simply as a string field.
So, if you update your mapping like this :
POST recipes
{
"mappings": {
"recipe": {
"properties": {
"title": {
"type": "string"
},
"ingredients": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
You can still index your recipes as :
{
"title":"recipe b",
"ingredients":["salt","pepper","egg"]
}
And this query gives you the result you are waiting for :
POST recipes/recipe/_search
{
"query": {
"filtered": {
"query": {
"match": {
"_all": "salt"
}
},
"filter": {
"terms": {
"ingredients": [
"flour",
"egg"
],
"execution": "and"
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"terms": {
"field": "ingredients"
}
}
}
}
which is :
{
...
"hits": {
"total": 1,
"max_score": 0.22295055,
"hits": [
{
"_index": "recipes",
"_type": "recipe",
"_id": "PP195TTsSOy-5OweArNsvA",
"_score": 0.22295055,
"_source": {
"title": "recipe c",
"ingredients": [
"salt",
"flour",
"egg",
"water"
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "egg",
"doc_count": 1
},
{
"key": "flour",
"doc_count": 1
},
{
"key": "salt",
"doc_count": 1
},
{
"key": "water",
"doc_count": 1
}
]
}
}
}
Hope this helps.

Decay filter function for a no-limit value with ElasticSearch

I have the following documents (at least 1 000 000) in an ElasticSearch index:
{"title":"toto", "views":132, "likes":23, "date" : "2014-09-01..." ...}
Where title is indexed with a lang analyser, views and likes fields are integer from 0 to infinite, and the date is a ..date field.
I want to search by title, and boost documents if they are recent and have a high views and likes.
I am using a decay filter function for the date (from today as origin), it's working as expected, but I don't know how to do for boosting the views and likes fields, since I have no max-origin.
Here my search query:
POST /threads/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "air france",
"type": "phrase",
"fields": [
"title^4",
"desc"
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "2014/09/29 13:00:00",
"scale": "12h",
"offset":"6h",
"decay":0.5
}
}
}
]
}
}
}
You could try a "field_value_factor", as per this section in the documentation. And you'd need to test and assess the results, modify the "factor" and the boost you are giving to "title" and then test again and see if it's getting closer to what you need. Also, you can use search=explain to see how ES computes the _score. Something like this:
POST /threads/_search?explain
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "air france",
"type": "phrase",
"fields": [
"title^8",
"desc"
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "2014/09/29 13:00:00",
"scale": "12h",
"offset":"6h",
"decay":0.5
}
}
},
{
"field_value_factor": {
"field": "views",
"modifier": "log2p",
"factor": 0.1
}
},
{
"field_value_factor": {
"field": "likes",
"modifier": "log2p",
"factor": 0.1
}
}
]
}
}
}

Resources