In ElasticSearch, how do I filter the nested documents in my result? - elasticsearch

Suppose, in ElasticSearch 5, I have data with nesting like:
{"number":1234, "names": [
{"firstName": "John", "lastName": "Smith"},
{"firstName": "Al", "lastName": "Jones"}
]},
...
And I want to query for hits with number 1234 but return only the names that match "lastName": "Jones", so that my result omits names that don't match. In other words, I want to get back only part of the matching document, based on a term query or similar.
A simple nested query won't do, as such would be filtering top-level results. Any ideas?
{ "query" : { "bool": { "filter":[
{ "term": { "number":1234} },
???? something with "lastName": "Jones" ????
] } } }
I want back:
hits: [
{"number":1234, "names": [
{"firstName": "Al", "lastName": "Jones"}
]},
...
]

hits section returns a _source - this is exactly the same document you have indexed.
You are right, nested query filters top-level results, but with inner_hits it will show you which inner nested objects caused these top-level documents to be returned, and this is exactly what you need.
names field can be excluded from top-level hits using _source parameter.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
},
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}
So now top-level documents are returned without names field, and you have an additional inner_hits section with the names that match.
You should treat nested objects as part of a top-level document.
If you really need them to be separate - consider parent/child relations.

Try something like this
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
{ "term": { "number":1234} }
},
{
"nested": {
"path": "something",
"query": {
"term": {
"something.lastName": "Jones"
}
},
"inner_hits" : {}
}
}
]
}
}
}
}
}
I used this Refrence

Similar but a bit different, use the should parameter and then look at inner hits for the names. This will return the top level doc and then inner_hits will have any hits.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
}
],
should: [
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}

Related

how to combine and with or in an elastic search query

I'm working on a product search function using elasticsearch and am having trouble figuring out how to represent the following logic in a nested query:
(A or B) && (C or D)
I want this to work like a traditional programming language where it must match one of each set of or conditions in order for the product to be a match (e.g., I don't want the or conditions to just boost the score I want the products that don't match at least one condition to not be selected).
In my particular case A,B,C,D are all tests against a nested property (a list of category records).
Here are two sample index records to illustrate:
{
"ProductId":1111,
"Name":"First Product",
"AllCategories":[
{"CatId":15,"CatName":"Some Tag Name", "ParentCatId":99, "ParentCatName":"Tags"},
{"CatId":352,"CatName":"Some child menu", "ParentCatId":88, "ParentCatName":"Some parent menu"}
]
},
{
"ProductId":2222,
"Name":"Second Product",
"AllCategories":[
{"CatId":20,"CatName":"Some Tag Name2", "ParentCatId":99, "ParentCatName":"Tags"},
{"CatId":352,"CatName":"Some child menu", "ParentCatId":88, "ParentCatName":"Some parent menu"}
]
}
I've tried lots of different variants of my query but haven't been able to find one that works the way I want. This ticket is asking the same basic question but the only provided answer isn't working for me (my code below is modeled after the answer from this ticket: Elastic search combine two must with OR
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"AllCategories",
"query":{
"bool": {
"must": [
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":352} } },
{"term":{"AllCategories.ParentCatId":{"value":352} } }
]
}
},
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":15} } },
{"term":{"AllCategories.CatId":{"value":8 } } }
]
}
}
]
}
}
}
}]
}
}
}
UPDATE:
Based on the posted answer I reformatted the query as follows but it's still not working for me. It's the second bool inside the nested query that's causing the trouble. I wonder if it might be an issue with testing the same field in the nested subquery in both conditions (AllCategories.CatId):
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"AllCategories",
"query":{
"bool": {
"minimum_should_match": 2,
"should": [
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":352} } },
{"term":{"AllCategories.ParentCatId":{"value":352} } }
]
}
},
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":15} } },
{"term":{"AllCategories.CatId":{"value":8 } } }
]
}
}
]
}
}
}
}]
}
}
}
This is the mapping for the index in question
{
"mappings": {
"properties": {
"ProductId": { "type": "integer" },
"Name": { "type": "text" },
"AllCategories": {
"type": "nested",
"properties": {
"CatId": { "type": "integer" },
"ParentCatId": { "type": "integer" },
"CatName": { "type": "text" },
"ParentCatName": { "type": "text" }
}
}
"SalesRank": { "type": "integer" }
}
}
}
Using the sample products, I want the search to return product 1111 but not product 2222 (product 1111 does contain one of cat 15 and cat8. product 2222 does not contain at least one of the two. Both products satisfy the first boolean condition as they both are linked to the cat 352). In my current testing, that second bool/should condition causes the search to return no results. If I remove that one I get matches.
this is a simplified version of what i'm running for one of my apis.
the outer bool contains a should and a minimum_should_match of 2 (this is your AND clause)
inside that should are 2 bool statements each contains one of the OR clauses
each OR clause is a should with a minimum_should_match of 1
Note: fieldA and fieldB are the two distinct fields, value A-D being the various values you want to test
//GET /index/type/_search
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"minimum_should_match": 2,
"should": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"fieldA": "value a"
}
},
{
"term": {
"fieldA": "value b"
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"fieldB": "value c"
}
},
{
"term": {
"fieldB": "value d"
}
}
]
}
}
]
}
}
}

Elastic - Multiple filter query syntax

Hello I have the following query that I am running:
{
"_source": [
"source1",
"source2",
"source3",
"source4",
],
"query": {
"bool": {
"minimum_should_match": 1,
"must": {
"filter": [
{
"term": {
"_type": {
"value": "someval1"
}
}
},
{
"term": {
"_type": {
"value": "someval2"
}
}
}
],
"query_string": {
"analyze_wildcard": "true",
"query": "tesla*",
"rewrite": "scoring_boolean"
}
}
}
},
"size": 50,
"sort": [
"_score"
]
}
That is currently returning:
'"reason":"[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":343},"status":400}'
Any idea how to use multiple filters on a query? I was able to do it just fine on elastic 2.4 but since OR is now deprecated as well as filtered, I am a bit lost.
Thanks!
The syntax of the query is wrong. filter should not be wrapped into the must statement. It should be in the same level with must. Also bool queries must statement should be an array, not an object. So your query should look like this
{
"_source":[
"source1",
"source2",
"source3",
"source4"
],
"query":{
"bool":{
"minimum_should_match":1,
"must":[
{
"query_string":{
"analyze_wildcard":"true",
"query":"tesla*",
"rewrite":"scoring_boolean"
}
}
],
"filter":{
"bool":{
"should":[
{
"term":{
"_type":{
"value":"someval1"
}
}
},
{
"term":{
"_type":{
"value":"someval2"
}
}
}
]
}
}
}
},
"size":50,
"sort":[
"_score"
]
}
I think your filter is OR, that's why I wrap it inside should

Elasticsearch get all parents with no children

Originally I've been trying to get a list of parents and a single most recent child for each one of them. I've figured how to do that with the following query
{"query":
{"has_child":
{"inner_hits":
{"name": "latest", "size": 1, "sort":
[{"started_at": {"order": "desc"}}]
},
"type": "child_type",
"query": {"match_all": {}}
}
}
}
But the problem is — the results do not include parents with no children. Adding min_children: 0 doesn't help either. So I thought I could make a query for all parents with no children and combine those two in a single OR query. But I'm having trouble building such a query. Would appreciate any suggestions.
Here is your query:
{
"query":{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"has_child":{
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
},
{
"has_child":{
"inner_hits":{
"name":"latest",
"size":1, "sort":[{"started_at": {"order": "desc"}}]
},
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
}
}
Another point: just use must_not for has_child will not only show parents without child, but all the child(s) as well, because they all don't have any child...
So another limitation should be added in the bool query:
{
"query":{
"bool": {
"must_not": [
{
"has_child": {
"type": "<child-type>",
"query": {
"match_all": {}
}
}
}
],
"should": [
{
"term": {
"<the join field>": {
"value": "<parent-type>"
}
}
}
]
}
}
}

Elastic search filtered query, query part being ignored?

I'm building up the following search in code, the idea being that it filters down the set of matches then queries this so I can add score based on certain fields. For some reason the filter part works but whatever I put in the query (i.e. in the below I have no index sdfsdfsdf) it still returns anything matching the filter.
Is the syntax wrong?
{
"query":{
"filtered":{
"query":{
"bool":{
"must":{
"match":{
"sdfsdfsdf":{
"query":"4",
"boost":2.0
}
}
}
},
"filter":{
"bool":{
"must":[
{
"terms":{
"_id":[
"55f93ead5df34f1900abc20b",
"55f8ab0226ec4bb216d7c938",
"55dc4e949dcf833308c63d6b"
]
}
},
{
"range":{
"published_date":{
"lte":"now"
}
}
}
],
"must_not":{
"terms":{
"_id":[
"55f0a799acccc28204a5058c"
]
}
}
}
}
}
}
}
}
Your filter is not at the right level. It should not be inside query but at the same level as query like this:
{
"query": {
"filtered": {
"query": { <--- query and filter at the same level
"bool": {
"must": {
"match": {
"sdfsdfsdf": {
"query": "4",
"boost": 2
}
}
}
}
},
"filter": { <--- query and filter at the same level
"bool": {
"must": [
{
"terms": {
"_id": [
"55f93ead5df34f1900abc20b",
"55f8ab0226ec4bb216d7c938",
"55dc4e949dcf833308c63d6b"
]
}
},
{
"range": {
"published_date": {
"lte": "now"
}
}
}
],
"must_not": {
"terms": {
"_id": [
"55f0a799acccc28204a5058c"
]
}
}
}
}
}
}
}
You need to replace sdfsdfsdf with your existing field name in your type, e.g. title, otherwise I think it will fallback to match_all query.
"match":{
"title":{
"query": "some text here",
"boost":2.0
}
}

multiple search conditions in one query in es and distinguish the items according to the conditions

For one case I need to put multiple search conditions in one query to reduce the number of queries we need.
However, I need to distinguish the returning items based on the conditions.
Currently I achieved this goal by using function score query, specifically: each condition is assigned with a score, and I can differentiate the results based on those scores.
However, the performance is not that good. Plus now we need to get the doc count of each condition.
So is there any way to do it? I'm thinking using aggregation, but not sure if I can do it.
Thanks!
update:
curl -X GET 'localhost:9200/locations/_search?fields=_id&from=0&size=1000&pretty' -d '{
"query":{
"bool":{
"should":[
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"new york"}},{"term":{"state":"ny"}}]
}
}
}
},
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"los angeles"}},{"term":{"state":"ca"}}]
}
}
}
}
]
}
}}'
Well to answer the first part of your question , names queries are the best.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"field2": {
"query": "hosted Elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
}
}
This will return an additional field called matched_queries for each hit which will have the information on queries matched for that document.
You can find more info on names queries here
But this this information cant be used for aggregation.
So you need to handle the second part of your question in a separate manner.
Filter aggregation for each query type would be the idea solution here.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"text": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"source": {
"query": "elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
},
"aggs": {
"firstQuery": {
"filter": {
"term": {
"text": "qbox"
}
}
},
"secondQuery": {
"filter": {
"term": {
"source": "elasticsearch"
}
}
}
}
}
You can find more on filter aggregation here

Resources