Elastic Search Nested Object mapping and Query for search - elasticsearch

I am trying to use Elastic Search and I am stuck trying to query for the nested object.
Basically my object is of the following format
{
"name" : "Some Name",
"field2": [
{
"prop1": "val1",
"prop2": "val2"
},
{
"prop1": "val3",
"prop2":: "val4"
}
]
}
Mapping I used for the nested field is the following.
PUT /someval/posts/_mapping
{
"posts": {
"properties": {
"field2": {
"type": "nested"
}
}
}
}
Say now i insert elements for /field/posts/1 and /field/posts/2 etc. I have k values for field2.prop1 and i want a query which gets the posts sorted based on most match of field2.prop1 among the K values i have. What would be the appropriate query for that.
Also I tried a simple filter but even that doesnt seem to work right.
GET /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
},
"filter" : {
"nested" : {
"path" : "field2",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"field2.prop1" : "val1"}
}
]
}
},
"_cache" : true
}
}
}
}
The above query should match atleast the first post. But it returns no match. Can anyone help to clarify whats wrong here ?

There was problem in your json structure, you used filtered query , but filter(object) was in different level than query.
Find the difference.
POST /someval/posts/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "field2",
"filter": {
"bool": {
"must": [
{
"term": {
"field2.prop1": "val1"
}
}
]
}
},
"_cache": true
}
}
}
}
}

Related

Elasticsearch: "must" query on nested fields

How to do a "must" "match" query on multiple fields under the same nesting? Here's a reproducible ES index where the "user" field is defined as "nested" type.
PUT my_index
{
"mappings": {
"properties": {
"user": {
"type": "nested",
"properties": {
"firstname": {"type": "text"}
}
}
}
}
}
And here are 2 documents:
PUT my_index/_doc/1
{
"user" : [
{
"firstname" : "John"
},
{
"firstname" : "Alice"
}
]
}
PUT my_index/_doc/2
{
"user" : [
{
"firstname" : "Alice"
}
]
}
For this index, how can I query for documents where "John" AND "Alice" both exist? With the index defined above, I expect to get Document 1 but not Document 2. So far, I've tried the following code, but it's returning no hits:
GET my_index/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{"match": {"user.firstname": "John"}},
{"match": {"user.firstname": "Alice"}}
]
}
}
}
}
}
Below query is what is required.
POST my_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "user",
"query": {
"match": {
"user.firstname": "alice"
}
}
}
},
{
"nested": {
"path": "user",
"query": {
"match": {
"user.firstname": "john"
}
}
}
}
]
}
}
}
Notice how I've made use of two nested queries in a single must clause. That is because if you notice the documents that you have alice and john are both considered two different documents.
The query you have would work if your document structure is something like below:
POST my_index/_doc/3
{
"user" : [
{
"firstname" : ["Alice", "John"]
}
]
}
Try reading this (nested datatype) and this (nested query) link to understand more on how they work and from the second link, you can see the below info:
The nested query searches nested field objects as if they were indexed
as separate documents.
Hope that helps!

Finding docs that do not contain a given user in all nested fields

I have the following mapping for a nested field called ratings:
"ratings" : {
"type" : "nested",
"properties" : {
"rating" : {
"type" : "double"
},
"user_id" : {
"type" : "long"
}
}
}
I'm attempting to find all records where a user_id does not exist in the nested field.
Here's what I have, but it's failing when there are multiple nested docs and any of the docs are not user_id 1.
{
"nested": {
"path": "ratings",
"query": {
"bool": { "must_not": [
{ "term": { "ratings.user_id": 1}}
]}}}}
If I'm understanding you correctly, and what you are trying to do is find documents for which NONE of the nested documents have a specific user_id, then this query seems to do what you want (assuming you want docs that have not been rated by user 2):
POST /test_index/_search
{
"query": {
"constant_score": {
"filter": {
"not": {
"filter": {
"nested": {
"path": "ratings",
"filter": {
"term": {
"ratings.user_id": 2
}
}
}
}
}
}
}
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/afd319e64403a7f995cbf1e9f40e5c5948729193

How to add properties from a root object in a nested object for sorting?

A simplified example of the kind of document in our index:
{
"organisation" : {
"code" : "01310"
},
"publications" : [
{
"dateEnd" : 1393801200000,
"dateStart" : 1391986800000,
"code" : "PUB.02"
},
{
"dateEnd" : 1401055200000,
"dateStart" : 1397512800000,
"code" : "PUB.06"
}
]
}
Note that publications are mapped as nested objects because we need to filter based on a combination of the dateEnd, dateStart and publicationStatus properties.
The PUB.02 status code is special. It states: 'this publication period is valid if the current user is a member of the organisation'.
I have a problem when I want to sort on 'most recent':
{
"sort": {
"publications.dateStart" : {
"mode" : "min",
"order" : "desc",
"nested_filter" : {
"or" : [
{
"and" : [
{ "term" : { "organisation.code" : "01310" } },
{ "term" : { "publications.code" : "PUB.02" } }
]
},
{ "term" : { "publications.code" : "PUB.06" } }
]
}
}
}
}
No error is given, but the PUB.02 entry is ignored. I tried to use copy_to in my mapping to copy the value of organisation.code to the nested object, but that did not help.
Is there a way to reach for the parent document inside a nested sort?
Alternatively, is there a way to copy data from parent to the nested document?
I am currently using version 1.7 of Elasticsearch without the ability to use scripts. Upgrading to a newer version could be done if that would help the situation.
This gist shows that the sort is performed on the PUB.06 publications: https://gist.github.com/EECOLOR/2db9a1ec9d6d5c791ea6
Although the documentation does not explictly mention it does look like we cannot access the parent field in a nested filter context.
Also I wasn't able to use copy_to to add data from root/parent field to nested document. I would suggest asking in elasticsearch discuss thread you would have more luck about the reasons for this.
Before some trigger happy bloke downvotes this answer I would like to add that the query and intended results that was desired in the OP using sort could be achieved using function_score work-around.
One implementation to achieve this is as follows
1) start of with a should query
2) In the first should clause
a) use filtered query to filter documents with the `organisation.code : 01310`
b) then score these documents based on max value of reciprocal of nested document **dateStart** with terms **PUB2.0 PUB6.0**
3) In the second should clause
a) use filtered query to filter documents with those with `organisation.code not equal to 01310`
b) like before score these documents based on max value of reciprocal of nested document **dateStart** with term **PUB6.0** only
Example Query:
POST /testindex/testtype/_search
{
"query": {
"bool": {
"should": [
{
"filtered": {
"filter": {
"term": {
"organisation.code": "01310"
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.02",
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
},
{
"filtered": {
"filter": {
"not": {
"term": {
"organisation.code": "01310"
}
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
}
]
}
}
}
I'm first to admit it is not the most readable and if there is a way to 'copy_to' nested it would be much more ideal
If not simulating copy_to by injecting data in the source by client before indexing would be more simpler and flexible.
But the above is an example of how it could be done using function scores.

Elasticsearch match list against field

I have a list, array or whichever language you are familiar. E.g. names : ["John","Bas","Peter"] and I want to query the name field if it matches one of those names.
One way is with OR Filter. e.g.
{
"filtered" : {
"query" : {
"match_all": {}
},
"filter" : {
"or" : [
{
"term" : { "name" : "John" }
},
{
"term" : { "name" : "Bas" }
},
{
"term" : { "name" : "Peter" }
}
]
}
}
}
Any fancier way? Better if it's a query than a filter.
{
"query": {
"filtered" : {
"filter" : {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
Which Elasticsearch rewrites as if you hat used this one
{
"query": {
"filtered" : {
"filter" : {
"bool": {
"should": [
{
"term": {
"name": "John"
}
},
{
"term": {
"name": "Bas"
}
},
{
"term": {
"name": "Peter"
}
}
]
}
}
}
}
}
When using a boolean filter, most of the time, it is better to use the bool filter than and or or. The reason is explained on the Elasticsearch blog: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
As I tried the filtered query I got no [query] registered for [filtered], based on answer here it seems the filtered query has been deprecated and removed in ES 5.0. So I provide using:
{
"query": {
"bool": {
"filter": {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
example query = filter by keyword and a list of values
{
"query": {
"bool": {
"must": [
{
"term": {
"fguid": "9bbfe844-44ad-4626-a6a5-ea4bad3a7bfb.pdf"
}
}
],
"filter": {
"terms": {
"page": [
"1",
"2",
"3"
]
}
}
}
}
}

Elasticsearch multi term filter

I'm quite new to Elasticsearch, so here's my question.
I wanna do a search query with elasticsearch and wanna filter with multiple terms.
If I want to search for a user 'tom', then I would like to have all the matches where the user 'isActive = 1', 'isPrivate = 0' and 'isOwner = 1'.
Here's my search query
"query":{
"filtered": {
"query": {
"query_string": {
"query":"*tom*",
"default_operator": "OR",
"fields": ["username"]
}
},
"filter": {
"term": {
"isActive": "1",
"isPrivate": "0",
"isOwner": "1"
}
}
}
}
When I use 2 terms, it works like a charm, but when i use 3 terms it doesn't.
Thanks for the help!!
You should use bool filter to AND all your terms:
"query":{
"filtered": {
"query": {
"query_string": {
"query":"*tom*",
"default_operator": "OR",
"fields": ["username"]
}
},
"filter": {
"bool" : {
"must" : [
{"term" : { "isActive" : "1" } },
{"term" : { "isPrivate" : "0" } },
{"term" : { "isOwner" : "1" } }
]
}
}
}
}
For version 2.x+ you can use bool query instead of filtered query with some simple replacement: https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-filtered-query.html
As one of the comments says, the syntax has changed in recent ES versions. If you are using Elasticsearch 6.+, and you want to use a wildcard and a sequence of terms in your query (such as in the question), you can use something like this:
GET your_index/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"your_field_name_1": {
"value": "tom*"
}
}
},
{
"term": {
"your_field_name_2": {
"value": "US"
}
}
},
{
"term": {
"your_field_name_3": {
"value": "Michigan"
}
}
},
{
"term": {
"your_field_name_4": {
"value": "0"
}
}
}
]
}
}
}
Also, from the documentation about wildcard queries:
Note that this query can be slow, as it needs to iterate over many
terms. In order to prevent extremely slow wildcard queries, a wildcard
term should not start with one of the wildcards * or ?.
I hope this helps.

Resources