Elastic search query to return documents matching all elements in an array - elasticsearch

I have a structure similar to this:
Document 1:
nestedobject: {
uniqueid: "12345",
field: [ {id: 1,
color: blue,
fruit:banana},
{id: 2,
color: red,
fruit:apple},
]
}
Document 2: (in same index)
nestedobject: {
uniqueid:23456,
field: [ {id: 3,
color: blue,
fruit:banana},
{id: 4,
color: blue,
fruit:banana},
]
}
the field mappings can be seen as :
{"mappings":
"nestedobject":{
"properties":{
"uniqueid":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
"field":{
"type":"nested",
"id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"color":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"fruit":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
Now, I try to query this index with 2 documents and I want only the document which has all the elements in the field array with color blue and fruit as banana - NOT ATLEAST 1.
Right now, with the query, I get returned both the documents as it matches with the first element of the first document and returns that.
How to make this possible?
{
"query": {
"nested" : {
"path" : "nestedobject.field",
"query" : {
"bool" : {
"must" : [
{ "match" : {"nestedobject.field.color" : "blue"} },
{ "match" : {"nestedobject.field.fruit" : "banana"}}
]
}
}
}
}
}

Change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"match":{
"field.color": "blue"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"match":{
"field.fruit": "banana"
}
}
}
}
]
}
}
}
Note that there are two Nested Queries inside a must clause.
Also note that, in order to make use of Exact Match, you should be using Term Queries on keyword field as shown below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"term": {
"field.color.keyword": "yellow"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"term": {
"field.fruit.keyword": "banana"
}
}
}
}
]
}
}
}
Hope that helps and if you think that solved what you are looking for, feel free to upvote and/or accept the answer by clicking on big gray check button on the left side of this answer.

Related

Nested query on ElasticSearch for Long type (ES 5.0.4)

This is my first question on Stack overflow , please excuse me for the mistakes. I will improve on them in the future.
I am new to Elastic Search too. Okay so I am trying to do a exact match in elastic search (5.0.4). Instead of doing an exact match, the request returns all the documents present.
Not sure of this behavior.
Here is the mapping
{
"properties":{
"debug_urls":{
"properties":{
"characteristics":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"url_id":{
"type":"long"
}
},
"type":"nested"
},
"scanId":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
This is my request.
{
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
}
}
}
}
The response received,
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1,
"hits":[
{
"_index":"cust_cca39c0c6c8141008e9411032bbf4d21",
"_type":"debug-urls",
"_id":"AW70h0l72s9qXitMsWgC",
"_score":1,
"_source":{
"scan_id":"n_a0a523fb5c81435fb79c34c624c7fbd6",
"debug_urls":[
{
"url_id":1,
"characteristics":[
"FORM",
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":2,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":3,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":4,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":5,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":6,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":7,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
}
]
}
}
]
}
}
If you only want to see the nested documents that match the criteria, you can leverage nested inner_hits:
{
"_source":["scan_id"], <--- add this line
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
},
"inner_hits": {} <--- add this line
}
}
}

Elasticsearch [match] unknown token [START_OBJECT] after [created_utc]

I am learning how to use elasticsearch using the 2006 dataset of reddit comments from pushshift.io.
created_utc is the field with the time a comment was created.
I am trying to get all the posts within a certain time range. I googled a bit and found out that I need to use the "range" keyword.
This is my query right now:
{
"query": {
"match" : {
"range": {
"created_utc": {
"gte": "1/1/2006",
"lte": "31/1/2006",
"format": "dd/MM/yyyy"
}
}
}
}
}
I then tried using a bool query so I can match time range with edited must not = False (edited being the boolean field that tells me whether a post has been edited or not):
{
"query": {
"bool" : {
"must" : {
"range" : {
"created_utc": {
"gte" : "01/12/2006", "lte": "31/12/2006", "format": "dd/MM/yyyy"
}
}
},
"must_not": {
"edited": False
}
}
}
}
However, this gave me another error that I can't figure out:
[edited] query malformed, no start_object after query name
I'd appreciate if anyone can help me out with this, thanks!
Here is my mapping for the comment if it helps:
{
"comment":{
"properties":{
"author":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"body":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"controversiality":{
"type":"long"
},
"created_utc":{
"type":"date"
},
"edited":{
"type":"boolean"
},
"gilded":{
"type":"long"
},
"id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"link_id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"parent_id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"score":{
"type":"long"
},
"subreddit":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
If you want to get all the posts within a time range, then you must be using a range query. The problem with your query is you are using range inside a match query which is not allowed in elasticsearch, so your query should look like:
{
"query": {
"range": {
"created_utc": {
"gte": 1136074029,
"lte": 1136076410
}
}
}
}
Providing the fact that the created_utc field is saved as epoch, you must use a epoch format to query.
The second query where you want to find the posts within a range where edited must not false:
{
"query": {
"bool": {
"must": [
{
"range": {
"created_utc": {
"gte": 1136074029,
"lte": 1136076410
}
}
}
],
"must_not": [
{
"match": {
"edited": false
}
}
]
}
}
}
Note: If your created_utc is stored in dd/MM/yyyy format then while querying you should use a strict companion format, i.e. instead of 1/1/2006 you should be giving 01/01/2006.
Hope this helps !

Mutiple query_strings (nested and not nested)

I have got the following index:
{
"thread":{
"properties":{
"members":{
"type":"nested",
"properties":{
"memberId":{
"type":"keyword"
},
"firstName":{
"type":"keyword",
"copy_to":[
"members.fullName"
]
},
"fullName":{
"type":"text"
},
"lastName":{
"type":"keyword",
"copy_to":[
"members.fullName"
]
}
}
},
"name":{
"type":"text"
}
}
}
}
I want to implement a search, that finds all threads, that either match the members name or the thread name, as long as the user id matches.
My current query looks like this:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "members",
"score_mode": "none",
"query": {
"bool": {
"filter": [
{ "match": { "members.id": "123456789" } }
]
}
}
}
},
{
"nested": {
"path": "members",
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "Rhymen",
"fields": ["members.fullName"]
}
}
}
}
}
}
]
}
}
}
Can I filter the members and thread names in one query or do I have to merge two separate queries? I tried adding a "should" with "minimum_should_match: 1" so I could add a second not nested "query_string". But that didn't work as expected (scores were pretty screwed).
yeah i think this should work.
you have to keep the concern for filter memberId in both the filters. Nested filter will need it to match the user with memberId and name.
{
"query": {
"bool": {
"must": [{
"nested": {
"path": "members",
"query": {
"term": {
"members.memberId": {
"value": 1
}
}
}
}
},
{
"bool": {
"should": [{
"term": {
"name": {
"value": "thread_name"
}
}
},
{
"nested": {
"path": "members",
"query": {
"bool": {
"should": [{
"term": {
"members.fullName": {
"value": "trump"
}
}
},
{
"term": {
"members.memberId": {
"value": 1
}
}
}
]
}
}
}
}
]
}
}
]
}
}
}

Multi-level nesting in elastic search

I have the below structure (small part of a very large elastic-search document)
sample: {
{
"md5sum":"4002cbda13066720513d1c9d55dba809",
"id":1,
"sha256sum":"1c6e77ec49413bf7043af2058f147fb147c4ee741fb478872f072d063f2338c5",
"sha1sum":"ba1e6e9a849fb4e13e92b33d023d40a0f105f908",
"created_at":"2016-02-02T14:25:19+00:00",
"updated_at":"2016-02-11T20:43:22+00:00",
"file_size":188416,
"type":{
"name":"EXE"
},
"tags":[
],
"sampleSources":[
{
"filename":"4002cbda13066720513d1c9d55dba809",
"source":{
"name":"default"
}
},
{
"filename":"4002cbda13066720332513d1c9d55dba809",
"source":{
"name":"default"
}
}
]
}
}
The filter I would like to use is to find by the 'name' contained within sample.sampleSources.source using elastic search.
I tried the below queries
curl -XGET "http://localhost:9200/app/sample/_search?pretty" -d {query}
where, {query} is
{
"query":{
"nested":{
"path":"sample.sampleSources",
"query":{
"nested":{
"path":"sample.sampleSources.source",
"query":{
"match":{
"sample.sampleSources.source.name":"default"
}
}
}
}
}
}
}
However, it is not returning me any results. I have certain cases in my document where the nesting is more deeper than this. Can someone please guide me as to how should I formulate this query so that it works for all cases?
EDIT 1
Mappings:
{
"app":{
"mappings":{
"sample":{
"sampleSources":{
"type":"nested",
"properties":{
"filename":{
"type":"string"
},
"source":{
"type":"nested",
"properties":{
"name":{
"type":"string"
}
}
}
}
}
}
EDIT 2
The solution posted by Waldemar Neto below works well for match query but not for a wild-card or neither for a regexp
Can you please guide? I need the wild-card and the regexp queries to be working for this.
i tried here using your examples and works fine.
Take a look in my data.
mapping:
PUT /app
{
"mappings": {
"sample": {
"properties": {
"sampleSources": {
"type": "nested",
"properties": {
"source": {
"type": "nested"
}
}
}
}
}
}
}
indexed data
POST /app/sample
{
"md5sum": "4002cbda13066720513d1c9d55dba809",
"id": 1,
"sha256sum": "1c6e77ec49413bf7043af2058f147fb147c4ee741fb478872f072d063f2338c5",
"sha1sum": "ba1e6e9a849fb4e13e92b33d023d40a0f105f908",
"created_at": "2016-02-02T14:25:19+00:00",
"updated_at": "2016-02-11T20:43:22+00:00",
"file_size": 188416,
"type": {
"name": "EXE"
},
"tags": [],
"sampleSources": [
{
"filename": "4002cbda13066720513d1c9d55dba809",
"source": {
"name": "default"
}
},
{
"filename": "4002cbda13066720332513d1c9d55dba809",
"source": {
"name": "default"
}
}
]
}
Search query
GET /app/sample/_search
{
"query": {
"nested": {
"path": "sampleSources.source",
"query": {
"match": {
"sampleSources.source.name": "default"
}
}
}
}
}
Example using wildcard
GET /app/sample/_search
{
"query": {
"nested": {
"path": "sampleSources.source",
"query": {
"wildcard": {
"sampleSources.source.name": {
"value": "*aul*"
}
}
}
}
}
}
The only thing that I saw some difference was in the path, you don't need to set the sample (type) in the nested path, only the inner objets.
Test and give me a feedback.

Nested ElasticSearch query results in too many items

The nested ElasticSearch query below returns some results it should not hit. A lot of results do not contain the requested order number but are listed nevertheless. I'm not getting all documents though so the query is definitely reducing the result set on some level.
{
"query": {
"nested": {
"path": "orders",
"query": {
"match": {
"orderNumber": "242347"
}
}
}
}
}
The query result (truncated):
{
"took":0,
"timed_out":false,
"_shards": {
"total":1,
"successful":1,
"failed":0
},
"hits": {
"total":60,
"max_score":9.656103,
"hits":[
{
"_index": "index1",
"_type":"documenttype1",
"_id":"mUmudQrVSC6rn68ujDJ8iA",
"_score":9.656103,
"_source" : {
"documentId": 12093894,
"orders": [
{
"customerId": 129048669,
"orderNumber": "242347", // <-- CORRECT HIT ON ORDER
},
{
"customerId": 229405848,
"orderNumber": "431962"
}
]
}
},
{
"_index":"index1",
"_type":"documenttype1",
"_id":"9iO5QBCpT_6kmH3CoBTdWw",
"_score":9.656103,
"_source" : {
"documentId": 43390283,
// <-- ORDER ISN'T HERE BUT THE DOCUMENT IS HIT NEVERTHELESS!
"orders": [
{
"customerId": 229405848,
"orderNumber": "431962"
},
{
"customerId": 129408979,
"orderNumber": "142701"
}
]
}
}
// Left out 58 more results most of which do not contain
// the requested order number.
]
}
}
As you can see, there is a hit (actually, there are quite a few of them) that shouldn't be there because none of the orders contain the requested order number.
This is the mapping for documenttype1:
{
"index1":{
"properties":{
"documentId":{
"type":"integer"
},
"orders":{
"type":"nested",
"properties":{
"customerId":{
"type":"integer"
},
"orderNumber":{
"type":"string",
"analyzer":"custom_internal_code"
}
}
}
}
}
}
Finally, here are the settings to clarify the custom_internal_code analyzer as referred to in the mapping shown above:
{
"index1":{
"settings":{
"index.analysis.analyzer.custom_internal_code.filter.1":"asciifolding",
"index.analysis.analyzer.custom_internal_code.type":"custom",
"index.analysis.analyzer.custom_internal_code.filter.0":"lowercase",
"index.analysis.analyzer.custom_internal_code.tokenizer":"keyword",
}
}
}
for a exact search use termquery [1] and make orderNumber not_analyzed [2].
[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping-intro.html#_literal_index_literal
It seems that you should use bool query instead of match.
But. If you want just filter your records, your should use nested filter instead of query. It works faster, because you have not to calculate scores.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "orders",
"filter": {
"bool": {
"must": [
{
"term": {
"orderNumber": "242347"
}
}
]
}
},
"_cache": true
}
}
}
}
}

Resources