Nested query on ElasticSearch for Long type (ES 5.0.4) - elasticsearch

This is my first question on Stack overflow , please excuse me for the mistakes. I will improve on them in the future.
I am new to Elastic Search too. Okay so I am trying to do a exact match in elastic search (5.0.4). Instead of doing an exact match, the request returns all the documents present.
Not sure of this behavior.
Here is the mapping
{
"properties":{
"debug_urls":{
"properties":{
"characteristics":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"url_id":{
"type":"long"
}
},
"type":"nested"
},
"scanId":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
This is my request.
{
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
}
}
}
}
The response received,
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1,
"hits":[
{
"_index":"cust_cca39c0c6c8141008e9411032bbf4d21",
"_type":"debug-urls",
"_id":"AW70h0l72s9qXitMsWgC",
"_score":1,
"_source":{
"scan_id":"n_a0a523fb5c81435fb79c34c624c7fbd6",
"debug_urls":[
{
"url_id":1,
"characteristics":[
"FORM",
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":2,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":3,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":4,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":5,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":6,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":7,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
}
]
}
}
]
}
}

If you only want to see the nested documents that match the criteria, you can leverage nested inner_hits:
{
"_source":["scan_id"], <--- add this line
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
},
"inner_hits": {} <--- add this line
}
}
}

Related

Elastic search query to return documents matching all elements in an array

I have a structure similar to this:
Document 1:
nestedobject: {
uniqueid: "12345",
field: [ {id: 1,
color: blue,
fruit:banana},
{id: 2,
color: red,
fruit:apple},
]
}
Document 2: (in same index)
nestedobject: {
uniqueid:23456,
field: [ {id: 3,
color: blue,
fruit:banana},
{id: 4,
color: blue,
fruit:banana},
]
}
the field mappings can be seen as :
{"mappings":
"nestedobject":{
"properties":{
"uniqueid":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
"field":{
"type":"nested",
"id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"color":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"fruit":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
Now, I try to query this index with 2 documents and I want only the document which has all the elements in the field array with color blue and fruit as banana - NOT ATLEAST 1.
Right now, with the query, I get returned both the documents as it matches with the first element of the first document and returns that.
How to make this possible?
{
"query": {
"nested" : {
"path" : "nestedobject.field",
"query" : {
"bool" : {
"must" : [
{ "match" : {"nestedobject.field.color" : "blue"} },
{ "match" : {"nestedobject.field.fruit" : "banana"}}
]
}
}
}
}
}
Change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"match":{
"field.color": "blue"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"match":{
"field.fruit": "banana"
}
}
}
}
]
}
}
}
Note that there are two Nested Queries inside a must clause.
Also note that, in order to make use of Exact Match, you should be using Term Queries on keyword field as shown below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"term": {
"field.color.keyword": "yellow"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"term": {
"field.fruit.keyword": "banana"
}
}
}
}
]
}
}
}
Hope that helps and if you think that solved what you are looking for, feel free to upvote and/or accept the answer by clicking on big gray check button on the left side of this answer.

How to query elasticsearch nested object using fuzziness and wildcard query

{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":1,
"max_score":1,
"hits":[
{
"_index":"event_11",
"_type":"_doc",
"_id":"1",
"_score":1,
"_source":{
"title":"Event One",
"comments":{
"author":"Alvin",
"author_id":1
}
},
"inner_hits":{
"comments":{
"hits":{
"total":1,
"max_score":1,
"hits":[
{
"_index":"event_11",
"_type":"_doc",
"_id":"1",
"_nested":{
"field":"comments",
"offset":0
},
"_score":1,
"_source":{
"author":"Alvin",
"author_id":1
}
}
]
}
}
}
}
]
}
}
I am trying to query the above data with the below wildcard query:
GET event_11/_search
{
"query": {
"nested": {
"path": "comments",
"query": {
"wildcard": {
"comments.author": "Al*"
}
}
}
}
}
The above query is giving empty result set. Can someone help me fix the search query using wildcard and fuzziness? I am using ElasticSearch 6 and Kibana to create my queries. PHP SDK is used to write queries from PHP application.
You can try this.
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"should": [
{
"wildcard": {
"comments.author": "real*"
}
},
{
"match": {
"comments.author": {
"query": "reaa",
"fuzziness": "AUTO"
}
}
}
]
}
}
}
}
}

Can i filter subarray in Elasticsearch?

I have orders and order products attached for each order as subarray in Elastic Search. When i'm aggregating Prices i need possibility to filter my order products in my documents of orders.
Example of my document in Elastic:
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"15",
"Price":"15.22",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"123",
"Price":"24.55",
},
]
}
How im filter right now:
{
"index":"order_index",
"from":0,
"size":100,
"body":{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"orderProducts.brandNo":"30"
}
}
],
}
}
}
}
}
}
What i'm expecting
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
]
}
What i'm really getting:
All document.
That is possible? To filter subarray data?
UPD.
Yes this is my schema mappings:
"mappings":{
"order":{
"dynamic_templates":[
{
"strings":{
"mapping":{
"type":"string",
"fields":{
"raw":{
"index":"not_analyzed",
"type":"string"
}
}
},
"match_mapping_type":"string"
}
}
],
"properties":{
"orderProducts":{
"include_in_parent":true,
"properties":{
"OrderProductID":{
"type":"long"
},
"OrderID":{
"type":"long"
},
"brandNo":{
"type":"long"
},
"Price":{
"type":"double"
}
},
"type":"nested"
},
"OrderID":{
"type":"long"
}
}
}
},
All right, after some experiments i discovered that that aggregation can be done like this:
{
"aggs":{
"sales":{
"nested":{
"path":"orderProducts"
},
"aggs":{
"filtered_nestedobjects":{
"filter":{
"bool":{
"must":[
{
"terms":{
"orderProducts.brandNo":[
"30"
]
}
}
]
}
},
"aggs":{
"Quantity":{
"sum":{
"field":"orderProducts.Quantity"
}
}
}
}
}
}
}
}
And the answer to main question can we filter subarray of elastic is yes. With the inner_hits only i did this.

Elasticsearch term aggregation document count issue

This the request I'm sending to ElasticSearch:
{
"aggregations":{
"followUpActivity.metainfo.metainfos.string1":{
"terms":{
"field":"metainfos.string1",
"missing":"null",
"order":{
"_count":"asc"
}
}
}
}
}
I'm asking for buckets on field metainfos.string1 and ordering them by _count. This is the response:
{
"took":7,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":3,
"max_score":1.0,
"hits":[
{
"_index":"living_v1",
"_type":"fuas",
"_id":"be9b29f3-37a5-11e6-a66a-30b5c2122322",
"_score":1.0,
"_routing":"living_team",
"_source":{
"user":"living_team",
"timestamp":"2016-06-22T11:27:25.531Z",
"metainfos":{
"string1":[
"s1", <<<<<<<<<<<<<--------------
"s2" <<<<<<<<<<<<<--------------
]
}
}
},
{
"_index":"living_v1",
"_type":"fuas",
"_id":"c3af0f64-37a5-11e6-a66a-30b5c2122322",
"_score":1.0,
"_routing":"living_team",
"_source":{
"user":"living_team",
"timestamp":"2016-06-22T12:30:01.625Z",
"metainfos":{
"string1":[
"s1", <<<<<<<<<<<<<--------------
"s2" <<<<<<<<<<<<<--------------
]
}
}
},
{
"_index":"living_v1",
"_type":"fuas",
"_id":"ee790469-48f3-11e6-9f47-30b5c2122322",
"_score":1.0,
"_routing":"living_team",
"_source":{
"user":"living_team",
"timestamp":"2016-07-13T13:33:41.231Z",
"metainfos":{
"string1":[
"s2" <<<<<<<<<<<<<--------------
]
}
}
}
]
},
"aggregations":{
"followUpActivity.metainfo.metainfos.string1":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"s2",
"doc_count":2 <<<<<<<<<<<<<--------------
},
{
"key":"s1",
"doc_count":3 <<<<<<<<<<<<<--------------
}
]
}
}
}
As you can see, there are two buckets: s1 and s2. However, s1 is present only in two documents but ES is telling me that doc_count = 3!!! Moreover, s2 is present in three documents, but ES is telling me that doc_count = 2!!!!
I'm performing it using only one node.
Any ideas?
MAPPING
{
"living_v1":{
"mappings":{
"fuas":{
"properties":{
"metainfos":{
"properties":{
"string1":{
"type":"string"
}
}
},
"timestamp":{
"type":"date",
"format":"strict_date_optional_time||epoch_millis"
},
"user":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}

Elasticsearch - Cardinality over Full Field Value

I have a document that looks like this:
{
"_id":"some_id_value",
"_source":{
"client":{
"name":"x"
},
"project":{
"name":"x November 2016"
}
}
}
I am attempting to perform a query that will fetch me the count of unique project names for each client. For this, I am using a query with cardinality over the project.name. I am sure that there are only 4 unique project names for this particular client. However, when I run my query, I get a count of 5, which I know is wrong.
The project names all contain the name of the client. For instance, if a client is "X", project names will be "X Testing November 2016", or "X Jan 2016", etc. I don't know if that is a consideration.
This is the mapping for the document type
{
"mappings":{
"vma_docs":{
"properties":{
"client":{
"properties":{
"contact":{
"type":"string"
},
"name":{
"type":"string"
}
}
},
"project":{
"properties":{
"end_date":{
"format":"yyyy-MM-dd",
"type":"date"
},
"project_type":{
"type":"string"
},
"name":{
"type":"string"
},
"project_manager":{
"index":"not_analyzed",
"type":"string"
},
"start_date":{
"format":"yyyy-MM-dd",
"type":"date"
}
}
}
}
}
}
}
This is my search query
{
"fields":[
"client.name",
"project.name"
],
"query":{
"bool":{
"must":{
"match":{
"client.name":{
"operator":"and",
"query":"ABC systems"
}
}
}
}
},
"aggs":{
"num_projects":{
"cardinality":{
"field":"project.name"
}
}
},
"size":5
}
These are the results I get (I have only posted 2 results for the sake of brevity). Please find that the num_projects aggregation returns 5, but must only return 4, which are the total number of projects.
{
"hits":{
"hits":[
{
"_score":5.8553367,
"_type":"vma_docs",
"_id":"AVTMIM9IBwwoAW3mzgKz",
"fields":{
"project.name":[
"ABC"
],
"client.name":[
"ABC systems Pvt Ltd"
]
},
"_index":"vma"
},
{
"_score":5.8553367,
"_type":"vma_docs",
"_id":"AVTMIM9YBwwoAW3mzgK2",
"fields":{
"project.name":[
"ABC"
],
"client.name":[
"ABC systems Pvt Ltd"
]
},
"_index":"vma"
}
],
"total":18,
"max_score":5.8553367
},
"_shards":{
"successful":5,
"failed":0,
"total":5
},
"took":4,
"aggregations":{
"num_projects":{
"value":5
}
},
"timed_out":false
}
FYI: The project names are ABC, ABC Nov 2016, ABC retest November, ABC Mobile App
You need the following mapping for your project.name field:
{
"mappings": {
"vma_docs": {
"properties": {
"client": {
"properties": {
"contact": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"project": {
"properties": {
"end_date": {
"format": "yyyy-MM-dd",
"type": "date"
},
"project_type": {
"type": "string"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"project_manager": {
"index": "not_analyzed",
"type": "string"
},
"start_date": {
"format": "yyyy-MM-dd",
"type": "date"
}
}
}
}
}
}
}
It's basically a subfield called raw where the same value put in project.name is put in project.name.raw but without touching it (tokenizing or analyzing it). And then the query you need to use is:
{
"fields": [
"client.name",
"project.name"
],
"query": {
"bool": {
"must": {
"match": {
"client.name": {
"operator": "and",
"query": "ABC systems"
}
}
}
}
},
"aggs": {
"num_projects": {
"cardinality": {
"field": "project.name.raw"
}
}
},
"size": 5
}

Resources