How do I get just enough data in a list in Elasticsearch - elasticsearch

Say I have a doc in Elasticsearch Index like below
{
"data": [
{
"color": "RED",
"qty": 3
},
{
"color": "BLACK",
"qty": 1
}, {
"color": "BLUE",
"qty": 0
}
]
}
I just need color BLACK.
Is there any way to get just enough data back like below.
{
"data": [
{
"color": "BLACK",
"qty": 1
}
]
}

You can use script field to generate new field which have only specific value from array. Below is sample query:
{
"_source": {
"excludes": "data"
},
"query": {
"match_all": {}
},
"script_fields": {
"address": {
"script": {
"lang": "painless",
"source": """
List li = new ArrayList();
if(params['_source']['data'] != null)
{
for(p in params['_source']['data'])
{
if( p.color == 'BLACK')
li.add(p);
}
}
return li;
"""
}
}
}
}
Response:
"hits" : [
{
"_index" : "sample1",
"_type" : "_doc",
"_id" : "tUc6338BMCbs63yKTqj_",
"_score" : 1.0,
"_source" : { },
"fields" : {
"address" : [
{
"color" : "BLACK",
"qty" : 1
}
]
}
}
]

Elasticsearch returns whole document if there is any match to any field. If you want to find matched nested document, you can make data array nested in your index mapping, write a nested query that filters data.color by its value, which is BLACK in your case and use inner_hits to find the matched nested documents.
You can use source filtering to retrieve only fields you want but it will only filter by fields, not by field values.

Related

Is it possible to sort by substring of value in Elasticsearch (Opensearch)?

Let's say I have these documents:
[
{
"_index" : "index_name",
"_type" : "_doc",
"_id" : "some_obj.{RANDOM UUID}:{REAL ID}",
"_source" : {
"id" : "some_obj.{RANDOM UUID}:{REAL ID}",
"parent_id": "{SOME ID}"
"name" : "{some name}"
}
},
{ ... }
]
I trying to get these objects by using query:
GET index_name/_search
{
"sort" : [
{ "id": {"order": "asc"}}
],
"query": {
"terms": {
"parent_id": [ "1" ]
}
}
}
Because of {RANDOM UUID} in the middle of id field I can't order by {REAL ID}
Is it possible to remove some_obj.{RANDOM UUID}: part for sorting or split it by : and use second part for sorting or some another way to be able sort by {REAL ID}?
Following query should do it. I assumed id field in your mapping is keyword and {REAL ID}is numeric. If it is a string, you should change type in the query to string and remove Long.parse from the return statement.
{
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
return Long.parseLong(doc['id'].value.splitOnToken(':')[1]);
"""
},
"order": "asc"
}
}
],
"query": {
"term": {
"parent_id": {
"value": "1"
}
}
}
}

Elasticsearch nested sort based on minimum values of child of child arrays

I've two orders and these orders have multiple shipments and shipments have multiple products.
How can I sort the orders based on the minimum product.quantity in a shipment?
For example. When ordering ascending, orderNo = 2 should be listed first because it has a shipment that contains a product.quantity=1. (This is the minimum value among all product.quantity values. (productName doesn't matter)
{
"orders": [
{
"orderNo": "1",
"shipments": [
{
"products": [
{
"productName": "AAA",
"quantity": "2"
},
{
"productName": "AAA",
"quantity": "2"
}
]
},
{
"products": [
{
"productName": "AAA",
"quantity": "3"
},
{
"productName": "AAA",
"quantity": "6"
}
]
}
]
},
{
"orderNo": "2",
"shipments": [
{
"products": [
{
"productName": "AAA",
"quantity": "1"
},
{
"productName": "AAA",
"quantity": "6"
}
]
},
{
"products": [
{
"productName": "AAA",
"quantity": "4"
},
{
"productName": "AAA",
"quantity": "5"
}
]
}
]
}
]
}
Assuming that each order is a separate document, you could create an order-focused index where both shipments and products are nested fields to prevent array flattening.
The minimal index mapping could then look like:
PUT orders
{
"mappings": {
"properties": {
"shipments": {
"type": "nested",
"properties": {
"products": {
"type": "nested"
}
}
}
}
}
}
The next step is to ensure the quantity is always numeric -- not a string. When that's done, insert said docs:
POST orders/_doc
{"orderNo":"1","shipments":[{"products":[{"productName":"AAA","quantity":2},{"productName":"AAA","quantity":2}]},{"products":[{"productName":"AAA","quantity":3},{"productName":"AAA","quantity":6}]}]}
POST orders/_doc
{"orderNo":"2","shipments":[{"products":[{"productName":"AAA","quantity":1},{"productName":"AAA","quantity":6}]},{"products":[{"productName":"AAA","quantity":4},{"productName":"AAA","quantity":5}]}]}
Finally, you can use nested sorting:
POST orders/_search
{
"sort": [
{
"shipments.products.quantity": {
"nested": {
"path": "shipments.products"
},
"order": "asc"
}
}
]
}
Tip: To make the query even more useful, you could introduce sorted inner_hits to not only sort the top-level orders but also the individual products enclosed in a given order. These inner hits need a nested query so you could simply add a non-negative condition on shipments.products.quantity.
When you combine this query with the above sort and restrict the response to only relevant attributes with filter_path:
POST orders/_search?filter_path=hits.hits._id,hits.hits._source.orderNo,hits.hits.inner_hits.*.hits.hits._source
{
"_source": ["orderNo", "non_negative_quantities"],
"query": {
"nested": {
"path": "shipments.products",
"inner_hits": {
"name": "non_negative_quantities",
"sort": {
"shipments.products.quantity": "asc"
}
},
"query": {
"range": {
"shipments.products.quantity": {
"gte": 0
}
}
}
}
},
"sort": [
{
"shipments.products.quantity": {
"nested": {
"path": "shipments.products"
},
"order": "asc"
}
}
]
}
you'll end up with both sorted orders AND sorted products:
{
"hits" : {
"hits" : [
{
"_id" : "gVc0BHgBly0XYOUcZ4vd",
"_source" : {
"orderNo" : "2" <---
},
"inner_hits" : {
"non_negative_quantities" : {
"hits" : {
"hits" : [
{
"_source" : {
"quantity" : 1, <---
"productName" : "AAA"
}
},
{
"_source" : {
"quantity" : 4, <---
"productName" : "AAA"
}
},
{
"_source" : {
"quantity" : 5, <---
"productName" : "AAA"
}
}
]
}
}
}
},
{
"_id" : "gFc0BHgBly0XYOUcYosz",
"_source" : {
"orderNo" : "1"
},
"inner_hits" : {
"non_negative_quantities" : {
"hits" : {
"hits" : [
{
"_source" : {
"quantity" : 2,
"productName" : "AAA"
}
},
{
"_source" : {
"quantity" : 2,
"productName" : "AAA"
}
},
{
"_source" : {
"quantity" : 3,
"productName" : "AAA"
}
}
]
}
}
}
}
]
}
}

search first element of a multivalue text field in elasticsearch

I want to search first element of array in documents of elasticsearch, but I can't.
I don't find it that how can I search.
For test, I created new index with fielddata=true, but I still didn't get the response that I wanted
Document
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
Values
name : ["John", "Doe"]
My request
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['name'][0]=params.param1",
"params" : {
"param1" : "john"
}
}
}
}
}
}
}
Incoming Response
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
You can use the following script that is used in a search request to return a scripted field:
{
"script_fields": {
"firstElement": {
"script": {
"lang": "painless",
"inline": "params._source.name[0]"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64391432",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"firstElement": [
"John" <-- note this
]
}
}
]
You can use a Painless script to create a script field to return a customized value for each document in the results of a query.
You need to use equality equals operator '==' to COMPARE two
values where the resultant boolean type value is true if the two
values are equal and false otherwise in the script query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"name":{
"type":"text",
"fielddata":true
}
}
}
}
Index data:
{
"name": [
"John",
"Doe"
]
}
Search Query:
{
"script_fields": {
"my_field": {
"script": {
"lang": "painless",
"source": "params['_source']['name'][0] == params.params1",
"params": {
"params1": "John"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_field": [
true <-- note this
]
}
}
]
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
You can use the script as shown in my another answer if you want to just compare the value of the first element of the array to some other value. But based on your comments, it looks like your use case is quite different.
If you want to search the first element of the array you need to convert your data, into nested form. Using arrays of object at search time you can’t refer to “the first element” or “the last element”.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "nested"
}
}
}
}
Index Data:
{
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
{
"booking_id": 1,
"name": [
{
"first": "Adam Simith",
"second": "John Doe"
}
]
}
{
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name.first": "John Doe"
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9400072,
"_source": {
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.9400072,
"_source": {
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
}
]

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

How can I find all documents in elasticsearch that contain a number in a certain field?

I have a keyword type'd field that can contain either a number or a string. If the field does not contain any letters, I would like to hit on that document. How can I do this?
My index mapping looks like:
{
"mappings": {
"Entry": {
"properties": {
"testField": {
"type": "keyword"
}
}
}
}
}
My documents look like this:
{
"testField":"123abc"
}
or
{
"testField": "456789"
}
I've tried the query:
{
"query": {
"range": {
"gte": 0,
"lte": 2000000
}
}
}
but it stills hits on 123abc. How can I design this so that I only hit on the documents with a number in that particular field?
There is another more optimal option for achieving exactly what you want. You can leverage the ingest API pipelines and using a script processor you can create another numeric field at indexing time that you can then use more efficiently at search time.
The ingestion pipeline below contains a single script processor which will create another field called numField that will only contain numeric values.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
ctx.numField = /\D/.matcher(ctx.testField).replaceAll("");
"""
}
}
]
},
"docs": [
{
"_source": {
"testField": "123"
}
},
{
"_source": {
"testField": "abc123"
}
},
{
"_source": {
"testField": "123abc"
}
},
{
"_source": {
"testField": "abc"
}
}
]
}
Simulating this pipeline with 4 different documents having a mix of alphanumeric content, will yield this:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "abc123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "",
"testField" : "abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
}
]
}
After indexing your documents using this pipeline, you can run your range query on numField instead of testField. Compared to the other solution (sorry #Kamal), it will shift the scripting burden to run only once per document at indexing time, instead of everytime on every document at search time.
{
"query": {
"range": {
"numField": {
"gte": 0,
"lte": 2000000
}
}
}
}
Afaik, Elasticsearch does not have a direct solution for this.
Instead you would need to write a Script Query. Below is what you are looking for:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"source": """
try{
String temp = doc['testField'].value;
int a = Integer.parseInt(temp);
if(a instanceof Integer)
return true;
}catch(NumberFormatException e){
return false;
}
"""
}
}
}
]
}
}
}
Hope it helps!

Resources