ElasticSearch query fields in disabled object - elasticsearch

I have an Elastic Search 6.8.7 cluster.
I have a column with this mapping:
"event_object": { "enabled": false, "type": "object" }
I want to search for records that match certain other criteria, and also have a particular value for a particular field field in this object.
So far, I have tried variations of doing a normal search for the indexed fields, and a filter script for the unindexed ones:
GET /my_index/_search
{
"query":{
"bool":{
"must":{
"query_string": {
"query": "foo:bar"
}
},
"filter": {
"script": {
"script": {
"source": "doc[\"event_object\"][\"state\"].value == \"R\""
}
}
}
}
},
"terminate_after":1000,
"from":0,
"size":1000
}
Which is a hodgepodge of testing myself forwards based on google searches. But I can't get things to even compile, let alone run and filter.

It is not possible to access the content of JSON objects that have enabled: false. From the official documentation:
Elasticsearch skips parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way
So even scripting will not help here.
However, there's one way to access this disabled data from scripting in a terms aggregation (using the include parameter and a top_hitssub-aggregation):
POST test/_search
{
"query": {
"match_all": {}
},
"aggs": {
"state": {
"terms": {
"script": "params._source.event_object.state",
"size": 100,
"include": "R"
},
"aggs": {
"hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
And you'd get a response like this one:
"aggregations" : {
"state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "R",
"doc_count" : 1,
"hits" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"event_object" : {
"state" : "R"
},
"test" : "hello"
}
}
]
}
}
}
]
}
}

Related

multisearch API via curl

I've been looking at the documentation for multisearch API with the objective of exporting specific values of a field in Elasticsearch for a given time period.
I still haven't figured out a way of getting all the results of fieldA for the past 24h while applying a filter of filter: KEY
Is this possible to do via curl request to the Elasticsearch endpoint? Running 7.7.0.
You can use term query to filter on values and a range query to get values more than a date.
Terms aggregation will give all values for a field. If you just need documents you can skip this part.
Query:
{
"query": {
"bool": {
"filter": [
{ --> to filer on a value
"term": {
"fieldA.keyword": "A"
}
},
{
"range": {
"timestamp": {
"gte": "now-24h/h" --> within 24 hr from now
}
}
}
]
}
},
"aggs": {
"fieldA": {
"terms": { --> term aggregation
"field": "fieldA.keyword",
"size": 10
}
}
}
}
Result:
"hits" : [
{
"_index" : "index57",
"_type" : "_doc",
"_id" : "Uf4aOnIBRc7WtBUiRs6e",
"_score" : 0.0,
"_source" : {
"timestamp" : "2020-05-21",
"fieldA" : "A"
}
}
]
},
"aggregations" : {
"fieldA" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 1
}
]
}
}

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

combine output of a a first filter as input of a second filter

We have an elasticsearch instance with entries with two tagged fields.
sessionid
message
In a first filter, I find all entries where the message contains a certain substring. Each of those entries contains a sessionid,
In a second filter, I want to find all messages, where the sessionid matches one of the sessionids returned by the first filter. This filter should go through all entries a second time.
Example, in the log below (sessionid;message)
1234;miss 1
2456;miss 2
1234;match
When filtering for the string "match" in the message part, I would get as output of the combined query:
1234;miss 1
1234;match
We are using KQL.
Background: We want an easy way to follow complete flows with an error-string in a message, in a multithreaded environment.
I understand why you'd want to do that in one go but it's not possible in ElasticSearch. You cannot "revisit" documents which you've already ruled out by a different query -- searching for match would disqualify all misss.
It's unfortunate you have the log message combined with the ID but you can try this:
Find all that match match (pun intended) -- I'm assuming you do have a keyword field available
GET your_index/_search
{
"query": {
"regexp": {
"separated_msg.keyword": ".*\\;match.*"
}
}
}
Post-process the hits and extract the session IDs
Run session ID matching:
GET your_index/_search
{
"query": {
"regexp": {
"separated_msg.keyword": "1234;.*"
}
}
}
or on multiple IDs using a bool should:
GET your_index/_search
{
"query": {
"bool": {
"should": [
{
"regexp": {
"separated_msg.keyword": "1234;.*"
}
},
{
"regexp": {
"separated_msg.keyword": "4567;.*"
}
}
]
}
}
}
If a unique numeric value can be assigned to each message ex 1 for "match", 2 for "miss 1" then bucket selector and top_hits can be used.
{
"size": 0,
"aggs": {
"sessionid": {
"terms": {
"field": "sessionid", --> first get all unique sessionids
"size": 10
},
"aggs": {
"documents":{
"top_hits": {
"size": 10
}
},
"messageid": {
"terms": {
"field": "messageid", ---> get unique sessionId
"size": 10
},
"aggs": {
"matching_messageid": { ---> select a bucket with key(message Id) as 2
"bucket_selector": {
"buckets_path": {
"key": "_key"
},
"script": "params.key==2"
}
}
}
},
"my_bucket": {
"bucket_selector": {
"buckets_path": {
"hits": "messageid._bucket_count"
},
"script": "params.hits>0"--> if bucket not empty then consider that sessionid
}
}
}
}
}
}
Result
"aggregations" : {
"sessionid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1234,
"doc_count" : 2,
"documents" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index31",
"_type" : "_doc",
"_id" : "MTAYpnABheSAx2q_eNEF",
"_score" : 1.0,
"_source" : {
"sessionid" : 1234,
"message" : "miss 1",
"messageid" : 1
}
},
{
"_index" : "index31",
"_type" : "_doc",
"_id" : "MjAYpnABheSAx2q_n9FW",
"_score" : 1.0,
"_source" : {
"sessionid" : 1234,
"message" : "match",
"messageid" : 2
}
}
]
}
},
"messageid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2,
"doc_count" : 1
}
]
}
}
]
}
}
If a given message has timestamp(max/min) then max_path can be used to select buckets with given messages.
The best approach to above problem will be to use nested documents
{
"sessionid":1234,
"messages":[
{
"message":"match"
},
{
"message":"miss 1"
}
]
}
````
then the problem can be resolved by nested query. If logstash is used then above structure can generated while indexing.

access query value from function_score to compute new score

I need to customize ES score. The score function I need to implement is:
score = len(document_term) - len(query_term)
For instance, one of my document in the ES index is :
{
"name": "foobar"
}
And the search query
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "foo"
}
}
},
"functions": [
{
"script_score": {
"script": {
"source": "doc['name'].value.length() - ?LEN(query_tem)?"
}
}
}
],
"boost_mode": "replace"
}
}
}
The above search should provide a score of 6 - 3 = 3. But I didn't find a solution to get access the value of the query term.
Is it possible to access the value of the query term in a function_score context ?
There is no direct way to do this, however you can achieve that in the below way where you would need to add the query parameters in two different parts of the query.
Before that one important note, you cannot apply the doc['myfield'].value if the field is of type text, instead you would need to have its sibling field created as keyword and refer that in the script, which again I've mentioned below:
Mapping:
PUT myindex
{
"mappings" : {
"properties" : {
"myfield" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Sample Document:
POST myquery/_doc/1
{
"myfield": "I've become comfortably numb"
}
Query:
POST <your_index_name>/_search
{
"query": {
"function_score": {
"query": {
"match": {
"myfield": "numb"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "return doc['myfield.keyword'].value.length() - params.myquery.length()",
"params": {
"myquery": "numb" <---- Add the query string here as well
}
}
}
}
],
"boost_mode": "replace"
}
}
}
Response:
{
"took" : 558,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 24.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 24.0,
"_source" : {
"myfield" : "I've become comfortably numb"
}
}
]
}
}
Hope this helps!

How can I get element at a particular index in elasticsearch?

I have stored three json objects in elasticsearch, each object has a title and projects array.
{"name": "haris","projects": [{"title": "Splunk"},{"title": "QRadar"},{"title": "LogAnalysis"}]}
{"name": "khalid","projects": [{"title": "MS"},{"title": "Google"},{"title": "Apple"}]}
{"name": "Hamid","projects": [{"title": "Toyota"},{"title": "Honda"},{"title": "Kia"}]}
I have written a query to extract a particular object by _id and its specific property projects
curl -XGET 'localhost:9200/jsontest/_search?pretty' -d '{"query" : { "match" : {"_id":"AV1kzzZqAzHWQ2S7B8f1"} }, "_source": ["projects"]}'
As expected it returns projects object
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "jsontest",
"_type" : "json",
"_id" : "AV1kzzZqAzHWQ2S7B8f1",
"_score" : 1.0,
"_source" : {
"projects" : [{"title" : "Splunk"},{"title" : "QRadar"},{"title" : "LogAnalysis"}
]
}
}
]
}
}
Question: is there a way to retrieve value at a particular index of projects? This is dummy data, in my real scenario projects can have a large number of elements and each element itself is a json object with a lot of properties. I only need to retrieve value at certain index of projects.
Here is what i would do.
First the mapping
PUT test/my_objects/_mapping
{
"properties": {
"name":{
"type": "string",
"index": "not_analyzed"
},
"projects": {
"type": "nested"
}
}
}
Second Projects are indexed
PUT test/my_objects/1111
{
"name": "haris",
"projects": [
{"title": "Splunk"},
{"title": "QRadar"},
{"title": "LogAnalysis"}
]
}
Finally the aggregation query
GET test/my_objects/_search
{
"aggs": {
"by_name": {
"terms": {
"field": "name"
},
"aggs": {
"by_project": {
"nested": {
"path": "projects"
},
"aggs": {
"by_title": {
"terms": {
"field": "projects.title"
}
}
}
}
}
}
}
}
its not tested and a bit tedious because of the nested aggs but should work if you manipulate it further for you requirements

Resources