Logstash. Nested object mapping with ruby script - ruby

I have a problem with nested object mapping by ruby filter plugin.
My object should have field cmds which is array of objects like this:
"cmds": [
{
"number": 91,
"errors": [],
"errors_count": 0
},
{
"number": 92,
"errors": ["ERROR_1"],
"errors_count": 1
}]
By elasticsearch I need to find objects where number = 91 and error_count > 0. So object above shoudn`t be correct result. But my query (below) matches it.
GET /logs/default/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"cmds.number": 91
}
},
{
"range": {
"cmds.errors_count": {
"gt": 0
}
}
}]}}
I know it because JSON document is flattened into a simple key-value format and I should mapp the cmds field as type nested instead of type object.
The problem is I have no idea how to do it in my logstash ruby script with event.set
I have folowing code:
for t in commandTexts do
commandv = Command.new(t)
if i==0
event.set("[cmds]", ["[number]" => commandv.hexnumber,
"[command_text]" => commandv.command_text,
"[errors]" => commandv.errors,
"[has_error]" => commandv.has_error,
"[errors_count]" => commandv.errors_count])
else
event.set("[cmds]", event.get("cmds") + ["[number]" => commandv.hexnumber,
"[command_text]" => commandv.command_text,
"[errors]" => commandv.errors,
"[has_error]" => commandv.has_error,
"[errors_count]" => commandv.errors_count])
end
i+=1
end
end
I`m new in ruby and my code is not perfect, but "cmds" field look fine in elastic search. The only problem is that is not nested. Please help.

Ok, i did it. I`m still new in ELK, and sometimes I'm confused where (logstash/kibana/scripts in ruby) i should do what needed.
My code is okey. Using kibana, I deleted my index, and make a new one with correct mapping
code:
PUT /logs?pretty
{
"mappings": {"default": {
"properties": {
"cmds" : {
"type" : "nested",
"properties": {
"command_text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"errors": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"errors_count": {
"type": "long"
},
"has_error": {
"type": "boolean"
},
"number": {
"type": "long"
}
}
}
}
}}
}
Earlier I was trying to create new index just by setting "type" as "nested"
PUT /logs?pretty
{
"mappings": {"default": {
"properties": {
"cmds" : {
"type" : "nested"
}
}
}}}
But it wasn`t working correctly ("cmds" field was not added to elasticsearch) so I done it by full mapping (all properties).

Related

Elasticsearch: How to define Contexts property of nested completion field?

I've got the following mapping for an ES index (I'm not including config for analyzer and other things):
{
"mappings": {
"properties": {
"topCustomer": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"topCustomer_suggest": {
"type": "completion",
"contexts": [
{
"name": "index_name",
"type": "category"
}
]
},
"customer": {
"type": "nested",
"include_in_root": "true",
"properties": {
"customer_name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"customer_name_suggest": {
"type": "completion",
"contexts": [
{
"name": "index_name",
"type": "category"
}
]
}
}
},
"customer_level": {
"type": "integer"
}
}
}
}
}
}
Also, I have the following logstash configuration file:
input {
jbdc {
//Input config
}
}
filter {
mutate {
remove_field => ["#version"]
}
ruby {
code => "
input = event.get('topCustomer').strip.gsub(/[\(\)]+/, '').split(/[\s\/\-,]+/);
event.set('[topCustomer_suggest][input]', input);
contexts = { 'index_name' => [event.get('type')] };
event.set('[topCustomer_suggest][contexts]', contexts);
input = event.get('[customer][cutomer_name]').strip.gsub(/[\(\)]+/, '').split(/[\s\/\-,]+/);
event.set('[customer][customer_name][fields][customer_name_suggest][input]', input);
contexts = { 'index_name' => [event.get('type')] };
event.set('[customer][customer_name][fields][customer_name_suggest][contexts]', contexts);
"
}
}
output {
elasticsearch {
index => "%{type}"
manage_template => false
hosts => ["localhost:9200"]
}
}
Now, when I try to refresh my index, to apply the changes that I made to one of these files, I get the following error:
Could not index event to Elasticsearch ...
:response=>{"index"=>{"index"=>"customers", "_type"=>"_doc",
"_id"=>"...", "status"=>400,
"error"=>{"type"=>"illegal_argument_exception", "reason"=>"Contexts
are mandatory in context enabled completion field
[customer.customer_name.customer_name_suggest]"}}}}
I tried to modify my config file so that the set events (in the ruby filter section) match the format that the error displays to access the field; I also tried many more combinations to see if this was causing the error.
As you can see, I defined another completion field in the mapping. This field works as expected. The difference is that this is not a nested field.
Notice that the customer_name_suggest is a sub-field and not an 'independent' field like the topCustomer_suggest field. Is this the correct way of doing it or should I not make customer_name_suggest a sub field? I really don't understand why I'm getting the error as I'm defining the contexts property in the mapping.

Elasticsearch remove a field from an object of an array in a dynamically generated index

I'm trying to delete fields from an object of an array in Elasticsearch. The index has been dynamically generated.
This is the mapping:
{
"mapping": {
"_doc": {
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"result": {
"properties": {
"resultid": {
"type": "long"
},
"resultname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
}
this is a document:
{
"result": [
{
"resultid": 69,
"resultname": "SFO"
},
{
"resultid": 151,
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
My goals is to remove all the fields resultid in result in all the document of the index. After update the document should look like this:
{
"result": [
{
"resultname": "SFO"
},
{
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
I tried using the following articles on stackoverflow but with no luck:
Remove elements/objects From Array in ElasticSearch Followed by Matching Query
remove objects from array that satisfying the condition in elastic search with javascript api
Delete nested array in elasticsearch
Removing objects from nested fields in ElasticSearch
Hopefully someone can help me find a solution.
You should reindex your index in a new one with _reindex API and call a script to remove your fields :
POST _reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-reindex"
},
"script": {
"source": """
for (int i=0;i<ctx._source.result.length;i++) {
ctx._source.result[i].remove("resultid")
}
"""
}
}
After you can delete your first index :
DELETE my-index
And reindex it :
POST _reindex
{
"source": {
"index": "my-index-reindex"
},
"dest": {
"index": "my-index"
}
}
I combined the answer from Luc E with some of my own knowledge in order to reach a solution without reindexing.
POST INDEXNAME/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
{
"script": {
"source": "for (int i=0;i<ctx._source.result.length;i++) { ctx._source.result[i].remove(\"resultid\")}"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "result.id"
}
}
]
}
}
}
Thanks again Luc!
If your array has more than one copy of element you want to remove. Use this:
ctx._source.some_array.removeIf(tag -> tag == params['c'])

How to make Elasticsearch aggregation only create 1 bucket?

I have an Elasticsearch index which contains a field called "host". I'm trying to send a query to Elasticsearch to get a list of all the unique values of host in the index. This is currently as close as I can get:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {"field": "host"}
}
}
}
Which returns:
"buckets": [
{
"key": "04",
"doc_count": 201
},
{
"key": "cyn",
"doc_count": 201
},
{
"key": "pc",
"doc_count": 201
}
]
However the actual name of the host is 04-cyn-pc. My understanding is that it is spliting them up into keywords so I try something like this:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keyword",
"fielddata": true
}
}
}
}
}
But it returns illegal_argument_exception "reason": "Mapper for [host.raw] conflicts with existing mapping in other types:\n[mapper [host.raw] has different [index] values, mapper [host.raw] has different [analyzer]]"
As you can probably tell i'm very new to Elasticsearch and any help or direction would be awesome, thanks!
Try this instead:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
Elastic automatically indexes string fields as text and keyword type if you do not specify a mapping. In your example if you do not want your field to be analyzed for full text search, you should just define that fields' type as keyword. So you can get rid of burden of analyzed text field. With the mapping below you can easily solve your problem without changing your agg query.
"properties": {
"host": {
"type": "keyword"
}
}

elasticsearch query child list containing specific value

I writing a query to return the products that has a specific promotionCode. In my index, product has following property indexed
"offers": [
{
"promotionCode": "MV"
},
{
"promotionCode": "LI"
},
.....
]
My initial thought the following would be the answer to
GET alias-live-dev/_search
{
"query": {
"match": {
"offers.promotionCode":"MV"
}
}
}
However, this always return 0 hit, I am guessing, it failed because offers is a list. Could anyone please advise what would the right query for this scenario. Thanks in advance.
In mapping,
"productId": {
"type": "keyword"
},
"offers": {
"type": "nested",
"properties": {
......
"promotionCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Resources