How to copy a value from one field to other if a field exists by using ingestnode pipeline - elasticsearch

I want to create a new field called kubernetes.pod.name if fields called prometheus.labels.pod exists in the logs. I found out that from the set processor I could copy the value which is present in prometheus.labels.pod to a new field kubernetes.pod.name but I need to do this conditionally as the pod name keeps on changing.
How do i set a condition such that if field prometheus.labels.pod exists then only I need to add a new field called kubernetes.pod.name (both has the same value)
ctx.prometheus?.labels?.namespace== "name_of_namespace"
could be do similarly can we do
ctx.prometheus?.labels?.pod== "*"
to check if this field exists or not?

If the text is a string and if its required to set a condition that if it exists then best way is to use the below condition in set processor.
ctx.prometheus?.labels?.namespace!=null
This is how I implemented the above scenario by using ingest node pipeline.
"set": {
"field": "kubernetes.pod.name",
"copy_from": "prometheus.labels.pod",
"if": "ctx.prometheus?.labels?.pod!=null",
"ignore_failure": true
}

Related

Elasticsearch query subfield directly without prefix

If have a object like this in elastic search, where a is a object with some fields (dynamically mapped)
{
"a": {
"b": "b_value",
"c": "c_value"
},
}
How can use query 'b:b_value' to get matched documents without have to specify 'a.b:b_value'?
I tried searching online but none of them work, is this possible?
You can use field alias.
An alias mapping defines an alternate name for a field in the index. The alias can be used in place of the target field in search requests, and selected other APIs like field capabilities.
https://www.elastic.co/blog/introducing-field-aliases-in-elasticsearch

Apache NiFi: Add column to csv using mapped values

A csv is brought into the NiFi Workflow using a GetFile Processor. I have a column consisting of a "id". Each id means a certain string. There are around 3 id's. For an example if my csv consists of
name,age,id
John,10,Y
Jake,55,N
Finn,23,C
I am aware that Y means York, N means Old and C means Cat. I want a new column with a header named "nick" and have the corresponding nick for each id.
name,age,id,nick
John,10,Y,York
Jake,55,N,Old
Finn,23,C,Cat
Finally I want a csv with the extra column and the appropriate data for each record. How is this possible Using Apache NiFi. Please advice me on the processors that must be used and the configurations that must be changed in order to accomplish this task.
Flow:
add a new nick column
copy over the id to the nick column
look at each line and match id with it's corresponding value
set this value into current line in the nick column
You can achieve this using either ReplaceText or ReplaceTextWithMapping. I do it with ReplaceText:
UpdateRecord will parse the csv file, add the new column and copy the id value:
Create a CSVReader and keep the default properties. Create a CSVRecordSetWriter and set Schema access strategy to Schema Text. Set Schema Text property to
{
"type":"record",
"name":"foobar",
"namespace":"my.example",
"fields":[
{
"name":"name",
"type":"string"
},
{
"name":"age",
"type":"int"
},
{
"name":"id",
"type":"string"
},
{
"name":"nick",
"type":"string"
}
]
}
Notice that it has the new column. Finally replace the original values with the mapping:
PS: I noticed you are new to SO, welcome! You have not accepted a single answer in any of your previous questions. Accept them, if they solve your problem, as it will help others to find solutions.

Logstash "add_field" saves "%{...}" as value when key value pair missing in JSON

add_field => {"ExampleFieldName" => "%{[example][jsonNested1][jsonNested2]}"}
My Logstash receives a JSON from Filebeat, which contains object example, which itself contains object jsonNested1, which contains a key value pair (with the key being jsonNested2).
If jsonNested1 exists and jsonNested2 exists and contains a value, then this value will be saved correctly in ExampleFieldName in Elasticsearch.
{
"example": {
"jsonNested1": {
"jsonNested2": "exampleValue"
}
}
}
In this case ExampleFieldName would contain exampleValue.
{
"example": {
"jsonNested1": {
}
}
}
In this case I would like ExampleFieldName to contain an empty string or no value at all (or to be not created in the first place).
But it happens that ExampleFiledName contains the string %{[example][jsonNested1][jsonNested2]}.
I already found a solution for this by checking first if the the nested key value pair exists before performing the add_field.
if [example][jsonNested1][jsonNested2] {
mutate {
add_field => {"ExampleFieldName" => "%{[example][jsonNested1][jsonNested2]}"}
}
}
This solution works, but I can't believe this is the best way to do it. I find it very strange that Logstash even saves %{[example][jsonNested1][jsonNested2]} as a string here, when the key value pair doesn't exist. I would expect it to recognize this and to simply not save any value in this case.
The if statement is an acceptable solution if have to check for one field. But currently I'm working on a Logstash config with around 50 fields. Should I create 50 if statements there?
You may be able to fix this using a prune filter, where the default value of blacklist_names is to remove unresolved field references.

Delete empty attributes in NiFi

Because of this issue being still unresolved, I have an EvaluateJsonPath processor that sometimes outputs attributes with empty strings.
Is there a straight-forward way to delete attributes from a flowfile?
I tried using the UpdateAttributes processor, but it only is able to delete based on matching an attribute's name (I need to match on the attribute's value).
you can use ExecuteGroovyScript 1.5.0 processor with the following code:
def ff=session.get()
if(!ff)return
def emptyKeys = ff.getAttributes().findAll{it.value==null || it.value==''}.collect{it.key}
ff.removeAllAttributes(emptyKeys)
REL_SUCCESS<<ff
After EvaluateJsonPath processor use RouteonAttribute processor and check the attributes having isEmpty values in them using Expression Language
Routeonattribute configs:-
Add new property
emptyattribute
${anyAttribute("id","age"):isEmpty()}
by using or funtion
${id:isEmpty():or(${age:isEmpty()})}
in the above expression language we are checking any id, age attribute having empty values for them and routing them to emptyattribute relation.
${allAttributes("id","age"):isEmpty()}
By using and function
${id:isEmpty():and(${age:isEmpty()})}
this expression routes only when both id,age attributes are empty.
Use Empty Relationship and connect that to Update Attribute processor and delete the attributes that you want to delete.
UpdateAttributeConfigs:-
in delete attribute expression mentioned id,age attributes need to delete.
By using RouteonAttribute after evaljsonpath processor we can check the required attributes are having values or not, then by using updateattribute we can delete the attributes that having empty values.
You can use a jolt transform, but i can only get it to work for fields at the top level of the json. Any nested fields are lost, although perhaps some real jolt expert can improve on the solution to stop that happening.
[
{
"operation": "shift",
"spec": {
"*": {
"": "TRASH",
"*": {
"$": "&2"
}
}
}
},
{
"operation": "remove",
"spec": {
"TRASH": ""
}
}
]
once you validate the required attribute values having empty strings, then make use of UpdateAttribute Advanced Usage and check the required attribute values having empty strings, then change the value to null.For Advance usage of Update attribute refer to this link community.hortonworks.com/questions/141774/… Add Rule: idnull Conditions:- ${id:isEmpty():or(${id:isNull()})} Actions:- id(attribute) null(value) – Shu Feb 3 '18 at 3:08
The approach does not remove attribute but just set the attribute value as null.

Differentiating fields that didn't exist before schema change with Elastic Search

I'm trying to add a field to an elastic search schema. I already have about a million records in the index which don't have the field and I need to be able to differentiate those from the ones that are added after the field is. Using the modified date is the absolute last resort because I don't know when this will be turned on in production.
What I considered trying was the old records return something like
{
myField: null
}
and the new ones would return
{
myField: { }
}
But can't find a way to set the field on insert.

Resources