How do I split field with comma separrated and I concatenate field1 field 2 [3 word first] in processor nifi? - apache-nifi

I split field with comma separrated field 1 field 2 and concatenate field1 field 2 [3 word first]
example
2022-09-05T00:00:10,677 abc.1 ,
after split and concatenate
2022-09-05T00:00:10:677,abc.1,

You can use UpdateRecord and add a user-defined property something like /field3 set to concat( /field1, /field2 ). You can change /field3 to be whatever you want the output field name to be, and if you want to remove the other fields you can specify a schema in your Record Writer that only has the field(s) you want, such as:
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [{
"name": "field3",
"type": ["string", "null"]
}]
}

Related

Search across a searchable field in Elasticsearch

I'm looking for a way of searching across a tokenized field in Elasticsearch, so instead of returning the Elements indexed with my search, return a unique set of values that matched the best.
{
"id": 1,
"brand": [
"word1",
"another"
]
},
{
"id": 2,
"brand": [
"word2",
"word3",
"yet_another"
]
}
So searching for wo, I would recieve a list of the words word1, word2 and word3 scored, of course.
Should I create a new index for that with these values?
Is there a way I can do that work by reusing the tokenization of my index?

Create a keyword field concatenated of other fields

I've got an index with a mapping of 3 fields. Let's say f1, f2 and f3.
I want a new keyword field with the concatenation of the values of f1, f2 and f3 to be able to aggregate by it to avoid having lots of nested loops when checking the search results.
I've seen that this could be achieved by source transformation, but since elastic v5, this feature was deleted.
ElasticSearch version used: 6.5
Q: How can I archieve the concatenation in ElasticSearch v 6.5?
There was indeed source transformation prior to ES 5, but as of ES 5 there is now a more powerful feature called ingest nodes which will allow you to easily achieve what you need:
First, define an ingest pipeline using a set processor that will help you concatenate three fields into one:
PUT _ingest/pipeline/concat
{
"processors": [
{
"set": {
"field": "field4",
"value": "{{field1}} {{field2}} {{field3}}"
}
}
]
}
You can then index a document using that pipeline:
PUT index/doc/1?pipeline=concat
{
"field1": "1",
"field2": "2",
"field3": "3"
}
And the indexed document will look like:
{
"field1": "1",
"field2": "2",
"field3": "3",
"field4": "1 2 3"
}
Just make sure to create the index with the appropriate mapping for field4 prior to indexing the first document.

Nifi: Can I do mathematical operations on a json file values?

I have a JSON file as an input to a processor. Something like this:
{"x" : 10, "y" : 5}
Can I do mathematical operations on these values instead of writing a custom processor? I need to do something like
( x / y ) * 3
^ Just an example.
I need to save the result to an output file.
UPDATE:
This is my text in generateFlowFile processor:
X|Y
1|123
2|111
And this is my AVRO schema:
{
"name": "myschema",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "X" , "type": "int"},
{"name": "Y" , "type": "int"} ]
}
When I change the above types to string, it works fine but I cannot perform math operations on a string.
FYI, I have selected 'Use Schema Name Property' in Schema Access Strategy
Use QueryRecord processor.
Configure/enable Record Reader/Writer controller services
Define Avro schema to read the incoming Json.
Define Avro Schema to write the results of query in desired format.
Add new property in the query record processor as
sql
select ( x / y ) * 3 as div from FLOWFILE
The output flowfile from the query record processor will be in the configured Record Writer format.

Update Processor returns empty String: Converting string date to long in NIFI

Converting a string date with the format mentioned on the image to a number (long) but the output I get is empty string.
Using a JSON reader and writer;
where in input JSON it is a string and in the output JSON it is of type long.
Tried to keep the output JSON type as a String and tried to evaluate the following expression but that was also empty string
${DATE1.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber():toString()}
Sample data trying to convert: {"DATE1" : "2018-01-17 00:00:00"}
Tried to follow the solution on this link but still getting empty string.
Method 1: Referring to contents of flowfile:-
If you want to change the DATE1 value based on the field value from the content then you need to refer as field.value
Replacement Value Strategy
Literal Value
//DATE1
${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
Referring DATE1 value from the content, then apply expression language to it.
Avro Schema Registry:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type": "string"} ] }
Read DATE1 field value as String from the content.
JsonRecordSetWriter:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type":"long"} ] }
In SetWriter configure DATE1 as Long type.
Input:-
{"DATE1":"2018-01-17 00:00:00"}
Output:-
[{"DATE1":1516165200000}]
(or)
Method 2: Referring to attribute of the flowfile:-
if you are having DATE1 as attribute of the flowfile with value 2018-01-17 00:00:00 we are going to use DATE1 attribute instead of field.value(which refers to contents of flowfile)
Then UpdateRecord Configs would be
Replacement Value Strategy
Literal Value
//DATE1
${DATE1:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
in this Expression we are using DATE1 attribute to Update the contents of flowfile.
Both methods will result the same output.
Check value before converts to Date by using isEmpty() and ifElse().
${field.value:isEmpty():ifElse('', ${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()})}

Elasicsearch sort by inner field

I have documents that one of their field looks like the following -
"ingredients": [{
"unit": "MG",
"value": 123,
"key": "abc"
}]
And I would like to sort the different records according to the ascending value of specific ingredient. That is if I have 2 records which have use ingredient with key "abc", one with value 1 and one with value 2. The one with ingredient value 1 should appear first.
Each of those records may have more than on ingredient.
Thank you in advance!
The search query to sort will be:
{
"sort":{
"ingredients.value":{
"order":"asc"}
}}

Resources