Update Processor returns empty String: Converting string date to long in NIFI - apache-nifi

Converting a string date with the format mentioned on the image to a number (long) but the output I get is empty string.
Using a JSON reader and writer;
where in input JSON it is a string and in the output JSON it is of type long.
Tried to keep the output JSON type as a String and tried to evaluate the following expression but that was also empty string
${DATE1.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber():toString()}
Sample data trying to convert: {"DATE1" : "2018-01-17 00:00:00"}
Tried to follow the solution on this link but still getting empty string.

Method 1: Referring to contents of flowfile:-
If you want to change the DATE1 value based on the field value from the content then you need to refer as field.value
Replacement Value Strategy
Literal Value
//DATE1
${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
Referring DATE1 value from the content, then apply expression language to it.
Avro Schema Registry:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type": "string"} ] }
Read DATE1 field value as String from the content.
JsonRecordSetWriter:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type":"long"} ] }
In SetWriter configure DATE1 as Long type.
Input:-
{"DATE1":"2018-01-17 00:00:00"}
Output:-
[{"DATE1":1516165200000}]
(or)
Method 2: Referring to attribute of the flowfile:-
if you are having DATE1 as attribute of the flowfile with value 2018-01-17 00:00:00 we are going to use DATE1 attribute instead of field.value(which refers to contents of flowfile)
Then UpdateRecord Configs would be
Replacement Value Strategy
Literal Value
//DATE1
${DATE1:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
in this Expression we are using DATE1 attribute to Update the contents of flowfile.
Both methods will result the same output.

Check value before converts to Date by using isEmpty() and ifElse().
${field.value:isEmpty():ifElse('', ${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()})}

Related

How do I split field with comma separrated and I concatenate field1 field 2 [3 word first] in processor nifi?

I split field with comma separrated field 1 field 2 and concatenate field1 field 2 [3 word first]
example
2022-09-05T00:00:10,677 abc.1 ,
after split and concatenate
2022-09-05T00:00:10:677,abc.1,
You can use UpdateRecord and add a user-defined property something like /field3 set to concat( /field1, /field2 ). You can change /field3 to be whatever you want the output field name to be, and if you want to remove the other fields you can specify a schema in your Record Writer that only has the field(s) you want, such as:
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [{
"name": "field3",
"type": ["string", "null"]
}]
}

Transform date format inside CSV using Apache Nifi

I need to modify CSV file in Apache Nifi environment.
My CSV looks like file:
Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...
I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).
Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?
You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):
{
"type": "record",
"name": "ExampleCSV",
"namespace": "Stackoverflow",
"fields": [
{"name": "AdvertiserID", "type": "string"},
{"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignName", "type": "string"}
]
}
To configure the reader, set the following properties:
Schema Access Strategy = Use 'Schema Text' property
Schema Text = (Above codeblock)
Treat First Line as Header = True
Timestamp Format = "MM/dd/yyyy hh:mm:ss a"
Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.
Ignore CSD Header Column Names = True
Then in your CSVRecordSetWriter service you can specify the following:
Schema Access Strategy = Inherit Record Schema
Timestamp Format = "yyyy-MM-dd"
You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.
Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.

Nifi - atttributes to json - not able to generate the required json from an attribute

The flowfile content is
{
"resourceType": "Patient",
"myArray": [1, 2, 3, 4]
}
I use EvaluateJsonPath processor to load the "myArray" to an attrribute myArray.
Then I use the processor AttributesToJSON to create a json from myArray.
But in the flowfile content, what I get is
{"myArray":"[1,2,3,4]"}
I expected the flowfile to have the following content.
{"myArray":[1,2,3,4]}
Here are the flowfile attributes
How can I get "myArray" as an array again in the content?
Use record oriented processors like Convert Record processor instead of using EvaluateJsonPath,AttributesToJSON processors.
RecordReader as JsonPathReader
JsonPathReader Configs:
AvroSchemaRegistry:
{
"namespace": "nifi",
"name": "person",
"type": "record",
"fields": [
{ "name": "myArray", "type": {
"type": "array",
"items": "int"
}}
]
}
JsonSetWriter:
Use the same AvroSchemaRegistry controller service to access the schema.
To access the AvroSchema you need to set up schema.name attribute to the flowfile.
Output flowfile content would be
[{"myArray":[1,2,3,4]}]
please refer to this link how to configure ConvertRecord processor
(or)
if your deserved output is {"myArray":[1,2,3,4]} without [](array) then use
ReplaceText processor instead of AttributesToJson Processor.
ReplaceText Configs:
Not all credit goes to me but I was pointed to a better simpler way to achieve this. There are 2 ways.
Solution 1 - and the simplest and elegant
Use Nifi JoltTransformJSON Processor. The processor can make use of Nifi expression language and attributes in both left or right hand side of the specification syntax. This allows you to quickly use the JOLT default spec to add new fields (from flow-file attributes) to a new or existing JSON.
Ex:
{"customer_id": 1234567, "vckey_list": ["test value"]}
both of those fields values are stored in flow-file attributes as a result of a EvaluateJSONPath operation. Assume "customer_id_attr" and ""vckey_list_attr". We can simply generate a new JSON from those flow-file attributes with the "default" jolt spec and the right hand syntax. You can even add addition expression language functions to the processing
[
{
"operation": "default",
"spec": {
"customer_id": ${customer_id_attr},
"vckey_list": ${vckey_list_attr:toLower()}
}
}
]
This worked for me even when storing the entire JSON, path of "$", in a flow-file attribute.
Solution 2 - complicated and uglier
Use a sequence Nifi ReplaceText Processor. First use a ReplaceText processor to append the desired flow-file attribute to the file-content.
replace_text_processor_1
If you are generating a totally new JSON, this would do it. If you are trying to modify an existing one, you would need to first append the desired keys, than use ReplaceText again to properly format as a new key in the existing JSON, from
{"original_json_key": original_json_obj}{"customer_id": 1234567, "vckey_list": ["test value"]}
to
{"original_json_key": original_json_obj, "customer_id": 1234567, "vckey_list": ["test value"]}
using
replace_text_processor_2
Then use JOLT to do further processing (that's why Sol 1 always makes sense)
Hope this helps, spent about half a day figuring out the 2nd Solution and was pointed to Solution 1 by someone with more experience in Nifi

Aggregating nested fields of varying datatypes in Elasticsearch

I have an index based on Products and one of the fields declared in the mapping is Attributes. This field is a nested type as it will contain two values - key and value. The problem I have is that the depending on the context of the attribute the datatype of value can vary between an integer and string.
For example:
{"attributes":[{"key":"StrEx","value":"Red"},{"key":"IntEx","value":2}]}
It seems the datatype for every instance of 'value' within all future nested documents within Attributes is decided based on the first data entered. I need to be able to store it as a integer/long datatype so I can perform range queries.
Any help or alternative ideas would be greatly appreciated.
You need a mapping like this one, for the value field:
"value": {
"type": "string",
"fields": {
"as_number": {
"type": "integer",
"ignore_malformed": true
}
}
}
Basically, your field is string but using fields you can attempt to format it as a numeric field.
When you want to use range queries then use value.as_number, for anything else use value.

Logstash inserting dates as strings instead of dateOptionalTime

I have an Elasticsearch index with the following mapping:
"pickup_datetime": {
"type": "date",
"format": "dateOptionalTime"
}
Here is an example of a date contained in the file that is being read in
"pickup_datetime": "2013-01-07 06:08:51"
I am using Logstash to read and insert data into ES with the following lines to attempt to convert the date string into the date type.
date {
match => [ "pickup_datetime", "yyyy-MM-dd HH:mm:ss" ]
target => "pickup_datetime"
}
But the match never seems to occur.
What am I doing wrong?
It turns out the date filter was before the csv filter, where the columns get named, hence the date filter was not finding the pickup_datetime column since it had not yet been named.
It might be a good idea to clearly mention the sequentiality of the filters in the documentation to avoid others having similar problems in the future.

Resources