Is there way to loop in NiFi expression language to add padding to the data? - apache-nifi

There is a date field in the record. That is in the format below "YYYY-MM-DD HH:MM:SS.sss"(using this date value as a string). In some records, the milliseconds are rounded off from the source for example
2018-05-15 15:30:20.123
2018-05-15 15:30:20.12
2018-05-15 15:30:20.3
Is there a way to pad the additional zeros in example 2 and 3 like below in NiFi?
2018-05-15 15:30:20.120
2018-05-15 15:30:20.300
Is there way to loop in NiFi expression language?
PS: Right now I am using three different processors to do this loop by having the date as an attribute and check its length as a condition and decide to add '0' if needed. And another approach I tried is using an Execute script processor. But trying to find if there is a better solution to this?.

assume you have attribute date = 2018-05-15 15:30:20.3
you can use updateattribute with expression like this:
${date:append('000'):replaceAll('(\\.\\d{3})(.*)$','$1')}
append extra zeros and then remove the needless with regexp replace

Related

Convert azure.timestamp to NiFi date data type in NiFi expression language

I am using the NiFi ListAzureBlobStorage to get the available blob objects. The processor creates a flowfile for each object with the attributes containing the object metadata. I want to filter on the azure.timestamp attribute, but I do not know what the numeric value represents and how it relates to the NiFi's expression language date data type. I want to compare it with a known date so I need to convert it to a NiFi data-time variable first. How do I do this?
Thanks
According to the code it is already in "NiFi format" which means a Unix timestamp.
Since it represents the number of milliseconds passed since 1/1/1970, you can compare this and the other timestamp using regular number comparison operators.
example: ${azure.timestamp:ge(${now()})} - this will return true if the azure.timestamp is later(or equal) than the current timestamp(now).
If you'd like to compare it to another attribute you can do this:
${azure.timestamp:ge(${attribute.name})}.
If you'd like to convert a different date into a unix timestamp, you can use toDate and then toNumber, or to do the other way around, just use format.

SORT in JCL based on Current Date

Requirement: I need to sort an input file based on Date.
The date is in YYYYMMDD format starting at 56th Position in the flat file.
Now, the I am trying to write a sort card which writes all the records that have the date(YYYYMMDD) in the past 7 Days.
Example: My job is running on 20181007, it should fetch all the records that have date in between 20181001 to 20181007.
Thanks in advance.
In terms of DFSort you can use the following filter to select the current date as a relative value. For instance:
OUTFIL INCLUDE=(56,8,CH,GE,DATE1-7)
There are several definitions for Dates in various formats. I assume that since you are referring to a flat file the date is in a character format and not zoned decimal or other representation.
For DFSort here is a reference to the include statement
Similar constructs exist for other sort products. Without specifics about the product your using this is unfortunately a generic answer.

NiFi decimal number format

Given the CSV input file below:
name,amount
Abc,"1,234.56"
Def,"2,222,222.222222"
The amount field contains decimal number with comma. How to parse it into a number in NiFi? I don't want to parse it into a string.
I thought of using the UpdateRecord processor, Expression Language, and Java's NumberFormat to parse it, but it seems that NumberFormat is inaccessible from Expression Language. Alternatively, I want to use ScriptedRecordSetWriter to parse, but couldn't find any working example out there.
Appreciate any help especially with a working example.
When we are reading the incoming data we still needs to use String type(as the data is enclosed in ") while writing out the data from UpdateRecord processor we can use int/decimal types to write the output flowfile records.
1. Using Record Path Value:
You can read the incoming data as String datatype, Output flowfile will have integer type defined() and using UpdateRecord processor replace the ',' with ''
Add new property in UpdateRecord processor as
/amount
substringBefore(replace(/amount,',',''),'.')
Now the output flowfile will have integer datatype for the amount field.
2. Using Literal Value:
If we are using literal value we can use NiFi expression language functions on field.value by using replace and toNumber functions we are able to get int value for amount field.
Both ways we are going to get output flowfile in json format as
[{"name":"Abc","amount":1234},{"name":"Def","amount":2222222}]
In the same way if you want to have decimal as output flowfile type define avro schema with decimal type and don't use substringBefore and toNumber functions.

select rows with hour and not half hour

How to extract only rows with hour and not half hour i libreoffice
2014/06/15 19:30:00
2014/06/15 20:00:00
2014/06/15 20:30:00
=>
2014/06/15 20:00:00
It turned out that i could:
extract the time part using a =right(D1;8)
Do a =IF(MINUTE(D1)<>0,"",1)
Sort by the 1's and remove the empty cells.
But, there must be a better and more neat way.
Nicolai
As moggi pointed out, it depends on the way the time data is presented:
If your values are valid date/time values for LO Calc, you can just apply the MINUTE() function to the complete value to extract the minutes, and compare it with 0. "Valid date/time" means that, e.g. the date/time value 2014/06/15 19:30:00 is internally represented as double value 40343.8125. Displaying this as date/time is a matter of formatting. You can check this by manually entering the double value into a unformatted cell and change the formatting to a date value.
If your values are text values, you may use a regex to match full-hour time values with something like .*[:digit:]{2}:00:00$. You can use this regex in Menu Data -> Filters -> Standard Filter or Advanced Filter; don't forget to enable Regular Expressions under the filter's Options.

Time value as output

For few columns from the source i.e .csv file, we are having values like 1:52:00, 14:45:00.
I am supposed to load to the Oracle table.
Which data type should I choose in Target as well as source?
Should i be doing any thing in the expression transformation?
Use SQLLDR to load the data into database with the format described as in the link
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm
ie.'HH24:MI:SS'
Oracle does not support time-only values, it supports dates (with a time component).
You have a few options:
Store the value as a string, perhaps providing a leading zero for
the hour.
Store the value as the number of seconds (or minutes) past midnight.
Store the value as the time component of some arbitrarily defined date, for
example 0001-JAN-01 01:52:00 and 0001-Jan-01 14:45:00. Tell your report writers to ignore the date portion of the value.
Your source datatype will be string(8). Use LPAD to add leading zeroes.

Resources