We are connecting Oracle database. When querying the date from the ExecuteSQL processor, the date column timezone is getting changed to UTC. How can we avoid this ?
Even I added oracle.jdbc.timezoneAsRegion=false in controller service properties, but still it did not work.
We don't want to edit the bootstrap.conf for changing the timezone. Is there a way to change timezone in Nifi ?
Further if my date column value in DB is having 2021-06-20 01:00:00 , when read in nifi the value changes to 2021-06-19 19:30:00 where we can see 5:30 hours difference.
Same thing if I run it through java code, I am getting an exact date value as in DB.
Timezone in my system where Nifi is running is Time zone: Asia/Kolkata (IST, +0530)
Related
My use-case is simple but I did not find the right solution so-far.
I write the query which tag the data with the current timestamp in of the column at the time ExecuteSQLRecord processor hit and get the data from database now want I wanted is that created flowfile has to have the same timestamp in his name as well but i did not know how to capture the attribute which is ${now():format("yyyyMMddHHmmss")} so I can use alter for renaming the flowfile
Basically, I wanted to store the timestamp "at the time I hit the database", I can not use the update processor just before the executeSQL processor to get the timestamp needed (why => because if prior execution is still in process with executeSQL and all the flow files will pass updateattribute processor with the timestamp value and will sit in the queue until executeSQL processor process current thread).
Note - I am running NiFi in standalone mode so I can not run executeSQL in multiple threads.
Any help is highly appreciated. thanks in advance
ExecuteSQLRecord writes an attribute called executesql.query.duration which contains the duration of the query + fetch in milliseconds.
So, we can put an UpdateAttribute processor AFTER the ExecuteSQLRecord that uses ${now():toNumber():minus(${executesql.query.duration})} to get the current time as Epoch Millis, then minus the total query duration, to get the time at which the Query started.
You can then use :format('yyyyMMddHHmmss') to bring it back to the timestamp format you want.
It might be a few milliseconds off of the exact time (time taken to get to the UpdateAttribute processor).
See docs for ExecuteSQLRecord
i have a nifi flow that converts a log from mq, converts it into jason and puts the data in a database. (a picture of the workflow is attached)
although i use jolt in order to reshape it, no matter how i parse the date-time component, nifi sends an error message that the time component cannot be converted to a timestamp.
how can i overcome this problem?
thank you very much!
An extra space is added before the milliseconds while timestamp field is being ingested for eg. 05-OCT-17 03.39.02.689000000 AM is ingested as
2017-10-5 3:39:2. 689000000. Using Oracle as the source,parquet format as the format for storing the data in HDFS.
Any suggestions on how it can be avoided.
I am using the process QueryDatabaseTable in NiFi for incrementally getting data from a DB2. QueryDatabaseTable is scheduled to run every 5 minutes. Maximum-value Columns is set to "rep" (which corresponds to a date, in the DB2 db).
I have a seperate MySQL database I want to update with the value "rep", that QueryDatabaseTable uses to query the DB2 database with. How can i get this value?
In the logfiles I've found that the attributes of the FlowFiles does not contain this value.
QueryDatabaseTable doesn't currently accept incoming flow files or allow the use of Expression Language to define the table name, I've written up an improvement Jira to handle this:
https://issues.apache.org/jira/browse/NIFI-2340
I would like to store table into HBase using Hive (hive hbase integration )
My table contains a field typed TIMESTAMP (like DATE)
I've done some research and i discovered that TIMESTAMP is not supported by HBASE, some what should I do?
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating dat at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:185)
at org.apache.hadoop.hive.serde2.lazy.LazyTimestamp.init(LazyTimestamp.java:74)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:219)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:192)
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:188)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
The easiest thing to do would be to convert the TIMESTAMP into a STRING, INT, or FLOAT. This will have the unfortunate side effect of giving up Hive's built in TIMESTAMP support. Due to this you will lose
Read time checks to make sure your column contains a valid TIMESTAMP
The ability to transparently use TIMESTAMPSs of different formats
The use of Hive UDFs which operate on TIMESTAMPs.
The first two losses are mitigated if you choose a single format for your own timestamps and stick to it. The last is not a huge loss because only two Hive date functions actually operate on TIMESTAMPs. Most of them operate on STRINGs. If you aboslutely needed from_utc_timestamp and from_utc_timestamp, you can write your own UDF.
If you go with STRING and only need the date, I would go with a yyyy-mm-dd format. If you need the time as well go with yyyy-mm-dd hh:mm:ss, or yyyy-mm-dd hh:mm:ss[.fffffffff] if you need partial second timestamps. This format also is also consistent with how Hive expects TIMESTAMPs and is the form required for most Hive date functions.
If you with INT you again have a couple of options. If only the date is important, YYYYMMDD fits in with the "basic" format of ISO 8601 (This is a form I've personally used and found convenient when I didn't need to perform any date operations on the column). If the time is also important, go with YYYYMMDDhhmmss. This an acceptable variant for the basic form of ISO 8601 for date time. If you need fractional second timing, then use a FLOAT and the form YYYYMMDDhhmmss.fffffffff. Note that neither of these forms is consitent with how Hive expects integer or floating point TIMESTAMPs.
If the concept of calendar dates and time of day isn't important at all, then using an INT as a Unix timestamp is probably the easiest, or a FLOAT if you also need fractional seconds. This form is consistent with how Hive expects TIMESTAMPs.