convert timestamp to UTC in stream sets - time

I am ingesting logs from different zones in Hadoop through streamsets. I want to convert different timestamps to a single UTC timestamp. How can I do that in streamsets?

Related

How to change timezone in Nifi?

We are connecting Oracle database. When querying the date from the ExecuteSQL processor, the date column timezone is getting changed to UTC. How can we avoid this ?
Even I added oracle.jdbc.timezoneAsRegion=false in controller service properties, but still it did not work.
We don't want to edit the bootstrap.conf for changing the timezone. Is there a way to change timezone in Nifi ?
Further if my date column value in DB is having 2021-06-20 01:00:00 , when read in nifi the value changes to 2021-06-19 19:30:00 where we can see 5:30 hours difference.
Same thing if I run it through java code, I am getting an exact date value as in DB.
Timezone in my system where Nifi is running is Time zone: Asia/Kolkata (IST, +0530)

Issue with timestamp field while using SQOOP

An extra space is added before the milliseconds while timestamp field is being ingested for eg. 05-OCT-17 03.39.02.689000000 AM is ingested as
2017-10-5 3:39:2. 689000000. Using Oracle as the source,parquet format as the format for storing the data in HDFS.
Any suggestions on how it can be avoided.

logstash-elasticsearch: sort data by timestamp

I centralize logfiles into one logfile by using logstash and for each event I have timestamp(the original one).
Now, my last challenge it to get this data sorted by timestamp(if possible on real-time thats better).
my timestamp format is: yyyy-MM-dd HH:mm:ss
Now, I can make any change in the format/ file format in order to make it work, as long as it stays on our servers.
What's the best way to sort my data?
any ideas?
Thanks in advance!

hive hbase integration timestamp

I would like to store table into HBase using Hive (hive hbase integration )
My table contains a field typed TIMESTAMP (like DATE)
I've done some research and i discovered that TIMESTAMP is not supported by HBASE, some what should I do?
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating dat at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:185)
at org.apache.hadoop.hive.serde2.lazy.LazyTimestamp.init(LazyTimestamp.java:74)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:219)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:192)
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:188)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
The easiest thing to do would be to convert the TIMESTAMP into a STRING, INT, or FLOAT. This will have the unfortunate side effect of giving up Hive's built in TIMESTAMP support. Due to this you will lose
Read time checks to make sure your column contains a valid TIMESTAMP
The ability to transparently use TIMESTAMPSs of different formats
The use of Hive UDFs which operate on TIMESTAMPs.
The first two losses are mitigated if you choose a single format for your own timestamps and stick to it. The last is not a huge loss because only two Hive date functions actually operate on TIMESTAMPs. Most of them operate on STRINGs. If you aboslutely needed from_utc_timestamp and from_utc_timestamp, you can write your own UDF.
If you go with STRING and only need the date, I would go with a yyyy-mm-dd format. If you need the time as well go with yyyy-mm-dd hh:mm:ss, or yyyy-mm-dd hh:mm:ss[.fffffffff] if you need partial second timestamps. This format also is also consistent with how Hive expects TIMESTAMPs and is the form required for most Hive date functions.
If you with INT you again have a couple of options. If only the date is important, YYYYMMDD fits in with the "basic" format of ISO 8601 (This is a form I've personally used and found convenient when I didn't need to perform any date operations on the column). If the time is also important, go with YYYYMMDDhhmmss. This an acceptable variant for the basic form of ISO 8601 for date time. If you need fractional second timing, then use a FLOAT and the form YYYYMMDDhhmmss.fffffffff. Note that neither of these forms is consitent with how Hive expects integer or floating point TIMESTAMPs.
If the concept of calendar dates and time of day isn't important at all, then using an INT as a Unix timestamp is probably the easiest, or a FLOAT if you also need fractional seconds. This form is consistent with how Hive expects TIMESTAMPs.

Hadoop Pig ISO Date to Unix Timestamp

I have a list of items in Pig consisting of ISO 8601 (YYYY-MM-DD) formatted date strings:
(2011-12-01)
(2011-12-01)
(2011-12-02)
Is there any way to transform these items into UNIX timestamps apart from implementing my own functions in Java?
You need a UDF to do that = The good news it has already been done.
Pig also comes with "piggybank" UDFs contributed by the community including date convert

Resources