Funky timestamp format - hadoop

We are getting data from a vendor, and loading it into Hive. I am unable to cast a date-time field as timestamp (It's all stored as strings). After bashing my head against it for a while, I've finally noticed that there is an hyphen between the day portion of the date and the hour portion of the time:
yyyy-mm-dd-hh.mm.ss.SSSSSS
2016-05-18-21.05.21.177152
I've been trying to work out a way to handle this with from_unixtime, but no luck so far. I'm pretty sure that's not a valid pattern for a SimpleDateFormat.
Is there any way to handle this that doesn't involve splitting it apart into two strings, and concatenating them back into a valid pattern?

Split the date using substr and replace the '-' in the second part of the string and concat both string.
from_unixtime(concat(substr(sdate,0,10)),regexp_replace(substr(sdate,10,16),'-','')))

Related

oracle: add large string to clob colum

I'd like to add a large string (above 76k characters) to CLOB column in Oracle database.
I need to run a script from liquibase framework.
How can achive this?
Simple insert
INSERT INTO table_clob (clob_column) VALUES (to_Clob('string above 72000 chars...'));
with and without to_clob() method is returning exception like:
ORA-01704: string literal too long
Cannot load data from file as described here with procedure: https://oracle-base.com/articles/8i/import-clob
as I don't have priviliges to any directory
Searched google but didn't enounter any solution for my requirement.
Any advice?
UPDATE:
After hours of searching finally found a workaround here: https://stackoverflow.com/a/49817056/1622703
It is not sufficient as I need to cut the text for 3 chunks manually (with around 30k chars), but it works.
Now just need to figure our how to do it dynamically in case that the string will have vary lenghts of chars (above 10k chars for example).
A hard-coded string enclosed in single quotes is known as a string literal. An example is 'Hello world'. Another example is the very long string you are trying to insert in the table. By contrast, 'abc' || 'def' is a string expression but it is not a string literal. Similarly, to_char(sysdate, 'yyyy-mm-dd') is a string expression, but not a literal. "Literal" means constant, hard-coded text.
The issue you are facing has nothing to do with insert, or to_clob(), or the data type of columns in your table, etc. It only has to do with the string literal itself.
In Oracle, a string literal can be at most 4000 bytes long (or 32767 bytes if the database is set up with extended MAX_STRING_SIZE). PERIOD! There is no way around it.
So, the question is, how can you ever get a string as long as the one you have into a table with a CLOB column. The answer depends on how you are receiving the string in the first place. The best option would be if it came in chunked already - as a collection of strings, with a tag (an id) to keep track of which fragment belongs to which CLOB and an ordinal number (to show if it's the first chunk, the second, etc.) Then you could re-assemble them using TO_CLOB() on the first chunk, plus the concatenation operator.
If your process is to type 72000 characters at the keyboard, you will have to type 4000 of them at a time, enclose in single quotes, and use the concatenation operator (essentially doing by hand what I described above). You would also have to use TO_CLOB() on the first fragment (otherwise the concatenation will fail).
Another option is for the string to come as a value, from some application that supports long strings (something compatible with Oracle's CLOB) and that can hand over such values to the Oracle database without the need to write out the hard-coded string in full.
So, the ball is in your court. The first question is, Where is the long string coming from in the first place?

Oracle : Want to convert Substring to a useable, sortable date

1st Post go easy on me.
I'm using this Substring to pull part of a Field, this date I assume is probably non Standard (ddmmmyy) - how can I enhance this command so that I can use this a sortable Date Field, I'm guessing Cast but have no idea of Syntax etc ??
SELECT SUBSTR(Host_Name,-9) as Decom_Date
Output
DECOM_DATE
31Oct2018
31May2018
31May2018
31Mar2017
31Jul2018
TIA
This is exactly what the TO_DATE function is designed for:
SELECT TO_DATE(SUBSTR(Host_Name,-9), 'DDMonYYYY') as Decom_Date
It doesn't affect you here but bear in mind that oracle dates can only store down to a second precision. Also, if you have any rogue data in the table that can't be cant be parsed as a date you'll get "not a valid..." or "a nonnumeric was found where a numeric was expected".
Be mindful that your strings here are in English but parsing MON (3 letter month name) can be regionally contextual so this code might not work on a server with a different NLS; for example consider passing 'NLS_DATE_LANGUAGE = American' as the third argument to TO_DATE if you know your strings will always be English month names

How can you add FLOAT measures in Tableau formatted as a time stamp (hh:mm:ss)?

The fields look as described above. They are time fields from SQL imported as a varchar. I had to format as date in tableau. There can be NULL values, so I am having a tough time getting over that. Tableau statement I have is only ([time spent])+([time waited])+([time solved)].
Thank you!
If you only want to use the result for a graphical visualization of what took the longest, you can split and add all the values into seconds and using it into your view. E.g.
In this case the HH:MM:SS fields are Strings for Tableau.
The formula used to sum the three fields is:
//transforms everything into seconds for each variable
zn((INT(SPLIT([Time Spent],':',1))*3600))
+
zn((INT(SPLIT([Time Spent],":",2))*60))
+
zn((INT(SPLIT([Time Spent],":",3))))
+
zn((INT(SPLIT([Time Waited],':',1))*3600))
+
zn((INT(SPLIT([Time Waited],":",2))*60))
+
zn((INT(SPLIT([Time Waited],":",3))))
+
zn((INT(SPLIT([Time Solved],':',1))*3600))
+
zn((INT(SPLIT([Time Solved],":",2))*60))
+
zn((INT(SPLIT([Time Solved],":",3))))
Quick explanation of the formula:
I SPLIT every field three times, one for the hours, minutes and seconds, adding all the values.
There is an INT formula that will convert the strings into integers.
There is also a ZN for every field - this will make Null fields become Zeros.
You can also use the value as integer if you want, e.g. the Case A has a Total Time of 5310 seconds.
The best approach is usually to store dates in the database in a date field instead of in a string. That might mean a data prep/cleanup step before you get to Tableau, but it will help with efficiency, simplicity and robustness ever after.
You can present dates in many formats, including hh:mm, when the underlying representation is a date datatype. See the custom date options on the format pane in Tableau for example. But storing dates as formatted strings and converting them to something else for calculations is really doing things the hard way.
If you have no choice but to read in strings and convert them to dates, then you should look at the DateParse function.
Either way, decide what a null date means and make sure your calculations behave well in that case -- unless you can enforce that the date field not contain nulls in the database.
One example would be a field called Completed_Date in a table of Work_Orders. You could determine that a null Completed_Date meant the work order had not been fulfilled yet, and thus allow nulls for that field. But you could also have the database enforce that another field, say Submitted_Date, could never be null.

Split a Value in a Column with Right Function in SSIS

I need an urgent help from you guys, the thing i have a column which represent the full name of a user , now i want to split it into first and last name.
The format of the Full name is "World, hello", now the first name here is hello and last name is world.
I am using Derived Column(SSIS) and using Right Function for First Name and substring function for last name, but the result of these seems to be blank, this where even i am blank. :)
It's working for me. In general, you should provide more detail in your questions on places such as this to help others recreate and troubleshoot your issue. You did not specify whether we needed to address NULLs in this field nor do I know how you'd want to interpret it so there is room for improvement on this answer.
I started with a simple OLE DB Source and hard coded a query of "SELECT 'World, Hello' AS Name".
I created 2 Derived Column Tasks. The first one adds a column to Data Flow called FirstCommaPosition. The formula I used is FINDSTRING(Name,",", 1) If NAME is NULLable, then we will need to test for nullability prior to calling the FINDSTRING function. You'll then need to determine how you will want to store the split data in the case of NULLs. I would assume both first and last are should be NULLed but I don't know that.
There are two reasons for doing this in separate steps. The first is performance. As counter-intuitive as it sounds, doing less in a derived column results in better performance because the SSIS engine can better parallelize the operations. The other is more simple - I will need to use this value to make the first and last name split so it will be easier and less maintenance to reference a column than to copy paste a formula.
The second Derived Column is going to actually perform the split.
My FirstNameUnicode column uses this formula (FirstCommaPosition > 0) ? RTRIM(LTRIM(RIGHT(Name,FirstCommaPosition))) : "" That says "If we found a comma in the preceding step, then slice out everything from the comma's position to the end of the string and apply trim operations. If we didn't find a comma, then just return a blank string. The default string type for expressions will be the Unicode (DT_WSTR) so if that is not your need, you will need to cast the resultant into the correct string codepage (DT_STR)
My LastNameUnicode column uses this formula (FirstCommaPosition > 0) ? SUBSTRING(Name,1,FirstCommaPosition -1) : "" Similar logic as above except now I use the SUBSTRING operation instead of RIGHT. Users of the 2012 release of SSIS and beyond, rejoice fo you can use the LEFT function instead of SUBSTRING. Also note that you will need to back off 1 position to remove the comma.

Time value as output

For few columns from the source i.e .csv file, we are having values like 1:52:00, 14:45:00.
I am supposed to load to the Oracle table.
Which data type should I choose in Target as well as source?
Should i be doing any thing in the expression transformation?
Use SQLLDR to load the data into database with the format described as in the link
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm
ie.'HH24:MI:SS'
Oracle does not support time-only values, it supports dates (with a time component).
You have a few options:
Store the value as a string, perhaps providing a leading zero for
the hour.
Store the value as the number of seconds (or minutes) past midnight.
Store the value as the time component of some arbitrarily defined date, for
example 0001-JAN-01 01:52:00 and 0001-Jan-01 14:45:00. Tell your report writers to ignore the date portion of the value.
Your source datatype will be string(8). Use LPAD to add leading zeroes.

Resources