Date importing from oracle to hive using spark hive context format in hive should be YYYYMMDD (dt_skey) - oracle

I have to import a table from Oracle to Hive using Spark and Scala, date column in Oracle looks like this Oracle column date, I have to cast it to the dt_skey format (YYYYMMDD) in Hive. Table format in Hive is Parquet. How can I do that? I googled it a lot but I didn't find any solution.
Thanks in advance

Assuming your input data is supposed to mean yy-mm-dd (so 16-09-15 means year 2016, month 09, day 15) you probably need a transformation like this:
select to_char( to_date (dt, 'yy-mm-dd'), 'yyyymmdd') from ...
Example:
with my_table ( dt ) as ( select '16-09-15' from dual)
-- this creates a test table my_table with column dt and value as shown
select dt,
to_char( to_date (dt, 'yy-mm-dd'), 'yyyymmdd') as dt_skey
from my_table
;
DT DT_SKEY
-------- --------
16-09-15 20160915
You could also manipulate the input string directly but I would strongly recommend against that. Translating to date and back will catch invalid "dates" in your data before you try to push them to an application. Also, the string manipulation would become complicated if the input strings are inconsistent (for example if something like 16-9-15 is allowed along with 16-09-15).
EDIT: In a comment to his original question, the OP stated that dt is already in DATE format in Oracle. In that case, it should NOT be wrapped within to_date() - that will lead to errors. Rather, the solution is MUCH simpler, all is needed is
select to_char(dt, 'yyyymmdd') from ...

Related

Varchar to Timestamp but varchar data is yyyy-mm-dd-hh:mi:ss:ff format

My source is from Oracle and the col1 is varchar2(26) but the value looks like YYYY-MM-DD-hh:mi:ss:ff (Sample rec: 2014-08-15-02.03.34.979946).
I have to extract only 6 months records based on COL1. Since there is a hypen between date part and time part - i could not consider as timestamp. Is there any idea how to have this as timestamp to lookup only 6 months data.
If it is possible at all, fix the data first. Storing timestamps in string data type is terrible. How do you know you don't have a time like 25:30:00 in the strings? Or a date like February 30? Besides, you can't really use an index on that column (so queries will be very slow), you will have to write a lot of code whenever referencing that column, etc.
Anyway - to deal with the immediate problem, use TO_TIMESTAMP(), exactly with the format model you show in your post - including the dash between the date part and the time part. Something like this:
select case when to_timestamp('2014-08-15-02.03.34.979946', 'YYYY-MM-DD-HH24:MI:SS.FF')
>= systimestamp - interval '6' month
then 'TRUE' else 'FALSE' end
as result
from dual;
RESULT
------
FALSE
EDIT: As Alex Poole points out (correctly as always) in a Comment below this Answer, interval arithmetic won't work correctly in all cases. It is better, than, to use something like
cast ( timestamp (...., format-model) as date ) <= add_months (sysdate, -6).
Maybe something like this will do:
select *
from your_table
where to_date(substr(col1,1,19),'yyyy-mm-dd-HH24.MI.SS') between add_months(sysdate,-6) and sysdate;
Assuming all the data format in col1 is always the same.
Also note that I used HH24 for hour segment, however could be not your case.
You can include the dash in your format model, as #mathguy showed, to convert your string to a timestamp:
select to_timestamp('2014-08-15-02.03.34.979946', 'YYYY-MM-DD-HH24:MI:SS.FF') from dual;
TO_TIMESTAMP('2014-08-15-02.
----------------------------
15-AUG-14 02.03.34.979946000
although unless you explicitly tell it not to be via the FX modifier, Oracle is flexible enough to allow a dash even if the model has a space (see the text below this table in the documentation:
select to_timestamp('2014-08-15-02.03.34.979946', 'YYYY-MM-DD HH24:MI:SS.FF') from dual;
TO_TIMESTAMP('2014-08-15-02.
----------------------------
15-AUG-14 02.03.34.979946000
However, converting all of the values in your col1 column and then comparing them may be a lot of work, and will prevent any index on that string column being used. Given the format, you can convert your date range to string instead, and use string comparison; e.g. to find everything in the six months up to midnight this morning:
select col1 -- or whichever columns you need
from your_table
where col1 >= to_char(cast(add_months(trunc(sysdate), -6) as timestamp), 'YYYY-MM-DD-HH24:MI:SS.FF6')
and col1 < to_char(cast(trunc(sysdate) as timestamp), 'YYYY-MM-DD-HH24:MI:SS.FF6');
or since the time part can be fixed for that example, you can use character literals instead of casting:
select col1 -- or whichever columns you need
from your_table
where col1 >= to_char(add_months(sysdate, -6), 'YYYY-MM-DD"-00:00:00.000000"')
and col1 < to_char(sysdate, 'YYYY-MM-DD"-00:00:00.000000"');
Of course, storing data in the correct native data type would be a much better solution. Any other solution only works at all if your string data actually contains what you think, and the data is all sane (or as sane as it can be in the wrong data type).

How to load files using SQLLDR with date format as yyyymmddhhmmss?

I need to load a table with a .csv file which contains date "20140825145416".
I have tried using (DT date "yyyymmdd hh24:mm:ss") in my control file.
It throws an error as ORA-01821: date format not recognized
I require the data in table as "MM/DD/YYYY HH:MM:SS".
Sample data : 20140825145416
thanks in advance.
Well, I would be remiss if I did not point out that the correct answer is to never store dates as VARCHAR2 data, but make it a proper DATE column and load it like this:
DT DATE "YYYYMMDDHH24MISS"
Formatting is done when selecting. It will make your life so much easier if you ever need to use that date in a calculation.
That out of the way, If you have no control over the database and have to store it as a VARCHAR2, first convert to a date, then use to_char to format it before inserting:
DT CHAR "to_char(to_date(:DT, 'YYYYMMDDHH24MISS'), 'MM/DD/YYYY HH24:MI:SS')"
Note 'MI' is used for minutes. You had a typo where you used 'MM' (months) again for minutes.
I know it's already been said in the previous answer, but it's so important, it's worth repeating. Do not store dates as varchars !!
If your DT column is timestamp then this might work
DT CHAR(25) date_format TIMESTAMP mask "yyyymmddhhmiss"
I used something like this in external tables. Maybe this might help
https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm
and
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8128892010789

Toad SQL Issue for Timestamp without Seconds

I am executing the following sql in Toad. Oracle is RDBMS
I only need Date in yyyymmdd HH24:mi, but I get Date only as shown below
alter session set nls_date_format = 'DD/MM/YYYY HH24:MI';
SELECT to_date('22/07/1980 00:00','dd/mm/yyyy hh24:mi') dt FROM dual
22/07/1980
Required Output
22/07/1980 00:00
You are looking for to_char() -- you want to return the date as a string, not a date. As far as I know, the date is returned without the time, and I don't think the NLS changes that.
So:
SELECT to_char(to_date('22/07/1980 00:00', 'dd/mm/yyyy hh24:mi'
), 'DD/MM/YYYY HH24:MI'
) as dt
FROM dual
I have used sql plus and spooled the file instead of Toad, used NLS Date format conversion here, but as this is command based, wanted to use GUI based TOAD.

Oracle Date Format Conversion Issue

Am at the end of my tether so hoping someone can help me! I'm really new to Oracle, but do have a SQL background which is why I'm finding this so frustrating!
We have a system that runs Oracle at the back end. I've got very limited access to the system and can only write select queries.
I've written a query that gets the data I want but the date format is coming out as mm dd yyyy what I need is dd/mm/yyyy
I ran SELECT sysdate FROM dual and that come back as:
SYSDATE
03 11 2015
So my select statement reads (action_date is the column in question)
Select username, action_date from adminview
I've tried everything I can think of to change the date format including:
to_date(action_date,'dd/mm/yyyy')
to_date(action_date,'dd/mm/yyyy','nls_language=English')
to_date(to_date(action_date,'mm dd yyyy'),'dd/mm/yyyy')
I've also tried to_char along the same lines.
If you want to format a DATE value, use TO_CHAR():
SELECT username, TO_CHAR(action_date, 'DD/MM/YYYY') AS action_date
FROM adminview;
If it's not a DATE value, then you'll want to convert it to a DATE (based on what it currently looks like), then use TO_CHAR() to format.

How to populate a timestamp field with current timestamp using Oracle Sql Loader

I'm reading a pipe delimited file with SQL Loader and want to populate a LAST_UPDATED field in the table I am populating. My Control File looks like this:
LOAD DATA
INFILE SampleFile.dat
REPLACE
INTO TABLE contact
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
(
ID,
FIRST_NAME,
LAST_NAME,
EMAIL,
DEPARTMENT_ID,
LAST_UPDATED SYSTIMESTAMP
)
For the LAST_UPDATED field I've tried SYSTIMESTAMP and CURRENT_TIMESTAMP and neither work. SYSDATE however works fine but doesn't give me the time of day.
I am brand new to SQL Loader so I really know very little about what it is or isn't capable of. Thanks.
Have you tried the following:
CURRENT_TIMESTAMP [ (precision) ]
select current_timestamp(3) from dual;
CURRENT_TIMESTAMP(3)
-----------------------------
10-JUL-04 19.11.12.686 +01:00
To do this in SQLLDR, you will need to use EXPRESSION in the CTL file so that SQLLDR knows to treat the call as SQL.
Replace:
LAST_UPDATED SYSTIMESTAMP
with:
LAST_UPDATED EXPRESSION "current_timestamp(3)"
I accepted RC's answer because ultimately he answered what I was asking but my unfamiliarity with some of Oracle's tools led me to make this more difficult than it needed to be.
I was trying to get SQL*Loader to record a timestamp instead of just a date. When I used SYSDATE, and then did a select on the table it was only listing the the date (05-AUG-09).
Then, I tried RC's method (in the comments) and it worked. However, still, when I did a select on the table I got the same date format. Then it occurred to me it could just be truncating the remainder for display purposes. So then I did a:
select TO_CHAR(LAST_UPDATED,'MMDDYYYY:HH24:MI:SS') from contact;
And it then displayed everything. Then I went back to the control file and changed it back to SYSDATE and ran the same query and sure enough, the HH:MI:SS was there and accurate.
This is all being done in SqlDeveloper. I don't know why it defaults to this behavior. Also what threw me off are the following two statements in sqldeveloper.
SELECT CURRENT_TIMESTAMP FROM DUAL; //returns a full date and time
SELECT SYSDATE FROM DUAL; // returns only a date
If you want to use the table defined default you can use:
ROWDATE EXPRESSION "DEFAULT"
In Sql Developer run:
ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'
and then check it with
SELECT SYSDATE FROM DUAL

Resources