Trying to convert the string to array<string> type in hive using collect_set - hadoop

I have two columns in my table (date,users) which are strings.
date users
2019-01-01 '"U10000","U20000"'
I am trying to convert the users column to array<string>, and getting \ in values. I didn't find any spaces in string values, so why am I getting '\' in new array column?
this is my query and the result it produces:
Select date,collect_set(users) as user_arr from mytable group by date
date user_arr
2019-01-01 ["\"U10000\",\"U20000\""]

Related

Count date strings between a range of dates

I have a hive table (table_1). In that table, one of the columns is called 'date'. Values in that column are 'string' type and in the format 'yyyyMMdd', (ex: 20210102). I am trying to get the count(*) of records of a range of dates in that column.
Ex: select count(*) from table_1 where date BETWEEN 20210101 AND 20210301. This will not work now since that column is 'string' type. Need some help querying the DATE version of that column.

How to store date value in hive timestamp?

I am trying to store the date and timestamp values in timestamp column using hive. The source file contain the values of date or sometimes timestamps.
Is there a way to read both date and timestamp by using the timestamp data type in hive.
Input:
2015-01-01
2015-10-10 12:00:00.232
2016-02-01
Output which I am getting:
null
2015-10-10 12:00:00.232
null
Is it possible to read both values by using timestamp data type.
DDL:
create external table mytime(id string ,t timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://xxx/data/dev/ind/'
I was able think of a workaround. tried this with a small set of data:
Load the data with inconsistent date data into a hive table say table1 by making the column as string datatype .
Now create another table table2 with the datatype as timestamp for the required column and load the data from table1 to table2 using the transformation INSERT OVERWRITE TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
This should load the data in required format.
Code as below:
`
create table table1
(
id int,
tsstr string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table1.tb';
Data:
1,2015-04-15 00:00:00
2,2015-04-16 00:00:00
3,2015-04-17
LOAD DATA LOCAL INPATH '/home/cloudera/data/tsstr' INTO TABLE table1;
create table table2
(
id int,
mytimestamp timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table2.tb';
INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
Result shows up as expected:
Hive is similar to any other database in terms of datatype mapping and hence requires a uniform values for a specific column to be stored under a conformed datatype. The data in your file for second column has non-uniform data i.e, some are in date format while others in timestamp format.
In order to not to lose the date, as suggested by #Kishore , make sure you have a uniform datatype in the file and get the file with timestamp values as 2016-01-01 00:00:000 where there are only dates.

How can I convert a date to another format at run time in Oracle?

I have a date string coming from user input in the format of DD/MM/YYYY and I need to match it against a date column in our database in the format of DD-MON-YY.
Example input is 01/01/2015 and example date column in our database:
SELECT MAX(creation_date) FROM orders;
MAX(creation_date)
------------------
06-AUG-15
I need to query in the format:
SELECT * FROM orders WHERE creation_date = 01/01/2015
and somehow have that converted to 01-JAN-15.
Is it possible with some built-in Oracle function?
Use to_date, if the column in the table is in date format
http://www.techonthenet.com/oracle/functions/to_date.php
to_char allows you to specify different formats in a SQL statement.
Example: to_char(sysdate,'DD-MON-YYYY') will display 06-AUG-2015 for today's date.
TO_CHAR
Use to_date to compare your date column to a date string, but be careful in doing so since your date column may include a time component that isn't showing when selecting from your table.
If there is no index on your date column, you can truncate it during the comparison:
SELECT * FROM orders WHERE TRUNC(creation_date) = TO_DATE('01/01/2015','mm/dd/yyyy');
If there is an index on your date column and you still want to use it then use a ranged comparison:
SELECT * FROM orders
WHERE creation_date >= TO_DATE('01/01/2015','mm/dd/yyyy')
and creation_date < TO_DATE('01/01/2015','mm/dd/yyyy')+1;

Excluding the partition field from select queries in Hive

Suppose I have a table definition as follows in Hive(the actual table has around 65 columns):
CREATE EXTERNAL TABLE S.TEST (
COL1 STRING,
COL2 STRING
)
PARTITIONED BY (extract_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\007'
LOCATION 'xxx';
Once the table is created, when I run hive -e "describe s.test", I see extract_date as being one of the columns on the table. Doing a select * from s.test also returns extract_date column values. Is it possible to exclude this virtual(?) column when running select queries in Hive.
Change this property
set hive.support.quoted.identifiers=none;
and run the query as
SELECT `(extract_date)?+.+` FROM <table_name>;
I tested it working fine.

how to do in hibernate like order by to_date(ADD_DATE_CREATED,'DD-MM-YY') desc

i want to add criteria in hibernate for order by date.
Here date description in db as
vrhCreatedDate varchar2(20)
Here i have date with varchar datatype in database .
i m doing order by using
order by to_date(ADD_DATE_CREATED,'DD-MM-YY') desc
how to add criteria like below for order by date itself ?
criteria.addOrder(Order.desc("to_date({alias}.vrhCreatedDate, 'DD-MM-YY')")); //need to parse varchar column in date when pass order by column in hibernate.
Parse order by column with string to date field.
You can use the following statment to order by your date field
criteria.addOrder(Order.desc("your Field name"));

Resources