How to define a default NULL value to Hive table - hadoop

Suppose one table contains 3 columns and I render a query like this:
SELECT col1, col2, col3, null as col4 from table;
In this case null values are not properlly getting reflected. I got error in UNION operations stated "Runtime NULL exception". Please help on this

What distribution/version of Hadoop you are using this . I tried this on IDH and it just worked fine. Can you also paste complete error message
hive (default)> describe table4;
OK
fld1 string
fld2 string
hive (default)> select fld1, fld2, null as fld3 from table4;
A 1 NULL
B 1 NULL
C 1 NULL
Adding below as per questioner's comment,
I tried this cast , it describe now as string. you can cast as per your need
create table table6 as select fld1, fld2, cast(null as string) fld3 from table4;
hive (default)> describe table6;
OK
fld1 string
fld2 string
fld3 string

I tried this on some sample tables and it worked for me without any errors. Example query:
select * from
(select col1, NULL as col2 from table1 LIMIT 10
UNION ALL
select col1, col2 from table2LIMIT 10) q1

Related

Copy from VARCHAR field to NUMBER field such that VARCHAR value becomes null after being copied to NUMBER field

I have two tables Table1 and Table2 both with the same columns TestResult and Testcounts. Table1 has testresult as varchar and Table2 has testresult as number.
I have a string .for eg "Oracle" as value for testresult of varchar type for Table1 which needs to be inserted to testresult of number type of Table2 as null.How can i do this? Any suggestions will be highly appreciated :)
EDIT
I have table1 with columns as TestResult varchar2(50) and Testcount number with values as "0.5","0.6","0.8","Oracle" for TestResult and 1,2,3,4 for Testcount.
Now i have another table Table2 as TestResult number and Testcount number with no values, in other words its empty.. I would like to insert all data from table1 to table2 with "Oracle" being inserted as "null"
The following will do what you've asked for:
INSERT INTO TABLE2 (TESTRESULT, TESTCOUNTS)
SELECT CASE
WHEN LENGTH(REGEXP_SUBSTR(TESTRESULT, '[0-9.]*')) = LENGTH(TESTRESULT) THEN TESTRESULT
ELSE NULL
END,
TESTCOUNTS
FROM TABLE1
SQLFiddle here
If you only have a single string value that you can't convert to a number, and you want to set that to null, you can just use a case expression to supply the null:
insert into table2 (testresult, testcounts)
select case when testresult = 'Oracle' then null else to_number(testresult) end,
testcounts
from table1;
Demo:
create table table1 (testresult varchar2(10), testcounts number);
insert into table1
select '0.5', 1 from dual
union all select '0.6', 2 from dual
union all select '0.8', 3 from dual
union all select 'Oracle', 4 from dual;
create table table2 (testresult number, testcounts number);
insert into table2 (testresult, testcounts)
select case when testresult = 'Oracle' then null else to_number(testresult) end,
testcounts
from table1;
select * from table2;
TESTRESULT TESTCOUNTS
---------- ----------
.5 1
.6 2
.8 3
4
db<>fiddle
If you are using Oracle 12c Release 2 (or above) you could also just try to convert the string to a number and use the default ... on conversion error clause to substitute null for that, or any other, non-numeric value:
insert into table2 (testresult, testcounts)
select to_number(testresult default null on conversion error), testcounts
from table1;
select * from table2;
TESTRESULT TESTCOUNTS
---------- ----------
.5 1
.6 2
.8 3
4
In earlier versions you could do the same thing with a user-defined function that wraps the real to_number() call and returns null on error. Or a regex/translate check similar to what #BobJarvis has shown.
Having multiple rows with null would make the data hard to interpret though, so hopefully you only have this one fixed value...

Hive Getting only max occurrence of a value

I have hive table which has two cloumns,I want to get the value which occured max number of times
For example in my below table a value occured twice and c only once , here a value is dominat so I want only a value as shown in output
col1 col2
a a_value1
a a_value2
a c_value3
b b_value1
OUTPUT:
col1 col2
a a_value1
b b_value1
You are looking for what statisticians call the mode. A pretty simple method is to use aggregation with a window function:
select col1, col2
from (select col1, col2, count(*) as cnt,
row_number() over (partition by col1 order by count(*) desc) as seqnum
from t
) t
where seqnum = 1;
The above query will return one value for each col1, even if there are ties. If you want all the values in the event of ties, then use rank() or dense_rank().

Insert data listing columns with partitioning field in Hive

First of all let's setup a test environment:
CREATE TABLE IF NOT EXISTS source_table (
`col1` TIMESTAMP,
`col2` STRING
);
CREATE TABLE IF NOT EXISTS dest_table (
`col1` TIMESTAMP,
`col2` STRING,
`col3` STRING
)
PARTITIONED BY (day STRING)
STORED AS AVRO;
INSERT INTO TABLE source_table VALUES ('2018-03-21 17:08:04.401', 'test1'), ('2018-03-22 12:02:04.222', 'test2'), ('2018-03-22 07:21:04.111', 'test3');
How could I list the column names during insertion and put the partition value dynamically? The following command doesn't work:
INSERT INTO TABLE dest_table(col1, col2) PARTITION(day) SELECT col1, col2, date_format(col1, 'yyyy-MM-dd') FROM source_table;
By the way, without listing the columns of dest_table inside the INSERT INTO command, having two tables with the same columns number, everything works fine. What if my dest_table has more fields than the source_table?
Thank you for helping me.
P.S.
Ok, if I hardcode NULL this works. I leave the question opened because there might be better ways to achieve that.
INSERT INTO TABLE dest_table PARTITION(day) SELECT col1, col2, NULL, date_format(col1, 'yyyy-MM-dd') FROM source_table;
Anyway, this method is strictly bounded with columns order? In a real-life scenario, how could I handle lots of columns specifying a mapping, to avoid mistakes?
The syntax for inserting into a partitioned table when you want to list the specific columns is shown below. You don't need to put null on col3 since Hive will put a default value NULL since it is not in the column list during insert.
INSERT INTO TABLE dest_table PARTITION (day)(col1, col2, day)
SELECT col1, col2, date_format(col1, 'yyyy-MM-dd') FROM source_table;
Result:
col1 col2 col3 day
2018-03-22 12:02:04.222 test2 NULL 2018-03-22
2018-03-22 07:21:04.111 test3 NULL 2018-03-22
2018-03-21 17:08:04.401 test1 NULL 2018-03-21

How to use like operator outside SELECT

I have the following query which works fine.
SELECT A.*
FROM
(
SELECT 'ABC','DEF' FROM DUMMY
UNION ALL
SELECT col1,col2 FROM DUMMY
) A;
I want to apply the like key word on the first column returned by the nested query. I tried the following:
SELECT col1, col2
FROM
(
SELECT 'ABC','DEF' FROM DUMMY
UNION ALL
SELECT col1,col2 FROM DUMMY
) A
WHERE COL1 LIKE 'A%';
But the above query did not work. I'm getting the error: ORA-00904: "COL1": invalid identifier.
The fiddle link for the same is as follows: http://sqlfiddle.com/#!4/1c54b/7
Please could anybody guide me on how I can achieve the same?
Edit: I'm also trying to get the columns from the existing table as well.
Thanks in advance.
When using union, column names or aliases have to be matched on both tables.
You can call already defined column names and aliases.
Do not forget define column alias('ABC' as col1)
SELECT col1, col2
FROM
(
SELECT 'ABC' as col1,'DEF' as col2 FROM DUMMY
UNION ALL
SELECT 'GHI' as col1,'JKL' as col2 FROM DUMMY
) A
WHERE COL1 LIKE 'A%';
I think you are trying something like;
SELECT col1, col2
FROM
(
SELECT col1,col2 FROM DUMMY
UNION ALL
SELECT col1,col2 FROM DUMMY2
) A
WHERE col1 LIKE 'A%';
Have a look at another sample from me on here

Getting Error in query

update tablename set (col1,col2,col3) = (select col1,col2,col3 from tableName2 order by tablenmae2.col4) return error
Missing ). The query works fine if I remove the order by clause
ORDER BY is not allowed in a subquery within an UPDATE. So you get the error "Missing )" because the parser expects the subquery to end at the point that you have ORDER BY.
What is the ORDER BY intended to do?
What you probably have in mind is something like:
UPDATE TableName
SET (Col1, Col2, Col3) = (SELECT T2.Col1, T2.Col2, T2.Col3
FROM TableName2 AS T2
WHERE TableName.Col4 = T2.Col4
)
WHERE EXISTS(SELECT * FROM TableName2 AS T2 WHERE TableName.Col4 = T2.Col4);
This clumsy looking operation:
Grabs rows from TableName2 that match TableName on the value in Col4 and updates TableName with the values from the corresponding columns.
Ensures that only rows in TableName with a corresponding row in TableName2 are altered; if you drop the WHERE clause from the UPDATE, you replace the values in Col1, Col2, and Col3 with nulls if there are rows in TableName without a matching entry in TableName2.
Some DBMS also support an update-join notation to reduce the ghastliness of this notation.

Resources