Hive collect_list() does not collect NULL values

Hive collect_list() does not collect NULL values - hadoop

I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ?
SELECT col1, col2, collect_list(col3) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2;
Actual col3 values
0.9
NULL
NULL
0.7
0.6
Resulting col3 values
[0.9, 0.7, 0.6]
I was hoping that there is a hive solution that looks like this [0.9, NULL, NULL, 0.7, 0.6] after applying the collect_list.

This function works like this, but I've found the following workaround.
Add a case when statement to your query to check and keep NULLs.
SELECT col1,
col2,
collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2
Now, because you had a string element ('NULL') the whole result set is an array of strings.
At the end just convert the array of strings to an array of double values.

Note: If your column is STRING it won't be having a NULL value even though your external file does not have any data for that column
you can a where condition with validation check like "col3 is NULL and
col3 is not NULL"

Related

Oracle Select unique on multiple column

How can I achieve this to Select to one row only dynamically since
the objective is to get the uniqueness even on multiple columns

select distinct
coalesce(least(ColA, ColB),cola,colb) A1, greatest(ColA, ColB) B1
from T

The best solution is to use UNION
select colA from your_table
union
select colB from your_table;
Update:
If you want to find the duplicate then use the EXISTS as follows:
SELECT COLA, COLB FROM YOUR_TABLE T1
WHERE EXISTS (SELECT 1 FROM YOUR_tABLE T2
WHERE T2.COLA = T1.COLB OR T2.COLB = T1.COLA)

If I correctly understand words: objective is to get the uniqueness even on multiple columns, number of columns may vary, table can contain 2, 3 or more columns.
In this case you have several options, for example you can unpivot values, sort, pivot and take unique values. The exact code depends on Oracle version.
Second option is listagg(), but it has limited length and you should use separators not appearing in values.
Another option is to compare data as collections. Here I used dbms_debug_vc2coll which is simple table of varchars. Multiset except does main job:
with t as (select rownum rn, col1, col2, col3,
sys.dbms_debug_vc2coll(col1, col2, col3) as coll
from test )
select col1, col2, col3 from t a
where not exists (
select 1 from t b where b.rn < a.rn and a.coll multiset except b.coll is empty )
dbfiddle with 3-column table, nulls and different test cases

Fetch name based on comma-separated ids

I have two tables, customers and products.
products:
productid name
1 pro1
2 pro2
3 pro3
customers:
id name productid
1 cust1 1,2
2 cust2 1,3
3 cust3
i want following result in select statement,
id name productid
1 cust1 pro1,pro2
2 cust2 pro1,pro3
3 cust3
i have 300+ records in both tables, i am beginner to back end coding, any help?

Definitely a poor database design but the bad thing is that you have to live with that. Here is a solution which I created using recursive query. I don't see the use of product table though since your requirement has nothing to do with product table.
with
--Expanding each row seperated by comma
tab(col1,col2,col3) as (
Select distinct c.id,c.prdname,regexp_substr(c.productid,'[^,]',1,level)
from customers c
connect by regexp_substr(c.productid,'[^,]',1,level) is not null
order by 1),
--Appending `Pro` to each value
tab_final as ( Select col1,col2, case when col3 is not null
then 'pro'||col3
else col3
end col3
from tab )
--Displaying result as expected
SELECT
col1,
col2,
LISTAGG(col3,',') WITHIN GROUP( ORDER BY col1,col2 ) col3
FROM
tab_final
GROUP BY
col1,
col2
Demo:
--Preparing dataset
With
customers(id,prdname,productid) as ( Select 1, 'cust1', '1,2' from dual
UNION ALL
Select 2, 'cust2','1,3' from dual
UNION ALL
Select 3, 'cust3','' from dual),
--Expanding each row seperated by comma
tab(col1,col2,col3) as (
Select distinct c.id,c.prdname,regexp_substr(c.productid,'[^,]',1,level)
from customers c
connect by regexp_substr(c.productid,'[^,]',1,level) is not null
order by 1),
--Appending `Pro` to each value
tab_final as ( Select col1,col2, case when col3 is not null
then 'pro'||col3
else col3
end col3
from tab )
--Displaying result as expected
SELECT
col1,
col2,
LISTAGG(col3,',') WITHIN GROUP( ORDER BY col1,col2 ) col3
FROM
tab_final
GROUP BY
col1,
col2
PS: While using don't forget to put your actual table columns as in my example it may vary.

Hive Getting only max occurrence of a value

I have hive table which has two cloumns,I want to get the value which occured max number of times
For example in my below table a value occured twice and c only once , here a value is dominat so I want only a value as shown in output
col1 col2
a a_value1
a a_value2
a c_value3
b b_value1
OUTPUT:
col1 col2
a a_value1
b b_value1

You are looking for what statisticians call the mode. A pretty simple method is to use aggregation with a window function:
select col1, col2
from (select col1, col2, count(*) as cnt,
row_number() over (partition by col1 order by count(*) desc) as seqnum
from t
) t
where seqnum = 1;
The above query will return one value for each col1, even if there are ties. If you want all the values in the event of ties, then use rank() or dense_rank().

Distinct in XMLAGG function in oracle sql

problem in avoiding duplicates using XMLAGG function
A table which is having multiple records. where each record has one column contains repetitive date.
Using XMLAGG function in the following sql
select col1, col2, XMLAGG(XMLELEMENT(E, colname || ',')).EXTRACT('//text()')
from table
group by col1, col2
i get the following output
col1 col2 col3
hareesh apartment residential, commercial, residential, residential
But i need the following output as
col3 : residential, commercial.
Anyone help me

Try using a subquery to remove duplicates:
SELECT col1, col2, XMLAGG(XMLELEMENT(E, colname || ',')).EXTRACT('//text()')
FROM (SELECT DISTINCT col1, col2, colname FROM table)
GROUP BY col1, col2

Insert data into multiple table from one table using a procedure [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have one parent table which consists of around 150 columns. I need to get the records from the parent table and insert them into 11 different child tables, which have the column names and data types.

Oracle has a very convenient INSERT ALL command that can help in this case.
The syntax of unconditional version is:
INSERT ALL
INTO child1( col1, col2, col3 ) VALUES( col1, col2, col3 )
INTO child2 VALUES( col1, col2, col3 )
INTO child3 ( col1, col2, col3 )
INTO child4
SELECT col1, col2, col3
FROM parent
-- WHERE some conditions;
A link to a demo: --> http://sqlfiddle.com/#!4/3eb62/1
The above command retrieves all rows from the parent table using SELECT ... FROM ... (at the bottom), then, for each retrieved record, it executes all INSERT ... statements.
If the SELECT clause has also a WHERE conditions clause, then only rows that meet these conditions will be inserted.
The INSERT part of the query in the example could have various forms:
A full form with explicitely definied columns of source and destination tables:
INTO dest_table( destcol1, ... destcolN ) VALUES (sourcecol1, ..., sourcecolN)
A shortened form where only columns from source table are given
INTO dest_table VALUES (sourcecol1, ..., sourcecolN)
Another shortened form where only columns from destination table are given
INTO dest_table( destcol1, ... destcolN )
or the simplest:
INTO dest_table
INSERT ALL has also a conditional version:
INSERT ALL
WHEN 1=1 THEN INTO child1( col1, col2, col3 ) VALUES( col1, col2, col3 )
WHEN col1 <> 2 THEN INTO child2 VALUES( col1, col2, col3 )
WHEN col3 < 3 THEN INTO child3 ( col1, col2, col3 )
WHEN col2 = 'rec 3' THEN INTO child4
SELECT col1, col2, col3
FROM parent;
A link to a demo: ---> http://sqlfiddle.com/#!4/e7da3/1
This version inserts rows only when a condition specified after WHEN clause is meet.For each selected rows always all conditions are evaluated.
There is also another conditional form: INSERT FIRST
INSERT FIRST
WHEN col1 >= 4 THEN INTO child1( col1, col2, col3 ) VALUES( col1, col2, col3 )
WHEN col1 >= 3 THEN INTO child2 VALUES( col1, col2, col3 )
WHEN col1 >= 2 THEN INTO child3 ( col1, col2, col3 )
WHEN col1 >= 1 THEN INTO child4
SELECT col1, col2, col3
FROM parent;
A link to a demo: http://sqlfiddle.com/#!4/a421e/1
Here, for each source row, Oracle evaluates conditions from top to bottom, and when some condition is true, then executes only this one INSERT statement, and skips remaining inserts.
------- EDIT -------
An example how to do it in a procedural way:
CREATE OR REPLACE PROCEDURE name
AS
BEGIN
INSERT ALL
INTO child1( col1, col2, col3 ) VALUES( col1, col2, col3 )
INTO child2 VALUES( col1, col2, col3 )
INTO child3 ( col1, col2, col3 )
INTO child4
SELECT col1, col2, col3
FROM parent ;
-- if commit is required, place it here
-- COMMIT;
END;
/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hive collect_list() does not collect NULL values - hadoop

Note: If your column is STRING it won't be having a NULL value even though your external file does not have any data for that column you can a where condition with validation check like "col3 is NULL and col3 is not NULL"

Related

Oracle Select unique on multiple column

Fetch name based on comma-separated ids

Hive Getting only max occurrence of a value

Distinct in XMLAGG function in oracle sql

Insert data into multiple table from one table using a procedure [closed]

Categories

Resources