Reduce Rows to a map struct in hsql - hadoop

New to hadoop/ hive and need to reduce a set of rows down into a map datatype as follows;
From
Col1
Col2
Jeff
Smith
Steve
Brown
To
Col1
Col2
1
{"Jeff":"Smith"}, { "Steve" : "Brown}

Is this work for you?
with myTable as (
select 'Jeff' as Col1, 'Smith' as Col2 union
select 'Steve' as Col1, 'Brown' as Col2
) -- test data
select str_to_map(concat_ws(",",(collect_list(concat_ws(":",Col1, Col2)))),",",":") as Col2
from myTable
;
+-----------------------------------+--+
| col2 |
+-----------------------------------+--+
| {"Jeff":"Smith","Steve":"Brown"} |
+-----------------------------------+--+

Related

Oracle: split function result into multiple columns

I have a package in oracle. In the package i have a procedure which performs an (insert into ..select.. ) statement
which is like this:
insert into some_table(col1 , col2 , col3, col4)
select col1 , col2, my_func(col3) as new_col3 , col4
from some_other_table
my_func(col3) does some logic to return a value.
now i need to to return two values instead of one, using the same logic.
i can simply write another function to do the same logic and return the second value, but that would be expensive because the function selects from a large history table.
i can't do a join with the history table because the function doesn't perform a simple select.
is there a way to get two columns by calling this function only once?
Create an OBJECT type with two attributes and return that from your function. Something like:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TYPE my_func_type IS OBJECT(
value1 NUMBER,
value2 VARCHAR2(4000)
);
/
CREATE FUNCTION my_func
RETURN my_func_type
IS
value my_func_type;
BEGIN
value := my_func_type( 42, 'The Meaning of Life, The Universe and Everything' );
RETURN value;
END;
/
CREATE TABLE table1 (col1, col2, col5 ) AS
SELECT 1, 2, 5 FROM DUAL
/
Query 1:
SELECT col1,
col2,
t.my_func_value.value1 AS col3,
t.my_func_value.value2 AS col4,
col5
FROM (
SELECT col1,
col2,
my_func() AS my_func_value,
col5
FROM table1
) t
Results:
| COL1 | COL2 | COL3 | COL4 | COL5 |
|------|------|------|--------------------------------------------------|------|
| 1 | 2 | 42 | The Meaning of Life, The Universe and Everything | 5 |

delete data from Table1 that doesn't exist in Table2

I have 2 different tables and I want to delete records from table1 which does not exist in Tables2
Table1:
select col1 from Table1
Table2:
select
concat('A_',col1)
from
Table2
where
Col2 = '748'
and Col3 = 'D'
and Col4 = 'Account'
now I want to delete the difference from Table1...
This can be done using the minus operation, and an insert into statement.
insert into table3(col) (
select col1 from Table1
minus
select
concat('A_',col1)
from
Table2
where
Col2 = '748'
and Col3 = 'D'
and Col4 = 'Account'
)
Records can then be deleted from table1 using a delete statement like
delete from table1
where col1 in (
select col1 from Table1
minus
select
concat('A_',col1)
from
Table2
where
Col2 = '748'
and Col3 = 'D'
and Col4 = 'Account'
)
delete from table1 t1
where not exists ( select * from table2 where col2 || col3 || col4 = t1.col1 );
This will work EXCEPT for the following situation; you need to explain what you want in that case. The DELETE statement can be modified to accommodate.
If t1.col1 is NULL, it will be deleted even if there are rows in table2 where col2, col3 and col4 are all NULL. Is that situation possible (where t1.col1 and col2, col3, col4 in table2 are all NULL? In that case, should the row in t1 be kept rather than deleted?

Oracle query by column1 where column2 is the same

I have a table like this in Oracle 9i DB:
+------+------+
| Col1 | Col2 |
+------+------+
| 1 | a |
| 2 | a |
| 3 | a |
| 4 | b |
| 5 | b |
+------+------+
Col1 is the primary key, Col2 is indexed.
I input col1 as condition for my query and I want to get col1 where col2 is the same as my input.
For example I query for 1 and the result should be 1,2,3.
I know I can use self join for this, I would like to know if there is a better way to do this.
I'd call this a semi-join: does it satisfy your 'no self joins' requirement?:
SELECT *
FROM YourTable
WHERE Col2 IN ( SELECT t2.Col2
FROM YourTable t2
WHERE t2.Col1 = 1 );
I'd be inclined to avoid the t2 range variable like this:
WITH YourTableSearched
AS ( SELECT Col2
FROM YourTable
WHERE Col1 = 1 )
SELECT *
FROM YourTable
WHERE Col2 IN ( SELECT Col2
FROM YourTableSearched );
but TNH I would probably do this:
WITH YourTableSearched
AS ( SELECT Col2
FROM YourTable
WHERE Col1 = 1 )
SELECT *
FROM YourTable
NATURAL JOIN YourTableSearched;
It's possible. Whether it's better (i.e. more performant) than using a self-join, particularly if there is an index on col1, col2, is anyone's guess.
Assuming col1 is unique, you could do:
SELECT col1
FROM (SELECT col1,
col2,
MAX(CASE WHEN col1 = :p_col1_value THEN col2 END) OVER () col2_comparison
FROM your_table)
WHERE col2 = col2_comparison;
And with :p_col1_value = 1:
COL1
----------
1
2
3
And with :p_col1_value = 5:
COL1
----------
4
5

Select rows with same id without nulls and one row with Null if multiple nulls are present

I want to get only rows having a value and some other value than NULL for a particular username column.
If both rows have null for that particular username then it should show Null only once in output. If there are more than two rows for same username with null and some other value then display the value only not null.
Below is example sample and output. How it can be done using sql query?
Table:
Col1 | Col2
-------------------------
a | abc
a | bc
b | null
b | null
c | der
c | null
Output:
Col1 | Col2
-------------------------
a | abc
a | bc
b | null
c | der
Outlining the idea, there might be some syntax errors, don't have access to oracle.
SELECT * FROM
( SELECT DISTINCT USERNAME FROM <TABLE> ) USERS
LEFT OUTER JOIN
( SELECT USERNAME, COL2 FROM <TABLE> WHERE COL2 IS NOT NULL) USERS_COL2
ON
USRES.USERNAME = USERS_COL2.USERNAME
you use listagg () or stragg ()
drop table test;
create table test (
col1 varchar2(10),
col2 varchar2(10)
);
insert into test values ( 'a','abc');
insert into test values ( 'a','abc');
insert into test values ( 'b',null);
insert into test values ( 'b',null);
insert into test values ( 'c','der');
insert into test values ( 'c',null);
commit;
select col1,
listagg (col2,',') within group (order by col1) col2
from test
group by col1;
COL1 COL2
---------- -----------
a abc,abc
b
c der
select col1, stragg (col2)
from test
group by col1;
select col1, col2, count(*)
from omc.test
group by col1,col2;
you can remove count(*)

Need to transform the rows into columns for the similar ID's in oracle

I need to transform the rows into columns for the similar ID's in oracle
e.g.
The following is the result I will get if i query my database
Col1 Col2 Col3
---- ---- ----
1 ABC Yes
1 XYZ NO
2 ABC NO
I need to transform this into
Col1 Col2 Col3 Col4 Col5
---- ---- ---- ---- ----
1 ABC Yes XYZ No
2 ABC NO NULL NULL
Someone please help me in solving this issue
Thanks,
Siv
Based on AskTom:
select Col1,
max( decode( rn, 1, Col2 ) ) Col_1,
max( decode( rn, 1, Col3 ) ) Col_2,
max( decode( rn, 2, Col2 ) ) Col_3,
max( decode( rn, 2, Col3 ) ) Col_4
from (
select Col1,
Col2,
Col3,
row_number() over (partition by Col1 order by Col2 desc nulls last) rn
from MyTable
)
group by Col1;
I don't have access to an Oracle db to test it but I think that will work. If there could be more than two records per ID then you could just add more rows to the select cause with the corresponding row number.
One solution is to use the 10g MODEL clause:
SQL> select col1
2 , col2
3 , col3
4 , col4
5 , col5
6 from t23
7 model
8 return updated rows
9 partition by ( col1 )
10 dimension by ( row_number() over ( partition by col1
11 order by col2 desc nulls last) rnk
12 )
13 measures (col2, col3, lpad(' ',4) col4, lpad(' ',4) col5)
14 rules upsert
15 (
16 col2 [0] = col2 [1]
17 , col3 [0] = col3 [1]
18 , col4 [0] = col2 [2]
19 , col5 [0] = col3 [2]
20 )
21 /
COL1 COL2 COL3 COL4 COL5
---------- ---- ---- ---- ----
1 ABC Yes ABC NO
2 XYZ NO
SQL>
It is an unfortunate truth about such solutions that we need to specify the number of columns in the query. That is, in regular SQL there is no mechanism for determining that the table contains three rows where COL1 = 1 so we need seven columns, which is not unreasonable. For situations in which the number of pivot values is unknown at the time of coding there is always dynamic sql.

Resources