how to use Posexplode function in hive - bash

I am using posexplode to split single to multiple records in hive.
Along with multiple records as output i need to generate sequence number for each row.
col1, col2, col3 and col4 are defined as string because rarely we get alpha data as well.
col1 | col2| col3 | col4
---------------------------
7 | 9 | A | 3
5 | 6 | 9
Seq | Col
----------
1 | 7
2 | 9
3 | A
4 | 3
1 | 5
2 | 6
3 | 9
I am using below mentioned query but I am getting error
-bash: syntax error near unexpected token (
My query is :
SELECT
seq, col
FROM
(SELECT array( col1, col2 , col3,col4) as arr_r FROM srctable ) arrayrec
LATERAL VIEW posexplode(arrayrec) EXPLODED_rec as seq, col
How can this be resolved
I am able to run successfully this query :
SELECT col FROM
(SELECT array( col1, col2 , col3,col4)
as arr_r FROM srctable ) arrayrec
LATERAL VIEW explode(arrayrec) EXPLODED_rec as col
Which produces below output
Col
-----
7
9
A
3
5
6
9
I have checked the link : How to get first n elements in an array in Hive

Try
SELECT Seq, col FROM
(SELECT array( col1, col2 , col3,col4)
as arr_r FROM srctable ) arrayrec
LATERAL VIEW posexplode(arrayrec.arr_r) EXPLODED_rec as Seq, col;
Also check your hive version. posexplode() is available as of Hive 0.13.0.

Related

Retrieving and displaying data from oracle database

First of all, I would like to THANK YOU for stopping by and spending your precious time for looking at my problem.
I have 2 different tables within an Oracle Database.
The first table holds metadata about the columns present in the other table. Think of the first (COL_TAB) table as a custom version of the ALL_TAB_COLS which comes by default with Oracle.
COL_TAB
----------------------------------------------
| TABLE_NAME | COL_NAME | COL_DESC |
----------------------------------------------
| TABLE1 | TAB1_COL_2 | TABLE 1 COLUMN 2 |
| TABLE1 | TAB1_COL_4 | TABLE 1 COLUMN 4 |
| TABLE1 | TAB1_COL_3 | TABLE 1 COLUMN 3 |
| TABLE1 | TAB1_COL_5 | |
| TABLE1 | TAB1_COL_1 | TABLE 1 COLUMN 1 |
----------------------------------------------
TABLE1
--------------------------------------------------------------------
| TAB1_COL_3 | TAB1_COL_1 | TAB1_COL_5 | TAB1_COL_2 |
--------------------------------------------------------------------
| TAB1_COL3_DATA1 | TAB1_COL1_DAT | TAB1_COL5_DAT2 | TAB1_COL2_DAT |
| TAB1_COL3_DATA2 | TAB1_COL1_DAT | TAB1_COL5_DAT1 | TAB1_COL2_DAT |
| TAB1_COL3_DATA3 | TAB1_COL1_DAT | TAB1_COL5_DAT3 | TAB1_COL2_DAT |
--------------------------------------------------------------------
I want to display the data as 2 different outputs:
FIRST OUTPUT:
------------------------------------------------------------------------------------------------
| TABLE 1 COLUMN 3 | TABLE 1 COLUMN 1 | TAB1_COL_5 | TABLE 1 COLUMN 2 | TABLE 1 COLUMN 4 |
------------------------------------------------------------------------------------------------
-> In case, if the COL_DESC is blank or null, then the COL_NAME needs to be displayed in the output.
-> "TABLE 1 COLUMN 3" AND "TABLE 1 COLUMN 1" always need to be displayed as 1st and 2nd column followed by the rest of the columns.
-> In case, if any column defined within the COL_TAB table isn't being used in TABLE1, then such a column needs to be displayed at the last column in the output,
for example, the TAB1_COL_4 isn't being used in TABLE1, so it is being displayed in the last.
SECOND OUTPUT:
------------------------------------------------------------------------------------------------
| TAB1_COL3_DATA1 | TAB1_COL1_DAT | TAB1_COL5_DAT2 | TAB1_COL2_DAT | |
| TAB1_COL3_DATA2 | TAB1_COL1_DAT | TAB1_COL5_DAT1 | TAB1_COL2_DAT | |
| TAB1_COL3_DATA3 | TAB1_COL1_DAT | TAB1_COL5_DAT3 | TAB1_COL2_DAT | |
------------------------------------------------------------------------------------------------
-> The order of the COLUMNS in the SECOND OUTPUT needs to be in sync with the order of columns as displayed within in the FIRST OUTPUT.
I did try the below query for displaying the FIRST OUTPUT, but it isn't working (I'm sure it's not correct):
SELECT NVL(COL_DESC, COL_NAME) AS COL_TEXT
FROM COL_TAB
WHERE TABLE_NAME = 'TABLE1'
PIVOT(MIN(COL_TEXT)
FOR COL_TEXT IN (SELECT COL_NAME FROM COL_TAB WHERE TABLE_NAME = 'TABLE1'));
In case, if anything isn't clear please do let me know. I would try my best to explain it again. Thanks again for your help in advance.
You can get the column description/names - in a deterministic order - with something like:
select coalesce(ct.col_desc, ct.col_name)
from col_tab ct
left join user_tab_columns utc
on utc.table_name = ct.table_name and utc.column_name = ct.col_name
where ct.table_name = 'TABLE1'
order by utc.column_id, ct.col_name;
COALESCE(CT.COL_
----------------
TABLE 1 COLUMN 3
TABLE 1 COLUMN 1
TAB1_COL_5
TABLE 1 COLUMN 2
TABLE 1 COLUMN 4
Pivoting those rows to columns would need to be done dynamically.
You can also generate a dynamic query to get the data in the same order in a similar way.
This uses SQL*Plus (or SQLcl, or SQL Developer) bind variable ref cursors to get the two outputs, and uses a table name defined within the block; but could easily be adapted to be a procedure that is passed the table name and have out parameters for the ref cursors:
var rc1 refcursor;
var rc2 refcursor;
declare
l_table_name varchar2(30) := 'TABLE1';
l_stmt varchar2(4000);
begin
select 'select '
|| listagg('''' || coalesce(ct.col_desc, ct.col_name) || '''', ',')
within group (order by utc.column_id, ct.col_name)
|| ' from dual'
into l_stmt
from col_tab ct
left join user_tab_columns utc
on utc.table_name = ct.table_name and utc.column_name = ct.col_name
where ct.table_name = l_table_name;
dbms_output.put_line(l_stmt);
open :rc1 for l_stmt;
select 'select '
|| listagg(coalesce(utc.column_name, 'null') || ' as ' || ct.col_name, ',')
within group (order by utc.column_id, ct.col_name)
|| ' from ' || l_table_name
into l_stmt
from col_tab ct
left join user_tab_columns utc
on utc.table_name = ct.table_name and utc.column_name = ct.col_name
where ct.table_name = l_table_name;
dbms_output.put_line(l_stmt);
open :rc2 for l_stmt;
end;
/
Running the block gets dbms_output of the statements just for debugging, but might be of interest:
select 'TABLE 1 COLUMN 3','TABLE 1 COLUMN 1','TAB1_COL_5','TABLE 1 COLUMN 2','TABLE 1 COLUMN 4' from dual
select TAB1_COL_3 as TAB1_COL_3,TAB1_COL_1 as TAB1_COL_1,TAB1_COL_5 as TAB1_COL_5,TAB1_COL_2 as TAB1_COL_2,null as TAB1_COL_4 from TABLE1
and then you can print the ref cursors (again, client-specific behaviour):
print rc1
'TABLE1COLUMN3' 'TABLE1COLUMN1' 'TAB1_COL_ 'TABLE1COLUMN2' 'TABLE1COLUMN4'
---------------- ---------------- ---------- ---------------- ----------------
TABLE 1 COLUMN 3 TABLE 1 COLUMN 1 TAB1_COL_5 TABLE 1 COLUMN 2 TABLE 1 COLUMN 4
print rc2
TAB1_COL_3 TAB1_COL_1 TAB1_COL_5 TAB1_COL_2 TAB1_COL_4
--------------- ------------- -------------- ------------- ----------
TAB1_COL3_DATA1 TAB1_COL1_DAT TAB1_COL5_DAT2 TAB1_COL2_DAT
TAB1_COL3_DATA2 TAB1_COL1_DAT TAB1_COL5_DAT1 TAB1_COL2_DAT
TAB1_COL3_DATA3 TAB1_COL1_DAT TAB1_COL5_DAT3 TAB1_COL2_DAT
Those 2 columns are common across all the tables.
In that case you can use a case expression to extend the ordering logic:
within group (order by case ct.col_name
when 'TAB1_COL_3' then 1
when 'TAB1_COL_1' then 2
else 3 end,
utc.column_id, ct.col_name)
which then gets:
'TABLE1COLUMN3' 'TABLE1COLUMN1' 'TAB1_COL_ 'TABLE1COLUMN2' 'TABLE1COLUMN4'
---------------- ---------------- ---------- ---------------- ----------------
TABLE 1 COLUMN 3 TABLE 1 COLUMN 1 TAB1_COL_5 TABLE 1 COLUMN 2 TABLE 1 COLUMN 4
TAB1_COL_3 TAB1_COL_1 TAB1_COL_5 TAB1_COL_2 TAB1_COL_4
--------------- ------------- -------------- ------------- ----------
TAB1_COL3_DATA1 TAB1_COL1_DAT TAB1_COL5_DAT2 TAB1_COL2_DAT
TAB1_COL3_DATA2 TAB1_COL1_DAT TAB1_COL5_DAT1 TAB1_COL2_DAT
TAB1_COL3_DATA3 TAB1_COL1_DAT TAB1_COL5_DAT3 TAB1_COL2_DAT
or possibly using the description instead of the name, depending on whether it's the name or description that stays the same (hard to guess from the example).
It would be really great if you could show up how pivoting can be done dynamically.
it isn't really needed here in the end, and is more complicated than the listagg I used above; but you could do something like;
select '
select * from (
select row_number()
over (order by case ct.col_name
when ''TAB1_COL_3'' then 1
when ''TAB1_COL_1'' then 2
else 3
end,
utc.column_id, ct.col_name) as pos,
coalesce(ct.col_desc, ct.col_name) as name
from col_tab ct
left join user_tab_columns utc
on utc.table_name = ct.table_name and utc.column_name = ct.col_name
where ct.table_name = :tab
)
pivot (max(name) as col for (pos) in ('
|| listagg(level, ',') within group (order by level)
|| '))'
into l_stmt
from dual
connect by level <= (select count(*) from col_tab where table_name = l_table_name);
dbms_output.put_line(l_stmt);
open :rc1 for l_stmt using l_table_name;
which gets output showing the generated dynamic query as:
select * from (
select row_number()
over (order by case ct.col_name
when 'TAB1_COL_3' then 1
when 'TAB1_COL_1' then 2
else 3
end,
utc.column_id, ct.col_name) as pos,
coalesce(ct.col_desc, ct.col_name) as name
from col_tab ct
left join user_tab_columns utc
on utc.table_name = ct.table_name and utc.column_name = ct.col_name
where ct.table_name = :tab
)
pivot (max(name) as col for (pos) in (1,2,3,4,5))
and result set as:
1_COL 2_COL 3_COL 4_COL 5_COL
---------------- ---------------- ---------------- ---------------- ----------------
TABLE 1 COLUMN 3 TABLE 1 COLUMN 1 TAB1_COL_5 TABLE 1 COLUMN 2 TABLE 1 COLUMN 4
You could use the column names for the pivot instead of the pos, it would just make it even harder to read I think, as you'd need to include quotes around them.

Oracle query by column1 where column2 is the same

I have a table like this in Oracle 9i DB:
+------+------+
| Col1 | Col2 |
+------+------+
| 1 | a |
| 2 | a |
| 3 | a |
| 4 | b |
| 5 | b |
+------+------+
Col1 is the primary key, Col2 is indexed.
I input col1 as condition for my query and I want to get col1 where col2 is the same as my input.
For example I query for 1 and the result should be 1,2,3.
I know I can use self join for this, I would like to know if there is a better way to do this.
I'd call this a semi-join: does it satisfy your 'no self joins' requirement?:
SELECT *
FROM YourTable
WHERE Col2 IN ( SELECT t2.Col2
FROM YourTable t2
WHERE t2.Col1 = 1 );
I'd be inclined to avoid the t2 range variable like this:
WITH YourTableSearched
AS ( SELECT Col2
FROM YourTable
WHERE Col1 = 1 )
SELECT *
FROM YourTable
WHERE Col2 IN ( SELECT Col2
FROM YourTableSearched );
but TNH I would probably do this:
WITH YourTableSearched
AS ( SELECT Col2
FROM YourTable
WHERE Col1 = 1 )
SELECT *
FROM YourTable
NATURAL JOIN YourTableSearched;
It's possible. Whether it's better (i.e. more performant) than using a self-join, particularly if there is an index on col1, col2, is anyone's guess.
Assuming col1 is unique, you could do:
SELECT col1
FROM (SELECT col1,
col2,
MAX(CASE WHEN col1 = :p_col1_value THEN col2 END) OVER () col2_comparison
FROM your_table)
WHERE col2 = col2_comparison;
And with :p_col1_value = 1:
COL1
----------
1
2
3
And with :p_col1_value = 5:
COL1
----------
4
5

Hive - Split delimited columns over multiple rows, select based on position

I'm Looking for a way to split the column based on comma delimited data. Below is my dataset
id col1 col2
1 5,6 7,8
I want to get the result
id col1 col2
1 5 7
1 6 8
The position of the index should match because I need to fetch results accordingly.
I tried the below query but it returns the cartesian product.
Query:
SELECT col3, col4
FROM test ext
lateral VIEW explode(split(col1,'\002')) col1 AS col3
lateral VIEW explode(split(col2,'\002')) col2 AS col4
Result:
id col1 col2
1 5 7
1 5 8
1 6 7
1 6 8
You can use posexplode() to create position index columns for your split arrays. Then, select only those rows where the position indices are equal.
SELECT id, col3, col4
FROM test
lateral VIEW posexplode(split(col1,'\002')) col1 AS pos3, col3
lateral VIEW posexplode(split(col2,'\002')) col2 AS pos4, col4
WHERE pos3 = pos4;
Output:
id col3 col4
1 5 7
1 6 8
Reference: Hive language manual - posexplode()

How to split two columns data in to two rows apart from using union? [duplicate]

I have 50 column in a table and it returns only one row and I want that one row with 50 column to be displayed in 50 rows and one column.
Can any one suggest me the Oracle query for it?
You can use UNPIVOT for one row like this to get only column with values
SELECT colvalue
FROM
(
SELECT *
FROM Table1
UNPIVOT INCLUDE NULLS
(
colvalue FOR cols IN (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, ... col50)
)
);
Sample output:
| COLVALUE |
------------
| 1 |
| 2 |
| (null) |
|..........|
If you need column with column names from your pivoted table just ditch the outer select
SELECT *
FROM Table1
UNPIVOT INCLUDE NULLS
(
colvalue FOR cols IN (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, ... col50)
);
Sample output:
| COLS | COLVALUE |
--------------------
| COL1 | 1 |
| COL2 | 2 |
| COL3 | (null) |
| ..... |......... |
Here is SQLFiddle demo
Be prepared for a lot of typing :) Oracle has UNPIVOT functionality but it wants at least two columns in the result, so it won't work for your situation.
First off, you'll need a counter from 1 to 50. You can query one like this:
SELECT LEVEL as Counter FROM DUAL CONNECT BY LEVEL <= 50
If you execute this query you'll get the numbers 1-50 as your result. With that as a basis, here's the full(ish) query:
WITH Cols AS (
SELECT LEVEL as Counter
FROM DUAL
CONNECT BY LEVEL <= 50
)
SELECT
CASE Cols.Counter
WHEN 1 THEN Col1
WHEN 2 THEN Col2
WHEN 3 THEN Col3
. . .
WHEN 50 THEN Col50
END AS myColumn
FROM myTable
CROSS JOIN Cols
ORDER BY Cols.Counter
Note that all of the columns must be the same data type, so if you have a mixture of character, number and date you'll need to convert them all to character.
Note that this query assumes one row in the table, as mentioned in the question. If there's more than one row you should end with ORDER BY a-column-that-identifies-the-row, Cols.Counter.

Oracle query to convert multiple column into one column

I have 50 column in a table and it returns only one row and I want that one row with 50 column to be displayed in 50 rows and one column.
Can any one suggest me the Oracle query for it?
You can use UNPIVOT for one row like this to get only column with values
SELECT colvalue
FROM
(
SELECT *
FROM Table1
UNPIVOT INCLUDE NULLS
(
colvalue FOR cols IN (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, ... col50)
)
);
Sample output:
| COLVALUE |
------------
| 1 |
| 2 |
| (null) |
|..........|
If you need column with column names from your pivoted table just ditch the outer select
SELECT *
FROM Table1
UNPIVOT INCLUDE NULLS
(
colvalue FOR cols IN (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, ... col50)
);
Sample output:
| COLS | COLVALUE |
--------------------
| COL1 | 1 |
| COL2 | 2 |
| COL3 | (null) |
| ..... |......... |
Here is SQLFiddle demo
Be prepared for a lot of typing :) Oracle has UNPIVOT functionality but it wants at least two columns in the result, so it won't work for your situation.
First off, you'll need a counter from 1 to 50. You can query one like this:
SELECT LEVEL as Counter FROM DUAL CONNECT BY LEVEL <= 50
If you execute this query you'll get the numbers 1-50 as your result. With that as a basis, here's the full(ish) query:
WITH Cols AS (
SELECT LEVEL as Counter
FROM DUAL
CONNECT BY LEVEL <= 50
)
SELECT
CASE Cols.Counter
WHEN 1 THEN Col1
WHEN 2 THEN Col2
WHEN 3 THEN Col3
. . .
WHEN 50 THEN Col50
END AS myColumn
FROM myTable
CROSS JOIN Cols
ORDER BY Cols.Counter
Note that all of the columns must be the same data type, so if you have a mixture of character, number and date you'll need to convert them all to character.
Note that this query assumes one row in the table, as mentioned in the question. If there's more than one row you should end with ORDER BY a-column-that-identifies-the-row, Cols.Counter.

Resources