Explode Hive Map data object into long format - hadoop

I have a map data type in a table with fairly large number of key, values (10-30). When I explode the key, values, I get below:
SELECT id, key,value
FROM tbl1
lateral view explode(map_field) feature_cols
Results:
id, key1, value1
id, key2, value2
id, key3, value3
However, I would like to see:
id, key1, key2, key3
1, value1, valu2, value3
Is there any command that either produces my desired format, or is there any command to convert exploded output to the long format I desire?

We need to transpose Columns into Rows after lateral view explode. You can write query like as stated below.
Select
id,
Case when key=key1 then value1 as key1,
Case when key=key2 then value2 as key2,
Case when key=key3 then value3 as key3
From
(SELECT id, key,value FROM tbl1 lateral view explode(map_field) feature_cols) temp

Related

Oracle: Split data from one column into two columns based on data

I have a strange situation.
a) I have to create a table B from Table A plus some custom columns. For example.
Table B should have few columns of table A and some additional columns(These are static values like NULL,static string and system timestmaps).
b) One column in Table A needs to be split into Two columns in Table B
example: Data in Column X is [A1234, B5678, 0000, 1111]
Table B should have two columns AlphaColumn[A1234, B5678], NumberishColumn[0000, 1111]
The difference is : First letter of the data can be alphabet. Thats the only distinguishing criteria
How can I do this in one query?
You can use a CASE expression and simple string functions:
INSERT INTO table_b(firstname, lastname, alphacolumn, numberishcolumn)
SELECT firstname,
lastname,
CASE
WHEN SUBSTR(employeeid, 1, 1) BETWEEN '0' AND '9'
THEN NULL
ELSE employeeid
END,
CASE
WHEN SUBSTR(employeeid, 1, 1) BETWEEN '0' AND '9'
THEN employeeid
ELSE NULL
END
FROM table_a;
Or, you could create table_b as a VIEW instead of another table.

Hive parse and edit array to struct field

I've a requirement in hive complex data structure which I'm new to. I've tried few things which didn't work out. I'd like to know if there is a solution or I'm looking at a dead end.
Requirement :
Table1 and Table2 are of same create syntax. I want to select all columns from table1 and insert it into table2, where few column values will be modified. For struct field also, I can make it work using named_struct.
But if table1 has array> type, then I'm not sure how to make it work.
eg.,
CREATE TABLE IF NOT EXISTS table1 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
CREATE TABLE IF NOT EXISTS table2 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
hive> select * from table1 ;
OK
1 [{"x":1,"y":2,"z":3},{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
2 [{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
How can I update a struct field in array while inserting. Let's say if structField y is 5, then I want it to be inserted as 0.
For complex type struct you can use Brickhouse UDF.Download the jar and add it in your script.
add jar hdfs://path_where_jars_are_downloaded/brickhouse-0.6.0.jar
Create a collect function.
create temporary function collect_arrayofstructs as 'brickhouse.udf.collect.CollectUDAF';
Query:Replace the y value with 0
select ID, collect_arrayofstructs(
named_struct(
"x", x,
"y", 0,
"z", z,
)) as XYZ
from table1;

drop down list formula in Quickbase

I have a simple question : Is it possible to fill a drop down list with two value from a table?
I have a table with field X and field Y and I want my drop down in my form to show :
Value1 YField - Value1 XField
Value2 YField - Value2 XField
Value3 YField - Value3 XField
...
Or I have not choice but to add another drop down to select my value from and put a Text (formula) field under it using its value to build what I want?
I would like to avoid overloading the form if possible.
Thank you!
So I have solve my problem :
What I did is add a field in my table with the concatenate value
YField - XField
So in my drop down list I link a reference to that field.

Output a tuple of one record and all other records Hadoop

I need to emit a tuple from Reducer that contains one of the records with each of other records. For example, if Reducer gets:
key1, value1
key1, value2
key1, value3
key1, value4
key2, value5
key2, value6
key2, value7
I need to select one of the values for each of the keys somehow and emit it with all the other values, concretely, let's say that value2 is selected for key1 and value6 is selected form key2. I need to emit from the reducer the following:
key1, value1, value2
key1, value2, value2
key1, value3, value2
key1, value4, value2
key2, value5, value6
key2, value6, value6
key2, value7, value6
As you can see I need to go through the values twice, once to get the value2 (for example) and once to emit it with all the other records.
As I am aware of, you cannot iterate through values twice, and my assumption that the dataset is too large to fit the memory, so I cannot buffer the values.
My other idea is to emit each record twice, and have two different keys, for example key1a and key1b, so I could iterate once through values to select, and iterate second time to emit records. Is there any other way to do it?
I don't want to split this into multiple jobs, since this is only one part of a much larger process.

Reading CSV with Column header and loading it in hive tables

I have csv file with column header inside the file.
e.g.
Column1 Column2 Column3
value1 value2 value 3
value1 value2 value 3
value1 value2 value 3
value1 value2 value 3
Now i want to create hive table using this header inside and then load the entire table without the header line into the table.
Can anyone please suggest what approach should be followed in this case.
You can specify
tblproperties ("skip.header.line.count"="1");
see this SO question (Hive External table-CSV File- Header row)
You should remove the header line before loading data into HDFS, no other options here.

Resources