Complex Data Type issue in Hive - hadoop

I am trying to create a table in Hive using complex data types.
One of my column is an array of strings and the other is an array of maps.
After I have loaded the data into the table, when I try to query the data, I don't get the desired result in the third column which is an array of maps.
The following is my Hive query:
Step 1:
create table transactiondb2(order_id int,billtype array<string>,paymenttype array<map<string,int>>)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '#';
Step 2:
load data local inpath '/home/xyz/data.txt' overwrite into table transactiondb2;
Step 3:
select * from transactiondb2;
And my output is as follows:
OK
1 ["A","B"] [{"credit":null,"10":null},{"cash":null,"25":null},{"emi":null,"30":null}]
2 ["C","D"] [{"credit":null,"157":null},{"cash":null,"45":null},{"emi":null,"35":null}]
3 ["X","Y"] [{"credit":null,"25":null},{"cash":null,"38":null},{"emi":null,"50":null}]
4 ["E","F"] [{"credit":null,"89":null},{"cash":null,"105":null},{"emi":null,"85":null}]
5 ["Z","A"] [{"credit":null,"7":null},{"cash":null,"79":null},{"emi":null,"105":null}]
6 ["D","Y"] [{"credit":null,"30":null},{"cash":null,"100":null},{"emi":null,"101":null}]
7 ["A","Z"] [{"credit":null,"50":null},{"cash":null,"9":null},{"emi":null,"85":null}]
8 ["B","Z"] [{"credit":null,"70":null},{"cash":null,"38":null},{"emi":null,"90":null}]
And my input file data is as follows:
1 A|B credit#10|cash#25|emi#30
2 C|D credit#157|cash#45|emi#35
3 X|Y credit#25|cash#38|emi#50
4 E|F credit#89|cash#105|emi#85
5 Z|A credit#7|cash#79|emi#105
6 D|Y credit#30|cash#100|emi#101
7 A|Z credit#50|cash#9|emi#85
8 B|Z credit#70|cash#38|emi#90

I solved it myself.
We need not mention an array of maps explicitly by default it takes values from one map after the other

Create the table as shown below and load the data, then you will get the desired output.
create table complex(id int,bill array<string>,paytype map<string,int>)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '#';

Related

Parsing and replacing columns in a file

INPUT:
I have an input file in which the first 10 characters of each line represent 2 fields - first 4 characters(field A) and the next 6 characters(field B). The file contains about 400K records.
I have a Mapping table which contains about 25M rows and looks like
Field A Field B SomeStringA SomeStringB
1628 836791 1234 783901
afgd ahutwe 1278 ashjkl
--------------------------------
--------------------------------
and so on.
Field A and Field B combined is the Primary Key for the table.
PROBLEM STATEMENT:
Replace:
Field A by SomeStringA
Field B by SomeStringB
in the input file. SomeStringA and SomeStringB are exactly the same width as Field A and B respectively.
Here's what I'm trying:
Approach 1:
Sort and Dump the mapping table into a file
spool dump_file
select * from mapping order by fieldA, fieldB;
spool off
exit;
Strip the input file and get the first 10 chars
cut -c1-10 input_file > input_file_stripped
Do something to find the lines that begin with the same string and then when they do - replace in the input_file with field 10-20 in the spooled file. - here's where I'm stuck.
Approach 2:
Take the input file and get the first 10 chars
cut -c1-10 input_file >input_file_stripped
Use sqlldr and load into a temp_table.
Select matching records from the mapping table and spool
spool matching_records
select m.* from mapping m, temp t where m.fieldA=t.fieldA and m.fieldB=t.fieldB;
spool off
exit;
Now how do I replace these in the original file ?
Given the high number of records to process, how can this be done and done fast ?
Notes:
Not a one time activity, has to be done daily so scale is important
The mapping table is unlikely to change
I've Python, shell script and Oracle database available. Any combination of these is fine.

Extracting positions 1 to 11 and position 13 from a 15-digit number/string in Oracle

I have a 15-digit number that needs to be stored in an Oracle table either as a number or as a text.
Will I be able to select records from the table based on the field ("Positions 1 thru 11" + "Position 13")?
Example: If the data is 123456789012345, I need to select rows from the table to extract all rows that contain value "123456789013" in that field.
Can an index be created in Oracle to ensure the above query performs as good as a normal select query on the entire data field.
If you are storing the column in text then something like this should solve your problem. Use the first if you need to query separately otherwise if you want to query on first eleven and thirteenth use the last example.
create index ix_firsteleven on TABLE (substr(COL, 1, 11))
create index ix_thrirteenth on TABLE (substr(COL, 13, 1))
or
create index ix_concatstr on TABLE (substr(COL, 1, 11) || substr(col_name, 13, 1))

How to import data from a hbase table to hive table?

I've created a Hbase table like this,
create 'student','personal'
and I've put some data into it like this.
ROW COLUMN+CELL
1 column=personal:age, timestamp=1456224023454, value=20
1 column=personal:name, timestamp=1456224008188, value=pesronA
2 column=personal:age, timestamp=1456224891317, value=13
2 column=personal:name, timestamp=1456224868967, value=pesronB
3 column=personal:age, timestamp=1456224935178, value=21
3 column=personal:name, timestamp=1456224921246, value=personC
4 column=personal:age, timestamp=1456224951789, value=20
4 column=personal:name, timestamp=1456224961845, value=personD
5 column=personal:age, timestamp=1456224983240, value=20
5 column=personal:name, timestamp=1456224972816, value=personE
-
I want to import this data to a hive table. I wrote a hive query for that like this
CREATE TABLE hbaseStudent(key INT,name STRING,age INT) STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal:age,personal:name") TBLPROPERTIES("hbase.table.name" = "student")
But when I execute the query error comes out like this.
Driver returned: 1. Errors: OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration
what should i do?
I tried this thing and it worked try replacing all the double quotes (") with single quotes ('). It will work & also try to add terminator ; in last line.

How to load data into couple of tables from a single files with different records structure in Hive?

I have a single file with a structure like:
A 1 2 3
A 4 5 6
A 5 8 12
B abc cde
B and fae
B bsd oio
C 1
C 2
C 3
and would like to load the data in 3 simple tables (A (int int int), B(string string) C(int)).
Is it possible and how?
It's also fine for me, if A(string int int int) etc. with the first column of the file to be included in the table.
I'd go with option 1 as Praveen suggests. I'd create an external table of only a string, and use the FROM ( ... ) syntax to insert into multiple tables at once. I think something like the following would work
create external table source_table( line string )
stored as textfile
location '/myfile';
from ( select split( line , " ") as col_array from source_table ) cols
insert overwrite table A select col_array[1], col_array[2], col_array[3] where col_array[0] = 'A'
insert overwrite table B select col_array[1], col_array[2] where col_array[0] = 'B'
insert overwrite table C select col_array[1] where col_array[0] = 'C';
Option 1) Map the entire data to a Hive table and then use the insert overwrite table .... option to map the appropriate data to the target tables.
Option 2) Develop a MR program to split the file into multiple files and then do the mapping of the files to the target tables in Hive.

concatenate multiple fields in sqlldr

I am working on sqlldr(sql loader) in oracle 11g.
I am trying to concatenate 3 fields into a single field. Has anyone done this?
ex:
TABLE - "CELLINFO" where the fields are (mobile_no,service,longitude).
The data given is (+9198449844,idea,110,25,50) i.e. (mobile_no,service,grad,min,sec).
But while loading data into the table i need to concatenate the last 3 fields (grad,min,sec) into the longitude field of the table.
Here i cant edit manually because i have 1000's of data to be loaded.
I also tried using ||,+ and concat().... but I am not able to.
ctl may be:
load data
append
into table cellinfo
fields terminated by ","
(
mobile_no,
service,
grad BOUNDFILLER,
min BOUNDFILLER,
sec BOUNDFILLER,
latitude ":grad || :min|| :sec"
)
suposing cellinfo(mobile_no, service, latitude).
Some nice info here on orafaq
Alternatively, you can modify your input:
awk -F"," '{print $1","$2","$3":"$4":"$5}' inputfile > outputfile

Resources