Insert overwrite directory using Presto like Hive - hadoop

In Hive, the statement below will output foo^Bbar^Abaz
insert overwrite directory 's3://bucket-name/foobarbaz'
row format delimited
fields terminated by '\001'
select split('foo,bar', ','), 'baz';
In Presto, I ran this statement:
insert overwrite directory 's3://bucket-name/foobarbaz'
select split('foo,bar', ','), 'baz';
With this result: ["foo","bar"]^Abaz
What is the equivalent Presto clause for insert overwrite directory that works for arrays and structs?
It seems like Presto converted my array type into a json string, but I want this formatted to Hadoop spec with collection item and map key delimiter support.

Try to specify COLLECTION ITEMS TERMINATED BY in the create table DDL.
row_format DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] ...

Related

How to make hive table match data using column names and not using ordinal positions

If I have a csv like -
colName1,colName2
col1Value,col2Value
and a hive ddl like -
CREATE EXTERNAL TABLE tableName (
col2 STRING,
col1 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs://location/to/testcsv/directory'
tblproperties ("skip.header.line.count"="1");
//select col2 from tableName; gives col1Value
This is obviously because in case of text files hive matches column to data field by ordinal position matching. If the underlying file is parquet then the match is done using column names.
I was wondering is there is a hive SerDe someone has written or maybe some SerDe property I am missing that tells hive to map data field names with hive table column names, such that in above example it would return "col2Value" when col2 is queried, even though ordinal position of col2 in hive table and data file does not match.
Thanks in advance!

write json data from hive to local file

This is my query in hive
select colname from uber_test;
OK
{"data":"{\"age\":42, \"gender\":\"male\"}"}
This is how I export data:
CREATE TABLE export_test_1( name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
INSERT OVERWRITE LOCAL DIRECTORY 'export_test_1' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select colname from uber_test;
This is what my data looks like after I export:
more export_test_2/000000_0
data{"age":42, "gender":"male"}
I need to preserve the json. needs to be:
{"data":"{\"age\":42, \"gender\":\"male\"}"}

remove surrounding quotes from fields while loading data into hive

I want to load a table with input data into hive. I have data in the following format.
"153662";"0002241447";"0"
"153662";"000647036X";"0"
"153662";"0020434901";"0"
"153662";"0020973403";"0"
"153662";"0028604202";"0"
"153662";"0030437512";"0"
I want to load this data into a table with two varchar columns and one int column.But the surrounding double quotes trouble me. I have created the following table.
CREATE EXTERNAL TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
but the quotes around the field also become part of field as shown below.
"276725" "034545104X" "0"
"276726" "0155061224" "5"
I want to ignore them. Also I want the third field to be read as INT. Currently it becomes NULL when I provide third field as INT while making table.
You will have to use Csv-Serde for this.
CREATE TABLE Table(A varchar(50),B varchar(50),C varchar(50))
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
"separatorChar" = ";",
"quoteChar" = "\""
)
STORED AS TEXTFILE;
Multiple ways to achieve this:
Use CSV serde
Use regex serde- regex "\"(.*)\"\;\"(.*)\"\;\"(.*)\""
Load data to external table then remove double quotes:
CREATE EXTERNAL TABLE source(
a string,
b String,
c String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LOCATION 'xyz';
CREATE TABLE destination AS SELECT REGEXP_REPLACE(a,'"',''), REGEXP_REPLACE(b,'"',''), CAST ( REGEXP_REPLACE(c,'"','') AS BIGINT) FROM source;
Hive query to remove double quotes around the string.
Example:
col2 value: "my name is, abc"
select col1, (regexp_replace(col2,'"','')) as col2 from table;
Output: my name is, abc

Hive Insert overwrite into Dynamic partition external table from a raw external table failed with null pointer exception.,

I have a raw external table with four columns-
Table 1 :
create external table external_partitioned_rawtable (age_bucket
String,country_destination String,gender
string,population_in_thousandsyear int) row format delimited
fields terminated by '\t' lines terminated by '\n' location
'/user/HadoopUser/hive'
I want a external table with partitions from Country_destination and gender.Table -2
create external table external_partitioned (age_bucket
String,population_in_thousandsyear int) partitioned
by(country_destination String,gender String) row format delimited
fields terminated by '\t' lines terminated by '\n';
Insert Overwrite is failing with null pointer exception-
insert overwrite table external_partitioned partition(country_destination,gender) <br>
select (age_bucket,population_in_thousandsyear,country_destination,gender) <br>
from external_partitioned_rawtable;
FAILED: NullPointerException null
For dynamic partition insertion, before executing the INSERT statement you have to execute two properties of hive:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
then execute insert statement(which I have modified)
insert overwrite table external_partitioned partition(country_destination,gender)
select age_bucket,population_in_thousandsyear,country_destination,gender
from external_partitioned_rawtable;
I hope this help you!!!

Hive: How to delimit rows using a string literal

Need help here.
This is related to hive.
i have a text file with a single long line, for e.g:
JASON 29\SASHA 24\CHRISTINE 15\ROBERT 20\
Now i need to create a table in hive, whose rows are delimited using "\" (backslash), like if i insert the data from the above mentioned line "JASON 29\SASHA 24...." i would want 4 rows to be inseted in my table.
in other words, i want my custom char to be row delimiters, and not the default "\n".
i wrote the DDL:
CREATE TABLE newline_tab
(
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\\'
STORED AS TEXTFILE;
but i am unable to create the table, and im getting following error:
FAILED: SemanticException 9:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\''
any help would be appreciated :)
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

Resources