Append records to existing hive table does not write all records - hadoop

I am trying to append some records to an existing hive external table created from a csv file stored in HDFS. The problem is that only a portion of the dataset is written, can anyone help me?
Here is my code with 2 alternatives:
if fs.exists(jvm.org.apache.hadoop.fs.Path(analysis_result_path_formatted)):
analysisFinal.write.format("hive")
.mode("append")
.saveAsTable(output_hive_db + "." + hive_table + "_analyzersresult")
analysisFinal.write
.mode("append")
.insertInto(output_hive_db + "." + hive_table + "_analyzersresult" ,overwrite=False)
I tried both the solutions but they write the same portion of the total records.
Also, here is the code for the original csv file and for the creation of the hive table:
#Writing to Analysis Result to HDFS
analysisFinal.coalesce(1).write.option("header","true")
.option("delimiter","\t")
.csv(hdfs_url + analysis_result_path_formatted)
create_analyzer_query_formatted = CREATE EXTERNAL TABLE IF NOT EXISTS {}.{}_AnalyzersResult (name STRING,entity STRING, value STRING, instance STRING, provenienza_tabella STRING, date STRING, table_name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '{}' TBLPROPERTIES ("skip.header.line.count"="1")
#Creating external table based on previously written file
spark.sql(create_analyzer_query_formatted)

Related

HIVE - create external tables where string itself contains commas

I am new to Hive and am creating external tables on csv file. One of the issues I am coming across are values that contain multiple commas within string itself. For example, the csv file contains the following:
CSV File
When I create an external table in Hive, because there are columns within the "name" column, it shifts the first name to the right adding another column. This throws all of the data off when you view the table in Hive.
External Table result in Hive
Is there anything I can add to my script to keep the commas but also keep first and last name in the same column when the external table is created? Thank you all in advance - I am very new to Hive.
CREATE EXTERNAL TABLE database.table name (
ID INT,
Name String,
City String,
State String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/xyz/xyz/database/directory/'
TBLPROPERTIES ("skip.header.line.count"="1");
Check this solution - you need to add this line : ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
https://community.cloudera.com/t5/Support-Questions/comma-in-between-data-of-csv-mapped-to-external-table-in/td-p/220193
Complete DDL example:
create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"");

Inserting into Hive Table error

I am looking to encode columns of a table in hive.
I tried:
hive> create table encode_test(id int, name STRING, phone STRING, address STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE;
Say i have a CSV file, with following row
100,'navis','010-0000-0000','Seoul Seocho'
Now i tried to use.
LOAD DATA LOCAL INPATH
'/home/path/to/csv/test.csv'
INTO TABLE encode_test;
But when doing Select * from encode_test i am getting all columns NULL
Whereas the result should have been
100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw==
Also i want to give Fields TERMINATED BY ',' IN create table encode_test query.
but i am getting error: EOF error Near Fields
I also tried creating another table sample
create table sample(id int, name STRING, phone STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
And then imported the csv file in the sample table. and it was successfully imported.
then i tried using.
insert into encode_test select * from sample;
But i am getting this new error
Permission denied: user=root, access=WRITE, inode="/user":h dfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.c heckFsPermission(DefaultAuthorizationProvider.java:279)
I'n new into hadoop
Please refer to this link from where i tried this problem
In Hive DDL, ROW FORMAT SERDE and FIELDS TERMINATED BY cannot co-exist together. Instead you can use, field.delim serde property.
create table encode_test(id int, name STRING, phone STRING, address STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'column.encode.columns'='phone,address',
'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly')
STORED AS TEXTFILE;
And for the PermissionDenied exception, run the hive queries as either hdfs or hive user since root user does not have WRITE access to HDFS.

Hive External table retrieve query (New to Hive )

I created below mention external table..
create external table if not exists sensor.building1 (BuildingID int,BuildingMgr string , BuildingAge string, HVACproduct string , Country string) row format delimited fields terminated by ',';
Loaded the table by using below query..
load data inpath '/user/cloudera/sensor/SensorFiles/building.csv' into table sensor.building1;
When I am trying to retrieve the buildingID column using below query, but I am getting null value..
select a.BuildingID
from sensor.building1 as a
limit 10;
Please guide me where I am doing something wrong
You are trying to load a CSV file into hive table but hive's default field delimiter is '\001'
So while you tring to load data from csv (I am assuming its ',' separated) its get failed.
You can create table like :
create external table test1(country string, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

How do I Insert data from text table (using MultiDelimitSerDe) to Avro Table?

I noticed that I can use an insert into statement from text table to avro table when not using the MultiDelimitSerDe. It also works with ROW FORMAT DELIMITED FIELDS TERMINATED BY "," i.e. a single character.
I create 2 tables - 1 text table and 1 avro table:
CREATE TABLE example1 ( example STRING, example2 STRING, example3
STRING ) ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH
SERDEPROPERTIES ("field.delim"="**") STORED AS TEXTFILE ;
CREATE TABLE example2 ( example STRING, example2 STRING, example3
STRING ) STORED AS AVRO;
I then load data into example1 table (file delimited by "**")i.e.
LOAD DATA INPATH 'HDFS-path' INTO TABLE example1;
example1 now has data inside it. I want to insert the data from example1 to example2.
INSERT INTO TABLE example2 SELECT * from example1;
This however, gives a "return code 2" error. I have no idea why I am unable to insert the data using the MultiDelimitSerDe but I am able to do this with "ROW FORMAT DELIMITED FIELDS TERMINATED BY". But, I need to use a multi-delimiter.
Could anyone help me please?
Have you added the required JAR file?
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' - Make sure you have the required JAR file for this (hive_contrib.jar).

why don't Hive have FIELDS ENCLOSED BY like in MySQL?

here is my case :
input lines:
"vijay" <\t> "a-b-c","a-c-d","a-d-c"
"kumar" <\t> "a-b-c","b-c-d""
i created table like this :
hive >create table user_infos(name string, path ARRAY<String> --i need array only)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS
TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE ;
output received :
hive > select * from user_infos ;
"vijay" ["**\"a-b-c\"**","**\"a-c-d\"**","**\"a-d-c\"**"]
"kumar" ["**\"a-b-c\"**","**\"b-c-d\"**"]
problem here is : i don't want double quotes i.e., \"
Required output :
vijay ["a-b-c","a-c-d","a-d-c"]
kumar ["a-b-c","b-c-d"]
Is there any why to achieve this not using custom Serde. Any thing like ENCLOSED BY like in mysql?
I was also stuck with the same issue as my fields are enclosed with double quotes and separated by semicolon(;). My table name is employee1.
So I have searched with links and I have found perfect solution for this.
#ramisetty.vijay: Yes, We have to use serde for this. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar
then follow below steps using hive prompt :
add jar path/to/csv-serde.jar;
create table employee1(id string, name string, addr string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
;
and then load data from your given path using below query:
load data local inpath 'path/xyz.csv' into table employee1;
and then run :
select * from employee1;
Thanks.

Resources