Hive insert query failing with error return code -101 - hadoop

I am trying to run a simple insert statement as below:
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;
Then it fails with the below error:
INFO : Loading data to table dtc.bwc_test partition (call_date=null) from /apps/hive/warehouse/dtc.db/bwc_test/.hive-staging_hive_2018-11-13_19-10-37_084_8697431764330812894-1/-ext-10000
Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. HIVE_LOAD_DYNAMIC_PARTITIONS_THREAD_COUNT (state=08S01,code=-101)
Table definition for bwc_master:
CREATE TABLE `bwc_master`(
unique_id bigint,
customer_id string,
direction string,
call_date_time timestamp,
duration int,
billed_duration int,
retail_rate decimal(9,7),
retail_cost decimal(19,7),
billed_tier smallint,
call_type tinyint,
record_status tinyint,
aggregate_id bigint,
originating_ipaddress string,
originating_number string,
destination_number string,
lrn string,
ocn string,
destination_rate_center string,
destination_lata int,
billed_prefix string,
rate_id string,
wholesale_rate decimal(9,7),
wholesale_cost decimal(19,7),
cnam_dipped boolean,
billed_number_type tinyint,
source_lata int,
source_ocn string,
location_id string,
sippeer_id int,
rate_attempts tinyint,
source_state string,
source_rc string,
destination_country string,
destination_state string,
destination_ip string,
carrier_id string,
rated_date_time timestamp,
partition_id smallint,
encryption_rate decimal(9,7),
encryption_cost decimal(19,7),
trans_coding_rate decimal(9,7),
trans_coding_cost decimal(19,7),
file_name string,
call_id string,
from_tag string,
to_tag string,
unique_record_id string)
PARTITIONED BY (
`call_date` date)
CLUSTERED BY (
customer_id)
INTO 10 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://*****/apps/hive/warehouse/dtc.db/bwc_master'
Can someone help me debug this? I didn't find anything in the logs.

You missing the "table" before bwc_test
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;

Related

Non-string values showing as NULL in Hive

Im new to HIVE and creating my first table!
for some reason all non-string values are showing as NULL (including int, BOOLEAN, etc.)
my data looks like this sample row:
58;"management";"married";"tertiary";"no";2143;"yes";"no";"unknown";5;"may";261;1;-1;0;"unknown";"no"
i used this to create the table:
create external table bank_dataset(
age TINYINT,
job string,
education string,
default BOOLEAN,
balance INT,
housing BOOLEAN,
loan BOOLEAN,
contact STRING,
day STRING,
month STRING,
duration INT,
campaign INT,
pdays INT,
previous INT,
poutcome STRING,
y BOOLEAN)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u003B'
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Thanks for the comments it worked! but i have 1 issue. For every row i get all the data correctly then I get extra columns of null values. Find below my code:
create external table bank_dataset(age TINYINT, job string, education string, default BOOLEAN, balance INT, housing BOOLEAN, loan BOOLEAN, contact STRING,day INT, month STRING, duration INT,campaign INT, pdays INT, previous INT, poutcome STRING,y BOOLEAN)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\u003B",
"quoteChar" = '"'
)
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Any suggestions?

AWS Athena creates indentation and moves values into wrong columns after partitions loads

I encountered the following problem:
I created a Hive table in an EMR cluster in HDFS without partitions
and loaded a data to it.
I created another Hiva table based on the
table from the paragraph#1 but with partitions from the datetime
column: PARTITIONED BY (year STRING,month STRING,day STRING).
I loaded a data from the non partitioned table into partitioned table and get the valid result.
I created an Athena database and table with the same structure as Hive table.
I copied partitioned files from HDFS locally and by aws s3 sync transferred all files into S3 empty bucket. All files were transferred without error and with the same order as in Hive directory in HDFS.
I loaded partitions by MSCK REPAIR TABLE and didn't get any error in an output.
After that I found that many values got indentation, for example a value that need to be in the "IP" column was in "Operating_sys" column and etc.
My scripts are:
-- Hive tables
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs_page_part
(
log_DATE STRING,
user_id STRING,
page_path STRING,
referer STRING,
tracking_referer STRING,
medium STRING,
campaign STRING,
source STRING,
visitor_id STRING,
ip STRING,
session_id STRING,
operating_sys STRING,
ad_id STRING,
keyword STRING,
user_agent STRING
)
PARTITIONED BY
(
`year` STRING,
`month` STRING,
`day` STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/admin/events_partitioned';
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs_event_part
(
log_DATE STRING,
user_id STRING,
category STRING,
action STRING,
label STRING,
value STRING,
visitor_id STRING,
ip STRING,
session_id STRING,
operating_sys STRING,
extra_data_json STRING
)
PARTITIONED BY
(
`year` STRING,
`month` STRING,
`day` STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/admin/pages_partitioned';
INSERT INTO TABLE cloudfront_logs_page_part
PARTITION
(
`year`,
`month`,
`day`
)
SELECT
log_DATE,
user_id,
page_path,
referer,
tracking_referer,
medium,
campaign,
source,
visitor_id,
ip,
session_id,
operating_sys,
ad_id,
keyword,
user_agent,
year(log_DATE) as `year`,
month(log_DATE) as `month`,
day(log_DATE) as `day`
FROM
cloudfront_logs_page;
INSERT INTO TABLE cloudfront_logs_event_part
PARTITION
(
`year`,
`month`,
`day`
)
SELECT
log_DATE,
user_id,
category,
action,
label,
value,
visitor_id,
ip,
session_id,
operating_sys,
extra_data_json,
year(log_DATE) as `year`,
month(log_DATE) as `month`,
day(log_DATE) as `day`
FROM
cloudfront_logs_event;
-- Athena tables
CREATE DATABASE IF NOT EXISTS test
LOCATION 's3://...';
DROP TABLE IF EXISTS test.cloudfront_logs_page_ath;
CREATE EXTERNAL TABLE IF NOT EXISTS powtoon_hive.cloudfront_logs_page_ath (
log_DATE STRING,
user_id STRING,
page_path STRING,
referer STRING,
tracking_referer STRING,
medium STRING,
campaign STRING,
source STRING,
visitor_id STRING,
ip STRING,
session_id STRING,
operating_sys STRING,
ad_id STRING,
keyword STRING,
user_agent STRING
)
PARTITIONED BY (`year` STRING,`month` STRING, `day` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://.../';
DROP TABLE IF EXISTS test.cloudfront_logs_event_ath;
CREATE EXTERNAL TABLE IF NOT EXISTS test.cloudfront_logs_event_ath
(
log_DATE STRING,
user_id STRING,
category STRING,
action STRING,
label STRING,
value STRING,
visitor_id STRING,
ip STRING,
session_id STRING,
operating_sys STRING,
extra_data_json STRING
)
PARTITIONED BY (`year` STRING,`month` STRING, `day` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://.../';
What can be wrong? Table structure? Athena metadata?
The easiest method would be to convert your raw files directly into a partitioned Parquet columnar format. This has the benefit of partitioning, columnar storage, predicate push-down and all those other fancy words.
See: Converting to Columnar Formats - Amazon Athena

Partition a hive table with a column in middle from another external table

I have created an external table as below:
create external table if not exists complaints (date_received string, product string, sub_product string, issue string, sub_issue string, consumer_complaint_narrative string, state string, company_public_response string, company varchar(50), zipcode int, tags string, consumer_consent_provided string, submitted_via string, date_sent_company string, company_response string, timely_response string, consumer_disputed string, complaint_id int) row format delimited fields terminated by ',' stored as textfile location 'hdfs:hostname:8020/complaints/';
Now I want to create another table complaints_new with partition as state and have all the data from above table. How can this be acheived?
I tried the below:
create external table if not exists complaints_new (date_received string, product string, sub_product string, issue string, sub_issue string, consumer_complaint_narrative string, company_public_response string, company varchar(50), zipcode int, tags string, consumer_consent_provided string, submitted_via string, date_sent_company string, company_response string, timely_response string, consumer_disputed string, complaint_id int) partitioned by (state varchar(20)) row format delimited fields terminated by ',' stored as textfile location 'hdfs://hostname:8020/complaints/';
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.mapred.mode = nonstrict;
insert into table complaints_new partition(state) select * from complaints;
The query is failing.
You have a few problems here... you are pointing to the same location which means that you will be reading and overwriting that location... the other problem is that Hive expect th partition column to be the last element in your list, it means that you cannot do select *, instead you have to select field to field and put the state and the end of your select statement

Query on Bucketized Table

I created a bucketized table as the following
drop table if exists bi_st.st_usr_member_active_day_test;
CREATE TABLE `bi_st.st_usr_member_active_day_test`(
`cal_dt_from` string,
`cal_dt_to` string,
`memberid` string,
`vipcode` string,
`vipleavel` string,
`cityid` string,
`cityname` string,
`groupid` int,
`groupname` string,
`storeid` int,
`storename` string,
`sectionid` int,
`sectionname` string,
`promotionid` string,
`promotionname` string,
`moduleid` string,
`modulename` string,
`activeness_today` string,
`new_vip_class` string
)
clustered by (storeid) into 2 buckets
row format delimited fields terminated by '\t'
stored as orc TBLPROPERTIES('transactional'='true');
And then inserted some data into it, and then I did
select * from bi_st.st_usr_member_active_day_test where storeid = 193;, it failed and gave an array index out of bound error. Can anybody explain about this? Thanks

Insert data of 2 Hive external tables in new External table with additional column

I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.
create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';
create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';
Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table. The new external table is as follows.
create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';
how can I do it.?
You do SELECT from each table and perform UNION ALL operation on these results and finally insert the result into your third table.
Below is the final hive query:
INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;
Hope this help you!!!
You can very well do it by manual partitioning as well.
CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)
Then run below command,
load data inpath 'hdfspath' into table transaction_new_table partition(sourcetablename='1')
You could use the INSERT INTO Clause of Hive
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_usa' FROM transaction_usa;
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_canada' FROM transaction_canada;

Resources