Parse Exception EOF Hive - hadoop

Query:
hive> CREATE TABLE GREENTAXI(VendorID INT, pick_up_date DATE,drop_date DATE,Flag CHAR(1),rate_code INT, pick_up_long STRING,pick_up_lat STRING,drop_off_long STRING,drop_off_lat STRING,passenger_count INT,trip_distance DECIMAL,fare_amount DECIMAL,Extra DECIMAL,Tax DECIMAL,Tip DECIMAL,Tolls INT,Fee INT,Surcharge DECIMAL,total_amount DECIMAL,payment_type INT,trip_type INT)COMMENT 'Data about Green NYC Taxi for the year 2016-Jan’ ROW FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE;
I get this error. Please advise

Looks like some character encoding problem. Use a simple editor. Tried this and worked:
CREATE TABLE greentaxi
(
vendorid INT,
pick_up_date DATE,
drop_date DATE,
flag CHAR(1),
rate_code INT,
pick_up_long STRING,
pick_up_lat STRING,
drop_off_long STRING,
drop_off_lat STRING,
passenger_count INT,
trip_distance DECIMAL,
fare_amount DECIMAL,
extra DECIMAL,
tax DECIMAL,
tip DECIMAL,
tolls INT,
fee INT,
surcharge DECIMAL,
total_amount DECIMAL,
payment_type INT,
trip_type INT
)
comment 'Data about Green NYC Taxi for the year 2016-Jan'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Related

Non-string values showing as NULL in Hive

Im new to HIVE and creating my first table!
for some reason all non-string values are showing as NULL (including int, BOOLEAN, etc.)
my data looks like this sample row:
58;"management";"married";"tertiary";"no";2143;"yes";"no";"unknown";5;"may";261;1;-1;0;"unknown";"no"
i used this to create the table:
create external table bank_dataset(
age TINYINT,
job string,
education string,
default BOOLEAN,
balance INT,
housing BOOLEAN,
loan BOOLEAN,
contact STRING,
day STRING,
month STRING,
duration INT,
campaign INT,
pdays INT,
previous INT,
poutcome STRING,
y BOOLEAN)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u003B'
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Thanks for the comments it worked! but i have 1 issue. For every row i get all the data correctly then I get extra columns of null values. Find below my code:
create external table bank_dataset(age TINYINT, job string, education string, default BOOLEAN, balance INT, housing BOOLEAN, loan BOOLEAN, contact STRING,day INT, month STRING, duration INT,campaign INT, pdays INT, previous INT, poutcome STRING,y BOOLEAN)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\u003B",
"quoteChar" = '"'
)
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Any suggestions?

Hive insert query failing with error return code -101

I am trying to run a simple insert statement as below:
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;
Then it fails with the below error:
INFO : Loading data to table dtc.bwc_test partition (call_date=null) from /apps/hive/warehouse/dtc.db/bwc_test/.hive-staging_hive_2018-11-13_19-10-37_084_8697431764330812894-1/-ext-10000
Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. HIVE_LOAD_DYNAMIC_PARTITIONS_THREAD_COUNT (state=08S01,code=-101)
Table definition for bwc_master:
CREATE TABLE `bwc_master`(
unique_id bigint,
customer_id string,
direction string,
call_date_time timestamp,
duration int,
billed_duration int,
retail_rate decimal(9,7),
retail_cost decimal(19,7),
billed_tier smallint,
call_type tinyint,
record_status tinyint,
aggregate_id bigint,
originating_ipaddress string,
originating_number string,
destination_number string,
lrn string,
ocn string,
destination_rate_center string,
destination_lata int,
billed_prefix string,
rate_id string,
wholesale_rate decimal(9,7),
wholesale_cost decimal(19,7),
cnam_dipped boolean,
billed_number_type tinyint,
source_lata int,
source_ocn string,
location_id string,
sippeer_id int,
rate_attempts tinyint,
source_state string,
source_rc string,
destination_country string,
destination_state string,
destination_ip string,
carrier_id string,
rated_date_time timestamp,
partition_id smallint,
encryption_rate decimal(9,7),
encryption_cost decimal(19,7),
trans_coding_rate decimal(9,7),
trans_coding_cost decimal(19,7),
file_name string,
call_id string,
from_tag string,
to_tag string,
unique_record_id string)
PARTITIONED BY (
`call_date` date)
CLUSTERED BY (
customer_id)
INTO 10 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://*****/apps/hive/warehouse/dtc.db/bwc_master'
Can someone help me debug this? I didn't find anything in the logs.
You missing the "table" before bwc_test
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;

Unable to load text data into Hive table as ORC through temporary Hive table

I want to load .csv file into Hive table as a ORC file. I came across one post
which suggested a workaround to the problem to which I executed the below queries:
1) Creating and loading data as a text file into a temporary table:
CREATE TABLE IF NOT EXISTS CrimesData( ID int, Case_Number int, CrimeDate string, Block string , IUCR string,Primary_Type string, Description string, Location_Description string, Arrest string, Domestic string, Beat int, District int, Ward int, Community_Area int, FBI_Code string, X_Coordinate int, Y_Coordinate int, Year int, Updated_On string, Latitude decimal(10,10), Longitude decimal(10,10), CrimeLocation string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '"' LINES TERMINATED BY '\n'
tblproperties("skip.header.line.count"="1")
LOAD DATA LOCAL INPATH '/home/cloudera/Documents/CrimesData.csv' INTO TABLE CrimesData
2) Creating a new table and specifying ORC data as the source:
CREATE TABLE IF NOT EXISTS CrimesDataORC( ID int, Case_Number int, CrimeDate string, Block string , IUCR string,Primary_Type string, Description string, Location_Description string, Arrest string, Domestic string, Beat int, District int, Ward int, Community_Area int, FBI_Code string, X_Coordinate int, Y_Coordinate int, Year int, Updated_On string, Latitude decimal(10,10), Longitude decimal(10,10), CrimeLocation string)
STORED AS ORC;
3) Insert data into the new table from temporary table:
INSERT INTO TABLE CrimesDataORC SELECT * FROM CrimesData;
The first two steps execute without any error but the step 3 throws the following error:
Error while processing statement: FAILED: Execution Error, return code
2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I am running the above queries on Cloudera Manager Quickstart VM 5.8.
Not sure where I am going wrong as similar steps for another table in the same database works as expected.
It might be kind of data non-compliance with structure. Try to set some where conditions in the select statement to check rather inserting all the data

Query on Bucketized Table

I created a bucketized table as the following
drop table if exists bi_st.st_usr_member_active_day_test;
CREATE TABLE `bi_st.st_usr_member_active_day_test`(
`cal_dt_from` string,
`cal_dt_to` string,
`memberid` string,
`vipcode` string,
`vipleavel` string,
`cityid` string,
`cityname` string,
`groupid` int,
`groupname` string,
`storeid` int,
`storename` string,
`sectionid` int,
`sectionname` string,
`promotionid` string,
`promotionname` string,
`moduleid` string,
`modulename` string,
`activeness_today` string,
`new_vip_class` string
)
clustered by (storeid) into 2 buckets
row format delimited fields terminated by '\t'
stored as orc TBLPROPERTIES('transactional'='true');
And then inserted some data into it, and then I did
select * from bi_st.st_usr_member_active_day_test where storeid = 193;, it failed and gave an array index out of bound error. Can anybody explain about this? Thanks

Insert data of 2 Hive external tables in new External table with additional column

I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.
create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';
create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';
Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table. The new external table is as follows.
create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';
how can I do it.?
You do SELECT from each table and perform UNION ALL operation on these results and finally insert the result into your third table.
Below is the final hive query:
INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;
Hope this help you!!!
You can very well do it by manual partitioning as well.
CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)
Then run below command,
load data inpath 'hdfspath' into table transaction_new_table partition(sourcetablename='1')
You could use the INSERT INTO Clause of Hive
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_usa' FROM transaction_usa;
INSERT INTO TABLE table transaction_usa_canada
SELECT tran_id, acct_id, tran_date, ...'transaction_canada' FROM transaction_canada;

Resources