I downloaded elasticsearch2.1.2 JAR and followed the guide to configure it in Hadoop(v5.4.4). Everything looks ok but I am getting 'CAST' error while reading from the elasticsearch source. Below is the error message-
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.elasticsearch.hadoop.mr.WritableArrayWritable cannot be cast to org.apache.hadoop.io.Text
Below is the table created in hive-
CREATE EXTERNAL TABLE Log_Event_ICS_ES(
product_version string,
agent_host string,
product_name string,
temp_time_stamp bigint,
log_message string,
org_id string,
log_datetime timestamp,
message string,
log_source_provider string,
log_source_name string,
log_message_for_trending string,
index_only_message string,
log_level string,
code_source string,
log_type string,
full_message string,
session_log_operation string,
source_received_time timestamp
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'log_event_2015-05-11/log_event',
'es.nodes' = '',
'es.port' = ''
)
Select query- select * from log_event_ics_es
Any idea?
Related
Im new to HIVE and creating my first table!
for some reason all non-string values are showing as NULL (including int, BOOLEAN, etc.)
my data looks like this sample row:
58;"management";"married";"tertiary";"no";2143;"yes";"no";"unknown";5;"may";261;1;-1;0;"unknown";"no"
i used this to create the table:
create external table bank_dataset(
age TINYINT,
job string,
education string,
default BOOLEAN,
balance INT,
housing BOOLEAN,
loan BOOLEAN,
contact STRING,
day STRING,
month STRING,
duration INT,
campaign INT,
pdays INT,
previous INT,
poutcome STRING,
y BOOLEAN)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u003B'
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Thanks for the comments it worked! but i have 1 issue. For every row i get all the data correctly then I get extra columns of null values. Find below my code:
create external table bank_dataset(age TINYINT, job string, education string, default BOOLEAN, balance INT, housing BOOLEAN, loan BOOLEAN, contact STRING,day INT, month STRING, duration INT,campaign INT, pdays INT, previous INT, poutcome STRING,y BOOLEAN)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\u003B",
"quoteChar" = '"'
)
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
Any suggestions?
Failed!!
Create the table for below schema
(schema = {"type":"record","name":"topLevelRecord","fields":[{"name":"MESSAGE_ID","type":["string","null"]},{"name":"MSGNAME","type":["string","null"]},{"name":"SOURCE","type":["string","null"]},{"name":"EVENT_DATETIME","type":["string","null"]},{"name":"CUSTOMER_ORDER_ID","type":["string","null"]},{"name":"SP_ORGANISATION_NAME","type":["string","null"]},{"name":"CUSTOMER_ACCOUNT_ID","type":["string","null"]},{"name":"ORDER_TYPE_NAME","type":["string","null"]},{"name":"ORDER_SUBTYPE_NAME","type":["string","null"]},{"name":"ORDER_REASON_NAME","type":["string","null"]},{"name":"ORDER_CREATED_DATE","type":["string","null"]},{"name":"ORDER_CREATED_CHANNEL_NAME","type":["string","null"]},{"name":"ORDER_CREATED_RETAILER_ID","type":["string","null"]},{"name":"ORDER_CREATED_DEALER_ID","type":["string","null"]},{"name":"ORDER_CREATED_AFFILIATE_ID","type":["string","null"]},{"name":"ORDER_CREATED_EMPLOYEE_ID","type":["string","null"]},{"name":"ORDER_CREATED_CONTACT_CENTRE_AGENT_ID","type":["string","null"]},{"name":"ORDER_SUBMITTED_DATE","type":["string","null"]},{"name":"ORDER_SUBMITTED_CHANNEL_NAME","type":["string","null"]},{"name":"ORDER_DUE_DATE","type":["string","null"]},{"name":"ONE_TIME_CHARGE_AMT","type":["string","null"]},{"name":"RECURRING_CHARGE_AMT","type":["string","null"]},{"name":"ORDER_STATUS_NAME","type":["string","null"]},{"name":"ORDER_STATUS_CHANGE_REASON_NAME","type":["string","null"]},{"name":"CREATE_JOB_RUN_ID","type":"int"},{"name":"CREATE_DATE_TIME","type":"string"},{"name":"SYSTEM_ID","type":"int"},{"name":"SRC_FILE_NAME","type":"string"}]}
I am new to hive just tried out by just looking around and came up with below Query
CREATE EXTERNAL TABLE governed_data.customer_order(
message_id string,
msgname string,
source string,
event_datetime string,
customer_order_id string,
sp_organisation_name string,
customer_account_id string,
order_type_name string,
order_subtype_name string,
order_reason_name string,
order_created_date string,
order_created_channel_name string,
order_created_retailer_id string,
order_created_dealer_id string,
order_created_affiliate_id string,
order_created_employee_id string,
order_created_contact_centre_agent_id string,
order_submitted_date string,
order_submitted_channel_name string,
order_due_date string,
one_time_charge_amt string,
recurring_charge_amt string,
order_status_name string,
order_status_change_reason_name string,
create_job_run_id int,
create_date_time string,
system_id int,
src_file_name string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS AVRO
location 'adl://rbsitbinsighstdlt001.azuredatalakestore.net/insights/governed_data/';
In the i want to insert data in the hive Database
You specified stored as AVRO and serde is JsonSerde, these properties are conflicting.
If you need AVRO, then specify the serde as org.apache.hadoop.hive.serde2.avro.AvroSerDe, specify the inputformat as org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat, and the outputformat as org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat. Also provide a location from which the AvroSerde will pull the most current schema for the table.
See example here: Creating Avro-backed Hive tables
Or simply specify STORED AS AVRO, without SerDe, Input and Output format. Try to remove ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' in your DDL.
And if you want JsonSerDe to parse attributes then create table like this:
CREATE EXTERNAL TABLE governed_data.customer_order(message_id string,
msgname string,
source string,
event_datetime string,
customer_order_id string,
sp_organisation_name string,
customer_account_id string,
order_type_name string,
order_subtype_name string,
order_reason_name string,
order_created_date string,
order_created_channel_name string,
order_created_retailer_id string,
order_created_dealer_id string,
order_created_affiliate_id string,
order_created_employee_id string,
order_created_contact_centre_agent_id string,
order_submitted_date string,
order_submitted_channel_name string,
order_due_date string,
one_time_charge_amt string,
recurring_charge_amt string,
order_status_name string,
order_status_change_reason_name string,
create_job_run_id int,
create_date_time string,
system_id int,
src_file_name string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'adl://rbsitbinsighstdlt001.azuredatalakestore.net/insights/governed_data/'
;
Read also docs about JsonSerDe
I am trying to run a simple insert statement as below:
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;
Then it fails with the below error:
INFO : Loading data to table dtc.bwc_test partition (call_date=null) from /apps/hive/warehouse/dtc.db/bwc_test/.hive-staging_hive_2018-11-13_19-10-37_084_8697431764330812894-1/-ext-10000
Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. HIVE_LOAD_DYNAMIC_PARTITIONS_THREAD_COUNT (state=08S01,code=-101)
Table definition for bwc_master:
CREATE TABLE `bwc_master`(
unique_id bigint,
customer_id string,
direction string,
call_date_time timestamp,
duration int,
billed_duration int,
retail_rate decimal(9,7),
retail_cost decimal(19,7),
billed_tier smallint,
call_type tinyint,
record_status tinyint,
aggregate_id bigint,
originating_ipaddress string,
originating_number string,
destination_number string,
lrn string,
ocn string,
destination_rate_center string,
destination_lata int,
billed_prefix string,
rate_id string,
wholesale_rate decimal(9,7),
wholesale_cost decimal(19,7),
cnam_dipped boolean,
billed_number_type tinyint,
source_lata int,
source_ocn string,
location_id string,
sippeer_id int,
rate_attempts tinyint,
source_state string,
source_rc string,
destination_country string,
destination_state string,
destination_ip string,
carrier_id string,
rated_date_time timestamp,
partition_id smallint,
encryption_rate decimal(9,7),
encryption_cost decimal(19,7),
trans_coding_rate decimal(9,7),
trans_coding_cost decimal(19,7),
file_name string,
call_id string,
from_tag string,
to_tag string,
unique_record_id string)
PARTITIONED BY (
`call_date` date)
CLUSTERED BY (
customer_id)
INTO 10 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://*****/apps/hive/warehouse/dtc.db/bwc_master'
Can someone help me debug this? I didn't find anything in the logs.
You missing the "table" before bwc_test
insert into table `bwc_test` partition(call_date)
select * from
`bwc_master`;
I have created an external table as below:
create external table if not exists complaints (date_received string, product string, sub_product string, issue string, sub_issue string, consumer_complaint_narrative string, state string, company_public_response string, company varchar(50), zipcode int, tags string, consumer_consent_provided string, submitted_via string, date_sent_company string, company_response string, timely_response string, consumer_disputed string, complaint_id int) row format delimited fields terminated by ',' stored as textfile location 'hdfs:hostname:8020/complaints/';
Now I want to create another table complaints_new with partition as state and have all the data from above table. How can this be acheived?
I tried the below:
create external table if not exists complaints_new (date_received string, product string, sub_product string, issue string, sub_issue string, consumer_complaint_narrative string, company_public_response string, company varchar(50), zipcode int, tags string, consumer_consent_provided string, submitted_via string, date_sent_company string, company_response string, timely_response string, consumer_disputed string, complaint_id int) partitioned by (state varchar(20)) row format delimited fields terminated by ',' stored as textfile location 'hdfs://hostname:8020/complaints/';
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.mapred.mode = nonstrict;
insert into table complaints_new partition(state) select * from complaints;
The query is failing.
You have a few problems here... you are pointing to the same location which means that you will be reading and overwriting that location... the other problem is that Hive expect th partition column to be the last element in your list, it means that you cannot do select *, instead you have to select field to field and put the state and the end of your select statement
I created a bucketized table as the following
drop table if exists bi_st.st_usr_member_active_day_test;
CREATE TABLE `bi_st.st_usr_member_active_day_test`(
`cal_dt_from` string,
`cal_dt_to` string,
`memberid` string,
`vipcode` string,
`vipleavel` string,
`cityid` string,
`cityname` string,
`groupid` int,
`groupname` string,
`storeid` int,
`storename` string,
`sectionid` int,
`sectionname` string,
`promotionid` string,
`promotionname` string,
`moduleid` string,
`modulename` string,
`activeness_today` string,
`new_vip_class` string
)
clustered by (storeid) into 2 buckets
row format delimited fields terminated by '\t'
stored as orc TBLPROPERTIES('transactional'='true');
And then inserted some data into it, and then I did
select * from bi_st.st_usr_member_active_day_test where storeid = 193;, it failed and gave an array index out of bound error. Can anybody explain about this? Thanks