I created an Hive external table to query the existing data of es like below
CREATE EXTERNAL TABLE ods_es_data_inc
(`agent_id` STRING,
`dt_server_time` TIMESTAMP
) COMMENT 'bb_i_app'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource'='data*',
'es.nodes'='ip',
'es.port'='port',
'es.net.http.auth.user'='user',
'es.net.http.auth.pass'='pass'
)
when I query date field in Hive external table,I am getting below error
Error:Java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException:org.apache.hadoop.hive.serde2 .io.Timestampwritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampwritableV2 (state=,code=0)
My situation is very similar to this problem.But I have used the timestamp field when I create external table.
My component version:
Hive:3.1.0
ES-Hadoop:6.8.7
Elasticsearch:6.7.1
I switched Hive's execution engine from mr to spark.The error has not changed. After eliminating the component problem, I don't know whether it is the version mismatch or the table creation problem.
Related
I have crawled some data via Nutch 2.3.1. Data is stored in Hbase 0.98 table. I have created an external table that import data from hbase table. Now I have to index this data to solr 4.10.3. For that I have followed this well known tutorial. I have created hive table like
create external table if not exists solr_items (
id STRING,
content STRING,
url STRING,
title STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
stored by "com.chimpler.hive.solr.SolrStorageHandler"
with serdeproperties ("solr.column.mapping"="id,content,url,title")
tblproperties ("solr.url" = "http://localhost:8983/solr/collection1") ;
There was some problem when I tried to copy data from hbase posted here. Then I just decide to first index some dummy data. For that I have decided to load data from a file like
LOAD DATA LOCAL INPATH 'data.csv3' OVERWRITE INTO TABLE solr_items;
But it gave following error
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
Where is the problem
HADOOP version is 1.2.1
You can't use LOAD DATA for external tables. Hive LanguageManual DML:
Hive does not do any transformation while loading data into tables.
Load operations are currently pure copy/move operations that move
datafiles into locations corresponding to Hive tables.
Hive obviously can't just copy data in case of Solr external table because Solr uses it's own internal data presentation.
You can insert though:
insert into table solr_items select * from tempTable;
I created one external table in hive which was successfully created.
create external table load_tweets(id BIGINT,text STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/user/cloudera/data/tweets_raw';
But, when I did:
hive> select * from load_tweets;
I got the below error:
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream#5dfb0646; line: 1, column: 2]**
Please suggest me how to fix this. Is it the twitter o/p file which was created using flume was corrupted or anything else?
You'll need to do two additional things.
1) Put data into the file (perhaps using INSERT). Or maybe it's already there. In either case, you'll then need to
2) from Hive, msck repair table load_tweets;
For Hive tables, the schema and other meta-information about the data is stored in what's called the Hive Metastore -- it's actually a relational database under the covers. When you perform operations on Hive tables created without the LOCATION keyword (that is, internal, not external tables), the Hive will automatically update the metastore.
But most Hive use-cases cause data to be appended to files that are updated using other processes, and thus external tables are common. If new partitions are created externally, before you can query them with Hive you need to force the metastore to sync with the current state of the data using msck repair table <tablename>;.
Totally stuck with fetch data from hive external table. I have done below till now.
I had a Managed table with date field whose value is 2014-10-23.
I created external table to store data in elastic search like below
create external table ext3 (
run_date date)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);
inserting one row in external table to create the Elastic Search Index and mapping.
Problem 1:
My Elastic search field is created as string.
Later I changed the mapping in elastic search to date.
"run_date":{"type":"date", "format": "yyyy-MM-ddZ", "index": "not_analyzed"}
re inserted the data in external table. when I query Elastic search its very fine. value is displayed as '2014-10-23+08:00'
Problem 2
When I query data for external table like select count(*) from ext3 I am getting below error.
2015-04-17 18:45:34,254 FATAL [main] org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector.getPrimitiveWritableObject(WritableDateObjectInspector.java:38)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:259)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:193)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:179)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:545)
Guys please help me on this, whole day is wasted. I have another external table with more data, I need to join these two tables and create a view to have my consolidated data ready for analysis.
I think the error gives a clue to your problem:
Error getting row data with exception java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to
org.apache.hadoop.hive.serde2.io.DateWritable
You have a date field in your hive table but the data you have inserted is of the type timestamp.
Re-create your table (or a new one if you don't want to replace it)
CREATE EXTERNAL TABLE ext3 ( run_date timestamp )
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);
We are very new to Hadoop and Hive. We created normal Hive table and loaded data as well. But We are facing issue when we are creating table in Hive with JSON format. I have added serde jar also. We get the following error:
create table airline_tables(Airline string,Airlineid string,Sourceairport string,Sourceairportid string,Destinationairport string,`Destinationairportid string,Codeshare string,Stop string,Equipment String)` ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'`location '/home/hduser/part-000';`
FAILED: Error in metadata: java.lang.NullPointerException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Location is HDFS.
I am using Hive-0.9.0 and hadoop 1.0.1.
As i can see. You are using the native table of hive. So in that case you need to load the data in the table. If you dont want to load the data then you just put the path of that particular location in the table creation script. So, i think you missed the keyword "EXTERNAL" again create the table like this. create external table airline_tables(blah blah.....)
External Table creation via HIVE JDBC isnt reflected in the hive datawarehouse whereas the normal table creation inside the hive datawarehouse happens without any issue.
After creating the table via Hive JDBC,
stmt.executeQuery("create external table trial (TOPIC STRING) row format delimited fields terminated by '' STORED as TEXTFILE LOCATION '/user/ranjitha/trial'");`
no error returned.
But when I try retrieving from this table trial, nothing is returned.
Here in this link, https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/YTekdFtbelE, it says external table creation not possible using HIVE JDBC.
It would be really helpful if someone can guide me on the above. Is this not possible with JDBC or is there another alternative for the same.
Thanks