Unable to query date field in Hive external table - elasticsearch

Totally stuck with fetch data from hive external table. I have done below till now.
I had a Managed table with date field whose value is 2014-10-23.
I created external table to store data in elastic search like below
create external table ext3 (
run_date date)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);
inserting one row in external table to create the Elastic Search Index and mapping.
Problem 1:
My Elastic search field is created as string.
Later I changed the mapping in elastic search to date.
"run_date":{"type":"date", "format": "yyyy-MM-ddZ", "index": "not_analyzed"}
re inserted the data in external table. when I query Elastic search its very fine. value is displayed as '2014-10-23+08:00'
Problem 2
When I query data for external table like select count(*) from ext3 I am getting below error.
2015-04-17 18:45:34,254 FATAL [main] org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector.getPrimitiveWritableObject(WritableDateObjectInspector.java:38)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:259)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:193)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:179)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:545)
Guys please help me on this, whole day is wasted. I have another external table with more data, I need to join these two tables and create a view to have my consolidated data ready for analysis.

I think the error gives a clue to your problem:
Error getting row data with exception java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to
org.apache.hadoop.hive.serde2.io.DateWritable
You have a date field in your hive table but the data you have inserted is of the type timestamp.
Re-create your table (or a new one if you don't want to replace it)
CREATE EXTERNAL TABLE ext3 ( run_date timestamp )
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);

Related

An error occurred when using hive to query the es

I created an Hive external table to query the existing data of es like below
CREATE EXTERNAL TABLE ods_es_data_inc
(`agent_id` STRING,
`dt_server_time` TIMESTAMP
) COMMENT 'bb_i_app'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource'='data*',
'es.nodes'='ip',
'es.port'='port',
'es.net.http.auth.user'='user',
'es.net.http.auth.pass'='pass'
)
when I query date field in Hive external table,I am getting below error
Error:Java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException:org.apache.hadoop.hive.serde2 .io.Timestampwritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampwritableV2 (state=,code=0)
My situation is very similar to this problem.But I have used the timestamp field when I create external table.
My component version:
Hive:3.1.0
ES-Hadoop:6.8.7
Elasticsearch:6.7.1
I switched Hive's execution engine from mr to spark.The error has not changed. After eliminating the component problem, I don't know whether it is the version mismatch or the table creation problem.

Cannot read timestamp data from s3 table with parquet data through hive , LongWritable cannot be cast to TimestampWritableV2

So I'm trying to read a table thats pointed to an s3 bucket with parquet files. The table ddl has input format as : org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
and output format as: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
I'm getting this error when doing a simple select * from table.
org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2
I'm able to query the data in Athena and I've searched endlessly on google. One solution I've seen is to recreate the table but change the data type to string and then convert to timestamp when querying the table.

How to change data type for column on partioned external Hive table (parquet) without deleting data?

I have a partitioned external Hive table. It has data loaded from parquet files. I have a few columns in that table that require a datatype change (TIMESTAMP -> STRING). Currently, when you query these columns, it returns NULL values because of the wrong data-type.
I ran ALTER TABLE table_name CHANGE col_1 col_1 STRING; to successfully change the datatype for that column to STRING, but when I query the table again, the data in that table is still showing NULL. Is there a way to update the data without dropping the partitions and re-loading the data from scratch?

Index Hbase data to solr via Hive external table

I have crawled some data via Nutch 2.3.1. Data is stored in Hbase 0.98 table. I have created an external table that import data from hbase table. Now I have to index this data to solr 4.10.3. For that I have followed this well known tutorial. I have created hive table like
create external table if not exists solr_items (
id STRING,
content STRING,
url STRING,
title STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
stored by "com.chimpler.hive.solr.SolrStorageHandler"
with serdeproperties ("solr.column.mapping"="id,content,url,title")
tblproperties ("solr.url" = "http://localhost:8983/solr/collection1") ;
There was some problem when I tried to copy data from hbase posted here. Then I just decide to first index some dummy data. For that I have decided to load data from a file like
LOAD DATA LOCAL INPATH 'data.csv3' OVERWRITE INTO TABLE solr_items;
But it gave following error
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
Where is the problem
HADOOP version is 1.2.1
You can't use LOAD DATA for external tables. Hive LanguageManual DML:
Hive does not do any transformation while loading data into tables.
Load operations are currently pure copy/move operations that move
datafiles into locations corresponding to Hive tables.
Hive obviously can't just copy data in case of Solr external table because Solr uses it's own internal data presentation.
You can insert though:
insert into table solr_items select * from tempTable;

How to query sorted/indexed columns in Impala

I have to make a POC with Hadoop for a database using interactive query (~300To log database). I'm trying Impala but i didn't find any solution to use sorted or indexed data. I'm a newbie so i don't even know if it is possible.
How to query sorted/indexed columns in Impala ?
By the way, here is my table's code (simplified).
I would like to have a fast access on the "column_to_sort" below.
CREATE TABLE IF NOT EXISTS myTable (
unique_id STRING,
column_to_sort INT,
content STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
STORED AS textfile;

Resources