We have developed custom input format to process edi files. After the recent upgrade to 5.8, select * from table doesn't return any rows.
Hive script:
*create external table CustomInputTest
all_cols String
STORED AS INPUTFORMAT 'parser.mapred.X12InputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/data/EDI'
TBLPROPERTIES ('edi.schema.hdfs.path' = '/user/data/layout/edi.xsl');*
*set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
select * from CustomInputTest;*
The same script returns expected output on Hive 1.1.0-cdh5.4.9.
on CDH 5.8, if the hive fetch task is disabled to force the query to generate MapReduce, the select query is working fine.
*set hive.fetch.task.conversion=none*;
I've checked hive server logs, i don't see any errors.
How to fix the issue, such that hive fetch task works in the new version[5.8]
I created a partitioned Hive table using the following query
`cid` string COMMENT '',
`member` string COMMENT '',
`account` string COMMENT '')
PARTITIONED BY (update_period string)
I'm writing to the partitioned location using map reduce program. when I read the output files using avro tools it is showing the correct data in json format. But when I use hive query to display the data, nothing is displayed. If I don't use partition field during table creation then the values are displayed in hive. what could be the reason for this ? I specify the output location for the mapreduce program as "/user/customer/update_period=201811".
Do I need to add anything in the mapreduce program configuration to resolve this?
You need to run msck repair table once you have loaded a new partition in HDFS location.
Why we need to run msck Repair table statement everytime after each ingestion?
Hive stores a list of partitions for each table in its metastore. However new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.
1.Adding each partition to the table
hive> alter table <db_name>.<table_name> add partition(`date`='<date_value>')
location '<hdfs_location_of the specific partition>';
2.Run metastore check with repair table option
hive> Msck repair table <db_name>.<table_name>;
which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore.
I have a Hive DB - I created a table, compatible to Parquet file type.
CREATE EXTERNAL TABLE `default.table`(
`date` date,
`udid` string,
`message_token` string)
`dt` date)
I added partitions to this table, but I can't query the data.
In Hive: I can see the partitions when using "Show partitions from default.table", and I get the number of queries when using "Select count(*) from default.table".
In Presto: I can see the partitions when using "Show partitions from default.table", but when I try to query the data itself - it looks like there's no data - empty return with "select *", and 0 when trying "select count(*)".
Hive cluster is AWS EMR, version: emr-5.9.0, Applications: Hive 2.3.0, Presto 0.184, instance type: r3.2xlarge.
Does someone know why I get these differences between Hive and Presto?
I am running spark sql on hive. I need to add auto.purge table properties while creating new hive table. I tried below code to add options while calling saveAsTable method :
inputDF.write.option("auto.purge" -> "true").saveAsTable(hiveTableName)
Above line of code added a property under WITH SERDEPROPERTIES of table.
I need to add this property under TBLPROPERTIES section of hive DDL.
Finally i found a solution, I am not sure if this is the best solution.
Unfortunately Spark 1.5 sql saveAsTable method doesn't support table property as input.They are creating new tableProperties map before hive table creation.
check out below code:
To add table properties to existing hive table use alter table command.
ALTER TABLE table_name SET TBLPROPERTIES ('auto.purge'='true');
Above command will add table property to hive meta store.
To drop existing table inside encryption zone run above command before drop command.
I have crawled some data via Nutch 2.3.1. Data is stored in Hbase 0.98 table. I have created an external table that import data from hbase table. Now I have to index this data to solr 4.10.3. For that I have followed this well known tutorial. I have created hive table like
create external table if not exists solr_items (
content STRING,
title STRING
stored by "com.chimpler.hive.solr.SolrStorageHandler"
with serdeproperties ("solr.column.mapping"="id,content,url,title")
tblproperties ("solr.url" = "http://localhost:8983/solr/collection1") ;
There was some problem when I tried to copy data from hbase posted here. Then I just decide to first index some dummy data. For that I have decided to load data from a file like
But it gave following error
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
Where is the problem
HADOOP version is 1.2.1
You can't use LOAD DATA for external tables. Hive LanguageManual DML:
Hive does not do any transformation while loading data into tables.
Load operations are currently pure copy/move operations that move
datafiles into locations corresponding to Hive tables.
Hive obviously can't just copy data in case of Solr external table because Solr uses it's own internal data presentation.
You can insert though:
insert into table solr_items select * from tempTable;
I am new to HBase. Can someone provide me a detailed example on how bulk loading can be done in a HBase table.
Say for example I have a customer file with 10 columns and 100K rows. I want to load the file in a HBase table.
I have created a HBase table which is managed by HIVE and tried to load the same using LOAD command, but it failed.
Looks like I have to insert the table from HBase only.
hive (Koushik)> CREATE TABLE hive_hbase_emp_sample(eid int, ename string, esal double)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> ("hbase.columns.mapping" = ":key,cfstr:enm,cfsal:esl")
> TBLPROPERTIES ("hbase.table.name" = "hive_hbase_emp_sample");
Time taken: 6.404 seconds
hive (Koushik)> load data local inpath '/home/hduser/sample_emp_file' into table hive_hbase_emp_sample;
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
You cannot direcly use load for targeting a HbaseStorage Handler Non native table instead load data in a staging table and then insert into your Hbase table using select * from staging table