Hive Index Creation failed - hadoop

I am using hive version 3.1.0 in my project I have created one external table using below command.
CREATE EXTERNAL TABLE IF NOT EXISTS testing(ID int,DEPT int,NAME string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
I am trying to create an index for the same external table using the below command.
CREATE INDEX index_test ON TABLE testing(ID)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD ;
But I am getting below error.
Error: Error while compiling statement: FAILED: ParseException line 1:7 cannot recognize input near 'create' 'index' 'user_id_user' in ddl statement (state=42000,code=40000)

According to Hive documentation, Hive indexing is removed since version 3.0
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing#LanguageManualIndexing-IndexingIsRemovedsince3.0

Related

Alter table in hive is not working for serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' in Hive "Apache Hive (version 2.1.1-cdh6.3.4)"

Environment:
Apache Hive (version 1.1.0-cdh5.14.2)
I tried creating a table with below DDL.
create external table test1 (v_src_code string,d_extraction_date date) partitioned by (d_mis_date date) row format serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' with serdeproperties ("field.delim"="~|") stored as textfile location '/hdfs_path/test1' tblproperties("serialization.null.format"="");
Then I alter this table by adding one extra column as below.
alter table test1 add columns(n_limit_id bigint);
This is working perfectly fine.
But recently our cluster got upgraded. The new environment is
Apache Hive (version 2.1.1-cdh6.3.4)
The same table is created in this new environment. When I do alter table I get below error.
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: type expected at the position 0 of '<derived from deserializer>:bigint' but '<' is found. (state=08S01,code=1)

How to delete an external table in Hive when the hdfs path has been deleted?

I've removed my HDFS path /user/abc, and some Hive tables were stored in /user/abc/data/abc.db , with a rm -R command.
Despite having my regular tables correctly deleted with Hive SQL, my external tables didn't drop, with the following error:
[Code: 1, SQL State: 08S01] Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to load storage handler: Error in loading storage handler.org.apache.phoenix.hive.PhoenixStorageHandler)
How can I safely delete the tables?
I tried using:
delete from TBL_COL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBLS where TBL_ID=[myexternaltableID];
But it didn't work with the following error message:
[Code: 10297, SQL State: 42000] Error while compiling statement: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table sys.TBLS that is not transactional
Thank you,
NB: I know a schema is supposed to be deleted more safely with HiveQL but on this particular case, this was not done this way.
Solution is to delete the tables from Hive Metastore (PostgreSQL) with
delete from "TABLE_PARAMS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_COL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBLS" where "TBL_ID"='[myexternaltableID]';
NB: Order is important.

Unable to run SerDe

We have one ebcdic sample file.
It is stored in /user/hive/warehouse/ebcdic_test_file.txt
Cobol layout of the file is stored in /user/hive/Warehouse/CobolSerde.cob
We are running on Hue browser query editor.
We also tried in CLI.
But the same error is coming
We have added CobolSerde.jar.
Via
Add jar /home/cloudera/Desktop/CobolSerde.jar
It has been added successfully. Proof via LIST JARS.
Query
CREATE EXTERNAL TABLE cobol2Hve
ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde2.cobol.CobolSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/ebcdic_test_file.txt'
TBLPROPERTIES ('cobol.layout.url'='/user/hive/warehouse/CobolSerDe.cob','fb.length'='159');
Error while processing statement:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
Cannot validate serde: com.savy3.hadoop.hive.serde2.cobol.CobolSerDe
Why is the error coming?
What is fb.length?

serdes jar don't work

I'm sing cdh5 quickstart... I would like to run this script:
CREATE EXTERNAL TABLE serd(
user_id string,
type string,
title string,
year string,
publisher string,
authors struct<name:string>,
source string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/user/hdfs/data/book-seded-workings-reduced.json/' INTO TABLE serd;
But I got this error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
But following my previous question(Loading JSON file with serde in Cloudera) , I've tried to build each serd proposed here: https://github.com/rcongiu/Hive-JSON-Serde
But I always have the same error
Finally, only twitter serde worked in my cdh5 vm

How to create an ORC file in Hive CDH?

I can easily create an ORC file format in Apache Hadoop or Hortonworks' HDP:
CREATE TABLE ... STORED AS ORC
However this doesn't work in Cloudera's CDH 4.5. (Surprise!) I get:
FAILED: SemanticException Unrecognized file format in STORED AS clause: ORC
So as an alternative, I tried to download and install the Hive jar that contains the ORC classes:
hive> add jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hive/lib/hive-exec-0.11.0.jar;
Then create my ORC Table:
hive> CREATE TABLE test (name STRING)
> row format serde
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> stored as inputformat
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> outputformat
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK
But upon inserting into this table from some CSV data, I get an error:
hive> INSERT OVERWRITE TABLE test
> SELECT name FROM textdata;
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
How should I create an ORC table in Hive in CDH?
CDH 4.5 contains Hive 0.10, see CDH Version 4.5.0 Packaging and Tarballs. ORC was added in Hive 0.11, see release notes and HIVE-3874: Create a new Optimized Row Columnar file format for Hive.
CDH 5 is in Beta now but it does contain Hive 0.11, see CDH Version 5.0.0 Beta 1.

Resources