Unable to run SerDe - hadoop

We have one ebcdic sample file.
It is stored in /user/hive/warehouse/ebcdic_test_file.txt
Cobol layout of the file is stored in /user/hive/Warehouse/CobolSerde.cob
We are running on Hue browser query editor.
We also tried in CLI.
But the same error is coming
We have added CobolSerde.jar.
Via
Add jar /home/cloudera/Desktop/CobolSerde.jar
It has been added successfully. Proof via LIST JARS.
Query
CREATE EXTERNAL TABLE cobol2Hve
ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde2.cobol.CobolSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/ebcdic_test_file.txt'
TBLPROPERTIES ('cobol.layout.url'='/user/hive/warehouse/CobolSerDe.cob','fb.length'='159');
Error while processing statement:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
Cannot validate serde: com.savy3.hadoop.hive.serde2.cobol.CobolSerDe
Why is the error coming?
What is fb.length?

Related

Alter table in hive is not working for serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' in Hive "Apache Hive (version 2.1.1-cdh6.3.4)"

Environment:
Apache Hive (version 1.1.0-cdh5.14.2)
I tried creating a table with below DDL.
create external table test1 (v_src_code string,d_extraction_date date) partitioned by (d_mis_date date) row format serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' with serdeproperties ("field.delim"="~|") stored as textfile location '/hdfs_path/test1' tblproperties("serialization.null.format"="");
Then I alter this table by adding one extra column as below.
alter table test1 add columns(n_limit_id bigint);
This is working perfectly fine.
But recently our cluster got upgraded. The new environment is
Apache Hive (version 2.1.1-cdh6.3.4)
The same table is created in this new environment. When I do alter table I get below error.
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: type expected at the position 0 of '<derived from deserializer>:bigint' but '<' is found. (state=08S01,code=1)

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

EXTERNAL TABLE to a file in Hive?

Is it possible to use file in LOCATION for external table in HIVE?
CREATE EXTERNAL TABLE table1
(
line string
)
LOCATION '/hdp_in/fd/file.txt.gz';
cause I get an error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.FileAlreadyExistsException Parent path is not a directory: /hdp_in/fd/file.txt.gz file.txt.gz
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1957)
(...)
Do I have to use only directories? I haven't found that info in Manual Reference...
Regards
Pawel
Yes you will have to put this file in a directory and then create an external table on top of it. As per the documentation : An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir
Even if you create an internal table hive by default creates a directory for it inside the hive.metastore.warehouse.dir and the same behavior is expected while creating an external table except for the fact that the default directory is not used.

serdes jar don't work

I'm sing cdh5 quickstart... I would like to run this script:
CREATE EXTERNAL TABLE serd(
user_id string,
type string,
title string,
year string,
publisher string,
authors struct<name:string>,
source string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/user/hdfs/data/book-seded-workings-reduced.json/' INTO TABLE serd;
But I got this error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
But following my previous question(Loading JSON file with serde in Cloudera) , I've tried to build each serd proposed here: https://github.com/rcongiu/Hive-JSON-Serde
But I always have the same error
Finally, only twitter serde worked in my cdh5 vm

How to create an ORC file in Hive CDH?

I can easily create an ORC file format in Apache Hadoop or Hortonworks' HDP:
CREATE TABLE ... STORED AS ORC
However this doesn't work in Cloudera's CDH 4.5. (Surprise!) I get:
FAILED: SemanticException Unrecognized file format in STORED AS clause: ORC
So as an alternative, I tried to download and install the Hive jar that contains the ORC classes:
hive> add jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hive/lib/hive-exec-0.11.0.jar;
Then create my ORC Table:
hive> CREATE TABLE test (name STRING)
> row format serde
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> stored as inputformat
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> outputformat
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK
But upon inserting into this table from some CSV data, I get an error:
hive> INSERT OVERWRITE TABLE test
> SELECT name FROM textdata;
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
How should I create an ORC table in Hive in CDH?
CDH 4.5 contains Hive 0.10, see CDH Version 4.5.0 Packaging and Tarballs. ORC was added in Hive 0.11, see release notes and HIVE-3874: Create a new Optimized Row Columnar file format for Hive.
CDH 5 is in Beta now but it does contain Hive 0.11, see CDH Version 5.0.0 Beta 1.

Resources