How to add SerDe jar - hadoop

I use Hive to create table store sequencefile. Row format is serder class myserde.TestDeserializer in hiveserde-1.0.jar
In the command line I use this command to add the jar file:
hive ADD JAR hiveserde-1.0.jar
Then I create a table, the file loads successfully.
But now I want to run it and create a table on the client by using mysql jdbc.
The error is :
SerDe: myserde.TestDeserializer does not exist.
How to run it ? Thanks

So, there are a few options. In all of them the jar needs to be present on your cluster with Hive installed. The JDBC client code, of course, can be run from anywhere within or outside of the cluster.
Option 1: You issue a HQL query before you run any of your other HQL commands:
ADD JAR hiveserde-1.0.jar
Option 2: You can update your hive-site.xml to have the
hive.aux.jars.path property set to the complete path to your jar hiveserde-1.0.jar

Go to your hive-env.sh and append to the bottom of the file:
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH:/<path-to-jar>
You can then source this file. Not ideal, but it works.

Are you saying that you'd like to create table by jdbc rather than doing in CLI ? In that case, you should add the jar to your classpath when you run your jdbc code.

Yes this can be a little bit confusing, it seems half the time Hive is reading from the cluster and the other half from the local file system (machine Hive server is installed).
To overcome this simple copy the .jar file to the Hive server machine and you can then reference this in your Hive query for example:
add jar /tmp/json-serde.jar;
create table tweets (
name string,
address1 string,
address2 string,
address3 string,
postcode string
)
...
And then onto the next problem ;)

Related

Add path with aux jars for Hive client

I did have HDP 2.6.1.0-129
I have external Jar example.jar for serialized flume data file.
I did add new parametr in section Custom hive-site
name = hive.aux.jars.path
value hdfs:///user/libs/
Did save new configuration and did restart hadoop componens and in more time restart all hadoop cluster.
After in Hive client I did try to run select
select * from example_serealized_table
and hive did return error
FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.ClassNotFoundException: Class com.my.bigtable.example.model.gen.TSerializedRecord not found)
How solve this problem?
p.s.
If did try add in current session,
add jar hdfs:///user/libs/example-spark-SerializedRecord.jar;
Did try to put *.jar to local folder.
Problem same.
I did not say that library write my my colleague did write a library.
It did turn out that it redefines the variables that affect the level of logging the field.
After excluding overridden variables in the library, the problem of reproducing did stopp.

error getting while creating hive table

Before creating the twitter table i added this
ADD JAR hdfs:///user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar;
I got the following error when create the twitter table in hive:
Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde:
com.cloudera.hive.serde.JSONSerDe
Move the Jar from HDFS to Local.
Then try to add JAR in hive terminal
Then try to use the query on Twitter Table
Ideally speaking you can add jars from both Local file system or hdfs, looks like problem could be something else here.
I would recommend to follow below sequence of steps:
List the file on hdfs to make sure it exists
hadoop fs -ls hdfs://namenode_hostname:8020/user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar
Add the jar in the hive using full path like above and verify the
addition using list jars command in hive cli Use the serde in
hive>list jars;
create table statement with proper syntax as show here for example
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe

How to add jar files for Hue in Cloudera?

I'm running an SQL query on a JSON serde table. It's working in the Hive CLI, but it's failing in Hue with the error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I guess it's due to the missing jar file; any idea how to add the jar file hive-hcatalog-core-1.2.1.jar for Hue?
Place your jar in HDFS and add same path by using ADD JAR hdfs:///user/hive/lib/hive-hcatalog-core-1.2.1.jar ;
Run ADD JAR hive-hcatalog-core-1.2.1.jar in hue before your query this thing will be present till your current secession persists.
For the benefit of others, who might face same issue either for this particular jar "hive-hcatalog-core-1.2.1.jar" or any udf jar:
In the HUE - Query Editor, run the following command:
add jar hdfs:/hive-hcatalog-core-1.2.1.jar;
Please note single quotes is not required as is the case with Hive CLI
Exact command cloudera gave is ADD JAR {{lib_dir}}/hive/lib/hive-contrib.jar;
1)I am unable to find hive/lib directory on CDH 5
The {{lib_dir}} on CDH installed environments for Hive would either be /usr/lib/hive/ or /opt/cloudera/parcels/CDH/lib/hive/ (depending on packages or parcels being in use).
this is the way to add jar in cloudera
for this you have to change to supper user by use this command
SUDO SU
it will change to supper user

Export data into Hive from a node without Hadoop(HDFS) installed

Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.
Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.

how to access hadoop hdfs with greenplum external table

oue datawarehouse is based on hive,now we need to transform data from hive to greenplum,we want to use external table with gphdfs,but it looks something goes wrong.
the table creating script is
CREATE EXTERNAL TABLE flow.http_flow_data(like flow.zb_d_gsdwal21001)
LOCATION ('gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt')
FORMAT 'TEXT' (DELIMITER ' ');
when we run
bitest=# select * from flow.http_flow_data limit 1;
ERROR: external table http_flow_data command ended with error. sh: java: command not found (seg12 slice1 sdw3:40000 pid=17778)
DETAIL: Command: gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt
our hadoop is 1.0 and greenplum is 4.1.2.1
I want to know if we need to config something about to make gp access hadoop
Have you opened the port (8081) to listen for the month_id=201202 directory?
I would double check the admin guide, I think you can use gphdfs, but not until greenplum 4.2
have you checked to ensure that java is installed on your Greenplum system? as this is required in order for gphdfs to work.

Resources