Hi i am planning to integrate HBase and HIVE for one of my project .
I am confused in adding jars and where to add these jars?
I am using Hadoop 2.6.0-cdh5.7.0 .
I have downloaded jars:
guava-r09.jar
hbase-0.92.0.jar
hive-hbase-handler-0.9.0.jar
zookeeper-3.3.4.jar
I ran this command to create table
CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
Now where should I copy all this jars?
Do i have to copy in /usr/lib/hive location and then i have to run add jar command?
All this jar version will work for my Hadoop version ?
I have just copied jars in one of the directory and then providing path to directory in hive, I am running add jars command, but it throws error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: Not a host:port pair: PBUF
quickstart.cloudera���ʼ��+��
If you place jars in /lib directory then all jars automatically available in hive CLASSPATH and you don't need add those jars explicitly again using add jar command.
The error you getting because add jar command expect fully qualified path of a jar.
add jar <fully qualified path of jar>;
Read hive-hbase handler for more details.
Related
I'm following this tutorial: https://learn.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-develop-java-topology
What I've done so far is
maven setting
vi *.java files (in src/main/java/com/microsoft/example directory)
RandomSentenceSpout.java
SplitSentence.java
WordCount.java
WordCountTopology.java
mvn compile
jar cf storm.jar *.class (in target/classes/com/microsoft/example directory)
RandomSentenceSpout.class SplitSentence.class WordCount.class WordCountTopology.class
The above 4 files were used to make storm.jar file
Then, I tried
storm jar ./storm.jar com.microsoft.example.WordCountTopology WordCountTopology
and
storm jar ./storm.jar WordCountTopology
, but both of these failed, saying:
Error: Could not find or load main class com.microsoft.example.WordCountTopology
or
Error: Could not find or load main class WordCountTopology
According to a document, it says
Syntax: storm jar topology-jar-path class ...
Runs the main method of class with the specified arguments. The storm
jars and configs in ~/.storm are put on the classpath. The process is
configured so that StormSubmitter will upload the jar at
topology-jar-path when the topology is submitted.
I cannot find where to fix.
How can I resolve this?
I think your jar file does not contain class WordCountTopology. You can check it with jar tf storm.jar | grep WordCountTopology.
Looks like your jar does not contain a Manifest file which keeps information about the main class.
Try including the Manifest file or you can run the below java command to include the Manifest file
Hope this works!
jar cvfe storm.jar mainClassNameWithoutDotClassExtn *.class
I am new to hadoop while exploring the hadoop data join package I am given the below mentioned command:
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1
I am getting below error Exception in thread main
java.lang.NoClassDefFoundError: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.lang.ClassLoader.defineClassImpl(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:364)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:154)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:777)
at java.net.URLClassLoader.access$400(URLClassLoader.java:96)
You need to add the hadoop-datajoin jar into classpath while running the job. Use the -libjars option for adding the extra jars into class path. Your command will be like this. Provide the correct path to jar directory or you can download the jars.
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
-libjars <path>/hadoop-datajoin.jar
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1
I am using hive-0.12. I successfully created a parquet table using the below query.
hive> create table ptest1 (a INT, b DOUBLE)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
stored as INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
OK
Time taken: 0.124 seconds
But when I am using 'STRING' as column data type it is failing.
hive> create table ptest1 (a INT, b STRING)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
stored as INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
Could not initialize class org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetPrimitiveInspectorFactory
Pls suggest what might be wrong here.
Thank you.
I solved this problem of mine by adding below few jars to hive.
add jar parquet-avro-1.2.5.jar;
add jar parquet-cascading-1.2.5.jar;
add jar parquet-column-1.2.5.jar;
add jar parquet-common-1.2.5.jar;
add jar parquet-encoding-1.2.5.jar;
add jar parquet-generator-1.2.5.jar;
add jar parquet-hadoop-1.2.5.jar;
add jar parquet-hive-1.2.5.jar;
add jar parquet-pig-1.2.5.jar;
add jar parquet-scrooge-1.2.5.jar;
add jar parquet-test-hadoop2-1.2.5.jar;
add jar parquet-thrift-1.2.5.jar;
add jar parquet-format-1.0.0.jar;
Now its working fine. Thank You.
I have been trying spark using spark-shell. All my data is in sql.
I used to include external jars using the --jars flag like /bin/spark-shell --jars /path/to/mysql-connector-java-5.1.23-bin.jar --master spark://sparkmaster.com:7077
I have included it in class path by changing the bin/compute-classpath.sh file
I was running succesfully with this config.
Now when I am running a standalone job through jobserver. I am getting the following error message
result: {
"message" : "com.mysql.jdbc.Driver"
"errorClass" : "java.lang.classNotFoundException"
"stack" :[.......]
}
I have included the jar file in my local.conf file as below.
context-settings{
.....
dependent-jar-uris = ["file:///absolute/path/to/the/jarfile"]
......
}
All of your dependencies should be included in your spark-jobserver application JAR (e.g. create an "uber-jar"), or be included on the classpath of the Spark executors. I recommend configuring the classpath, as it's faster and requires less disk-space since the third-party library dependencies don't need to be copied to each worker whenever your application runs.
Here are the steps to configure the worker (executor) classpath on Spark 1.3.1:
Copy the third-party JAR(s) to each of your Spark workers and the Spark master
Place the JAR(s) in the same directory on each host (e.g. /home/ec2-user/lib
Add the following line to the Spark /root/spark/conf/spark-defaults.conf file on the Spark master:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/name-of-your-jar-file.jar
Here's an example of my own modifications to use the Stanford NLP library:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/stanford-corenlp-3.4.1.jar:/home/ec2-user/lib/stanford-corenlp-3.4.1-models.jar
You might not be having /path/to/mysql-connector-java-5.1.23-bin.jar in your workers.
You can either copy required dependency to all spark workers or
Bundle the submitting jar with required dependencies.
I use maven for building the jar. The scope of dependencies must be run-time.
curl --data-binary #/PATH/jobs_jar_2.10-1.0.jar 192.168.0.115:8090/jars/job_to_be_registered
For posting dependency jar
curl -d "" 'http://192.168.0.115:8090/contexts/new_context?dependent-jar-uris=file:///path/dependent.jar'
This works for jobserver 1.6.1
I am adding a .jar file to the class path using Disributed Cache:
DistributedCache.addFileToClassPath(new Path("binary/tools.jar"), job.getConfiguration());
I am not sure whether addFileToClassPath() is the correct API to be used for adding .jar files to the class path. When I try to retrieve the class path from the mapper, I could not see the added jar. The classpath contains the working directory for the job (jobcache dir), but that does not include the jar distributed through Distributed Cache.
Properties prop = System.getProperties();
System.out.println("The classpath is: " + prop.getProperty("java.class.path", null));
I tried addArchiveToClassPath() too.. It did no work..
Am I missing something ?
Thanks,
The problem was was with the path. addFileToClassPath() or addArchiveToClassPath() takes only absolute path as input. binary/tools.jar is relative and hence did not work. I need to specify the path as /user/<username>/binary/tools.jar.. Now it works fine. Even hdfs://<hostname>:port/user/.. fails.
Thank you all..
Is the jar you are adding to the classpath on the local file system, or in HDFS?
DistributedCache expects the path you name to be in HDFS