Error when trying to execute kylin.sh start in HDP Sandbox 2.6 - hadoop

I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance

Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.

Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.

Related

How to add jar files for Hue in Cloudera?

I'm running an SQL query on a JSON serde table. It's working in the Hive CLI, but it's failing in Hue with the error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I guess it's due to the missing jar file; any idea how to add the jar file hive-hcatalog-core-1.2.1.jar for Hue?
Place your jar in HDFS and add same path by using ADD JAR hdfs:///user/hive/lib/hive-hcatalog-core-1.2.1.jar ;
Run ADD JAR hive-hcatalog-core-1.2.1.jar in hue before your query this thing will be present till your current secession persists.
For the benefit of others, who might face same issue either for this particular jar "hive-hcatalog-core-1.2.1.jar" or any udf jar:
In the HUE - Query Editor, run the following command:
add jar hdfs:/hive-hcatalog-core-1.2.1.jar;
Please note single quotes is not required as is the case with Hive CLI
Exact command cloudera gave is ADD JAR {{lib_dir}}/hive/lib/hive-contrib.jar;
1)I am unable to find hive/lib directory on CDH 5
The {{lib_dir}} on CDH installed environments for Hive would either be /usr/lib/hive/ or /opt/cloudera/parcels/CDH/lib/hive/ (depending on packages or parcels being in use).
this is the way to add jar in cloudera
for this you have to change to supper user by use this command
SUDO SU
it will change to supper user

Cannot start standalone instance of HBase

I was unable to configure the HBase standalone instance. Following are the steps I followed:
Downloaded hbase-0.98.9-hadoop2 and extracted it.
Set my JAVA_HOMEin the environment variables.
Edited conf/hbase-site.xml and changed the configuration as mentioned in the Apache HBase quick start guide.
Ran the bin/start-hbase.sh and this error came up.
Can anyone tell me what I'm missing or doing wrong? Thanks
Here are the steps:
http://hbase.apache.org/cygwin.html
Hbase cannot be installed without cygwin tooling.

Hadoop issue with Sqoop installation

I have Hadoop(pseudo distributed mode), Hive, sqoop and mysql installed in my local machine.
But when I am trying to run sqoop Its giving me the following error
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.
Then I set the sqoop-env-template.sh file with all the information. Beneath is the snapshot of the sqoop-env-template.sh file.
Even after providing the hadoop hive path I face the same error.
I've installed
hadoop in /home/hduser/hadoop version 1.0.3
hive in /home/hduser/hive version 0.11.0
sqoop in /home/hduser/sqoop version 1.4.4
and mysql connector jar java-5.1.29
Could anybody please throw some light on what is going wrong
sqoop-env-template.sh is a template, meaning it doesn't by itself get sourced by the configurator. If you want it to have a custom conf and load it, make a copy as $SQOOP_HOME/conf/sqoop-env.sh.
Note: here is the relevant excerpt from bin/configure-sqoop for version 1.4.4:
SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
. "${SQOOP_CONF_DIR}/sqoop-env.sh"
fi

How to know Hive and Hadoop versions from command prompt?

How can I find which Hive version I am using from the command prompt. Below is the details-
I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- leo-ingesting.vip.name.com and then I click Open. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did
$ bash
bash-3.00$ hive
Hive history file=/tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name=hdmi-technology;
hive> select * from table LIMIT 1;
So is there any way from the command prompt I can find which hive version I am using and Hadoop version too?
$ hive --version
Hive version 0.8.1.3
EDIT: added another '-' before the version. Doesn't work for newer versions. Hope it works for all now.
Known to work in the following distributions:
HortonWorks distribution: $ hive --version Hive 0.14.0.2.2.0.0-2041
CDH 5.3
It does not work:
CDH 4.3
HDinsight (Azure)
$ hadoop version
Hadoop 0.20.2-cdh3u4
Not sure you can get the Hive version from the command line, though. Maybe you could use something like the hive.hwi.war.file property or pull it out of the classpath, though.
You can not get hive version from command line.
You can checkout hadoop version as mentioned by Dave.
Also if you are using cloudera distribution, then look directly at the libs:
ls /usr/lib/hive/lib/ and check for hive library
hive-hwi-0.7.1-cdh3u3.jar
You can also check the compatible versions here:
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-1-x/CDH-Version-and-Packaging-Information/CDH-Version-and-Packaging-Information.html
You CAN get version from command line(hive or beeline).
hive> select version();
OK
1.1.0-cdh5.12.0 rUnknown
Time taken: 2.815 seconds, Fetched: 1 row(s)
hive>
The below works on Hadoop 2.7.2
hive --version
hadoop version
pig --version
sqoop version
oozie version
This should certainly work:
hive --version
hive -e "set hive.hwi.war.file;" | cut -d'-' -f3
Use the version flag from the CLI
[hadoop#usernode~]$ hadoop version
Hadoop 2.7.3-amzn-1
Subversion git#aws157git.com:/pkg/Aws157BigTop -r d94115f47e58e29d8113a887a1f5c9960c61ab83
Compiled by ec2-user on 2017-01-31T19:18Z
Compiled with protoc 2.5.0
From source with checksum 1833aada17b94cfb94ad40ccd02d3df8
This command was run using /usr/lib/hadoop/hadoop-common-2.7.3-amzn-1.jar
[hadoop#usernode ~]$ hive --version
Hive 1.0.0-amzn-8
Subversion git://ip-20-69-181-31/workspace/workspace/bigtop.release-rpm-4.8.4/build/hive/rpm/BUILD/apache-hive-1.0.0-amzn-8-src -r d94115f47e58e29d8113a887a1f5c9960c61ab83
Compiled by ec2-user on Tue Jan 31 19:51:34 UTC 2017
From source with checksum 298304aab1c4240a868146213f9ce15f
From the hive shell issue 'set system.sun.java.command'
The hive-cli.jar version is the hive version.
<code>
hive> set system:sun.java.command;
system:sun.java.command=org.apache.hadoop.util.RunJar /opt/cloudera/parcels/CDH-4.2.2-1.cdh4.2.2.p0.10/bin/../lib/hive/lib/hive-cli-**0.10.0**-cdh**4.2.2**.jar org.apache.hadoop.hive.cli.CliDriver
hive>
</code>
The above example shows Hive version 0.10.0 for CDH version 4.2.2
hive --version
hadoop version
We can find hive version by
on linux shell : "hive --version"
on hive shell : " ! hive --version;"
above cmds works on hive 0.13 and above.
Set system:sun.java.command;
gives the hive version from hue hive editor it gives the the jar name which includes the version.
to identify hive version on a EC2 instance use
hive --version
Below command works , i tried this and got the current version as
/usr/bin/hive --version
If you are using beeline to connect to hive, then !dbinfo will give all the underlying database details and in the output getDatabaseProductVersion will have the hive database version.
Sample output:
getDatabaseProductVersion 1.2.1000.2.4.3.0-227
If you are using hortonworks distro then using CLI you can get the version with the command:
hive --version
We can also get the version by looking at the version of the hive-metastore jar file.
For example:
$ ls /usr/lib/hive/lib/ | grep metastore
hive-metastore-0.13.1.jar
You can get Hive version
hive --version
if you want to know hive version and its related package versions.
rpm -qa|grep hive
Output will be like below.
libarchive2-2.5.5-5.19
hive-0.13.0.2.1.2.2-516
perl-Archive-Zip-1.24-2.7
hive-jdbc-0.13.0.2.1.2.2-516
webhcat-tar-hive-0.13.0.2.1.2.2_516-2
hive-webhcat-0.13.0.2.1.2.2-516
hive-hcatalog-0.13.0.2.1.2.2-516
Latter gives better understanding of hive and its dependents. Nevertheless rpm needs to be present.
Use the below command to get hive version
hive --service version
From your SSH connection to edge node, you can simply type
hive --version
Hive 1.2.1000.x.x.x.x-xx
This returns the Hive version for your distribution of Hadoop. Another approach is if you enter into beeline, you can find the version straight away.
beeline
Beeline version 1.2.1000.x.x.x.x-xx by Apache Hive
another way is to make a REST call, if you have WebHCat (part of Hive project) installed, is
curl -i http://172.22.123.63:50111/templeton/v1/version/hive?user.name=foo
which will come back with JSON like
{"module":"hive","version":"1.2.1.2.3.0.0-2458"}
WebHCat docs has some details
Yes you can get version of your hive by using "hive command":
hive --service version
You can get a list of available service names by using following "hive command":
hive --service help
you can look for the jar file as soon as you login to hive
jar:file:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1401-140130.jar!/hive-log4j.properties
/usr/bin/hive --version worked for me.
[qa#ip-10-241-1-222 ~]$ /usr/bin/hive --version
Hive 0.13.1-cdh5.3.1
Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-0.13.1-cdh5.3.1 -r Unknown
Compiled by jenkins on Tue Jan 27 16:38:55 PST 2015
From source with checksum 1bb86e4899928ce29cbcaec8cf43c9b6
[qa#ip-10-241-1-222 ~]$
On HDInsight I tried the hive --version, but it did not recognize the option or mention it in the help.
D:\Users\admin1>%hive_home%/bin/hive --version
Unrecognized option: --version
usage: hive
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
However when you login to the head node and start the hive console it prints out some helpful configuration information from which the version can be read:
D:\Users\admin1>%hive_home%/bin/hive
Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.11.0-2316/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.4.0.2.1.11.0-2316/share/hadoop/common/lib/slf4j-log4j12-1.7.5.j
ar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.0.2.1.11.0-2316-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4
j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> quit;
From this I would say I have Hive version 0.13 deployed, which is consistent with this list of versions https://hive.apache.org/downloads.html
I was able to get the version of installed Hadoop 3.0.3 by the following command
$HADOOP_HOME/bin$ ./hadoop version
which gave me the following output
Hadoop 3.0.3
Source code repository https://yjzhangal#git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.3.jar

Hue Hive -- Beeswax Server Can't Find JDBC Driver for MySQL

We're using the Cloudera 3.7.5 and having a tough time configuring the Beeswax server such that the Hue can access the Hive databases. I followed all the instructions from the Cloudera documentation that to setup MySQL to serve as Hive's metastore, but when I restart the Hue services and check Beeswax server's StdErr logs, I still see the painful "javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" which is caused by
org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is bizzare to me, because the logs also indicate that the environment variable HIVE_HOME is equal to "/usr/lib/hive", and sure enough I have copied the "mysql-connector-java-5.1.15-bin.jar" into the /usr/lib/hive/lib directory, as the documents dictate.
I have also tried the instructions on the blog post http://hadoopchallenges.blogspot.com/2011/03/hue-120-upgrade-and-beeswax.html, which involved copying the the mysql-connector jar into "/usr/share/hue/apps/beeswax/hive/lib/". Unfortunately I did not have a hive/lib subdirectory in the beeswax folder, so I attempted to make one. This also did not work.
Any advice how I can get the MySQL JDBC library onto Beeswax's classpath?
We finally decided to just bite the bullet and upgrade to CDH4. Placing the JDBC jar in /usr/share/hive/lib allowed the Beeswax server to function perfectly without issue.
If anyone else is experiencing this issue I recommend upgrading from CDH3 to CDH4, the UI is much cleaner, smoother, and we had much fewer installation and maintenance bugs with CDH4.
You have to paste your mysql connector in HUE_HOME/apps/beeswax/hive/lib.
If this path doesn't exist, create hive/lib and then paste the mysql connector. I hope your problem will be solved.
When you start using cloudera 4.5 they move everything into parcels, so this exact problem on my hive meta server was fixed by this command (below). Essentially you're just re-adding modules. I'm sure you can modify the extra classpath in the hive config file to make this oblivious to parcel updates.
cp /usr/lib/hive/lib/mysql-connector-java-5.1.17-bin.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/.
So a real fix might be something like this:
cp `locate mysql-connector | grep jar | head -n 1` /opt/cloudera/parcels/*/lib/hive/lib/.
which would copy the jar into every parcel.

Resources