Installing Hive (Hadoop) on windows (Cygwin) - windows

I've just installed hadoop on windows using cygwin which works fine, and now I am installing Hive. I am running it as:
bin/hive -hiveconf java.io.tmpdir=/cygdrive/c/cygwin/tmp
OR
bin/hive -hiveconf java.io.tmpdir=/tmp
(both give the same problem) as I have found out there is a bug with the windows naming convension (https://issues.apache.org/jira/browse/HIVE-2388...)
When I run the above command, Hive seems to load fine, but when I enter "show tables;" I get no response. This is the same for all queries. CREATE TABLE etc, there is no response
Its the same problem as this guy:
http://mail-archives.apache.org/mod_mbox...
Any ideas?

I resolved a similar issue and successfully ran HIVE after starting all Hadoop daemons
namenode
datanode
jobtracker
Task Tracker
Running queries from files using hive -f <filename>, instead of writing queries directly at the HIVE command prompt. Additionally, you may also use bin/hive -e 'SHOW TABLES'

Related

how hive is running without hive-site.xml file?

I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.

Error when trying to execute kylin.sh start in HDP Sandbox 2.6

I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance
Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.
Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.

Can sqoop run without hadoop?

Just wondering can sqoop run without a hadoop cluster? sort of in a standalone mode? Has anyone tried to run sqoop on spark, please share some experiences on it.
To run Sqoop commands (both sqoop1 and sqoop2), Hadoop is a mandatory prerequisite. You cannot run sqoop commands without the Hadoop libraries.
Sqoop works in local mode too, so it is not a requirement that the Hadoop daemons must be running. To run sqoop in local mode,
sqoop [tool-name] -fs local -jt local [tool-arguments]
Sqoop on Spark is still In-Progress. See SQOOP-1532

How to know Hive and Hadoop versions from command prompt?

How can I find which Hive version I am using from the command prompt. Below is the details-
I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- leo-ingesting.vip.name.com and then I click Open. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did
$ bash
bash-3.00$ hive
Hive history file=/tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name=hdmi-technology;
hive> select * from table LIMIT 1;
So is there any way from the command prompt I can find which hive version I am using and Hadoop version too?
$ hive --version
Hive version 0.8.1.3
EDIT: added another '-' before the version. Doesn't work for newer versions. Hope it works for all now.
Known to work in the following distributions:
HortonWorks distribution: $ hive --version Hive 0.14.0.2.2.0.0-2041
CDH 5.3
It does not work:
CDH 4.3
HDinsight (Azure)
$ hadoop version
Hadoop 0.20.2-cdh3u4
Not sure you can get the Hive version from the command line, though. Maybe you could use something like the hive.hwi.war.file property or pull it out of the classpath, though.
You can not get hive version from command line.
You can checkout hadoop version as mentioned by Dave.
Also if you are using cloudera distribution, then look directly at the libs:
ls /usr/lib/hive/lib/ and check for hive library
hive-hwi-0.7.1-cdh3u3.jar
You can also check the compatible versions here:
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-1-x/CDH-Version-and-Packaging-Information/CDH-Version-and-Packaging-Information.html
You CAN get version from command line(hive or beeline).
hive> select version();
OK
1.1.0-cdh5.12.0 rUnknown
Time taken: 2.815 seconds, Fetched: 1 row(s)
hive>
The below works on Hadoop 2.7.2
hive --version
hadoop version
pig --version
sqoop version
oozie version
This should certainly work:
hive --version
hive -e "set hive.hwi.war.file;" | cut -d'-' -f3
Use the version flag from the CLI
[hadoop#usernode~]$ hadoop version
Hadoop 2.7.3-amzn-1
Subversion git#aws157git.com:/pkg/Aws157BigTop -r d94115f47e58e29d8113a887a1f5c9960c61ab83
Compiled by ec2-user on 2017-01-31T19:18Z
Compiled with protoc 2.5.0
From source with checksum 1833aada17b94cfb94ad40ccd02d3df8
This command was run using /usr/lib/hadoop/hadoop-common-2.7.3-amzn-1.jar
[hadoop#usernode ~]$ hive --version
Hive 1.0.0-amzn-8
Subversion git://ip-20-69-181-31/workspace/workspace/bigtop.release-rpm-4.8.4/build/hive/rpm/BUILD/apache-hive-1.0.0-amzn-8-src -r d94115f47e58e29d8113a887a1f5c9960c61ab83
Compiled by ec2-user on Tue Jan 31 19:51:34 UTC 2017
From source with checksum 298304aab1c4240a868146213f9ce15f
From the hive shell issue 'set system.sun.java.command'
The hive-cli.jar version is the hive version.
<code>
hive> set system:sun.java.command;
system:sun.java.command=org.apache.hadoop.util.RunJar /opt/cloudera/parcels/CDH-4.2.2-1.cdh4.2.2.p0.10/bin/../lib/hive/lib/hive-cli-**0.10.0**-cdh**4.2.2**.jar org.apache.hadoop.hive.cli.CliDriver
hive>
</code>
The above example shows Hive version 0.10.0 for CDH version 4.2.2
hive --version
hadoop version
We can find hive version by
on linux shell : "hive --version"
on hive shell : " ! hive --version;"
above cmds works on hive 0.13 and above.
Set system:sun.java.command;
gives the hive version from hue hive editor it gives the the jar name which includes the version.
to identify hive version on a EC2 instance use
hive --version
Below command works , i tried this and got the current version as
/usr/bin/hive --version
If you are using beeline to connect to hive, then !dbinfo will give all the underlying database details and in the output getDatabaseProductVersion will have the hive database version.
Sample output:
getDatabaseProductVersion 1.2.1000.2.4.3.0-227
If you are using hortonworks distro then using CLI you can get the version with the command:
hive --version
We can also get the version by looking at the version of the hive-metastore jar file.
For example:
$ ls /usr/lib/hive/lib/ | grep metastore
hive-metastore-0.13.1.jar
You can get Hive version
hive --version
if you want to know hive version and its related package versions.
rpm -qa|grep hive
Output will be like below.
libarchive2-2.5.5-5.19
hive-0.13.0.2.1.2.2-516
perl-Archive-Zip-1.24-2.7
hive-jdbc-0.13.0.2.1.2.2-516
webhcat-tar-hive-0.13.0.2.1.2.2_516-2
hive-webhcat-0.13.0.2.1.2.2-516
hive-hcatalog-0.13.0.2.1.2.2-516
Latter gives better understanding of hive and its dependents. Nevertheless rpm needs to be present.
Use the below command to get hive version
hive --service version
From your SSH connection to edge node, you can simply type
hive --version
Hive 1.2.1000.x.x.x.x-xx
This returns the Hive version for your distribution of Hadoop. Another approach is if you enter into beeline, you can find the version straight away.
beeline
Beeline version 1.2.1000.x.x.x.x-xx by Apache Hive
another way is to make a REST call, if you have WebHCat (part of Hive project) installed, is
curl -i http://172.22.123.63:50111/templeton/v1/version/hive?user.name=foo
which will come back with JSON like
{"module":"hive","version":"1.2.1.2.3.0.0-2458"}
WebHCat docs has some details
Yes you can get version of your hive by using "hive command":
hive --service version
You can get a list of available service names by using following "hive command":
hive --service help
you can look for the jar file as soon as you login to hive
jar:file:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1401-140130.jar!/hive-log4j.properties
/usr/bin/hive --version worked for me.
[qa#ip-10-241-1-222 ~]$ /usr/bin/hive --version
Hive 0.13.1-cdh5.3.1
Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-0.13.1-cdh5.3.1 -r Unknown
Compiled by jenkins on Tue Jan 27 16:38:55 PST 2015
From source with checksum 1bb86e4899928ce29cbcaec8cf43c9b6
[qa#ip-10-241-1-222 ~]$
On HDInsight I tried the hive --version, but it did not recognize the option or mention it in the help.
D:\Users\admin1>%hive_home%/bin/hive --version
Unrecognized option: --version
usage: hive
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
However when you login to the head node and start the hive console it prints out some helpful configuration information from which the version can be read:
D:\Users\admin1>%hive_home%/bin/hive
Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.11.0-2316/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.4.0.2.1.11.0-2316/share/hadoop/common/lib/slf4j-log4j12-1.7.5.j
ar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.0.2.1.11.0-2316-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4
j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> quit;
From this I would say I have Hive version 0.13 deployed, which is consistent with this list of versions https://hive.apache.org/downloads.html
I was able to get the version of installed Hadoop 3.0.3 by the following command
$HADOOP_HOME/bin$ ./hadoop version
which gave me the following output
Hadoop 3.0.3
Source code repository https://yjzhangal#git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.3.jar

Hive doesn't respond when I try to make a query

I have a setup on a EC2 instance that uses Whirr to spin up new hadoop instances. I have been trying to get Hive to work with this setup. Hive should be configured to use mysql as the local metastore. The issue that I am having is that every time I try to run a query like( CREATE TABLE testers (foo INT, bark STRING); ) via the hive interface it just hangs there and doesn't seem like it is doing anything.
Any help would be appreciated.
I would first get the debug output from the hive command line to see where it is hanging. Run the hive shell with this parameter, and then paste the output of your command.
hive -hiveconf hive.root.logger=DEBUG,console

Resources