Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file - hadoop

I have installed Hadoop 2.7.3 on Windows and I am able to start the cluster. Now I would like to have hive and went through the steps below:
1. Downloaded db-derby-10.12.1.1-bin.zip, unpacked it and started the startNetworkServer -h 0.0.0.0.
2. Downloaded apache-hive-1.1.1-bin.tar.gz from mirror site and unpacked it. Created hive-site.xml to have below properties:
javax.jdo.option.ConnectionURL
javax.jdo.option.ConnectionDriverName
hive.server2.enable.impersonation
hive.server2.authentication
datanucleus.autoCreateTables
hive.metastore.schema.verification
I have also setup HIVE_HOME and updated path. Also set HIVE_LIB and HIVE_BIN_PATH.
When I run hive from bin I get
'hive' is not recognized as an internal or external command,
operable program or batch file.
The bin/hive appears as filetype File.
Please suggest. Not sure if the hive version is correct one.
Thank you.

If someone is still going through this problem; here's what i did to solve hive installation on windows.
My configurations are as below (latest as of date):
I am using Windows 10
Hadoop 2.9.1
derby 10.14
hive 2.3.4 (my hive version does not contain bin/hive.cmd; the necessary file to run hive on windows)
#wheeler above mentioned that Hive is for Linux. Here's the hack to make it work for windows.
My Hive installation version did not come with windows executable files. Hence the hack!
STEP 1
There are 3 files which you need to specifically download from *https://svn.apache.org/repos/
https://svn.apache.org/repos/asf/hive/trunk/bin/hive.cmd
save it in your %HIVE_HOME%/bin/ as hive.cmd
https://svn.apache.org/repos/asf/hive/trunk/bin/ext/cli.cmd
save it in your %HIVE_HOME%/bin/ext/ as cli.cmd
https://svn.apache.org/repos/asf/hive/trunk/bin/ext/util/execHiveCmd.cmd
save it in your %HIVE_HOME%/bin/ext/util/ as execHiveCmd.cmd*
where %HIVE_HOME% is where Hive is installed.
STEP 2
Create tmp dir under your HIVE_HOME (on local machine and not on HDFS)
give 777 permissions to this tmp dir
STEP 3
Open your conf/hive-default.xml.template save it as conf/hive-site.xml
Then in this hive-site.xml, paste below properties at the top under
<property>
<name>system:java.io.tmpdir</name>
<value>{PUT YOUR HIVE HOME DIR PATH HERE}/tmp</value>
<!-- MY PATH WAS C:/BigData/hive/tmp -->
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
(check the indents)
STEP 4
- Run Hadoop services
start-dfs
start-yarn
Run derby
StartNetworkServer -h 0.0.0.0
Make sure you have all above services running
- go to cmd for HIVE_HOME/bin and run hive command
hive

Version 1.1.1 of Apache Hive does not contain a version that can be executed on Windows (only Linux binaries):
However, version 2.1.1 does have Windows capabilities:
So even if you had your path correctly set, cmd wouldn't be able to find an executable it could run, since one doesn't exist in 1.1.1.

i also run into this problem. to get necessary file to run hive on windows i have downloaded hive-2.3.9 and hive-3.1.2 but none of them have this files.so, we have two option:
Option 1: install hive-2.1.0 and set it up as i have tried,
Hadoop 2.8.0
derby 10.12.1.1
hive 2.1.0
Option 2: download whole bin directory and replace with yours hive bin directory. for downloading bin we need wget utility for windows.
after that run this command(to understand how it works):
wget -r -np -nH --cut-dirs=3 -R index.html
https://svn.apache.org/repos/asf/hive/trunk/bin/
your downloaded bin looks like:
after replacing it you are ready to go. so now my configurations are as below:
Hadoop 3.3.1
derby 10.13.1.1
hive 2.3.9

Related

xmx1000m is not recognized as an internal or external command: pig on windows

I am trying to setup pig on windows 7. I already have hadoop 2.7 single node cluster running on windows 7.
To setup pig, I have taken following steps as of now.
Downloaded the tar: http://mirror.metrocast.net/apache/pig/
Extracted tar to: C:\Users\zeba\Desktop\pig
Have set the Environment (User) Variable to:
PIG_HOME = C:\Users\zeba\Desktop\pig
PATH = C:\Users\zeba\Desktop\pig\bin
PIG_CLASSPATH = C:\Users\zeba\Desktop\hadoop\conf
Also changed HADOOP_BIN_PATH in pig.cmd to %HADOOP_HOME%\libexec as suggested by (Apache pig on windows gives "hadoop-config.cmd' is not recognized as an internal or external command" error when running "pig -x local") as was getting the same error
When I enter pig, I encounter the following error:
xmx1000m is not recognized as an internal or external command
Please help!
The error went away by installing pig-0.17.0. I was working with pig-0.16.0 previously.
Finally i got it. I changed HADOOP BIN PATH in pig.cmd to "HADOOP_HOME%\hadoop-2.9.2\libexec", as you can see "hadoop-2.9.2" is a subfile where "libexec" from my hadoop version is located..
Fix your "HADOOP_HOME" according to given image don't provide bin path only provide hadoop path.

Spark installed but no command 'hdfs' or 'hadoop' found

I am a new pyspark user.
I just downloaded and installed a spark cluster ("spark-2.0.2-bin-hadoop2.7.tgz")
after installation I wanted to access the file system (upload local files to cluster). But when I tried to type hadoop or hdfs in command it will say "no command found".
Am I gonna install hadoop/HDFS (I thought it's built in the spark, I don't get)?
Thanks in advance.
You have to install hadoop first to access HDFS.
Follow this http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Choose the latest version of hadoop from the apache site.
Once you done with hadoop setup go to spark http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz download this, Extract files. Setup java_home and hadoop_home in spark-env.sh.
You don't have hdfs or hadoop on classpath so this is the reason why you are getting message: "no command found".
If you run \yourparh\hadoop-2.7.1\bin\hdfs dfs -ls / it should works and show root content.
But, You can add your hadoop/bin (hdfs, hadoop ...) commands to classpath with something like this:
export PATH $PATH:$HADOOP_HOME/bin
where HADOOP_HOME is your env. variable with path to hadoop installation folder (download and install is required)

Spark: Run Spark shell from a different directory than where Spark is installed on slaves and master

I have a small cluster (4 machines) set up with 3 slaves and a master node, all installed to /home/spark/spark. (I.e, $SPARK_HOME is /home/spark/spark)
When I use the spark shell: /home/spark/spark/bin/pyspark --master spark://192.168.0.11:7077 everything works fine. However I'd like for my colleagues to be able to connect to the cluster from a local instance of spark on their machine installed in whatever directory they wish.
Currently if somebody has spark installed in say /home/user12/spark and run /home/user12/spark/bin/pyspark --master spark://192.168.0.11:7077 the spark shell will connect to the master without problems but fails with an error when I try to run code:
class java.io.IOException: Cannot run program
"/home/user12/bin/compute-classpath.sh"
(in directory "."): error=2, No such file or directory)
The problem here is that Spark is looking for the spark installation in /home/user12/spark/, where as I'd like to just tell spark to look in /home/spark/spark/ instead.
How do I do this?
You need to edit three files, spark-submit, spark-class and pyspark (all in the bin folder).
Find the line
export SPARK_HOME = [...]
Then change it to
SPARK_HOME = [...]
Finally make sure you set SPARK_HOME to the directory where spark is installed on the cluster.
This works for me.
Here you can find a detailed explanation.
http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot-find-compute-classpath-sh-td859.html

hbase installation on single node

i have installed hadoop single node on ubuntu 12.04. Now I am trying to install hbase over it (version 0.94.18). But i get the following errors(even though i have extracted it in the /usr/local/hbase):
Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
starting master, logging to /usr/lib/hbase/hbase-0.94.8/logs/hbase-hduser-master-ubuntu.out
nice: /usr/lib/hbase/hbase-0.94.8/bin/hbase: No such file or directory
cat: /usr/lib/hbase/hbase-0.94.8/conf/regionservers: No such file or directory
To resolve This Error
Download binary version of hbase
Edit conf file hbase-env.sh and hbase-site.xml
Set Up Hbase Home Directory
Start hbase By - Start-hbase.sh
Explanation To above Error:
Could not find or load main class your downloaded version does not have required jar
Hi can you tell when it is coming this error.
I think you gave environment set wrong
You should enter bellow command:
export HBASE_HOME="/usr/lib/hbase/hbase-0.94.18"
Then try hbase it will work.
If you want shell script you can download this lik :: https://github.com/tonyreddy/Apache-Hadoop1.2.1-SingleNode-installation-shellscript
It have hadoop, hive, hbase, pig.
Thank
Tony.
It is not recommended to run hbase from the source distribution directly instead you have to download the binary distribution as they have mentioned in their official site, follow the same instructions and you will get it up.
You could try installing the version 0.94.27
Download it from : h-base 0.94.27 dowload
This one worked for me.
Follow the instruction specified in :
Hbase installation guide
sed "s/<\/configuration>/<property>\n<name>hbase.rootdir<\/name>\n<value>hdfs:\/\/'$c':54310\/hbase<\/value>\n<\/property>\n<property>\n<name>hbase.cluster.distributed<\/name>\n<value>true<\/value>\n<\/property>\n<property>\n<name>hbase.zookeeper.property.clientPort<\/name>\n<value>2181<\/value>\n<\/property>\n<property>\n<name>hbase.zookeeper.quorum<\/name>\n<value>'$c'<\/value>\n<\/property>\n<\/configuration>/g" -i.bak hbase/conf/hbase-site.xml
sed 's/localhost/'$c'/g' hbase/conf/regionservers -i
sed 's/#\ export\ HBASE_MANAGES_ZK=true/export\ HBASE_MANAGES_ZK=true/g' hbase/conf/hbase-env.sh -i
Yes just type this tree commands and you need change replace $c to your hostname.
Then try it will work.

HADOOP_HOME and hadoop streaming

Hi I am trying to run hadoop on a server that has hadoop installed but I have no idea the directory where hadoop resides. The server was configure by the server admin.
In order to load hadoop I use the use command from the dotkit package.
There may be several solutions but wanted to know where the hadoop package was installed, how to set up the $HADOOP_HOME variable, and how to approp run a hadoop streaming job, such as $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming.jar, aka, http://wiki.apache.org/hadoop/HadoopStreaming.
Thanks! any help would be greatly appreciated!
If you're using a cloudera distribution then it's most probably in /usr/lib/hadoop, otherwise it could be anywhere (at the discretion of your system admin).
There are some tricks you can use to try and locate it:
locate hadoop-env.sh (assuming that locate has been installed and updatedb has been run recently)
If the machine you're running this on is running a hadoop service (such as data node, job tracker, task tracker, name node), then you can perform a process list and grep for the hadoop command: ps axww | grep hadoop
Failing the above two, look for the hadoop root directory in some common locations such as: /usr/lib, /usr/local, /opt
Failing all this, and assuming your current user has the permissions: find / -name hadoop-env.sh
If you're install with rpm then it's most probably in /etc/hadoop.
Why don't you try:
echo $HADOOP_HOME
Obiviously the above env variable has to be set before you could even issue hadoop executables from anywhere on the box.

Resources