Apache pig in Cygwin - windows

Is there any sources available for running Apache in Cygwin. With the latest Hadoop version i was able to setup a hadoop cluster in windows machine successfully, but I can't make PIG run in a cygwin terminal. The following error returns while attempting invoking pig grunt.
$ pig -x local
cygwin warning:
MS-DOS style path detected: c:\pig/conf/pig.properties
Preferred POSIX equivalent is: /cygdrive/c/pig/conf/pig.properties
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
cygpath: cannot create short name of C:\pig\logs
Cannot locate pig-withouthadoop.jar. do 'ant jar-withouthadoop', and try again.
Any help would be appreciated.
Thanks

To resolve the above error, I have rebuild pig for hadoop-2.2.0 as described in the below link and able to get rid of the exception.
http://javatute.com/javatute/faces/post/hadoop/2014/installing-pig-11-for-hadoop-2-on-ubuntu-12-lts.xhtml

Related

Hive on window 10 using cygwin: Path error

I got this error when install hive on window 10 using cygwin.
I had added this code
C:\WINDOWS\system32>mklink /J D:\cygdrive\d\ d:\
But the path like below. I think Cygwin doesn't understand cygdrive\d\hadoop\hive3.1.2....
It must be cygdrive/d/hadoop/hive3.1.2/....
Error
org.apache.hadoop.hive.metastore.HiveMetaException: File /cygdrive/d/hadoop/hive3.1.2\scripts\metastore\upgrade\mssql\upgrade.order.mssqlnot found
Underlying cause: java.io.FileNotFoundException : \cygdrive\d\hadoop\hive3.1.2\scripts\metastore\upgrade\mssql\upgrade.order.mssql (The system cannot find the path specified)
How can I fix this?
Update: Solved.
I was fixed by using Wsl instead of Cygwin. In Wsl, install openJdk8 of Java, set Java_home to /user/lib/jvm/openJdk8... then rerun hive command again. Done.
It seems Cygwin calls java of windows, it should be using Linux version.

How to resolve Invalid Hadoop Yarn Home error

I'm trying to install Hadoop on CentOS7, following steps here - https://www.vultr.com/docs/how-to-install-hadoop-in-stand-alone-mode-on-centos-7 (Only difference Hadoop version is 3.2.1 instead of 2.7.3 in article)
I followed everything precisely until at step 4 when i type in "hadoop" in terminal it gives me an error - ERROR: Invalid HADOOP_YARN_HOME
Is there any setup related to Yarn thats needed? I read the Apache doc and other links on the web but they all mention only JAVA_HOME path is needed which I did set as per above link.
Any help appreciated.
Thanks!
Open ~/.bashrc
add
export HADOOP_HOME=path_to_your_hadoop_package
export HADOOP_YARN_HOME=$HADOOP_HOME

error while installing kylo specific services for nifi

I am trying to install kylo 0.8.4.
There is a step to install kylo specific components after installing Nifi using command,
sudo ./install-kylo-components.sh /opt /opt/kylo kylo kylo
but getting follwing error.
Creating symlinks for NiFi version 1.4.0.jar compatible nars
ERROR: spark-submit not on path. Has spark been installed?
I have spark installed.
need help.
The script calls which spark-submit to check if Spark is available. If available, it uses spark-submit --version to determine the version of Spark that is installed.
The error indicates that spark-submit is not available on system path. Can you please execute which spark-submit on the command line and check the result? Please refer to the screenshot below for expected result on Kylo sandbox.
If spark-submit is not available on the system path, you can fix it by updating the PATHvariable in .bash_profile file by providing the location of your Spark installation.
As a next step, you can also verify the installed version of Spark by running spark-submit --version. Please refer to screenshot below for an example result.

bin/hadoop: line 133: C:Java/jdk1.7.0_45/bin/java: No such file or directory

Can someone help on this? I am trying to get hadoop 2.2.0 version and got error message
$ bin/hadoop version
bin/hadoop: line 133: C:Java/jdk1.7.0_45/bin/java: No such file or directory
bin/hadoop: line 133: exec: C:Java/jdk1.7.0_45/bin/java: cannot execute: No such file or directory
I am trying to install single instance hadoop on windows 7/64.
I did install Cygwin64 and hadoop on "c/+1/Hadoop/hadoop-2.2.0"
JAVA_HOME is
$ echo $JAVA_HOME
c:Java/jdk1.7.0_45
Any idea will be more than welcome so feel free to fire up!
"C:Java/jdk1.7.0_45/bin/java" is neither a valid Windows path nor a valid cygwin path. So your JAVA_HOME is set incorrectly. Set it to the directory where you installed the JDK. Maybe you mean "/cydrive/c/Java/jdk1.7.0_45/bin/java". Using "where java" or "which java" might help a bit.
(opinion follows...)
In my experience trying to set up hadoop on windows using cygwin is a tough battle, and usually not worth the effort. When I have to develop on Windows machines I usually set up a virtual machine running Linux, and everything tends to go much smoother.

Still getting "Unable to load realm info from SCDynamicStore" after bug fix

I installed Hadoop and Pig using brew install hadoop and brew install pig.
I read here that you will to get Unable to load realm info from SCDynamicStore error message unless you add:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
to your hadoop-env.sh file, which I have.
However, when I run hadoop namenode -format, I still see:
java[1548:1703] Unable to load realm info from SCDynamicStore
amongst the outputs.
Anyone know why I'm still getting it?
As dturnanski suggests, you need to use an older JDK. You can set this in the hadoop-env.sh file by changing the JAVA_HOME setting to:
export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
(Note the grave quotes here.) This fixed the problem for me.
I had the same issue with java 7. works with java 6

Resources