error while installing kylo specific services for nifi - installation

I am trying to install kylo 0.8.4.
There is a step to install kylo specific components after installing Nifi using command,
sudo ./install-kylo-components.sh /opt /opt/kylo kylo kylo
but getting follwing error.
Creating symlinks for NiFi version 1.4.0.jar compatible nars
ERROR: spark-submit not on path. Has spark been installed?
I have spark installed.
need help.

The script calls which spark-submit to check if Spark is available. If available, it uses spark-submit --version to determine the version of Spark that is installed.
The error indicates that spark-submit is not available on system path. Can you please execute which spark-submit on the command line and check the result? Please refer to the screenshot below for expected result on Kylo sandbox.
If spark-submit is not available on the system path, you can fix it by updating the PATHvariable in .bash_profile file by providing the location of your Spark installation.
As a next step, you can also verify the installed version of Spark by running spark-submit --version. Please refer to screenshot below for an example result.

Related

How to resolve Invalid Hadoop Yarn Home error

I'm trying to install Hadoop on CentOS7, following steps here - https://www.vultr.com/docs/how-to-install-hadoop-in-stand-alone-mode-on-centos-7 (Only difference Hadoop version is 3.2.1 instead of 2.7.3 in article)
I followed everything precisely until at step 4 when i type in "hadoop" in terminal it gives me an error - ERROR: Invalid HADOOP_YARN_HOME
Is there any setup related to Yarn thats needed? I read the Apache doc and other links on the web but they all mention only JAVA_HOME path is needed which I did set as per above link.
Any help appreciated.
Thanks!
Open ~/.bashrc
add
export HADOOP_HOME=path_to_your_hadoop_package
export HADOOP_YARN_HOME=$HADOOP_HOME

Building Oozie 4.2.0 with Spark on YARN support

What I am trying to achieve is to build and install Oozie 4.2.0 that will enable me to submit Spark jobs to a YARN cluster.
I build the distro by executing: oozie-4.2.0/bin/mkdistro.sh -Puber -Phadoop-2 -DskipTests. That created oozie-4.2.0-distro.tar.gz package and inside I can find oozie-4.2.0-sharelib.tar.gz. However, many tutorials online state that I should use oozie-4.2.0-sharelib-yarn.tar.gz in order to use YARN. Such a file is not contained in the distro package. How can I make the build process output the YARN version of sharelibs?
I tried to continue with the non-YARN version, but when submitting the example Spark job (and adjusting the HDFS and YARN addresses in job.properties along with master property from local[*] to yarn) I got an error:
Error: Could not load YARN classes. This copy of Spark may not have
been compiled with YARN support.
Oozie 4.2 does not include OOZIE-2271 that added the spark_yarn dependency to the sharelib when compiling against the hadoop-2 profile.
Try to build distro with Oozie 4.3. Alternatively, you can try to backport OOZIE-2271 and build Oozie yourself.
See spark-yarn_2.10 in this commit:
https://github.com/apache/oozie/commit/e6b5c95efb492a70087377db45524e06f803459e

Apache pig in Cygwin

Is there any sources available for running Apache in Cygwin. With the latest Hadoop version i was able to setup a hadoop cluster in windows machine successfully, but I can't make PIG run in a cygwin terminal. The following error returns while attempting invoking pig grunt.
$ pig -x local
cygwin warning:
MS-DOS style path detected: c:\pig/conf/pig.properties
Preferred POSIX equivalent is: /cygdrive/c/pig/conf/pig.properties
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
cygpath: cannot create short name of C:\pig\logs
Cannot locate pig-withouthadoop.jar. do 'ant jar-withouthadoop', and try again.
Any help would be appreciated.
Thanks
To resolve the above error, I have rebuild pig for hadoop-2.2.0 as described in the below link and able to get rid of the exception.
http://javatute.com/javatute/faces/post/hadoop/2014/installing-pig-11-for-hadoop-2-on-ubuntu-12-lts.xhtml

Cloudera Installation Error I want to know can we cloudera manager for Hadoop single node Cluster on ubuntu?

I am using ubuntu 12.04 64bit, I installed and ran sample hadoop programs with single node successfully.
I am getting the following error while installing cloudera manager on my ubuntu
Refreshing repository metadata failed. See
/var/log/cloudera-manager-installer/2.refresh-repo.log for details.
Click OK to revert this installation.
I want to know can we install Cloudera for Hadoop's Single node cluster on ubuntu. Please response me that Is it possible to install cloudera manager for single node or not. Or else Am i want to create multiple nodes for using cloudera with my hadooop
Yes, CM can run in a single node.
This error is because CM can not use apt-get install to get the packages. Which tutorial do you follow?
However, you can manually add the cloudera repo. See this thread.

Hadoop environment variables

I'm trying to debug some issues with a single node Hadoop cluster on my Mac. In all the setup docs it says to add:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
to remove this error:
Unable to load realm info from SCDynamicStore
This works, but it only seems to work for STDOUT. When I check my Hadoop logs directory, under "job_###/atempt_###/stderr" the error is still there:
2013-02-08 09:58:23.662 java[2772:1903] Unable to load realm info from SCDynamicStore
I'm having great difficulty loading RVM Rubies into the Hadoop environment to execute Ruby code with Hadoop streaming. STDOUT is printing that RVM is loaded and using the right Ruby/gemset but my STDERR logs:
env: ruby_noexec_wrapper: No such file or directory
Is there some way to find out what path Hadoop is actually using to execute the jobs, or if it's invoking some other environment here?
Further background:
I'm using Hadoop 1.1.1 installed via Homebrew. It's setup in a manner very similar to "INSTALLING HADOOP ON MAC OSX LION" and debugging an implementation of wukong 3.0.0 as the wrapper for executing Hadoop jobs.
To answer my own question so other's can find it.
I appeared to be loading rvm in my hadoop-env but I must have not restarted the cluster after adding it. To make sure your rubies and gemsets are loaded, add the standard rvm clause to hadoop-env.sh. Something like:
[[ -s "/Users/ScotterC/.rvm/scripts/rvm" ]] && source "/Users/ScotterC/.rvm/scripts/rvm"
And make sure to restart the cluster so it picks it up. Oddly enough, without restarting, my logs would show that it was loading rvm but it clearly wasn't executing that ruby and it's respective gemfiles. After restarting it worked.

Resources