Hadoop issue with Sqoop installation - hadoop

I have Hadoop(pseudo distributed mode), Hive, sqoop and mysql installed in my local machine.
But when I am trying to run sqoop Its giving me the following error
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.
Then I set the sqoop-env-template.sh file with all the information. Beneath is the snapshot of the sqoop-env-template.sh file.
Even after providing the hadoop hive path I face the same error.
I've installed
hadoop in /home/hduser/hadoop version 1.0.3
hive in /home/hduser/hive version 0.11.0
sqoop in /home/hduser/sqoop version 1.4.4
and mysql connector jar java-5.1.29
Could anybody please throw some light on what is going wrong

sqoop-env-template.sh is a template, meaning it doesn't by itself get sourced by the configurator. If you want it to have a custom conf and load it, make a copy as $SQOOP_HOME/conf/sqoop-env.sh.
Note: here is the relevant excerpt from bin/configure-sqoop for version 1.4.4:
SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
. "${SQOOP_CONF_DIR}/sqoop-env.sh"
fi

Related

Error: Could not find or load main class backup

I have setup a Hbase on top of hadoop in my linux system. I creates a sample table in hbase shell and its working fine. However, when I try to run backup command I am getting an error in the terminal as follows:
> hbase backup create full hdfs://localhost:8020/data/backup
> Error: Could not find or load main class backup
OR
> hbase backup help
> Error: Could not find or load main class backup
I have installed apache Hadoop 2.7.3 and HBase 2.1.4. The Hbase is of Apache and not of Cloudera or Hortonworks.
I see that in the docs (http://hbase.apache.org/book.html#_backup_and_restore_commands), hbase command can be used. Please help here.

Error when trying to execute kylin.sh start in HDP Sandbox 2.6

I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance
Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.
Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.

Spark installed but no command 'hdfs' or 'hadoop' found

I am a new pyspark user.
I just downloaded and installed a spark cluster ("spark-2.0.2-bin-hadoop2.7.tgz")
after installation I wanted to access the file system (upload local files to cluster). But when I tried to type hadoop or hdfs in command it will say "no command found".
Am I gonna install hadoop/HDFS (I thought it's built in the spark, I don't get)?
Thanks in advance.
You have to install hadoop first to access HDFS.
Follow this http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Choose the latest version of hadoop from the apache site.
Once you done with hadoop setup go to spark http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz download this, Extract files. Setup java_home and hadoop_home in spark-env.sh.
You don't have hdfs or hadoop on classpath so this is the reason why you are getting message: "no command found".
If you run \yourparh\hadoop-2.7.1\bin\hdfs dfs -ls / it should works and show root content.
But, You can add your hadoop/bin (hdfs, hadoop ...) commands to classpath with something like this:
export PATH $PATH:$HADOOP_HOME/bin
where HADOOP_HOME is your env. variable with path to hadoop installation folder (download and install is required)

Does HCatalog require installation before being used?

Can anyone please tell me that, does HCatalog require installation before using? Or it can be used just as a jar file?
I have Cloudera running on a VM, and I can use HCatalog for my MR job, Pig, Hive with no problem. And I thought the same MR code would work with another hadoop installed platform, but obviously it's not the case, exception thrown on the HCatInputFormat.setInput(). When I use Pig -useHCatalog, I'ved been prompted that the usage was wrong, meaning that it didn't know what's -useHCatalog as a parameter.
Didn't thought about this before as have been using HCatalog on Cloudera...
Yes, you need to install and start HCatalog server. HCatalog should come with the latest Hive tar package.
Check here of Apache Hive documentation for details,
Basically you need to,
Setup MySQL database for HCatalog
Run server install script
share/hcatalog/scripts/hcat_server_install.sh -r root -d dbroot -h
hadoop_home -p portnum
Start the HCatalog server
export HIVE_HOME=hive_home
$HIVE_HOME/sbin/hcat_server.sh start
As pointed out, you do not need to install hcatalog separately if you are working with hive 0.12 or later versions.

run pig 0.7.0 error : ERROR 2998: Unhandled internal error

I have to connect pig to a hadoop which changed a little from Hadoop 0.20.0. I choose pig 0.7.0, and setting PIG_CLASSPATH by
export PIG_CLASSPATH=$HADOOP_HOME/conf
when I run pig, an error is reported like this:
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
So, I copy hadoop-core.jar in $HADOOP_HOME to overwrite hadoop20.jar in $PIG_HOME/lib, then "ant". Now, I can run pig, but when I use dump or store, another error:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/ hadoop/fs/Path;)V
java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/hadoop/fs/ Path;)V
at org.apache.pig.builtin.BinStorage.setStoreLocation(BinStorage.java:369)
...
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)
================================================================================
Does anyone have encountered this error, or is my compile way not right?
Thanks.
There is a section about this issue in the Pig FAQ which should give you a good idea what's wrong. Here is the outline taken from this page:
This usually happens when you are connecting hadoop cluster other than standard Apache hadoop 20.2 release. Pig bundles standard hadoop 20.2 jars in release. If you want to connect to other version of hadoop cluster, you need to replace bundled hadoop 20.2 jars with compatible jars. You can try:
do "ant"
copy hadoop jars from your hadoop installation to overwrite ivy/lib/Pig/hadoop-core-0.20.2.jar and ivy/lib/Pig/hadoop-test-0.20.2.jar
do "ant" again
cp pig.jar to overwrite pig-*-core.jar
Some other tricks is also possible. You can use "bin/pig -secretDebugCmd" to inspect the command line of Pig. Make sure you are using the right version of hadoop.
As pointed in this FAQ section, if nothing works I would advise just upgrading to a recent version of Pig after 0.9.1, Pig 0.7 is a bit old.
The Pig (core) jar has a bundled Hadoop dependency, which may differ from the version you want to use. If you have an old Pig version (< 0.9) the you have the option, to build a jar without Hadoop:
cd $PIG_HOME
ant jar-withouthadoop
cp $PIG_HOME/build/pig-x.x.x-dev-withouthadoop.jar $PIG_HOME
Then start Pig:
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:$PIG_HOME/pig-x.x.x-dev-withouthadoop.jar; ./pig
Newer Pig versions contain the prebuilt withouthadoop version (see this ticket) so you can skip the building process. Furthermore when you run pig it will pick up the withouthadoop jar from PIG_HOME rather than the bundled version, so you don't need to add withouthadoop.jar
to the PIG_CLASSPATH either (provided, that you run Pig from $PIG_HOME/bin)
..Back to your question:
Hadoop 0.20 and its modified variant (0.20-append?) can work even with the latest Pig distribution (0.11.1) :
You just need to do the followings:
unpack Pig 0.11.1
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf; ./pig
If you still get "Failed to create DataStorage" it's worth to start Pig with -secretDebugCmd as Charles Menguy suggested, so that you
can see whether Pig gets the right Hadoop version..etc.
Did you remember to run start-all.sh from /usr/local/bin? I ran into the same problem and I basically retraced my steps in configuring Hadoop itself. I am able to use Pig now.

Resources