Sqoop import is showing error - Jobtracker is not yet running - hadoop

I am trying to do sqoop import. It shows error Jobtracker is not running.
But I tried with eval by selecting few rows it works.
But while doing import I am getting error. I have included snapshot of both eval and import function which I have tried.
I tried function hadoop command (hadoop fs -ls, -put) is working.
I started start-all.sh.
Afterwards I check with jps then all daemon run.
After few minutes, all daemon stop.

SQOOP Eval function just acts on the RDBMS database and returns you the resultset. Here hadoop does not come into picture.
SQOOP Import function tries to import the data from RDBMS and load it into HDFS. The namenode is not able to connect to the HDFS. Restart hadoop and check the job tracker and namenode logs. Check whether the namenode and datanode storage directories mentioned in the the HDFS-Site.xml are available else point to a new directory.

Related

Sqoop command not found when running through Oozie

When I am running Sqoop script in CLI, it is running fine without any issue. But when run it using Oozie, it failed with Sqoop command not found. It seems sqoop is not installed in other data nodes. So to run Sqoop script using Oozie, sqoop should be installed in all data nodes or is there any alternatives for that. Currently we have one master and 2 Data nodes.

Sqoop failing when importing as avro in AWS EMR

I'm trying to perform an sqoop import in Amazon EMR(hadoop 2.8.5 sqoop 1.4.7). The import goes pretty well when no avro option(--as-avrodatafile) is specified. But once it's set, the job is failing with
19/10/29 21:31:35 INFO mapreduce.Job: Task Id : attempt_1572305702067_0017_m_000000_1, Status : FAILED
Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
Using this option -D mapreduce.job.user.classpath.first=true doesn't work.
Running locally(in my machine) I found that copying the avro-1.8.1.jar in sqoop to hadoop lib folder works, but in the EMR cluster I have only access to the master node, so doing the above doesn't work because it isn't the master node who runs the jobs.
Did anyone face this problem?
The solution I found was to connect to every node in the cluster(I thought I only had access to the master node, but I was wrong, in EMR we have access to all nodes) and replace the Avro jar that is included with Hadoop by the Avro jar that comes in Sqoop. It's not an elegant solution but it works.
[UPDATE]
Happened that the option -D mapreduce.job.user.classpath.first=true wasn't working because I was using s3a as target dir when Amazon says that we should use s3. As soon as I started using s3 Sqoop could perform the import correctly. So, no need of replacing any file in the nodes. Using s3a could lead to some strange errors under EMR due to Amazon own configuration, don't use it. Even in terms of performance s3 is better than s3a in EMR as the implementation for s3 is Amazon's.

How to connect Sqoop to multiple hadoop clusters

Is there anyway to have Sqoop connected to different Hadoop clusters so that multiple Sqoop jobs can be created to export data to multiple hadoop clusters?
to export data to multiple hadoop clusters
If data is going into Hadoop, that's technically a Sqoop import
Not clear how you currently manage different clusters from one machine, but you would need to have the conf folder of all environments available for Sqoop to read
The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the $HADOOP_HOME environment variable.
For example:
$ HADOOP_HOME=/path/to/some/hadoop sqoop import --arguments...
or:
$ export HADOOP_HOME=/some/path/to/hadoop
$ sqoop import --arguments...
If $HADOOP_HOME is not set, Sqoop will use the default installation location for Cloudera’s Distribution for Hadoop, /usr/lib/hadoop.
The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is set
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_the_hadoop_installation
Depending on how you setup Hadoop, Hortonworks only has Sqoop 1, while Cloudera (and maybe MapR) have Sqoop2, and those instructions are probably different since Sqoop2 architecture is different.

Can sqoop run without hadoop?

Just wondering can sqoop run without a hadoop cluster? sort of in a standalone mode? Has anyone tried to run sqoop on spark, please share some experiences on it.
To run Sqoop commands (both sqoop1 and sqoop2), Hadoop is a mandatory prerequisite. You cannot run sqoop commands without the Hadoop libraries.
Sqoop works in local mode too, so it is not a requirement that the Hadoop daemons must be running. To run sqoop in local mode,
sqoop [tool-name] -fs local -jt local [tool-arguments]
Sqoop on Spark is still In-Progress. See SQOOP-1532

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Resources