Sqoop eval throwing error when I tried to check the connection due to java.io.IOException: Could not load jar into JVM - hadoop

I have tried to run the Sqoop eval script through AWS EMR CLI for Teradata connection but found the error
Error loading ManagerFactory information from file /usr/lib/sqoop/conf/managers.d/td_connector.txt: java.io.IOException: Could not load jar $SQOOP_HOME/lib/teradata-connector-1.6.5.jar into JVM. (Could not find class org.apache.sqoop.teradata.TeradataConnManager.)
Steps I have followed:
login to EMR version emr-6.2.0 with the configuration of hadoop 3 and sqoop 1.4.7 through SSH
Downloaded the Teradata Hadoop connector 3.x from teradata downloads
moved the teradata hadoop connector to $SQOOP_HOME/lib and installed.
created the text file td_connect at /usr/lib/sqoop/conf/managers.d/ and included the text org.apache.sqoop.teradata.TeradataConnManager=$SQOOP_HOME/lib/teradata-connector-1.6.5.jar
ran the script
sqoop eval --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://host/database= --username username --password password --query 'select top 5 * from table'
Could you please help to identify the issue

Related

Connecting HiveServer2 from pyspark

I am stuck at point as , how to use pyspark to fetch data from hive server using jdbc.
I am Trying to connect to HiveServer2 running on my local machine from pyspark using jdbc. All components HDFS,pyspark,HiveServer2 are on same machine.
Following is the code i am using to connect :
connProps={ "username" : 'hive',"password" : '',"driver" : "org.apache.hive.jdbc.HiveDriver"}
sqlContext.read.jdbc(url='jdbc:hive2://127.0.0.1:10000/default',table='pokes',properties=connProps)
dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:hive://localhost:10000/default").option("driver", "org.apache.hive.jdbc.HiveDriver").option("dbtable", "pokes").option("user", "hive").option("password", "").load()
both methods used above are giving me same error as below:
org.apache.spark.sql.AnalysisException: java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection
to the given database. JDBC url =
jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
Terminating connection pool (set lazyInit to true if you expect to
start your database after your app).
ERROR XSDB6: Another instance of Derby may have already booted the database /home///jupyter-notebooks/metastore_db
metastore_db is located at same directory where my jupyter notebooks are created. but hive-site.xml is having different metastore location.
I have already checked reffering to other questions about same error saying other spark-shell or such process is running,but its not. Even if i try following command when HiveServer2 and HDFS are down i am getting same error
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
I am able to connect to hives using java program using jdbc. Am I missing something here? Please help.Thanks in advance.
Spark should not use JDBC to connect to Hive.
It reads from the metastore, and skips HiveServer2
However, Another instance of Derby may have already booted the database means that you're running Spark from another session, such as another Jupyter kernel that's still running. Try setting a different metastore location, or work on setting up a remote Hive metastore using a local Mysql or Postgres database and edit $SPARK_HOME/conf/hive-site.xml with that information.
From SparkSQL - Hive tables
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE...")

Sqoop create hive table ERROR - Encountered IOException running create table job

I am running sqoop on a Centos7 Machine that has hadoop/map reduce and hive already installed. I read from a tutorial that when importing data from a RDBMS (SQL Server in my case) to HDFS I need to run the next commands :
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA
Everything works perfectly with this step. The next step is creating a hive table that has the same structure as the RDBMS (SQL Server in my case) and using a sqoop command :
sqoop create-hive-table --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hivetable hivetablename --fields-terminated-by ','
However, whenever I run the above command I get the next error :
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 INFO ql.Driver: Completed executing
command(queryId=hadoop_20180401193745_1f3cf07d-ca16-40dd-
8f8d-1e426ecd5860); Time taken: 0.212 seconds
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Resetting thread name to
main
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme hdfs
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/java/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme file
18/04/01 19:37:52 ERROR tool.CreateHiveTableTool: Encountered IOException
running create table job: java.io.IOException: Hive CliDriver exited with
status=1
I am not a java expert but I would like to know if you have any idea of this result?
I've faced the same issue. It seems that there are some compatibility issues between my versions of sqoop (1.4.7) and hive (2.3.4).
The problem raises from the version of the jackson-* jar files within $SQOOP_HOME/lib: some of them are too old for hive because we need versions older than 2.6.
The solution that I found was to replace the following files in $SQOOP_HOME/lib by their counterpart in $HIVE_HOME/lib:
jackson-core-*.jar
jackson-databind-*.jar
jackson-annotations-*.jar
They are all from versions 2.6+ and this seems to work. Not sure it's good practice though.
I was facing the same issue and I have downgraded my hive to 1.2.2 and it works. That will solve the issue.
But not really sure if you want to use Sqoop with only hive2.
Instead of writing two different statements, you can put the whole thing in one statement, which will fetch the data from sql server and then create a HIVE table too.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hive-import --hive-overwrite --hive-table hivetablename --fields-terminated-by ',' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
For this please check the jackson-core, jackson-databind and jackson-annotation jar. The jar should be of the latest version. Usually it comes due to the older version. Place these jar inside the hive lib and sqoop lib. Along with please check the libthrift jar, both in hive and hbase it should be same and copy the same in sqoop lib

Sqoop job through oozie

I have created a sqoop job called TeamMemsImportJob which basically pulls data from sql server into hive.
I can execute the sqoop job through the unix command line by running the following command:
sqoop job –exec TeamMemsImportJob
If I create an oozie job with the actual scoop import command in it, it runs through fine.
However if I create the oozie job and run the sqoop job through it, I get the following error:
oozie job -config TeamMemsImportJob.properties -run
>>> Invoking Sqoop command line now >>>
4273 [main] WARN org.apache.sqoop.tool.SqoopTool – $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
4329 [main] INFO org.apache.sqoop.Sqoop – Running Sqoop version: 1.4.4.2.1.1.0-385
5172 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage – Cannot restore job: TeamMemsImportJob
5172 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage – (No such job)
5172 [main] ERROR org.apache.sqoop.tool.JobTool – I/O error performing job operation: java.io.IOException: Cannot restore missing job TeamMemsImportJob
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
it looks as if it cannot find the job. However I can see the job as below
[root#sandbox ~]# sqoop job –list
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/06/25 08:12:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
Available jobs:
TeamMemsImportJob
How do I resolve this?
You have to use the --meta-connect flag while creating a job to create a custom Sqoop metastore database so that Oozie can have access.
sqoop \
job \
--meta-connect \
"jdbc:hsqldb:file:/on/server/not/hdfs/sqoop-metastore/sqoop-meta.db;shutdown=true" \
--create \
jobName \
-- \
import \
--connect jdbc:oracle:thin:#server:port:sid \
--username username \
--password-file /path/on/hdfs/server.password \
--table TABLE \
--incremental append \
--check-column ID \
--last-value "0" \
--target-dir /path/on/hdfs/TABLE
When you need to execute jobs, you can do it from Oozie the regular way, but make sure to include --meta-connect to indicate where the job is stored.
If we see the log we can see that it cannot find the stored job.
Since you are using the native hsql db.
To make Sqoop jobs available across other systems you should configure other database for example mysql which can be accessed by all systems.
From documentation
Running sqoop-metastore launches a shared HSQLDB database instance on
the current machine. Clients can connect to this metastore and create
jobs which can be shared between users for execution
The location of the metastore’s files on disk is controlled by the
sqoop.metastore.server.location property in conf/sqoop-site.xml. This
should point to a directory on the local filesystem.
The metastore is available over TCP/IP. The port is controlled by the
sqoop.metastore.server.port configuration parameter, and defaults to
16000.
Clients should connect to the metastore by specifying
sqoop.metastore.client.autoconnect.url or --meta-connect with the
value jdbc:hsqldb:hsql://:/sqoop. For example,
jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop.
This metastore may be hosted on a machine within the Hadoop cluster,
or elsewhere on the network.
Can you check if that db is accessible from other systems.

Issue with sqoop import from mysql to hbase

I am trying to import data from mysql to hbase using sqoop:
sqoop import --connect jdbc:mysql://<hostname>:3306/test --username USERNAME -P --table testtable --direct --hbase-table testtable --column-family info --hbase-row-key id --hbase-create-table
The process runs smoothly, without any error, but the data goes to hdfs and not to hbase.
Here is my setup:
HBase and Hadoop is installed in distributed mode in my three server cluster. Namenode and HBase Master being one one server. Datanodes and Region server lies in two other servers. Sqoop is installed in NameNode server only.
I am using Hadoop version 0.20.2-cdh3u3, hbase version 0.90.6-cdh3u4 and sqoop version 1.3.0-cdh3u3.
Any suggestions where I am doing wrong?
Sqoop's direct connectors usually do not support HBase and this is definitely the case for MySQL direct connector. You should drop the --direct option if you need import data into HBase.
Here is an example of importing data from Mysql to HBase
http://souravgulati.webs.com/apps/forums/topics/show/8680714-sqoop-import-data-from-mysql-to-hbase

Error in sqoop import query

Scenario:
I am trying for importing data from MS SQL Server to HDFS. But I am getting certain errors as:
Errors:
hadoop#ubuntu:~/sqoop-1.1.0$ bin/sqoop import --connect 'jdbc:sqlserver://localhost;username=abcd;password=12345;database=HadoopTest' --table PersonInfo
11/12/09 18:08:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
Question:
I have configured Sqoop successfully and then what could be the problem? I am trying to connect to database by entering IP address but there is also the same problem.
How can I remove these error? Pls suggest me solution.
Thanks.
Sqoop is now an incubator project in Apache. There is no reason Sqoop should only run with CDH and not Apache Hadoop.
The Sqoop documentation says Sqoop is compatible with Apache Hadoop 0.21 and Cloudera's Distribution of Hadoop version 3.. So, I think using the the correct version of Apache will also solve the problem.
SQOOP-82 is more than an year old and there had been changes after that.
FYI, Sqoop was made part of the Hadoop 0.21 branch and has been removed from Hadoop after moving it to Apache Incubator.
Please check this issue:
Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.
In your sqoop import command you are missing the driver value using --driver
May be this will help.
I think you should try this one, it may solve your problem:
Add the port number of the sqlserver. For port number check with your my.conf(/etc/mysql/my.conf) file.
Try this command with port number and schema:
sqoop import --connect jdbc:mysql://localhost:3306/mydb -username root -password password --table emp --m 1

Resources