Sqoop import cannot locate needed JDBC file - hadoop

Ok, someone has already asked this question once, but it seems that didn't help, so here is my question.
I've got Hadoop 2.5.1 installed on my Cent OS 7 machine. It's set up to run in a pseudo distributed mode. I ran few MapReduce sample jobs - so assume that all the configuration is fine.
I've downloaded Sqoop 1.4.5. And installed MySql database (MariaDB) and created the needed table.
NOW. I'm running the following command:
bin/sqoop export --connect jdbc:mysql://localhost/sqoopdb \
--table sqooptable --export-dir /user/dennis \
--fields-terminated-by '\t' --username root --password ***
It returns the following error message:
14/11/12 06:11:54 ERROR tool.ExportTool: Encountered IOException
running export job: java.io.FileNotFoundException: File does not
exist:
hdfs://localhost:9000/home/dennis/Sqoop/lib/mysql-connector-java-5.1.34-bin.jar
The file mentioned in the error doest exist in the local file system, moreover I've given it chmod 777 - just so that everyone was able to access it.
Any ideas anyone please?
The way i understand it - it looks for the mentioned file somewhere in hdfs whereas it is located in the local file system.

I've made it work. It is definitely the worst solution possible - but noone had offered me anything better. I've created the folder structure in the HDFS and copied the bloody JAR there. Now you can judge me :) The same thing written on my blog

Related

Can SQOOP work with a custom libpath?

I am trying to get some table data imported from PostgreSQL to HDFS using Sqoop. Now due to licensing constraints, Sqoop does not come packaged with JDBC drivers for all JDBC compliant databases. PostgreSQL is one of them. In order to interact with this database, Sqoop needs the relevant JDBC driver to be installed into a preset classpath (typically $SQOOP_HOME/lib).
In my case, the Hadoop administrator does not provide me write access to this predefined classpath. Is there any alternate way to instruct Sqoop client to look into some path (say, my home directory) instead of or in addition to the preset location?
I looked into the official Apache documentation and searched the internet, but could not fetch any answer. Could anyone please help?
Thanks !
I got this working yesterday. Below are the steps to follow.
Download the appropriate JDBC driver from here
Put the jar file under the directory of choice. I chose
the hadoop cluster user's home directory i.e. /home/myuser
export HADOOP_CLASSPATH="/home/myuser/postgresql-9.4.1209.jar"
(replace /home/myuser/postgresql-9.4.1209.jar with your path and jar file name)
To perform Sqoop import you may use the below command.
sqoop import
--connect 'jdbc:postgresql://<postgres_server_url>:<postgres_port>/<db_name>'
--username <db_user_name>
--password <db_user_password>
--table <db_table_name>
--warehouse-dir <existing_empty_hdfs_directory>
To perform Sqoop export you may use the below command.
sqoop export
--connect 'jdbc:postgresql://<postgres_server_url>:<postgres_port>/<db_name>'
--username <db_user_name>
--password <db_user_password>
--table <db_table_name>
--export-dir <existing_hdfs_path_containing_export_data>
As per Sqoop docs,
-libjars <comma separated list of jars>- specify comma separated jar files to include in the classpath.
Make sure you use -libjars as first argument in the command.
EDIT :
According to docs,
The -files, -libjars, and -archives arguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.
So, JDBC client jars need to be put at $SQOOP_HOME/lib.
I had recently experienced issue with this -libjars option. It doesn't work perfectly. Probably this issue is propagated from Hadoop jar command line option. Possible option is to specify your extra jars using HADOOP_CLASSPATH environmental variable.
You have to export path to your driver jar file.
export HADOOP_CLASSPATH=<path_to_driver_jar>.jar
After this, it can correctly pick up the jar file you specified. -libjars option doesn't correctly pick the file. I noticed this in sqoop version 1.4.6.

Datastax Enterprise Sqoop demo, got exceptions

I try to run the sqoop demo from Datastax Enterprise 4.8, I set up an Analytics cluster of 4 nodes, then with another node set up MySql, and populate the data as in the demo example, I followed all the steps of the demo, and everything seems working fine until the point where I actually run the sqoop data migration command. All DBs are created correctly, and cluster is running fine (I can see it with nodetool status and with OpsCenter), but when I run the sqoop command, I got an exception:
host# /bin/dse sqoop --options-file /usr/share/dse/demos/sqoop/import.options
/usr/share/dse/bin/dse.in.sh: line 4: /bin/dse-client-tool: No such file or directory
Unable to start sqoop: jobtracker not found
The import.options file:
*cql-import
--table
npa_nxx
--cassandra-keyspace
npa_nxx
--cassandra-table
npa_nxx_data
--cassandra-column-mapping
npa:npa,nxx:nxx,latitude:lat,longitude:lon,state:state,city:city
--connect
jdbc:mysql://10.xxx.xxx.xxx/npa_nxx_demo
--username
root
--password
xxxxx
--cassandra-host
10.xxx.xxx.xxx,10.xxx.xxx.xxx*
anyone has ideas why is this error? I reinstalled the DSE, and still got the same... Thanks.
I found the reason, need to do a softlink of the dse-client-tool in /bin dir:
# ln -s /usr/shares/dse/bin/dse-client-tool /bin/dse-client-tool
then it works, not sure why the link not created during the installation...
Start DSE as an analytics node.
Edit /etc/default/dse, set HADOOP_ENABLED=1 in the cassandra.yaml to start the DSE service.
bin/dse cassandra -t

hadoop sqoop error while importing data using options file

i am new to hadoop and while practicing sqoop i have got this error message , the command i have used is
i created an import.txt file and in that i used
import --connect jdbc:mysql://localhost/hadoopdb --username hadoop -P and placed this file on HDFS.
while importing i have given this file to the sqoop tool using the --options-file command. so the final command i have given at the command promt is as follows,
sqoop --options-file /user/cloudera/import.txt --table employee
after hiting the enter key i have got the following error message
sqoop --options-file /user/cloudera/import.txt --table employee
13/10/16 13:43:12 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Unable to read options file: /user/cloudera/import.txt
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:102)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:201)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.io.FileNotFoundException: /user/cloudera/import.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:70)
... 4 more
Unable to read options file: /user/cloudera/import.txt
can anyone tell me why the error is coming.
Thanks in advance.
--option-file path should be local directory. Don't use HDFS directory.
sqoop --options-file /home/cloudera/import.txt --table employee
I got the same issue. I solved it using the following approach.
In the options file you have to mention tools,commands and their arguments line by line
In your case, Your options file "import.txt" should be created like this
$cat > import.txt
import
--connect
jdbc:mysql://localhost/hadoopdb
--username
hadoop
-P
After you created the options file, you can use this to import the table
sqoop --options-file /user/cloudera/import.txt --table employee
Hope this works. Key is you have to mention tools and arguments line by line.
For more understanding on this refer this link
Sqoop User Guide by Apache.org
Correct me if I am wrong.
If you are Calling Sqoop from Oozie and you are facing the same issue -Unable to read options file.
Then you need place the option-file inside workflow location and specify the file in sqoop action-files and also you need to change the permission for that file to chmod 674(When workflow is running in oozie, it will run with sqoop user so it is mandatory to change permission).
This will resolve the error.
I put option file in local directory, It worked.
Also
Argument and value should in different line.
like
--where
'sal > 5000'
and not like
--where 'sal > 5000'
[cloudera#quickstart sqoop]$ sqoop --options-file
/home/cloudera/Desktop/SqoopOptions.txt --table departments --username root --
password cloudera -m 1 --target-dir jan1301
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
No such sqoop tool: import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera. See 'sqoop help'.
I have received above error when I define the SqoopOptions.txt file data in a single line.
The issue resolved when i define each parameter & value in different line like below.
if you are trying on single node cluster, option file can be placed under local file system.
your option file should be like this.
import
--connect
"jdbc:mysql://localhost:3306/sakila"
--username root
-P
for each parameter there should be a next line space.
once you saved the option file then use below command.
sqoop --options-file "your optionfile location" --table abc
hope this should work as this option is perfectly working for me.
Thanks,
Suresh.

Hive failed to create /user/hive/warehouse

I just get started on Apache Hive, and I am using my local Ubuntu box 12.04, with Hive 0.10.0 and Hadoop 1.1.2.
Following the official "Getting Started" guide on Apache website, I am now stuck at the Hadoop command to create the hive metastore with the command in the guide:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
the error was mkdir: failed to create /user/hive/warehouse
Does Hive require hadoop in a specific mode? I know I didn't have to do much to my Hadoop installation other that update JAVA_HOME so it is in standalone mode. I am sure Hadoop itself is working since I am run the PI example that comes with hadoop installation.
Also, the other command to create /tmp shows the /tmp directory already exists so it didn't recreate, and /bin/hadoop fs -ls is listing the current directory.
So, how can I get around it?
Almost all examples of the documentation have this command wrong. Just like unix you will need the "-p" flag to create the parent directories as well unless you have already created them. This command will work.
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
When running hive on local system, just add to ~/.hiverc:
SET hive.metastore.warehouse.dir=${env:HOME}/Documents/hive-warehouse;
You can specify any folder to use as a warehouse. Obviously, any other hive configuration method will do (hive-site.xml or hive -hiveconf, for example).
That's possibly what Ambarish Hazarnis kept in mind when saying "or Create the warehouse in your home directory".
This seems like a permission issue. Do you have access to root folder / ?
Try the following options-
1. Run command as superuser
OR
2.Create the warehouse in your home directory.
Let us know if this helps. Good luck!
When setting hadoop properties in the spark configuration, prefix them with spark.hadoop.
Therefore set
conf.set("spark.hadoop.hive.metastore.warehouse.dir","/new/location")
This works for older versions of Spark. The property has changed in spark 2.0.0
Adding answer for ref to Cloudera CDH users who are seeing this same issue.
If you are using Cloudera CDH distribution, make sure you have followed these steps:
launched Cloudera Manager (Express / Enterprise) by clicking on the desktop icon.
Open Cloudera Manager page in browser
Start all services
Cloudera has /user/hive/warehouse folder created by default. Its just that YARN and HDFS might not be up and running to access this path.
While this is a simple permission issue that was resolved with sudo in my comment above, there are a couple of notes:
create it in home directory should work as well, but then you may need to update hive setting for the path of metastore, which I think defaults to /user/hive/warehouse
I ran into another error of CREATE TABLE statement with Hive shell, the error was something like this:
hive> CREATE TABLE pokes (foo INT, bar STRING);
FAILED: Error in metadata: MetaException(message:Got exception: java.io.FileNotFoundException File file:/user/hive/warehouse/pokes does not exist.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
It turns to be another permission issue, you have to create a group called "hive" and then add the current user to that group and change ownership of /user/hive/warehouse to that group. After that, it works. Details can be found from this link below:
http://mail-archives.apache.org/mod_mbox/hive-user/201104.mbox/%3CBANLkTinq4XWjEawu6zGeyZPfDurQf+j8Bw#mail.gmail.com%3E
if you r running linux check (in hadoop core-site.xml ) data directory & permission, it looks like you ve kept the default which is /data/tmp and im most cases that will take root permission ..
change the xml config file , delete /data/tmp and run fs format (OC after you ve modified the core xml config)
I recommend using upper versions of hive i.e. 1.1.0 version, 0.10.0 is very buggy.
Run this command and try to create a directory it would grant full permission for the user in hdfs /user directory.
hadoop fs -chmod -R 755 /user
I am using MacOS and homebrew as package manager. I had to set the property in hive-site.xml as
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/local/Cellar/hive/2.3.1/libexec/conf/warehouse</value>
</property>

Doubts in configuration of SQOOP

Scenario:
I have configured SQOOP on my PC. But I am facing some problem that,
when I go for bin/sqoop I get some error as:
Error:
Exception in thread "main"
`java.lang.NoSuchMethodError:`
org.apache.hadoop.conf.Configuration.getInstances(Ljava/lang/
String;Ljava/lang/Class;)Ljava/util/List;
at com.cloudera.sqoop.tool.SqoopTool.loadPlugins(SqoopTool.java:139)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:209)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
Question:
What could be the problem? I have also set the path of $HBASE_HOME and $ZOOKEEPER_HOME.
Please suggest me how can we do it.
Thanks.
I am giving you the steps as I configured on my terminal.
Downloaded sqoop-1.3.0-cdh3u1 from the Cloudera archive.
Download mysql-connector-java-5.0.8 and copy the mysql-connector-java-5.0.8.jar file to lib and bin directory of sqoop (for sqoop and mysql connection)
Copy all jars from lib to bin (optional)
Add 2 lines in .bash_profile file
export SQOOP_HOME=/home/hadoop/Desktop/Cloudera/sqoop-1.3.0-cdh3u1
export PATH=$PATH:$SQOOP_HOME/bin
Save it and just type sqoop help on terminal
It worked on my terminal. Post me the steps you followed .
Maybe this helps:
https://issues.apache.org/jira/browse/SQOOP-384
Try to downgrade to a different version of Sqoop.

Resources