passing mysql properties via sqoop eval - sqoop

sqoop eval command :
sqoop eval --connect 'jdbc:mysql://<connection url>' --driver com.mysql.jdbc.Driver --query "select max(rdate) from test.sqoop_test"
gives me output:
Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo
imports will fail. Please set $ACCUMULO_HOME to the root of your
Accumulo installation. Warning: /usr/hdp/2.3.2.0-2950/zookeeper does
not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to
the root of your Zookeeper installation. 16/10/05 18:38:17 INFO
sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/10/05
18:38:17 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead. 16/10/05 18:38:17
WARN sqoop.ConnFactory: Parameter --driver is set to an explicit
driver however appropriate connection manager is not being set (via
--connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly
which connection manager should be used next time. 16/10/05 18:38:17
INFO manager.SqlManager: Using default fetchSize of 1000
-------------- | max(rdate) |
-------------- | 2014-01-25 |
but i want output without warning and table boundries like:
max(rdate) 2014-01-25
i basically want to store this output to a file.
thanks in advance

You can perform Sqoop Import operation to save output in HDFS.
Warnings are straight forward.
You can set $ACCUMULO_HOME, $ZOOKEEPER_HOME if available.
You can set --connection-manager corresponding to Mysql
For the sake of security,
It's recommended to use -P for password rather than writing in command.
These are not errors, you can live with these warnings.

You can create a .sh file , write your sqoop commands into it, then run it as
shell_file_name.sh > your_output_file.txt

We have two ways to get the query results:
The other way is you can write to HDFS by importing query results(--target-dir /path) and read from there.
You can change the file system option in sqoop command to store the results from import query, So the idea behind is you importing data to local file system rather HDFS.
eg: sqoop import -fs local -jt local --connect "connection string" --username root --password root query "Select * from table" --target-dir /home/output
https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1762587

Related

sqoop job not running with parameters

I am trying to run a sqoop job .I am using sqoop version Sqoop 1.4.6-cdh5.8.0 and it is not working for this version
It is working fine with Sqoop 1.4.5-cdh5.4.0.
sqoop job --create E8 -- import --connect jdbc:mysql://localhost/test -- username root --password cloudera --table NAME --hive-import -m1
sqoop job --exec E8 -- --table dummy1
Is there any syntax issue.If anyone can help with this.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo
imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/12/23 04:48:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-
cdh5.8.0
Enter password:
16/12/23 04:48:19 INFO manager.MySQLManager: Preparing to use a
MySQL streaming resultset.
16/12/23 04:48:19 INFO tool.CodeGenTool: Beginning code generation
16/12/23 04:48:20 INFO manager.SqlManager:
Executing SQL statement: SELECT t.* FROM `NAME` AS t LIMIT 1
16/12/23 04:48:20 ERROR manager.SqlManager: Error executing
statement:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
'test.NAME' doesn't exist
Assuming you already did the basic checks (like manually placing the parameter in the job, and executing it) I would say the syntax looks to be correct.
When looking at the doc it is mentioned that one can override properties. Unfortunately, they only show an example that adds a property, and don't show overriding one.
A search led me to this open issue which leads me to believe that there is a bug that prevents you from overriding parameters properly.
Unfortunately I don't see a solution for this, some things that might help to work around the problem:
Parameterize on a different level
Play with the syntax (does it help if it is the first/last override element, what if you try to override AND add a user, what if you try to override the query parameter instead of the table parameter ...)
This seems to be a bug in sqoop-1.4.6-cdh5.8.0 and sqoop-1.4.6-cdh5.9.0
This, however, as you mentioned is working correctly with 1.4.5 version.
The below solution worked for me:
1) Download 'sqoop-1.4.5-cdh5.4.0.jar' from http://repo.spring.io/libs-release/org/apache/sqoop/sqoop/1.4.5-cdh5.4.0/
2) Replace 'sqoop-1.4.6-cdh5.8.0.jar' with 'sqoop-1.4.5-cdh5.4.0.jar' and modify the symbolic link 'sqoop.jar' to point to 'sqoop-1.4.5-cdh5.4.0.jar'
3) Though I don't support downgrading, but still this works as charm.

sqoop import issue with mysql

I have a hadoop ha setup based on cdh5.I have tried to import tables from mysql by using sqoop failed with following error.
15/03/20 12:47:53 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have used the below command..
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --username root --password password --table authors --hive-import
My mysql server version is 5.1.73-3. and used 5.1.34 and 5.1.17 version of mysql-connector-java
sqoop version is 1.4.5-cdh5.3.2
Please let me know any suggestion/comments.
Try including the option --driver com.mysql.jdbc.Driver in the import command.
Try using the below modified command, which can suit your purpose
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --driver com.mysql.jdbc.Driver --username root --password password --table authors --hive-import
follow this link
Include the driver argument --driver com.mysql.jdbc.Driver in sqoop command.
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/<db name> --username **** --password **** --table <table name> --hive-import --driver com.mysql.jdbc.Driver
The --driver parameter forces sqoop to use the latest mysql-connector-java.jar installed for mysql db on the sqoop machine
Try with mysql-connector-java-5.1.31.jar, it is compatable with sqoop 1.4.5.
mysql-connector-java-5.1.17.jar driver does not work with sqoop 1.4.5.
refer :
https://issues.apache.org/jira/browse/SQOOP-1400
If you have com.mysql.jdbc_5.1.5.jar or any version of com.mysql.jdbc_5.X.X.jar file in $HADOOP_HOME/bin folder, then remove that, and execute your SQOOP query.
including the option --driver com.mysql.jdbc.Driver in the import command worked for me.
Sqoop does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server.
Note:
The JDBC drivers need to be installed only on the machine where Sqoop runs. You do not need to install them on all hosts in your Hadoop cluster.
You can download driver from here : https://dev.mysql.com/downloads/connector/j/5.1.html
Try the exact command as like below.
sqoop import --connect "jdbc:mysql://localhost:3306/books"
--username=root --password=root --table authors --as-textfile --target-dir=/datasqoop/authors_db --columns "id, name, email" --split-by id --driver com.mysql.jdbc.Driver
This will resolve your issues.
Find the jar locations that are being used in sqoop, in my case, it is pointing to the link /usr/share/java/mysql-connector-java.jar
so when I check the link /usr/share/java/mysql-connector-java.jar it points to mysql-connector-java-5.1.17.jar
/usr/share/java/mysql-connector-java.jar -> mysql-connector-java-5.1.17.jar
as 5.1.17 is having this issue, try 5.1.37 or higher.
unlink /usr/share/java/mysql-connector-java.jar
ln -s /usr/share/java/mysql-connector-java.jar /usr/share/java/mysql-connector-java-5.1.37.jar

Sqoop job through oozie

I have created a sqoop job called TeamMemsImportJob which basically pulls data from sql server into hive.
I can execute the sqoop job through the unix command line by running the following command:
sqoop job –exec TeamMemsImportJob
If I create an oozie job with the actual scoop import command in it, it runs through fine.
However if I create the oozie job and run the sqoop job through it, I get the following error:
oozie job -config TeamMemsImportJob.properties -run
>>> Invoking Sqoop command line now >>>
4273 [main] WARN org.apache.sqoop.tool.SqoopTool – $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
4329 [main] INFO org.apache.sqoop.Sqoop – Running Sqoop version: 1.4.4.2.1.1.0-385
5172 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage – Cannot restore job: TeamMemsImportJob
5172 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage – (No such job)
5172 [main] ERROR org.apache.sqoop.tool.JobTool – I/O error performing job operation: java.io.IOException: Cannot restore missing job TeamMemsImportJob
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
it looks as if it cannot find the job. However I can see the job as below
[root#sandbox ~]# sqoop job –list
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/06/25 08:12:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
Available jobs:
TeamMemsImportJob
How do I resolve this?
You have to use the --meta-connect flag while creating a job to create a custom Sqoop metastore database so that Oozie can have access.
sqoop \
job \
--meta-connect \
"jdbc:hsqldb:file:/on/server/not/hdfs/sqoop-metastore/sqoop-meta.db;shutdown=true" \
--create \
jobName \
-- \
import \
--connect jdbc:oracle:thin:#server:port:sid \
--username username \
--password-file /path/on/hdfs/server.password \
--table TABLE \
--incremental append \
--check-column ID \
--last-value "0" \
--target-dir /path/on/hdfs/TABLE
When you need to execute jobs, you can do it from Oozie the regular way, but make sure to include --meta-connect to indicate where the job is stored.
If we see the log we can see that it cannot find the stored job.
Since you are using the native hsql db.
To make Sqoop jobs available across other systems you should configure other database for example mysql which can be accessed by all systems.
From documentation
Running sqoop-metastore launches a shared HSQLDB database instance on
the current machine. Clients can connect to this metastore and create
jobs which can be shared between users for execution
The location of the metastore’s files on disk is controlled by the
sqoop.metastore.server.location property in conf/sqoop-site.xml. This
should point to a directory on the local filesystem.
The metastore is available over TCP/IP. The port is controlled by the
sqoop.metastore.server.port configuration parameter, and defaults to
16000.
Clients should connect to the metastore by specifying
sqoop.metastore.client.autoconnect.url or --meta-connect with the
value jdbc:hsqldb:hsql://:/sqoop. For example,
jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop.
This metastore may be hosted on a machine within the Hadoop cluster,
or elsewhere on the network.
Can you check if that db is accessible from other systems.

hadoop sqoop error while importing data using options file

i am new to hadoop and while practicing sqoop i have got this error message , the command i have used is
i created an import.txt file and in that i used
import --connect jdbc:mysql://localhost/hadoopdb --username hadoop -P and placed this file on HDFS.
while importing i have given this file to the sqoop tool using the --options-file command. so the final command i have given at the command promt is as follows,
sqoop --options-file /user/cloudera/import.txt --table employee
after hiting the enter key i have got the following error message
sqoop --options-file /user/cloudera/import.txt --table employee
13/10/16 13:43:12 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Unable to read options file: /user/cloudera/import.txt
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:102)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:201)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.io.FileNotFoundException: /user/cloudera/import.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:70)
... 4 more
Unable to read options file: /user/cloudera/import.txt
can anyone tell me why the error is coming.
Thanks in advance.
--option-file path should be local directory. Don't use HDFS directory.
sqoop --options-file /home/cloudera/import.txt --table employee
I got the same issue. I solved it using the following approach.
In the options file you have to mention tools,commands and their arguments line by line
In your case, Your options file "import.txt" should be created like this
$cat > import.txt
import
--connect
jdbc:mysql://localhost/hadoopdb
--username
hadoop
-P
After you created the options file, you can use this to import the table
sqoop --options-file /user/cloudera/import.txt --table employee
Hope this works. Key is you have to mention tools and arguments line by line.
For more understanding on this refer this link
Sqoop User Guide by Apache.org
Correct me if I am wrong.
If you are Calling Sqoop from Oozie and you are facing the same issue -Unable to read options file.
Then you need place the option-file inside workflow location and specify the file in sqoop action-files and also you need to change the permission for that file to chmod 674(When workflow is running in oozie, it will run with sqoop user so it is mandatory to change permission).
This will resolve the error.
I put option file in local directory, It worked.
Also
Argument and value should in different line.
like
--where
'sal > 5000'
and not like
--where 'sal > 5000'
[cloudera#quickstart sqoop]$ sqoop --options-file
/home/cloudera/Desktop/SqoopOptions.txt --table departments --username root --
password cloudera -m 1 --target-dir jan1301
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
No such sqoop tool: import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera. See 'sqoop help'.
I have received above error when I define the SqoopOptions.txt file data in a single line.
The issue resolved when i define each parameter & value in different line like below.
if you are trying on single node cluster, option file can be placed under local file system.
your option file should be like this.
import
--connect
"jdbc:mysql://localhost:3306/sakila"
--username root
-P
for each parameter there should be a next line space.
once you saved the option file then use below command.
sqoop --options-file "your optionfile location" --table abc
hope this should work as this option is perfectly working for me.
Thanks,
Suresh.

No suitable driver found for jdbc for SQL Server Express 2008 R2 when executing list-tables with Sqoop

Can any one guide? I'm using sqljdbc4.jar from microsoft?
hduser#ubuntu:/usr/local/sqoop/lib$ sqoop list-tables --connect jdbc:sqlserver/Northwind --username sa --password xxxxxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver
Result I'm getting is as follows
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
13/07/08 14:45:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/07/08 14:45:32 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
13/07/08 14:45:32 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/08 14:45:33 ERROR manager.SqlManager: Error reading database metadata: java.sql.SQLException: No suitable driver found for jdbc:sqlserver/Northwind
Could not retrieve tables list from server
13/07/08 14:45:33 ERROR tool.ListTablesTool: manager.listTables() returned null
I also tried to add -libjars to the command
I believe that the issue is in invalid JDBC URL "jdbc:sqlserver/Northwind". The URL have to start at least with "jdbc:sqlserver://" (e.g. there is missing the "://" part). Please take a look into official documentation for more details.
Also in addition I would strongly recommend to drop the --driver option as it will instruct Sqoop to use different connector. Please refer to the provided log fragment for more details (line starting with "Parameter --driver is set to...").

Resources