I would like to know if I will be able to execute a procedure and get results in Sqoop import command. I am not able to find any such scenario in the web. Please help
I have tried something like this and it worked :
sqoop import --connect "jdbc:sqlserver://localhost;database=FADA" --username [name] --password [pdw] --query "print case when $CONDITIONS then 'yep' else 'yip' end exec dbo.ps" --target-dir /DIR/Psimport -m 1
https://issues.apache.org/jira/browse/SQOOP-769
It seems like Sqoop does not support it. Can you please let me know if there are any other tools which will help me to extract data from SQL server to HDFS
Have you tried the --query option in sqoop? The documentation for this option is here: http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
Sqoop export has a stored procedure parameter but you have to also provide a table that will be evaluated with the stored procedure.
If you want to "execute Stored_procedure" in oracle from sqoop you need to use eval and in the query use the SQL*plus execute command:
'BEGIN STORED_PROCEDURE; END;'
example:
sqoop eval -Dmapred.job.queue.name=root.test.test-mis --connect jdbc:oracle:thin:#SERVER.NAME:PORT:INSTANCE --password **** --username MYSCHEMA --query "BEGIN MYSCHEMA.TEST_STORED_PROCEDURE_NAME; END;"
Related
We are trying to export the data from HIVE tables into HANA, able to export the data using --hcatalog table options in the Sqoop export command.
But facing issues when trying to load the data using the query option with the where clause
Is it possible to use the query option in the sqoop export command?
My Sample scoop command is like below
sqoop export -D sqoop.export.records.per.statement=1 -D mapreduce.map.memory.mb=16384 -D mapreduce.map.java.opts=-Xmx16384m --connect "jdbc:xxxxxx" --driver "com.sap.db.jdbc.Driver" --username "xxxxx" --password "xxxxxx" --table "hanaschema.table1" --query "select field1,substr(field2,1),field3,field4,from "hadoopschema.table" where field1 = 2017 and field3 = 4" --input-null-string '\\N' --input-null-non-string '\\N' --num-mappers 20 –-validate
Appreciate your help..
Thanks
Srini
Unfortunately --query argument is not supported with Sqoop Export yet. We can use this with Sqoop import command only.
Please refer Export control arguments section for detailed information about available option in Sqoop export.
Alternatively copy the output of your query into another table and then export that table into HANA.
In Sqoop for Hadoop you can use a parameters file for connection string information.
--connection-param-file filename Optional properties file that provides connection parameters
What is the format of that file?
Say for example I have:
jdbc:oracle:thin:#//myhost:1521/mydb
How should that be in a parameters file?
if you want to give your database connection string and credentials then create a file with those details and use --options-file in your sqoop command
create a file database.props with the following details:
import
--connect
jdbc:mysql://localhost:5432/test_db
--username
root
--password
password
then your sqoop import command will look like:
sqoop --options-file database.props \
--table test_table \
--target-dir /user/test_data
and related to --connection-param-file hope this link will be helpful to understand its usage
It should be same as the command.
Example
import
--connect
jdbc:oracle:thin:#//myhost:1521/mydb
--username
foo
Below is the sample command connecting with mysql server
sqoop list-databases --connect jdbc:mysql://192.168.256.156/test --username root --password root
It will give you the list of databases available at your mysql server
I want use shell execute sqoop(1.4.5) command.
shell:
sqoop_cmd="sqoop import --connect jdbc:mysql://xx.x.xxx.xxx:3306/test --username test --password datagateway --query 'select t.name from table_name t where date(hrc.gmt_modified) = date_sub(curdate(),interval 1 day) AND $CONDITIONS' --target-dir /output -m 1 --append"
result=$sqoop_cmd 2>&1 | grep -c "successfully"
error:
WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
But I updated --query to --table and remove ' AND $CONDITIONS' param try again, The sqoop command result is successfull. I think the question about '$', but I try '\$CONDITIONS', "'$CONDITIONS'", it's unsuccessfull.
please help me, thank you so much!
this error is cause because
error: WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
you are entering your database password in the command itself like this --password datagateway, instead if you have written like this -P you won't get that warning and password will be prompted after you start executing the command
When you use --query parameter you have to use $CONDITION and you did it. But you forgot to add --split-by parameter that is obligatory:
From sqoop user guide:
"Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by."
Hope it helps a bit.
Pawel
Is it possible to export data from hive to Oracle DB using Sqoop for reporting purpose since i dont want to make any changes in client applications.
Regards,
Bhagwant Bhobe
Use the insert overwrite directory option with Hive for the output of the query to be written to a file and then use the Sqoop export option to insert the data in the file into RDBMs. A work-flow using Oozie or Azkaban (does Azkaban supports Oozie and Hive tasks?) can also be used to automate this.
By using sqoop export command you can export data from hive to oracle DB.
sqoop export --connect jdbc:oracle:thin:#ipaddress:portnumber:DBName --table tableName --export-dir /user/hive/warehouse/emp1 --username uname --password pwd --fields-terminated-by '\001' -m 1
in --export-dir specify the location of hive output directory.
We are using Cloudera CDH 4 and we are able to import tables from our Oracle databases into our HDFS warehouse as expected. The problem is we have 10's of thousands of tables inside our databases and sqoop only supports importing one table at a time.
What options are available for importing multiple tables into HDFS or Hive? For example what would be the best way of importing 200 tables from oracle into HDFS or Hive at a time?
The only solution i have seen so far is to create a sqoop job for each table import and then run them all individually. Since Hadoop is designed to work with large dataset it seems like there should be a better way though.
U can use " import-all-tables " option to load all tables into HDFS at one time .
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --target-dir '/Sqoop21/AllTables'
if we want to exclude some tables to load into hdfs we can use " --exclude-tables " option
Ex:
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --target-dir '/Sqoop21/AllTables' --exclude-tables <table1>,<tables2>
If we want to store in a specified directory then u can use " --warehouse-dir " option
Ex:
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --warehouse-dir '/Sqoop'
Assuming that the sqoop configuration for each table is the same, you can list all the tables you need to import and then iterate over them launching sqoop jobs (ideally launch them asynchronously). You can run the following to fetch the list of tables from Oracle:
SELECT owner, table_name FROM dba_tables reference
Sqoop does offer an option to import all tables. Check this link. There are some limitations though.
Modify sqoop source code and recompile it to your needs. The sqoop codebase is well documented and nicely arranged.
--target-dir is not a valid option when using import-all-tables.
To import all tables in particular directory, Use --warehouse-dir instead of --target-dir.
Example:
$ sqoop import-all-tables --connect jdbc:mysql://localhost/movies --username root --password xxxxx --warehouse-dir '/user/cloudera/sqoop/allMoviesTables' -m 1
The best option is do my shell script
Prepare a inputfile which has list of DBNAME.TABLENAME 2)The shell script will have this file as input, iterate line by line and execute sqoop statement for each line.
while read line;
do
DBNAME=`echo $line | cut -d'.' -f1`
tableName=`echo $line | cut -d'.' -f2`
sqoop import -Dmapreduce.job.queuename=$QUEUE_NAME --connect '$JDBC_URL;databaseName=$DBNAME;username=$USERNAME;password=$PASSWORD' --table $tableName --target-dir $DATA_COLLECTOR/$tableName --fields-terminated-by '\001' -m 1
done<inputFile
You can probably import multiple tables : http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
You can use Sqoop "import-all-tables" feature to import all the tables in the database. This also has another parameter, --exclude-tables, along with which you can exclude some of the table that you don't want to import in the database.
Note: --exclude-tables only works with import-all-tables command.
importing multiple tables by sqoop if no of tables are very less.
Create sqoop import for each table as below .
sqoop import --connect jdbc:mysql://localhost/XXXX --username XXXX
password=XXXX
--table XXTABLE_1XX*
sqoop import --connect jdbc:mysql://localhost/XXXX --username XXXX
password=XXXX
--table XXTABLE_2XX*
and so on.
But what if no of tables are 100 or 1000 or even more. Below would be ideal solution.
In such scenario, preparing shell script which takes input from text file containing list of table names to be imported, iterate over, run the scoop import job for each table