Sqoop with oozie printing lastvalue to new line - sqoop

Below is my sqoop command in oozie.
<action name="sqoop_test" retry-max="${maxretry}" retry-interval="${retryinterval}">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql:loadbalance://sql01.sboxdc.com/mydb --username usr1 --password ******** --table source_table --incremental lastmodified -check-column last_modified --merge-key Id --last-value "${wf:actionData('get_last_modified_time')['last_modified_date']}" --target-dir /warehouse/external_data/sms/target_location --as-textfile </command>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
Above action fails as it break the last value to new line.
from logs:
Sqoop command arguments :
import
--connect
jdbc:mysql:loadbalance://sql01.sboxdc.com/mydb
--username
usr1
--password
********
--table
source_table
--incremental
lastmodified
-check-column
last_modified
--merge-key
Id
--last-value
"2019-01-01
00:00:00"
--target-dir
/warehouse/external_data/sms/target_location
--as-textfile
2019-06-18 11:19:25,768 ERROR [main] org.apache.sqoop.tool.BaseSqoopTool: Error parsing arguments for import:
2019-06-18 11:19:25,768 ERROR [main] org.apache.sqoop.tool.BaseSqoopTool: Unrecognized argument: 00:00:00"
2019-06-18 11:19:25,768 ERROR [main] org.apache.sqoop.tool.BaseSqoopTool: Unrecognized argument: --target-dir
2019-06-18 11:19:25,768 ERROR [main] org.apache.sqoop.tool.BaseSqoopTool: Unrecognized argument: /warehouse/external_data/sms/sb_subscribermacs
2019-06-18 11:19:25,768 ERROR [main] org.apache.sqoop.tool.BaseSqoopTool: Unrecognized argument: --as-textfile
How can I force the sqoop to fit 'last_value' value in single line?

As you discovered, when you use the command element, Oozie will split the command on every space into multiple arguments. If there are spaces in an argument, like the date for your last value, you should use multiple arg options instead. So it would be something like:
<action name="sqoop_test" retry-max="${maxretry}" retry-interval="${retryinterval}">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>--conect</arg>
<arg>jdbc:mysql:loadbalance://sql01.sboxdc.com/mydb</arg>
<!--All the other arguments...-->
<arg>--last-value</arg>
<arg>"${wf:actionData('get_last_modified_time')['last_modified_date']}</arg>
<!--Other arguments...-->
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>

Related

Sqoop fails with unrecognized argument

I've multiple file in HDFS. Trying to load to Oracle. Using this script from long time. But recently same process giving me error.
Here is the code:
#!/bin/bash
refDate=`date --date="2 days ago" "+%Y-%m-%d"`
func_data_load_to_dwh()
{
date
if [[ ! -z `hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/*` ]]
then
file=`hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/|grep dna_daily_summary_data |awk '{print $NF}'`
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir ${file} --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
}
#_________________ main function ________________
func_data_load_to_dwh
Error Code:
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/f14873b51555a17d-946772b200000029_1797907225_data.0.
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: |
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --lines-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: \n
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --direct
But If I run manually with single file from shell it runs perfectly:
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/784784ec62558fea-1d14102000000029_1144698661_data.0. --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
Bit strange because it worked before. All I did is to change the directory in if condition. Nothing else. Could you suggesnt anything

Sqoop job Unrecognized argument: --merge-key

I am trying to create a Sqoop job with incremental lastmodified, but it throws ERROR tool.BaseSqoopTool: Unrecognized argument: --merge-key. Sqoop job:
sqoop job --create ImportConsentTransaction -- import --connect jdbc:mysql://quickstart:3306/production --table ConsentTransaction --fields-terminated-by '\b' --incremental lastmodified --merge-key transactionId --check-column updateTime --target-dir '/user/hadoop/ConsentTransaction' --last-value 0 --password-file /user/hadoop/sqoop.password --username user -m 1
Add --incremental lastmodified.

sqoop incremantal import error

i want to import last updates on my as400 table with sqoop import incremantal, this is my sqoop command:
i'm sure about allvariables, to_porcess_ts it's a string timestamp (yyyymmddhhmmss)
sqoop import --verbose --driver $SRC_DRIVER_CLASS --connect $SRC_URL --username $SRC_LOGIN --password $SRC_PASSWORD \
--table $SRC_TABLE --hive-import --hive-table $SRC_TABLE_HIVE --target-dir $DST_HDFS \
--hive-partition-key "to_porcess_ts" --hive-partition-value $current_date --split-by $DST_SPLIT_COLUMN --num-mappers 1 \
--boundary-query "$DST_QUERY_BOUNDARY" \
--incremental-append --check-column "to_porcess_ts" --last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE"); \
--compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec
I got this error:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --incremental-append
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --check-column
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: to_porcess_ts
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --last-value
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -46039745595
Remove ; or replace it on ) from the end of this line:
--last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE";) \

sqoop 1.4.4, from oracle to hive, some field in lower-case, ERROR

I am using the sqoop 1.4.4 to transfer data from oracle to hive, using sentence:
sqoop job --create vjbkeufwekdfas -- import --split-by "Birthdate"
--check-column "Birthdate" --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --null-string ''
it doesn't work, cause the sqoop use checking strategy in the tool.ImportTool
16/10/08 14:58:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f068e5d884f3929b6415cd8318085fea/vjbkeufwekdfas.jar
16/10/08 14:58:03 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:58:03 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
16/10/08 14:58:03 ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
16/10/08 14:58:03 ERROR util.SqlTypeMap: correct column names in Sqoop options.
16/10/08 14:58:03 ERROR tool.ImportTool: Imported Failed: column not found: "Birthdate"
However, if I don't use the double quotation marks:
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
null-string ''
The oracle gonna throw error, because the field has some lower-case letters:
16/10/08 14:37:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/1f05a7e94340dd92c9e3b11db1a5db46/vjbkeufwekdfas.jar
16/10/08 14:37:16 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:37:16 INFO tool.ImportTool: Incremental import based on column Birthdate
16/10/08 14:37:16 INFO tool.ImportTool: Upper bound value: TO_DATE('2016-10-08 14:37:20', 'YYYY-MM-DD HH24:MI:SS')
16/10/08 14:37:16 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/08 14:37:16 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/08 14:37:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/08 14:37:17 INFO client.RMProxy: Connecting to ResourceManager at master.huacloud.test/172.16.50.21:8032
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: number of splits:1
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474871676561_71208
16/10/08 14:37:25 INFO impl.YarnClientImpl: Submitted application application_1474871676561_71208
16/10/08 14:37:26 INFO mapreduce.Job: The url to track the job: http://master.huacloud.test:8088/proxy/application_1474871676561_71208/
16/10/08 14:37:26 INFO mapreduce.Job: Running job: job_1474871676561_71208
16/10/08 14:37:33 INFO mapreduce.Job: Job job_1474871676561_71208 running in uber mode : false
16/10/08 14:37:33 INFO mapreduce.Job: map 0% reduce 0%
16/10/08 14:37:38 INFO mapreduce.Job: Task Id : attempt_1474871676561_71208_m_000000_0, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:266)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "BIRTHDATE": invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:439)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:395)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:802)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:436)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:861)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3449)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3493)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
... 12 more
I am so confused, any help?
I think, It could be a case-sensitivity problem. In general, tables and columns are not case sensitive, but they will be if you use with quotation marks.
Try the following,
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
If it’s still not working, Try the ‘eval’ first and make sure the query is working fine
sqoop eval --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS"
As per sqoop user guide
" While the Hadoop generic arguments must precede any import arguments, you can type the import arguments in any order with respect to one another."
Please check your argument sequence.
Please try below..
sqoop job --create vjbkeufwekdfas \
-- import \
--connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL \
--username GA_TESTER1 \
--password 123456 \
--query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" \
--target-dir /tmp/vjbkeufwekdfas \
--split-by "Birthdate" \
--check-column "Birthdate" \
--incremental lastmodified \
--hive-import \
--hive-database chinacloud \
--hive-table hive_vjbkeufwekdfas \
--class-name vjbkeufwekdfas \
--fields-terminated-by '^X' \
--hive-drop-import-delims \
--null-non-string '' \
--null-string '' \
--m 1
use
--append \
--last-value "0001-01-01 01:01:01" \

Got exception running Sqoop: java.lang.NullPointerException using -query and --as-parquetfile

I am trying to import a table data from Redshift to HDFS (using Parquet format) and facing the error shown below:
15/06/25 11:05:42 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Command line query used:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" --connect
"jdbc:postgresql://:5439/events" --username "username"
--password "password" --query "SELECT * FROM mobile_og.pages WHERE \$CONDITIONS" --split-by anonymous_id --target-dir
/user/huser/pq_mobile_og_pages_2 --as-parquetfile.
It works fine when --as-parquetfile option is removed from the above command line query.
It is confirmed to be a bug SQOOP-2571.
If you want to import all data of a table, then you can eventually run such a command:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" \
--connect "jdbc:postgresql://:5439/events" \
--username "username" --password "password" \
--table mobile_og.pages \
--split-by anonymous_id \
--target-dir /user/huser/pq_mobile_og_pages_2 \
--as-parquetfile
And --where is also an useful parameter. Check the user manual.

Resources