sqoop incremantal import error - sqoop

i want to import last updates on my as400 table with sqoop import incremantal, this is my sqoop command:
i'm sure about allvariables, to_porcess_ts it's a string timestamp (yyyymmddhhmmss)
sqoop import --verbose --driver $SRC_DRIVER_CLASS --connect $SRC_URL --username $SRC_LOGIN --password $SRC_PASSWORD \
--table $SRC_TABLE --hive-import --hive-table $SRC_TABLE_HIVE --target-dir $DST_HDFS \
--hive-partition-key "to_porcess_ts" --hive-partition-value $current_date --split-by $DST_SPLIT_COLUMN --num-mappers 1 \
--boundary-query "$DST_QUERY_BOUNDARY" \
--incremental-append --check-column "to_porcess_ts" --last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE"); \
--compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec
I got this error:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --incremental-append
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --check-column
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: to_porcess_ts
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --last-value
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -46039745595

Remove ; or replace it on ) from the end of this line:
--last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE";) \

Related

Sqoop fails with unrecognized argument

I've multiple file in HDFS. Trying to load to Oracle. Using this script from long time. But recently same process giving me error.
Here is the code:
#!/bin/bash
refDate=`date --date="2 days ago" "+%Y-%m-%d"`
func_data_load_to_dwh()
{
date
if [[ ! -z `hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/*` ]]
then
file=`hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/|grep dna_daily_summary_data |awk '{print $NF}'`
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir ${file} --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
}
#_________________ main function ________________
func_data_load_to_dwh
Error Code:
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/f14873b51555a17d-946772b200000029_1797907225_data.0.
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: |
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --lines-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: \n
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --direct
But If I run manually with single file from shell it runs perfectly:
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/784784ec62558fea-1d14102000000029_1144698661_data.0. --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
Bit strange because it worked before. All I did is to change the directory in if condition. Nothing else. Could you suggesnt anything

Sqoop job Unrecognized argument: --merge-key

I am trying to create a Sqoop job with incremental lastmodified, but it throws ERROR tool.BaseSqoopTool: Unrecognized argument: --merge-key. Sqoop job:
sqoop job --create ImportConsentTransaction -- import --connect jdbc:mysql://quickstart:3306/production --table ConsentTransaction --fields-terminated-by '\b' --incremental lastmodified --merge-key transactionId --check-column updateTime --target-dir '/user/hadoop/ConsentTransaction' --last-value 0 --password-file /user/hadoop/sqoop.password --username user -m 1
Add --incremental lastmodified.

Error in sqoop incremental import command

Im working on a sqoop incremental import command. But Im getting error message in the end which I couldn't understand where is the problem.
Below is my MySQL table data
+----+-----------+
| ID | NAME |
+----+-----------+
| 1 | Sidhartha |
| 2 | Sunny |
| 3 | Saketh |
| 4 | Bobby |
| 5 | Yash |
| 6 | Nimmi |
+----+-----------+
Hive table with 4 records: DAY is the partitioned column
importedtable.id importedtable.name importedtable.day
1 Sidhartha 1
2 Sunny 1
3 Saketh 1
4 Bobby 1
My Sqoop command:
sqoop import --connect jdbc:mysql://127.0.0.1/mydb --table MYTAB --driver com.mysql.jdbc.Driver --username root --password cloudera --hive-import --hive-table importedtable --incremental append --check-column id --last-value $(hive -e "select max(id) from importedtable") --target-dir '/home/incdata';
Error message:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: The
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: method
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: class
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: org.apache.commons.logging.impl.SLF4JLogFactory#release()
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: was
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: invoked.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: Please
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: see
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: http://www.slf4j.org/codes.html#release
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: for
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: an
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: explanation.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: /home/incdata
Can anyone tell me what is the mistake Im doing the sqoop command.
The problem is with the hive query passed as the value for --last-value argument,
--last-value $(hive -e "select max(id) from importedtable")
This emits the log messages along with the result to --last-value.
Use -S (--silent) flag to the query,
--last-value $(hive -S -e "select max(id) from importedtable")

sqoop 1.4.4, from oracle to hive, some field in lower-case, ERROR

I am using the sqoop 1.4.4 to transfer data from oracle to hive, using sentence:
sqoop job --create vjbkeufwekdfas -- import --split-by "Birthdate"
--check-column "Birthdate" --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --null-string ''
it doesn't work, cause the sqoop use checking strategy in the tool.ImportTool
16/10/08 14:58:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f068e5d884f3929b6415cd8318085fea/vjbkeufwekdfas.jar
16/10/08 14:58:03 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:58:03 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
16/10/08 14:58:03 ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
16/10/08 14:58:03 ERROR util.SqlTypeMap: correct column names in Sqoop options.
16/10/08 14:58:03 ERROR tool.ImportTool: Imported Failed: column not found: "Birthdate"
However, if I don't use the double quotation marks:
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
null-string ''
The oracle gonna throw error, because the field has some lower-case letters:
16/10/08 14:37:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/1f05a7e94340dd92c9e3b11db1a5db46/vjbkeufwekdfas.jar
16/10/08 14:37:16 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:37:16 INFO tool.ImportTool: Incremental import based on column Birthdate
16/10/08 14:37:16 INFO tool.ImportTool: Upper bound value: TO_DATE('2016-10-08 14:37:20', 'YYYY-MM-DD HH24:MI:SS')
16/10/08 14:37:16 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/08 14:37:16 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/08 14:37:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/08 14:37:17 INFO client.RMProxy: Connecting to ResourceManager at master.huacloud.test/172.16.50.21:8032
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: number of splits:1
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474871676561_71208
16/10/08 14:37:25 INFO impl.YarnClientImpl: Submitted application application_1474871676561_71208
16/10/08 14:37:26 INFO mapreduce.Job: The url to track the job: http://master.huacloud.test:8088/proxy/application_1474871676561_71208/
16/10/08 14:37:26 INFO mapreduce.Job: Running job: job_1474871676561_71208
16/10/08 14:37:33 INFO mapreduce.Job: Job job_1474871676561_71208 running in uber mode : false
16/10/08 14:37:33 INFO mapreduce.Job: map 0% reduce 0%
16/10/08 14:37:38 INFO mapreduce.Job: Task Id : attempt_1474871676561_71208_m_000000_0, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:266)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "BIRTHDATE": invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:439)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:395)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:802)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:436)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:861)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3449)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3493)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
... 12 more
I am so confused, any help?
I think, It could be a case-sensitivity problem. In general, tables and columns are not case sensitive, but they will be if you use with quotation marks.
Try the following,
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
If it’s still not working, Try the ‘eval’ first and make sure the query is working fine
sqoop eval --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS"
As per sqoop user guide
" While the Hadoop generic arguments must precede any import arguments, you can type the import arguments in any order with respect to one another."
Please check your argument sequence.
Please try below..
sqoop job --create vjbkeufwekdfas \
-- import \
--connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL \
--username GA_TESTER1 \
--password 123456 \
--query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" \
--target-dir /tmp/vjbkeufwekdfas \
--split-by "Birthdate" \
--check-column "Birthdate" \
--incremental lastmodified \
--hive-import \
--hive-database chinacloud \
--hive-table hive_vjbkeufwekdfas \
--class-name vjbkeufwekdfas \
--fields-terminated-by '^X' \
--hive-drop-import-delims \
--null-non-string '' \
--null-string '' \
--m 1
use
--append \
--last-value "0001-01-01 01:01:01" \

Got exception running Sqoop: java.lang.NullPointerException using -query and --as-parquetfile

I am trying to import a table data from Redshift to HDFS (using Parquet format) and facing the error shown below:
15/06/25 11:05:42 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Command line query used:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" --connect
"jdbc:postgresql://:5439/events" --username "username"
--password "password" --query "SELECT * FROM mobile_og.pages WHERE \$CONDITIONS" --split-by anonymous_id --target-dir
/user/huser/pq_mobile_og_pages_2 --as-parquetfile.
It works fine when --as-parquetfile option is removed from the above command line query.
It is confirmed to be a bug SQOOP-2571.
If you want to import all data of a table, then you can eventually run such a command:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" \
--connect "jdbc:postgresql://:5439/events" \
--username "username" --password "password" \
--table mobile_og.pages \
--split-by anonymous_id \
--target-dir /user/huser/pq_mobile_og_pages_2 \
--as-parquetfile
And --where is also an useful parameter. Check the user manual.

Resources