Im working on a sqoop incremental import command. But Im getting error message in the end which I couldn't understand where is the problem.
Below is my MySQL table data
+----+-----------+
| ID | NAME |
+----+-----------+
| 1 | Sidhartha |
| 2 | Sunny |
| 3 | Saketh |
| 4 | Bobby |
| 5 | Yash |
| 6 | Nimmi |
+----+-----------+
Hive table with 4 records: DAY is the partitioned column
importedtable.id importedtable.name importedtable.day
1 Sidhartha 1
2 Sunny 1
3 Saketh 1
4 Bobby 1
My Sqoop command:
sqoop import --connect jdbc:mysql://127.0.0.1/mydb --table MYTAB --driver com.mysql.jdbc.Driver --username root --password cloudera --hive-import --hive-table importedtable --incremental append --check-column id --last-value $(hive -e "select max(id) from importedtable") --target-dir '/home/incdata';
Error message:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: The
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: method
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: class
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: org.apache.commons.logging.impl.SLF4JLogFactory#release()
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: was
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: invoked.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: Please
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: see
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: http://www.slf4j.org/codes.html#release
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: for
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: an
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: explanation.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: /home/incdata
Can anyone tell me what is the mistake Im doing the sqoop command.
The problem is with the hive query passed as the value for --last-value argument,
--last-value $(hive -e "select max(id) from importedtable")
This emits the log messages along with the result to --last-value.
Use -S (--silent) flag to the query,
--last-value $(hive -S -e "select max(id) from importedtable")
Related
I've multiple file in HDFS. Trying to load to Oracle. Using this script from long time. But recently same process giving me error.
Here is the code:
#!/bin/bash
refDate=`date --date="2 days ago" "+%Y-%m-%d"`
func_data_load_to_dwh()
{
date
if [[ ! -z `hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/*` ]]
then
file=`hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/|grep dna_daily_summary_data |awk '{print $NF}'`
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir ${file} --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
}
#_________________ main function ________________
func_data_load_to_dwh
Error Code:
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/f14873b51555a17d-946772b200000029_1797907225_data.0.
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: |
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --lines-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: \n
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --direct
But If I run manually with single file from shell it runs perfectly:
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/784784ec62558fea-1d14102000000029_1144698661_data.0. --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
Bit strange because it worked before. All I did is to change the directory in if condition. Nothing else. Could you suggesnt anything
I am trying to create a Sqoop job with incremental lastmodified, but it throws ERROR tool.BaseSqoopTool: Unrecognized argument: --merge-key. Sqoop job:
sqoop job --create ImportConsentTransaction -- import --connect jdbc:mysql://quickstart:3306/production --table ConsentTransaction --fields-terminated-by '\b' --incremental lastmodified --merge-key transactionId --check-column updateTime --target-dir '/user/hadoop/ConsentTransaction' --last-value 0 --password-file /user/hadoop/sqoop.password --username user -m 1
Add --incremental lastmodified.
i want to import last updates on my as400 table with sqoop import incremantal, this is my sqoop command:
i'm sure about allvariables, to_porcess_ts it's a string timestamp (yyyymmddhhmmss)
sqoop import --verbose --driver $SRC_DRIVER_CLASS --connect $SRC_URL --username $SRC_LOGIN --password $SRC_PASSWORD \
--table $SRC_TABLE --hive-import --hive-table $SRC_TABLE_HIVE --target-dir $DST_HDFS \
--hive-partition-key "to_porcess_ts" --hive-partition-value $current_date --split-by $DST_SPLIT_COLUMN --num-mappers 1 \
--boundary-query "$DST_QUERY_BOUNDARY" \
--incremental-append --check-column "to_porcess_ts" --last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE"); \
--compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec
I got this error:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --incremental-append
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --check-column
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: to_porcess_ts
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --last-value
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -46039745595
Remove ; or replace it on ) from the end of this line:
--last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE";) \
I am trying to migrate the data from oracle to Hbase with sqoop job.It looks like that it has successfully exported but it throws an error while importing the same in Hbase.
Job1:
`sqoop import --verbose --connect *** --username *** --password *** --table 'abc' --columns "MID,EID,RTIMESTAMP,VALUE,UTIMESTAMP" --split-by 'abc.ID' --hbase-table "HPVSQOOP" --column-family "cf1" --hbase-row-key MID,EID,RTIMESTAMP --num-mappers 4 --hbase-bulkload
where ID is the primary key in Oracle but I want my HBase row-key to be as MID_EID_RTIMESTAMP
Map-Reduced failed by throwing an Error:
INFO mapreduce.Job: Task Id : attempt_1492489711789_0014_m_000003_2,
Status : FAILED Error: java.io.IOException: Could not insert row with
null value for row-key column: MID,EID,RTIMESTAMP at
org.apache.sqoop.hbase.ToStringPutTransformer.getPutCommand(ToStringPutTransformer.java:146)
at
org.apache.sqoop.mapreduce.HBaseBulkImportMapper.map(HBaseBulkImportMapper.java:83)
Another Job with --query is not working with Hbase import.
Job2:
sqoop import --verbose --connect *** --username *** --password **' --query "select MID,EID,VALUE,RTIMESTAMP,UTIMESTAMP,ID from database.abc where \$CONDITIONS" --split-by 'abc.ID' --hbase-table "HPVSQOOP" --column-family "cf1" --hbase-row-key "MID,EID,RTIMESTAMP" --num-mappers 4 --hbase-bulkload
ended up throwing an error:
ERROR sqoop.Sqoop: Got exception running Sqoop:
java.lang.NullPointerException java.lang.NullPointerException
If the column name is lowercase in your database, you should use lowercase names in the command line like this --hbase-row-key "mid,eid..."
I am using the sqoop 1.4.4 to transfer data from oracle to hive, using sentence:
sqoop job --create vjbkeufwekdfas -- import --split-by "Birthdate"
--check-column "Birthdate" --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --null-string ''
it doesn't work, cause the sqoop use checking strategy in the tool.ImportTool
16/10/08 14:58:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f068e5d884f3929b6415cd8318085fea/vjbkeufwekdfas.jar
16/10/08 14:58:03 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:58:03 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
16/10/08 14:58:03 ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
16/10/08 14:58:03 ERROR util.SqlTypeMap: correct column names in Sqoop options.
16/10/08 14:58:03 ERROR tool.ImportTool: Imported Failed: column not found: "Birthdate"
However, if I don't use the double quotation marks:
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
null-string ''
The oracle gonna throw error, because the field has some lower-case letters:
16/10/08 14:37:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/1f05a7e94340dd92c9e3b11db1a5db46/vjbkeufwekdfas.jar
16/10/08 14:37:16 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:37:16 INFO tool.ImportTool: Incremental import based on column Birthdate
16/10/08 14:37:16 INFO tool.ImportTool: Upper bound value: TO_DATE('2016-10-08 14:37:20', 'YYYY-MM-DD HH24:MI:SS')
16/10/08 14:37:16 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/08 14:37:16 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/08 14:37:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/08 14:37:17 INFO client.RMProxy: Connecting to ResourceManager at master.huacloud.test/172.16.50.21:8032
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: number of splits:1
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474871676561_71208
16/10/08 14:37:25 INFO impl.YarnClientImpl: Submitted application application_1474871676561_71208
16/10/08 14:37:26 INFO mapreduce.Job: The url to track the job: http://master.huacloud.test:8088/proxy/application_1474871676561_71208/
16/10/08 14:37:26 INFO mapreduce.Job: Running job: job_1474871676561_71208
16/10/08 14:37:33 INFO mapreduce.Job: Job job_1474871676561_71208 running in uber mode : false
16/10/08 14:37:33 INFO mapreduce.Job: map 0% reduce 0%
16/10/08 14:37:38 INFO mapreduce.Job: Task Id : attempt_1474871676561_71208_m_000000_0, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:266)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "BIRTHDATE": invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:439)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:395)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:802)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:436)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:861)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3449)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3493)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
... 12 more
I am so confused, any help?
I think, It could be a case-sensitivity problem. In general, tables and columns are not case sensitive, but they will be if you use with quotation marks.
Try the following,
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
If it’s still not working, Try the ‘eval’ first and make sure the query is working fine
sqoop eval --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS"
As per sqoop user guide
" While the Hadoop generic arguments must precede any import arguments, you can type the import arguments in any order with respect to one another."
Please check your argument sequence.
Please try below..
sqoop job --create vjbkeufwekdfas \
-- import \
--connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL \
--username GA_TESTER1 \
--password 123456 \
--query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" \
--target-dir /tmp/vjbkeufwekdfas \
--split-by "Birthdate" \
--check-column "Birthdate" \
--incremental lastmodified \
--hive-import \
--hive-database chinacloud \
--hive-table hive_vjbkeufwekdfas \
--class-name vjbkeufwekdfas \
--fields-terminated-by '^X' \
--hive-drop-import-delims \
--null-non-string '' \
--null-string '' \
--m 1
use
--append \
--last-value "0001-01-01 01:01:01" \