I've multiple file in HDFS. Trying to load to Oracle. Using this script from long time. But recently same process giving me error.
Here is the code:
#!/bin/bash
refDate=`date --date="2 days ago" "+%Y-%m-%d"`
func_data_load_to_dwh()
{
date
if [[ ! -z `hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/*` ]]
then
file=`hadoop fs -ls /data/dna_data/dna_daily_summary_data/pdate=${refDate}/|grep dna_daily_summary_data |awk '{print $NF}'`
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir ${file} --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
}
#_________________ main function ________________
func_data_load_to_dwh
Error Code:
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/f14873b51555a17d-946772b200000029_1797907225_data.0.
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: |
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --lines-terminated-by
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: \n
20/01/14 11:05:12 ERROR tool.BaseSqoopTool: Unrecognized argument: --direct
But If I run manually with single file from shell it runs perfectly:
sqoop export --connect jdbc:oracle:thin:#//raxdw-scan:1628/raxdw --username schema_name --password 'password' --table TBL_DNA_DATA_DAILY_SUMMARY --export-dir /data/dna_data/dna_daily_summary_data/pdate=2020-01-12/784784ec62558fea-1d14102000000029_1144698661_data.0. --input-fields-terminated-by '|' --lines-terminated-by '\n' --direct
Bit strange because it worked before. All I did is to change the directory in if condition. Nothing else. Could you suggesnt anything
Related
I am trying to create a Sqoop job with incremental lastmodified, but it throws ERROR tool.BaseSqoopTool: Unrecognized argument: --merge-key. Sqoop job:
sqoop job --create ImportConsentTransaction -- import --connect jdbc:mysql://quickstart:3306/production --table ConsentTransaction --fields-terminated-by '\b' --incremental lastmodified --merge-key transactionId --check-column updateTime --target-dir '/user/hadoop/ConsentTransaction' --last-value 0 --password-file /user/hadoop/sqoop.password --username user -m 1
Add --incremental lastmodified.
i want to import last updates on my as400 table with sqoop import incremantal, this is my sqoop command:
i'm sure about allvariables, to_porcess_ts it's a string timestamp (yyyymmddhhmmss)
sqoop import --verbose --driver $SRC_DRIVER_CLASS --connect $SRC_URL --username $SRC_LOGIN --password $SRC_PASSWORD \
--table $SRC_TABLE --hive-import --hive-table $SRC_TABLE_HIVE --target-dir $DST_HDFS \
--hive-partition-key "to_porcess_ts" --hive-partition-value $current_date --split-by $DST_SPLIT_COLUMN --num-mappers 1 \
--boundary-query "$DST_QUERY_BOUNDARY" \
--incremental-append --check-column "to_porcess_ts" --last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE"); \
--compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec
I got this error:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --incremental-append
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --check-column
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: to_porcess_ts
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --last-value
18/05/14 16:14:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -46039745595
Remove ; or replace it on ) from the end of this line:
--last-value $(hive -e "select max(unix_timestamp(to_porcess_ts, 'ddmmyyyyhhmmss')) from $SRC_TABLE_HIVE";) \
I need some help with sqoop.
First of all, I'm sorry, my english isn't very good.
Using the folowing command:
sqoop import -D mapreduce.output.fileoutputformat.compress=false --num-mappers 1 --connection-manager "com.quest.oraoop.OraOopConnManager" --connect "jdbc:Oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=1534)))(CONNECT_DATA=(SERVICE_NAME=myservice)))" --username "rodrigo" --password pwd \
--query "SELECT column1, column2 from myTable where \$CONDITIONS" \
--null-string '' --null-non-string '' --fields-terminated-by '|' \
--lines-terminated-by '\n' --as-textfile --target-dir /data/rodrigo/myTable \
--hive-import --hive-partition-key yearmonthday --hive-partition-value '20180101' --hive-overwrite --verbose -P --m 1 --hive-table myTable
My table is already created, because I must create a solicitation for create a table in my hive database, so I can't create dinamically inside sqoop command.
I have permission to create the directory in hdfs.
When I remove the directory, sqoop logs an error saying that I have no create table permissions, and when I already create the diretory, it returns a FileAlreadyExistsException.
What can I do to solve that?
Thanks from Brazil.
Im working on a sqoop incremental import command. But Im getting error message in the end which I couldn't understand where is the problem.
Below is my MySQL table data
+----+-----------+
| ID | NAME |
+----+-----------+
| 1 | Sidhartha |
| 2 | Sunny |
| 3 | Saketh |
| 4 | Bobby |
| 5 | Yash |
| 6 | Nimmi |
+----+-----------+
Hive table with 4 records: DAY is the partitioned column
importedtable.id importedtable.name importedtable.day
1 Sidhartha 1
2 Sunny 1
3 Saketh 1
4 Bobby 1
My Sqoop command:
sqoop import --connect jdbc:mysql://127.0.0.1/mydb --table MYTAB --driver com.mysql.jdbc.Driver --username root --password cloudera --hive-import --hive-table importedtable --incremental append --check-column id --last-value $(hive -e "select max(id) from importedtable") --target-dir '/home/incdata';
Error message:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: The
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: method
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: class
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: org.apache.commons.logging.impl.SLF4JLogFactory#release()
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: was
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: invoked.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: WARN:
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: Please
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: see
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: http://www.slf4j.org/codes.html#release
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: for
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: an
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: explanation.
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
17/03/08 12:15:14 ERROR tool.BaseSqoopTool: Unrecognized argument: /home/incdata
Can anyone tell me what is the mistake Im doing the sqoop command.
The problem is with the hive query passed as the value for --last-value argument,
--last-value $(hive -e "select max(id) from importedtable")
This emits the log messages along with the result to --last-value.
Use -S (--silent) flag to the query,
--last-value $(hive -S -e "select max(id) from importedtable")
I am trying to import a table data from Redshift to HDFS (using Parquet format) and facing the error shown below:
15/06/25 11:05:42 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Command line query used:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" --connect
"jdbc:postgresql://:5439/events" --username "username"
--password "password" --query "SELECT * FROM mobile_og.pages WHERE \$CONDITIONS" --split-by anonymous_id --target-dir
/user/huser/pq_mobile_og_pages_2 --as-parquetfile.
It works fine when --as-parquetfile option is removed from the above command line query.
It is confirmed to be a bug SQOOP-2571.
If you want to import all data of a table, then you can eventually run such a command:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" \
--connect "jdbc:postgresql://:5439/events" \
--username "username" --password "password" \
--table mobile_og.pages \
--split-by anonymous_id \
--target-dir /user/huser/pq_mobile_og_pages_2 \
--as-parquetfile
And --where is also an useful parameter. Check the user manual.