Issue : Exporting table from hadoop to mysql - hadoop

This is my sqoop script for exporting table from hadoop to mysql:
export
## Database details
--connect
jdbc:mysql://mktgcituspoc1.cisco.com:3306/poc
--username
pocuser
--password
pocuser
## Table to export to
--table
mktg_site_pub
--export-dir
##/app/MarketingIT/warehouse/mktg_mbd.db/performance_tst
/app/dev/MarketingIt/warehouse/hddvmktg/mktg_mbd.db/mktg_site_pub
--input-fields-terminated-by
'|'
--input-null-string
'\\N'
--input-null-non-string
'\\N'
-m
I'm getting the following error after running the above sqoop script..
15/02/19 02:16:38 INFO mapred.JobClient: Task Id : attempt_201502172305_2648_m_000016_1, Status : FAILED on node hdnprd-c01-r01-05
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
Caused by: java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:140)
at mktg_site_pub.__loadFromFields(mktg_site_pub.java:3622)
at mktg_site_pub.parse(mktg_site_pub.java:3549)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Can anyone please help to figure out whats the exact issue?

It is highly likely that you have a date-related field in MySQL that you are trying to export a non-date value into.
Check which fields in the MySQL table have that type and then check that the corresponding values in your Hadoop data set can be converted to a Java Date using Date.valueOf().

Related

Table from sqoop 1.4.7 importing into HDFS is fine. while importing into hive 3.1.1

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdb --username dsa -P
--split-by id --columns id,name --table employee --target-dir /test1
--fields-terminated-by "," --hive-import --create-hive-table
--hive-table employee_sqoop
Table from sqoop 1.4.7 importing into HDFS is fine. while importing into hive 3.1.1 getting
ERROR [main] tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=1
This is pseudo hadoop 3.1.1 cluster with hbase , sqoop , hive with latest versions....
I copy the libthrift*.jar file from hive/lib into sqoop/lib directory
And also I set HBASE_HOME to a non-existing path
copy jackson-annotations-2.9.5.jar,jackson-core-2.9.5.jar,jackson-databind-2.9.5.jar files into sqoop/lib folder*
ERROR [main] tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=1 at
org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:355)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at
org.apache.sqoop.Sqoop.run(Sqoop.java:147) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at
org.apache.sqoop.Sqoop.main(Sqoop.java:252)

Sqoop import as ORC ERROR java.io.IOException: HCat exited with status 1

I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here
Sqoop command:
sqoop import
-m 1
--connect <jdbc_url>
--driver <database_driver>
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
--username <db_username>
--password <db_password>
--table <table_name>
--hcatalog-home /usr/hdp/current/hive-webhcat
--hcatalog-database <hcat_db>
--hcatalog-table < table_name >
--create-hcatalog-table
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")';
However , it failed with following exception. After spending few hours , I have no clue as to why it is failing. Any help / lead is much appreciated.
16/04/21 19:51:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 1
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1148)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1097)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:644)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:340)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:802)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
UPDATE:
I can see that empty table was created though source table has 200k records.
Any suggestion to fix this issue ?

Exporting the table from hive to Oracle server

When I am importing the table from Oracle db to hive is fine but when I am exporting the data from hive to db I got the error. I am performing the fallowing query operation.
sqoop export
--connect jdbc:oracle:thin:#oragmp01prscan.enterprisenet.org:1521/GMP1PROD_RLB
--username GMPAPMETA
--password welcome123
--table activity/
--export-dir hdfs://user/hive/warehouse/activity/
ERROR:
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
15/11/05 01:18:28 ERROR tool.ExportTool:
Encountered IOException running export job:
java.io.IOException: No columns to generate for ClassWriter

Sqoop import --as-parquetfile with CDH5

I'm trying to import data directly from mysql to parquet but it doesn't seem to work correctly...
I'm using CDH5.3 which includes Sqoop 1.4.5.
Here is my command line :
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile
Then I get this error :
Warning: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/01/09 14:31:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
15/01/09 14:31:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/01/09 14:31:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/01/09 14:31:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/01/09 14:31:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/01/09 14:31:49 INFO tool.CodeGenTool: Beginning code generation
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/01/09 14:31:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.jar
15/01/09 14:31:51 INFO mapreduce.ImportJobBase: Beginning query import.
15/01/09 14:31:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:51 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
15/01/09 14:31:51 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.create(Datasets.java:189)
at org.kitesdk.data.Datasets.create(Datasets.java:240)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:721)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I have no problem importing data to hive file format but parquet is a problem... Do you have any idea why this occurs ?
Thank you :)
Please do not use <db>.<table> with --hive-table. This doesn't work well with Parquet import. Sqoop uses Kite SDK to write Parquet files and it doesn't like this <db>.<table> format.
Instead, please use --hive-database --hive-table . for your command, it should be:
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database \
--username username --password mypass \
--query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id \
--hive-import --hive-database default --hive-table pages_users3 \
--target-dir hive_pages_users --as-parquetfile
Here's my pipeline in CDH 5.5 to import from a jdbc into Hive parquet files.
JDBC data source is for Oracle, but explanation below fits MySQL too.
1) Sqoop:
$ sqoop import --connect "jdbc:oracle:thin:#(complete TNS descriptor)" \
--username MRT_OWNER -P \
--compress --compression-codec snappy \
--as-parquetfile \
--table TIME_DIM \
--warehouse-dir /user/hive/warehouse \
--num-mappers 1
I chose --num-mappers as 1 because TIME_DIM table had just around ~20k rows, and it's not advised to split parquet table into multiple files for such a small dataset. Each mapper creates a separate output (parquet) file.
(ps. for Oracle users: I had to connect as owner of the source table, otherwise had to specify "MRT_OWNER.TIME_DIM", and was getting error org.kitesdk.data.ValidationException: Namespace MRT_OWNER.TIME_DIM is not alphanumeric (plus '_'), seems a sqoop bug).
(ps2. Table name had to be all-uppercase.. not sure if this is Oracle specific (shouldn't be) and if this is another sqoop bug).
(ps3. --compress --compression-codec snappy parameters were recognized but did not seem made any effect)
2) Above command creates a directory named
/user/hive/warehouse/TIME_DIM
It's a wise idea to move it to a specific Hive database directory, e.g.:
$ hadoop fs -mv /hivewarehouse/TIME_DIM /hivewarehouse/dwh.db/time_dim
Assuming name of Hive database/schema is "dwh".
3) Create Hive table, by taking schema directly from parquet file:
$ hadoop fs -ls /user/hive/warehouse/dwh.db/time_dim | grep parquet
-rwxrwx--x+ 3 hive hive 1216 2016-02-04 23:56 /user/hive/warehouse/dwh.db/time_dim/62679a1c-b848-426a-bb8e-9372328ddad7.parquet
If above command returns more than parquet file (it means you had more than one mapper, the --num-mappers parameter), you can pick any parquet file into the below command.
This command should run in Impala and not in Hive. Hive currently can't infer schema from parquet files, but Impala can:
[impala-shell] > CREATE TABLE dwh.time_dim
LIKE PARQUET '/user/hive/warehouse/dwh.db/time_dim/62679a1c-b848-426a-bb8e-9372328ddad7.parquet'
COMMENT 'sqooped from MRT_OWNER.TIME_DIM'
STORED AS PARQUET
LOCATION 'hdfs:///user/hive/warehouse/dwh.db/time_dim'
;
ps. It's also possible to infer schema from parquet using Spark, e.g.
spark.read.schema('hdfs:///user/hive/warehouse/dwh.db/time_dim')
4) Since table wasn't created in Hive (which collects stats automatically), it's a good idea to collect stats:
[impala-shell] > compute stats dwh.time_dim;
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_import_literal
--as-parquetfile
was added in Sqoop 1.4.6 (CDH 5.5).
(ps. for Oracle users: I had to connect as owner of the source table, otherwise had to specify "MRT_OWNER.TIME_DIM", and was getting error org.kitesdk.data.ValidationException: Namespace MRT_OWNER.TIME_DIM is not alphanumeric (plus '_'), seems a sqoop bug).
This can be fixed if database name and table name is written as db_name/table_name instead of db_name.table_name.
Seems like database support is missing in your distribution. It looks like it was added rather recently. Try setting --hive-table to --hive-table pages_users3 and removing --target-dir.
If the above doesn't work, try:
This blog post.
The docs.
Check with the user#sqoop.apache.org mailing list.
I found a solution, I droppped all the hive parts and use the target dir to store the data... Seems to work :
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --target-dir /home/cloudera/user/hive/warehouse/soprism.db/pages_users3 --as-parquetfile -m 1
I then link to the directory making an external table from Impala...

SQOOP Not able to import table

I am running below command on sqoop
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets
my version of sqoop : Sqoop 1.4.4.2.0.6.1-101
Hadoop -- Hadoop 2.2.0.2.0.6.0-101
Both taken from hortonworks distribution. all the paths like HADOOP_HOME, HCAT_HOME, SQOOP_HOME are set properly. I am able to get list of databases, list of tables from mysql database by running list-database, list-tables commands in sqoop. Even able to get data from --query 'select * from widgets'; but when i use --table option getting below error.
14/02/06 14:02:17 WARN mapred.LocalJobRunner: job_local177721176_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:394)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:233)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:236)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 13 more
Specify the --bindir where the compiled code and .jar file should be located.
Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-<username>/compile.
Use the --bindir option and point to your current working directory.
sqoop import --bindir ./ --connect jdbc:mysql://localhost/hadoopguide --table widgets
The problem is resolved after i copied the .class file from /tmp/sqoop-hduser/compile/ to hdfs /home/hduser/ and also the current working directory from where i am running sqoop.
For importing a specific table into hdfs, run:
sqoop import --connect jdbc:mysql://localhost/databasename --username root --password *** --table tablename --bindir /usr/lib/sqoop/lib/ --driver com.mysql.jdbc.Driver --target-dir /directory-name
Make sure that /usr/lib/sqoop/* and /usr/local/hadoop/* should be owned by the same user otherwise it will give error like "Permission denied".
PS: Make sure that you have installed mysql-java connector before you run the command. I installed hadoop version 2.7.3 and connector 5.0.8
Another fix for ClassNotFoundException is to tell Hadoop to use the user classpath first (-Dmapreduce.job.user.classpath.first=true). This can be on command line or in Options file. The top of an import Options file would be:
#Options file for Sqoop import
import
-Dmapreduce.job.user.classpath.first=true
This fixed ClassNotFoundException for me when trying to import data as-avrodatafile

Resources