Using command like:
sqoop export \
--connect jdbc:oracle:thin:'#somehostname.com:1521/prod1_adhoc' \
--username fbaggins \
--P \
--table MIDDLEEARTH \
--hcatalog-database MORDOR \
--hcatalog-table MORDOR \
--columns IS_DWARF,IS_ELF \
--verbose
Results in this error:
16/08/25 10:08:31 INFO hive.metastore: Trying to connect to metastore with URI thrift://somehostname.com:1521
16/08/25 10:08:31 INFO hive.metastore: Connected to metastore.
16/08/25 10:08:31 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader#56aac163
16/08/25 10:08:31 ERROR tool.ExportTool: Encountered IOException running export job: java.io.IOException: java.lang.NullPointerException
16/08/25 10:08:31 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:#somehostname.com:1521/prod1_adhoc/fbaggins
Not sure where the null is coming from as there are no nulls in the Hive table.
For reference, from hive:
hive> describe MORDOR;
OK
IS_DWARF bigint
IS_ELF string
From Oracle:
describe MORDOR
Name Null Type
----------------------- ---- -----------
IS_DWARF NUMBER(12)
IS_ELF VARCHAR2(3)
Is MORDOR a Hive view and not actually a table?
I have exactly the same problem when an object specified for sqoop --table is a view. DESCRIBE commands just prints columns of that view - so it does not show if it's a view or not. You could run SHOW CREATE TABLE MORDOR to confirm that's a view or table ("show create table" works for a view too as there is no separate command like "show create view").
SQOOP documentation explicitly states that sqoop import from a view is supported, but sqoop documentation on sqoop export for a view is silent - it's not stated if it's supported or not - it might be a SQOOP or HMS bug. Sqoop might not like what HMS returns for a view. So not exluding it might be an HMS bug.
Look at the logs again: don't you see anything absurd?
Trying to connect to metastore with URI thrift://$HOSTNAME:$PORT
Caching released connection for jdbc:oracle:thin:#$HOST:$PORT/$ALIAS/$SCHEMA
Yes. The bloody environment variables (or shell variables) have not been resolved by the shell. Although I'm not sure what happened to the Metastore URI (it's defined in a Hadoop conf file, no shell/env variables there...)
That's because you have enclosed them in SINGLE QUOTES.
--connect jdbc:oracle:thin:'#$HOST:$PORT/$ALIAS'
So you have a 3-step solution:
use double quotes whenever you want the shell to resolve variables inside
learn script shell
learn about pointers, Null pointers, and what happens when some buggy code fails to check whether an Object variable has been properly initialized (i.e. NullPointerException in Java)
Related
I am running sqoop on a Centos7 Machine that has hadoop/map reduce and hive already installed. I read from a tutorial that when importing data from a RDBMS (SQL Server in my case) to HDFS I need to run the next commands :
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA
Everything works perfectly with this step. The next step is creating a hive table that has the same structure as the RDBMS (SQL Server in my case) and using a sqoop command :
sqoop create-hive-table --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hivetable hivetablename --fields-terminated-by ','
However, whenever I run the above command I get the next error :
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 INFO ql.Driver: Completed executing
command(queryId=hadoop_20180401193745_1f3cf07d-ca16-40dd-
8f8d-1e426ecd5860); Time taken: 0.212 seconds
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Resetting thread name to
main
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme hdfs
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/java/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme file
18/04/01 19:37:52 ERROR tool.CreateHiveTableTool: Encountered IOException
running create table job: java.io.IOException: Hive CliDriver exited with
status=1
I am not a java expert but I would like to know if you have any idea of this result?
I've faced the same issue. It seems that there are some compatibility issues between my versions of sqoop (1.4.7) and hive (2.3.4).
The problem raises from the version of the jackson-* jar files within $SQOOP_HOME/lib: some of them are too old for hive because we need versions older than 2.6.
The solution that I found was to replace the following files in $SQOOP_HOME/lib by their counterpart in $HIVE_HOME/lib:
jackson-core-*.jar
jackson-databind-*.jar
jackson-annotations-*.jar
They are all from versions 2.6+ and this seems to work. Not sure it's good practice though.
I was facing the same issue and I have downgraded my hive to 1.2.2 and it works. That will solve the issue.
But not really sure if you want to use Sqoop with only hive2.
Instead of writing two different statements, you can put the whole thing in one statement, which will fetch the data from sql server and then create a HIVE table too.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hive-import --hive-overwrite --hive-table hivetablename --fields-terminated-by ',' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
For this please check the jackson-core, jackson-databind and jackson-annotation jar. The jar should be of the latest version. Usually it comes due to the older version. Place these jar inside the hive lib and sqoop lib. Along with please check the libthrift jar, both in hive and hbase it should be same and copy the same in sqoop lib
I am trying to run a sqoop job .I am using sqoop version Sqoop 1.4.6-cdh5.8.0 and it is not working for this version
It is working fine with Sqoop 1.4.5-cdh5.4.0.
sqoop job --create E8 -- import --connect jdbc:mysql://localhost/test -- username root --password cloudera --table NAME --hive-import -m1
sqoop job --exec E8 -- --table dummy1
Is there any syntax issue.If anyone can help with this.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo
imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/12/23 04:48:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-
cdh5.8.0
Enter password:
16/12/23 04:48:19 INFO manager.MySQLManager: Preparing to use a
MySQL streaming resultset.
16/12/23 04:48:19 INFO tool.CodeGenTool: Beginning code generation
16/12/23 04:48:20 INFO manager.SqlManager:
Executing SQL statement: SELECT t.* FROM `NAME` AS t LIMIT 1
16/12/23 04:48:20 ERROR manager.SqlManager: Error executing
statement:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
'test.NAME' doesn't exist
Assuming you already did the basic checks (like manually placing the parameter in the job, and executing it) I would say the syntax looks to be correct.
When looking at the doc it is mentioned that one can override properties. Unfortunately, they only show an example that adds a property, and don't show overriding one.
A search led me to this open issue which leads me to believe that there is a bug that prevents you from overriding parameters properly.
Unfortunately I don't see a solution for this, some things that might help to work around the problem:
Parameterize on a different level
Play with the syntax (does it help if it is the first/last override element, what if you try to override AND add a user, what if you try to override the query parameter instead of the table parameter ...)
This seems to be a bug in sqoop-1.4.6-cdh5.8.0 and sqoop-1.4.6-cdh5.9.0
This, however, as you mentioned is working correctly with 1.4.5 version.
The below solution worked for me:
1) Download 'sqoop-1.4.5-cdh5.4.0.jar' from http://repo.spring.io/libs-release/org/apache/sqoop/sqoop/1.4.5-cdh5.4.0/
2) Replace 'sqoop-1.4.6-cdh5.8.0.jar' with 'sqoop-1.4.5-cdh5.4.0.jar' and modify the symbolic link 'sqoop.jar' to point to 'sqoop-1.4.5-cdh5.4.0.jar'
3) Though I don't support downgrading, but still this works as charm.
sqoop eval command :
sqoop eval --connect 'jdbc:mysql://<connection url>' --driver com.mysql.jdbc.Driver --query "select max(rdate) from test.sqoop_test"
gives me output:
Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo
imports will fail. Please set $ACCUMULO_HOME to the root of your
Accumulo installation. Warning: /usr/hdp/2.3.2.0-2950/zookeeper does
not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to
the root of your Zookeeper installation. 16/10/05 18:38:17 INFO
sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/10/05
18:38:17 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead. 16/10/05 18:38:17
WARN sqoop.ConnFactory: Parameter --driver is set to an explicit
driver however appropriate connection manager is not being set (via
--connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly
which connection manager should be used next time. 16/10/05 18:38:17
INFO manager.SqlManager: Using default fetchSize of 1000
-------------- | max(rdate) |
-------------- | 2014-01-25 |
but i want output without warning and table boundries like:
max(rdate) 2014-01-25
i basically want to store this output to a file.
thanks in advance
You can perform Sqoop Import operation to save output in HDFS.
Warnings are straight forward.
You can set $ACCUMULO_HOME, $ZOOKEEPER_HOME if available.
You can set --connection-manager corresponding to Mysql
For the sake of security,
It's recommended to use -P for password rather than writing in command.
These are not errors, you can live with these warnings.
You can create a .sh file , write your sqoop commands into it, then run it as
shell_file_name.sh > your_output_file.txt
We have two ways to get the query results:
The other way is you can write to HDFS by importing query results(--target-dir /path) and read from there.
You can change the file system option in sqoop command to store the results from import query, So the idea behind is you importing data to local file system rather HDFS.
eg: sqoop import -fs local -jt local --connect "connection string" --username root --password root query "Select * from table" --target-dir /home/output
https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1762587
I am attempting to run a Sqoop job to load from an Oracle db and into Parquet format to a Hadoop cluster. The job is incremental.
Sqoop version is 1.4.6. Oracle version is 12c. Hadoop version is 2.6.0 (distro is Cloudera 5.5.1).
The Sqoop command is (this creates the job, and executes it):
$ sqoop job -fs hdfs://<HADOOPNAMENODE>:8020 \
--create myJob \
-- import \
--connect jdbc:oracle:thin:#<DBHOST>:<DBPORT>/<DBNAME> \
--username <USERNAME> \
-P \
--as-parquetfile \
--table <USERNAME>.<TABLENAME> \
--target-dir <HDFSPATH> \
--incremental append \
--check-column <TABLEPRIMARYKEY>
$ sqoop job --exec myJob
Error on execute:
16/02/05 11:25:30 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.ValidationException: Dataset name
05112528000000918_2088_<USERNAME>.<TABLENAME>
is not alphanumeric (plus '_')
at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:66)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:80)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:106)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:668)
at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Troubleshooting Steps:
0) HDFS is stable, other Sqoop jobs are functional, Oracle source DB is up and the connection has been tested.
1) I tried creating a synonym in Oracle, that way I could simply have the --table option as:
--table TABLENAME (without the username)
This gave me an error that the table name was not correct. It needs the full USERNAME.TABLENAME for the --table option.
Error:
16/02/05 12:04:46 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLENAME>. Please ensure that your table name is correct.
2) I made sure that this is a Parquet issue. I removed the --as-parquetfile option and the job was successful.
3) I wondered if this is somehow caused by the incremental options. I removed the --incremental append & --check-column options and the job was successful. This confuses me.
4) I tried the job with MySQL and it was successful.
Has anyone run into something similar? Is there a way (or is it even advisable) to disable the Kite validation? It seems that the dataset is being created with dots ("."), which then Kite SDK complains about - but this is an assumption on my part as I am not too familiar with Kite SDK.
Thanks in advance,
Jose
Resolved. There seems to be a known issue with the JDBC connectivity to Oracle 12c. Using a specific OJDBC6 (instead of 7) did the trick. FYI - the OJDBC is installed in /usr/share/java/ and a symbolic link is created in /installpath.../lib/sqoop/lib/
As reported by user #Remya Senan,
breaking the parameter
--hive-table my_hive_db_name.my_hive_table_name
into separate params
--hive-database my_hive_db_name
--hive-table my_hive_table_name
did the trick for me
My environment was
Sqoop v1.4.7
Hive 2.3.3
Tip: I was on emr-5.19.0
I also got this error when I was sqoop importing all tables as parquet file on CHD5.8. By looking at error message I felt this implementation does not support directories with "-" in their name. Based on this understanding I removed "-" from directory name and re-ran the sqoop import command and all worked fine. Hope this helps!
When I pass the command:
$sqoop create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433;username=cloud;password=cloud123;database=hadoop' --table cluster
Some errors and warnings appear and at the end it says,
Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details [again a list of import errors displayed]
Finally it says hive exited with satus 9
What is the problem here? I am new to sqoop and hive. Please anyone help me.
The correct syntax would be
sqoop import --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster --hive-import
I think you might want to check if you have write permissions to the specified directory and if a directory named metastore_db is being created
This message is usually shown when you're running Sqoop with default Hive configuration. Hive will by default use derby datastore which is usable only in very basic test use cases. I would recommend to reconfigure your hive instance to use some other relation database as a datastore back end (MySQL, PostgreSQL, Oracle).
Your syntax is all wrong. Syntax is $sqoop tool-name [tool-arguments]
$sqoop import --create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster
Pasting a sample call of hive import using sqoop. This might help you to correct your syntax further. Remember that essentially you need to give minimum the below command to make it work.
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --hive-import --create-hive-table --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
--connect, in this the part which reads /RAWDATA is the database name from your mysql instance which contains the geolocation table. You can execute 'show databases' and 'show tables' command in mysql to check for your databases and tables.
--delete-target-dir option is used for safety. It will ensure sqoop delete the tmp dir it creates to write the file before moving it into hive. This will avoid unnecessary errors of directory already exists, in case you retry the command.
--create-hive-table is required only if you did not create the target table in hive already. If your previous runs of sqoop command created the table already, then you can ignore this option completely. Check your hive database for existence of target hive table.
--driver is a mandatory part of the command to perform any database connection.Make sure you either find the right path to the driver library or try googling for options. You can try first the one pasted above to see if it does the trick. You can revert to this forum for help.
remember we did not mention which database in hive the table will be created therefore it will be in default database of hive. I am not giving that option since you are just about starting in sqoop.