Hive External Table - hadoop

I am trying to import data from Oracle to Hive using sqoop.
I used the command below once, now I want to overwrite the existing data with new data (Daily Action).
I ran this command again.
sqoop import --connect jdbc:oracle:thin:#UK01WRS6014:2184:WWSYOIT1
--username HIVE --password hive --table OIDS.ALLOCATION_SESSION_DIMN
--hive-overwrite --hive-database OI_DB --hive-table ALLOCATION_SESSION_DIMN
But I am getting an error File already exists:
14/10/14 07:43:59 ERROR security.UserGroupInformation:
PriviledgedActionException as:axchat
(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory
hdfs://uslibeiadg004.aceina.com:8020/user/axchat/OIDS.ALLOCATION_SESSION_DIMN
already exists
The tables that I created in hive were all external tables.
Like mapreduce, do we have to delete that file everytime we execute the same command ?
Any help would be highly appreciated.

When you delete from an EXTERNAL table you only delete objects in the Hive metastore: you don't delete the files over which that table is superimposed. A non-external table belongs soley to Hive and, when deleted, will result in metastore- AND HDFS-data being removed.
So you can either try deleting the HDFS data explicitly, or define the table as being internal to hive.

Related

Sqoop create hive table ERROR - Encountered IOException running create table job

I am running sqoop on a Centos7 Machine that has hadoop/map reduce and hive already installed. I read from a tutorial that when importing data from a RDBMS (SQL Server in my case) to HDFS I need to run the next commands :
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA
Everything works perfectly with this step. The next step is creating a hive table that has the same structure as the RDBMS (SQL Server in my case) and using a sqoop command :
sqoop create-hive-table --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hivetable hivetablename --fields-terminated-by ','
However, whenever I run the above command I get the next error :
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang
/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
18/04/01 19:37:52 INFO ql.Driver: Completed executing
command(queryId=hadoop_20180401193745_1f3cf07d-ca16-40dd-
8f8d-1e426ecd5860); Time taken: 0.212 seconds
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Resetting thread name to
main
18/04/01 19:37:52 INFO conf.HiveConf: Using the default value passed in
for log id: 0813b5c9-f374-4920-b8c6-b8541449a6eb
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme hdfs
18/04/01 19:37:52 INFO session.SessionState: Deleted directory: /tmp/hive
/java/hadoop/0813b5c9-f374-4920-b8c6-b8541449a6eb on fs with scheme file
18/04/01 19:37:52 ERROR tool.CreateHiveTableTool: Encountered IOException
running create table job: java.io.IOException: Hive CliDriver exited with
status=1
I am not a java expert but I would like to know if you have any idea of this result?
I've faced the same issue. It seems that there are some compatibility issues between my versions of sqoop (1.4.7) and hive (2.3.4).
The problem raises from the version of the jackson-* jar files within $SQOOP_HOME/lib: some of them are too old for hive because we need versions older than 2.6.
The solution that I found was to replace the following files in $SQOOP_HOME/lib by their counterpart in $HIVE_HOME/lib:
jackson-core-*.jar
jackson-databind-*.jar
jackson-annotations-*.jar
They are all from versions 2.6+ and this seems to work. Not sure it's good practice though.
I was facing the same issue and I have downgraded my hive to 1.2.2 and it works. That will solve the issue.
But not really sure if you want to use Sqoop with only hive2.
Instead of writing two different statements, you can put the whole thing in one statement, which will fetch the data from sql server and then create a HIVE table too.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://hostname;database=databasename' --username admin --password admin123 --table tableA --hive-import --hive-overwrite --hive-table hivetablename --fields-terminated-by ',' --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
For this please check the jackson-core, jackson-databind and jackson-annotation jar. The jar should be of the latest version. Usually it comes due to the older version. Place these jar inside the hive lib and sqoop lib. Along with please check the libthrift jar, both in hive and hbase it should be same and copy the same in sqoop lib

How can we import all tables in RDBMS into a Hive custom database?

I want to "import-all-tables" using sqoop from mysql to a Hive Custom Database ( Not Hive default Database )
Steps tried:
Create a custom database in hive under "/user/hive/warehouse/Custom.db"
Assigned all permissions for this directory- so there will be NO issues in writing into this directory by sqoop.
Used below command with option "--hive-database" option on CDH5.7 VM :
sqoop import-all-tables
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username retail_dba
--password cloudera
--hive-database "/user/hive/warehouse/sqoop_import_retail.db"
Tables created in hive default database only, not in the custom DB in this case: "sqoop_import_retail.db"
Else its trying to creates tables in the previous HDFS directories (/user/cloudera/categories), and error out stating table already exists:
16/08/30 00:07:14 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
16/08/30 00:07:14 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
[cloudera#quickstart etc]$
How to address this issues?
1. Creating tables in hive custom DB
2. Flushing previous directory references with Sqoop.
You did not mention --hive-import in your command. So, it will import it to HDFS under /user/cloudera/ in your case.
You are exceuting query again. That's why getting Exception
Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
Modify import command:
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --hive-database custom --hive-import
It will fetch all the tables from retail_db of MySQL and create corresponding table to custom database in hive.

Sqoop Import from Hive to Hive

Can we import tables from Hive DataSource to Hive DataSource using Sqoop.
Query like -
sqoop import --connect jdbc:hive2://localhost:10000/default --driver org.apache.hive.jdbc.HiveDriver --username root --password root --table student1 -m 1 --target-dir hdfs://localhost:9000/user/dummy/hive2result
Right now its throwing the below exception
15/07/19 19:50:18 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Method not supported
java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:141)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:290)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Sqoop is not a tool for transferring data from one hive instance to another hive instance. Seems like your requirement is to transfer data in hive from one cluster to another cluster. This can be achieved using hadoop distcp. The full form of sqoop itself is SQl to hadOOP and viceversa.
If you want to migrate multiple databases and tables from one hive to another hive instance, the best approach is to transfer the data using hadoop distcp and trigger the DDLs in the 2nd hive instance. If you don't have the DDLs handy with you, no need to worry.
Just take a dump of the metastore database.
Open the dump file using a notepad or textpad
Replace the hdfs uri with the new hdfs uri.
Import the mysql dump to the metastore of the 2nd hive instance.
Refresh the tables.
An example is given in the below blog post
https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/
distcp will work only for external tables. For managed tables (transactional) use export import DDL.

I couldn't import the tables from my sql server to hive through sqoop

When I pass the command:
$sqoop create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433;username=cloud;password=cloud123;database=hadoop' --table cluster
Some errors and warnings appear and at the end it says,
Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details [again a list of import errors displayed]
Finally it says hive exited with satus 9
What is the problem here? I am new to sqoop and hive. Please anyone help me.
The correct syntax would be
sqoop import --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster --hive-import
I think you might want to check if you have write permissions to the specified directory and if a directory named metastore_db is being created
This message is usually shown when you're running Sqoop with default Hive configuration. Hive will by default use derby datastore which is usable only in very basic test use cases. I would recommend to reconfigure your hive instance to use some other relation database as a datastore back end (MySQL, PostgreSQL, Oracle).
Your syntax is all wrong. Syntax is $sqoop tool-name [tool-arguments]
$sqoop import --create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster
Pasting a sample call of hive import using sqoop. This might help you to correct your syntax further. Remember that essentially you need to give minimum the below command to make it work.
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --hive-import --create-hive-table --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
--connect, in this the part which reads /RAWDATA is the database name from your mysql instance which contains the geolocation table. You can execute 'show databases' and 'show tables' command in mysql to check for your databases and tables.
--delete-target-dir option is used for safety. It will ensure sqoop delete the tmp dir it creates to write the file before moving it into hive. This will avoid unnecessary errors of directory already exists, in case you retry the command.
--create-hive-table is required only if you did not create the target table in hive already. If your previous runs of sqoop command created the table already, then you can ignore this option completely. Check your hive database for existence of target hive table.
--driver is a mandatory part of the command to perform any database connection.Make sure you either find the right path to the driver library or try googling for options. You can try first the one pasted above to see if it does the trick. You can revert to this forum for help.
remember we did not mention which database in hive the table will be created therefore it will be in default database of hive. I am not giving that option since you are just about starting in sqoop.

Moving Sqoop data from HDFS to Hive

When importing a bunch of large MySQL tables into HDFS using Sqoop, I forgot to include the --hive-import flag. So now I've got these tables sitting in HDFS, and am wondering if there's an easy way to load the data into Hive (without writing the LOAD DATA statements myself).
I tried using sqoop create-hive-table:
./bin/sqoop create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets
While this did create the correct hive table, it didn't import any data into it. I have a feeling I'm missing something simple here...
For the record, I am using Elastic MapReduce, with Sqoop 1.4.1.
Can't you create an external table in hive and point it to these files?
create external table something(a string, b string) location 'hdfs:///some/path'
You did not specify "import" in your command. Syntax is sqoop tool-name [tool-arguments]
It should look like this:
$ sqoop import --create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets

Resources