Importing data to hbase using sqoop - hadoop

When I want to import the data to hive using sqoop I can specify --hive-home <dir> and sqoop will call that specified copy of hive installed on the machine where the script is being executed. But what about hbase? How does sqoop know which hbase instance/database I want the data to be imported on?

Maybe the documentation helps?
By specifying --hbase-table, you instruct Sqoop to import to a table in HBase rather than a directory in HDFS
Every example I see just shows that option along with column families, and whatnot, so I assume it depends on whatever variables that might be part of the sqoop-env.sh, like what the Hortonworks docs say
When you give the hive home directory, that's not telling it any database or table information either, but rather where Hive configuration files exist on the machine you're running Sqoop on. By default, that's set to be the environment variable $HIVE_HOME

Related

Hadoop distcp to copy hive tables

I am new to hadoop and hive, I am trying to use
hadoop distcp -overwrite hdfs://source_cluster/apps/hive/warehouse/test.db hdfs://destination_cluster/apps/hive/warehouse/test.db
this command runs properly and there is no error, still I can't see test.db on the target hdfs cluster
You've copied files, but haven't modified the Hive metastore that actually registers table information.
If you want to copy between clusters, I suggest looking into a tool called Circus Train, otherwise, use SparkSQL to interact with the Hiveserver of both cluster rather than use hdfs only tooling
After copying files and directories, it is necessary to recreate the tables (ddl) so that data about those tables appears in the metastore

How to get table name based on the HDFS location?

I would like to find out the table name based on the HDFS location using a shell script.
Is it possible?
I doubt there is a direct hive cli command to get that but you can check the Hive Metastore database/table (mostly MySQL/MariaDB db/table) to get Hive table name based on location, if you have access.

How to push data from SQL to HDFS

I have the following use case:
We have several SQL databases in different locations and we need to load some data them to HDFS.
The problem is that we do not have access to the servers from our Hadoop cluster(due to security concerns), but we can push data to our cluster.
Is there ant tool like Apache Sqoop to do such bulk loading.
Dump data as files from your SQL databases in some delimited format for instance csv and then do a simple hadoop put command and put all the files to hdfs.
Thats it.
Let us assume I am working in a small company on 30 node cluster daily 100GB data processing. This data will comes from the different sources like RDBS such as Oracle, MySQL, IBMs Netteza, DB2 and etc. We need not to install SQOOP on all 30 nodes. The minimum number of nodes should be isntalled by SQOOP is=1. After installing on one machine now we will access those machines. Using SQOOP we will import that data.
As per the security is considered no import will be done untill and unless the administartor has to put the following two commands.
MYSQL>grant all privileges on mydb.table to ''#'IP Address of Sqoop Machine'
MYSQL>grant all privileges on mydb.table to '%'#'IP Address of Sqoop Machine'
these two commands should be fire by admin.
Then we can use our sqoop import commands and etc.

sqoop imports to a different directory than the Hive warehouse directory

When I use sqoop to import tables into Hive the tables go to a different directory than /user/hive/warehouse in HDFS Hadoop. I'm using a default Derby database for the Hive metastore. How can I change this directory to the Hive warehouse directory by default?
Try using --hive-home /user/hive/warehouse. Generally when you are importing data from relational database hive-home has to be selected by default. As you're mentioning that it is not using the warehouse path try setting the parameter using --hive-home.

Cant Query Hive Tables after Sqoop import

I imported several tables of a oracle db into hive via sqoop. The command looked something like this:
./sqoop import --connect jdbc:oracle:thin:#//185.2.252.52:1521/orcl --username USER_NAME --password test --table TABLENAME--hive-import
Im using a embedded Metastore (At least i think so. i have not changed the default conf in that regard). When i do SHOW TABLES in HIVE, the imported Tables do not show up, but some Tables i've created for testing via the command line do. The tables are all in the same warehouse directory on the hdfs. It seems like the sqoop import is not using the same metastore.
But where ist it? And how can I switch to it when using the command line for querying?
thanks
I think that the entire problem is in embedded metastore as HIVE will create it in case that it don't exists in user current working directory by default. And thus Sqoop will end up using different metastore than hive. I would recommend configuring MySQL as backend for metastore.

Resources