I imported several tables of a oracle db into hive via sqoop. The command looked something like this:
./sqoop import --connect jdbc:oracle:thin:#//185.2.252.52:1521/orcl --username USER_NAME --password test --table TABLENAME--hive-import
Im using a embedded Metastore (At least i think so. i have not changed the default conf in that regard). When i do SHOW TABLES in HIVE, the imported Tables do not show up, but some Tables i've created for testing via the command line do. The tables are all in the same warehouse directory on the hdfs. It seems like the sqoop import is not using the same metastore.
But where ist it? And how can I switch to it when using the command line for querying?
thanks
I think that the entire problem is in embedded metastore as HIVE will create it in case that it don't exists in user current working directory by default. And thus Sqoop will end up using different metastore than hive. I would recommend configuring MySQL as backend for metastore.
Related
When I want to import the data to hive using sqoop I can specify --hive-home <dir> and sqoop will call that specified copy of hive installed on the machine where the script is being executed. But what about hbase? How does sqoop know which hbase instance/database I want the data to be imported on?
Maybe the documentation helps?
By specifying --hbase-table, you instruct Sqoop to import to a table in HBase rather than a directory in HDFS
Every example I see just shows that option along with column families, and whatnot, so I assume it depends on whatever variables that might be part of the sqoop-env.sh, like what the Hortonworks docs say
When you give the hive home directory, that's not telling it any database or table information either, but rather where Hive configuration files exist on the machine you're running Sqoop on. By default, that's set to be the environment variable $HIVE_HOME
This question already has answers here:
How to change sqoop metastore?
(3 answers)
Closed 5 years ago.
am planning to change sqoop metastore to mysql db(am using hadoop 2.65, mysql 5.7, sqoop 1.4.6)
by defalut where the sqoop metastore will be stored, like sqoop job definition's (like hive metadata will be stored in derby db)..
created sqoop job's and able to see those by sqoop job --list n executing those as well, how do i confirm that all the metadata is going to store in mysql..
i went through the google didn't get good one,can any one please provide good documentation or google link
thanks in advance
Check your sqoop-site.xml for sqoop.metastore.server.location parameter. It will tell you how Sqoop is configured to use metastore.
You can configure sqoop.metastore.client.autoconnect.url to point to your metastore and then create and execute saved jobs.
Generally, we have two options w.r.t metastore:
Internal metastore - maintained by sqoop and built over hsqldb
External metastore - like Hive metastore
It would be great if you can post your observations(along with code) here for others to refer.
I have to use sqoop to import all tables from a mysql database to hdfs and to external tables in hive (no filters, with the same structure)
In import I want to bring:
New data for existing tables
Updated data for existing tables (using only the id column)
New tables created in mysql (y to create external table in hive)
Then create a sqoop job to do it all automatically.
(I have a mysql database with approximately 60 tables, and with each new client going into production, a new table is created. So I need sqoop to work as automatically as possible)
The first command executed to import all the tables was:
sqoop import-all-tables
--connect jdbc:mysql://IP/db_name
--username user
--password pass
--warehouse-dir /user/hdfs/db_name
-m 1
Here Scoop and support for external Hive tables says that support was added for the creation of external tables in hive, but I did not find documentation or examples on the mentioned commands
What are the best practices to work with in sqoop where it looks at
all the updates from a mysql database and passes to hdfs and hive?
Any ideas would be good.
Thanks in advance.
Edit: Scoop and support for external Hive tables (SQOOP-816) is still unresolved
I have the following use case:
We have several SQL databases in different locations and we need to load some data them to HDFS.
The problem is that we do not have access to the servers from our Hadoop cluster(due to security concerns), but we can push data to our cluster.
Is there ant tool like Apache Sqoop to do such bulk loading.
Dump data as files from your SQL databases in some delimited format for instance csv and then do a simple hadoop put command and put all the files to hdfs.
Thats it.
Let us assume I am working in a small company on 30 node cluster daily 100GB data processing. This data will comes from the different sources like RDBS such as Oracle, MySQL, IBMs Netteza, DB2 and etc. We need not to install SQOOP on all 30 nodes. The minimum number of nodes should be isntalled by SQOOP is=1. After installing on one machine now we will access those machines. Using SQOOP we will import that data.
As per the security is considered no import will be done untill and unless the administartor has to put the following two commands.
MYSQL>grant all privileges on mydb.table to ''#'IP Address of Sqoop Machine'
MYSQL>grant all privileges on mydb.table to '%'#'IP Address of Sqoop Machine'
these two commands should be fire by admin.
Then we can use our sqoop import commands and etc.
When I use sqoop to import tables into Hive the tables go to a different directory than /user/hive/warehouse in HDFS Hadoop. I'm using a default Derby database for the Hive metastore. How can I change this directory to the Hive warehouse directory by default?
Try using --hive-home /user/hive/warehouse. Generally when you are importing data from relational database hive-home has to be selected by default. As you're mentioning that it is not using the warehouse path try setting the parameter using --hive-home.