I am using HBase to store the data but later to suit my requirements I want to export the data from HBase to RDBM like mysql or postgres. I know we have Sqoop as a option but it imports from MySQL and put it into HBase and will export the saved data in HDFS to RDBMS and it cannot import data directly from HBase.
Is there any tool to export data from HBase tables to RDBMS tables?
Not sure if this is a better approach, but HBase data can be exported into a flat file and then loaded into RDBMS.
Related
just like in inmport command we can import all tables from rdbms to hdfs like in this way I want to export all tables from hive to rdbms or hdfs to rdbms is there any solution for this? and I tried all ready that export single table from hive to rdbms or hdfs to rdbms.
just i want to what is the command
Can we wirte sqoop data to hive and hbase together in hadoop
I want to write sqoop to hive (rdbms) and hbase (NoSql) together
No it cannot. If you want the data to show up in Hive and HBase, you will have to import it into two separate locations, Create hive table on one for use in Hive. On the second location you will have to create an External Hive table with HBase SerDe properties.
Integrating Hive and HBase. This link shall give you the steps required.
I have this environment:
Haddop environment (1 master, 4 slaves) with several applications:
ambari, hue, hive, sqoop, hdfs ... Server in production (separate
from hadoop) with mysql database.
My goal is:
Optimize the queries made on this mysql server that are slow to
execute today.
What did I do:
I imported the mysql data to HDFS using Sqoop.
My doubts:
I can not make selects direct in HDFS using Hive?
Do I have to load the data into Hive and make the queries?
If new data is entered into the mysql database, what is the best way
to get this data and insert it into HDFS and then insert it into
Hive again? (Maybe in real time)
Thank you in advance
I can not make selects direct in HDFS using Hive?
You can. Create External Table in hive specifying your hdfs location. Then you can perform any HQL over it.
Do I have to load the data into Hive and make the queries?
In case of external table, you don't need to load data in hive; your data resides in the same HDFS directory.
If new data is entered into the mysql database, what is the best way to get this data.
You can use Sqoop Incremental Import for this. It will fetch only newly added/updated data (depending upon incremental mode). You can create a sqoop job and schedule it as per your need.
You can try Impala which is much faster than Hive in case of SQL queries. You need to define tables most probably specifying some delimiter, storage format and where the data is stored on HDFS (I don't know what kind of data are you storing). Then you can write SQL queries which will take the data from HDFS.
I have no experience with real-time data ingestion from relational databases, however you can try scheduling Sqoop jobs with cron.
There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.
What is the difference between Apache Sqoop and Hive? I know that sqoop is used to import/export data from RDBMS to HDFS and Hive is a SQL layer abstraction on top of Hadoop. Can I can use Sqoop for importing data into HDFS and then use Hive for querying?
Yes, you can. In fact many people use sqoop and hive for exactly what you have told.
In my project what I had to do was to load the historical data from my RDBMS which was oracle, move it to HDFS. I had hive external tables defined for this path. This allowed me to run hive queries to do transformations. Also, we used to write mapreduce programs on top of these data to come up with various analysis.
Sqoop transfers data between HDFS and relational databases. You can use Sqoop to transfer data from a relational database management system (RDBMS) such as MySQL or Oracle into HDFS and use MapReduce on the transferred data. Sqoop can export this transformed data back into an RDBMS as well. More info http://sqoop.apache.org/docs/1.4.3/index.html
Hive is a data warehouse software that facilitates querying and managing large datasets residing in HDFS. Hive provides schema on read (as opposed to schema on write for RDBMS) onto the data and the ability to query the data using a SQL-like language called HiveQL. More info https://hive.apache.org/
Yes you can. As a matter of fact, that's exactly how it is meant to be used.
Sqoop :
We can integrate with any external data sources with HDFS i.e Sql , NoSql and Data warehouses as well using this tool at the same time we export it as well since this can be used as bi-directional ways.
sqoop to move data from a relational database into Hbase.
Hive: 1.As per my understanding we can import the data from Sql databases into hive rather than NoSql Databases.
We can't export the data from HDFS into Sql Databases.
We can use both together using the below two options
sqoop create-hive-table --connect jdbc:mysql://<hostname>/<dbname> --table <table name> --fields-terminated-by ','
The above command will generate the hive table and this table name will be same name in the external table and also the schema
Load the data
hive> LOAD DATA INPATH <filename> INTO TABLE <filename>
Hive can be shortened to one step if you know that you want to import stright from a database directly into hive
sqoop import --connect jdbc:mysql://<hostname>/<dbname> --table <table name> -m 1 --hive-import