ETL associated with HADOOP database Hbase? - hadoop

HI can anybody tell me which are the ETL Tools which can be used with Hbase which is the database of hadoop?
I mean to say like how the data in oracle database is used to pull data and work with in tools like Informatica and SSIS,is there any ETL tool that can be used for Hbase?
Kindly help me.

Take a look at Pentaho Data Integration for Hadoop.

Check out Cascading.HBase http://www.cascading.org/modules.html

Related

What is the best tool to query the text files(in compressed format) in HDFS?

I have imported the log files in compressed format from Database into HDFS. I am querying data using Hive CLI tool.
Could you please share other tools better than Hive for querying the data files?
Note: I am aware of Spark framework which I can use to query. But, I was wondering if there's any other option which is faster or/and offer SQL like syntax.
Thanks in advance.
Impala, Drill, PrestoDB and Amazon Redshift all accept SQL syntax and offer better performance.
For more info, see:
https://impala.apache.org/
https://drill.apache.org/
https://prestodb.io/
(AWS only) https://aws.amazon.com/redshift/

what is hive best suited for

I need daily snapshots from all databases of the enterprise and update hive with it.
In case that is the best approach, how do I approach this? I have used sqoop to manually import data to hive but what do I connect PHP to? Hive or Sqoop?
I understand hive is used for OLAP and not OLTP, but taking snapshots once in a day is what hive would be supporting nicely or I should consider other options like Hbase?
I am open to more suggestions considering that the data is structured for the most part.

Using Hive metastore for client application performance

I am new to hadoop. Please help me in below concept.
It is always good practice to use hive metastore(into other db like mysql etc) for production purpose.
What is the exact role and need of storing meatadata on RDBMS ?
If we create a client application to get hive data on UI, will this metadata store help to improve the performance to get data?
If yes What will be the architecture of this kind of client application? Will it hit first RDBMS metastore ? How it will be different form querying hive directly in some other way like using thrift?
Hadoop experts ,please help
Thanks
You can use prestodb that allows you to run/translate SQL queries against HIVE. It has a mysql connector that you can use to exploit your stored hive schema.
Thus from your client application, you just need a JDBC driver as any RDBMS out there.

Build an application for reporting and analysis on Hadoop framework

I have an application with SAS where I pull the data from Oracle and produce report to excel using Base SAS and SAS macros. Now the problem is day by day my database is getting huge and fetching data from Oracle is taking more time, as a result my jobs are running slow.
So I want my application to be built on Hadoop for Reporting and analysis purpose. Can someone please suggest me any approach and what are the tools I need to use for this.
The short answer is: it depends.
For unloading data from Oracle I would recommend you to use Sqoop (http://sqoop.apache.org/), it is designed for this specific use case and can even do incremental loads and can create Hive table for unloaded data
When the data is unloaded, you can use Impala to build the report you need. Impala can natively work with Hive tables, so the sings are really simple. Of course, you would have to rewrite your SAS code to a set of SQL statements that would run on top of Impala.
Next, if you need visualization tool to run on top of it, you can either try something like Tableau or any other tool that is capable of utilizing ODBC/JDBC to connect to Impala
Finally, I think Hadoop + Sqoop + Impala would cover your needs. But I'd recommend you also to take a look at the MPP databases, because using SAS means you have pretty structured data and MPP database would be a better fit for this case

How to retrieve and analyse data from a MS SQL Server using Hadoop, Hive and Sqoop?

I want to do analysis on data which is in database(MS SQL Server). So how can I Bring that data on HDFS with the help of Sqoop/Hive? Is it possible with Hive/Sqoop?
Please suggest me how can we do it.
Thanks.
Microsoft recently released a SQL connector for sqoop. There are few ETL tools (open source and not) that also connect from SQL to Hadoop (like Talend etc.)

Resources