hbase as database in web application - hadoop

A big question about using hadoop or related technologies in a real web application.
I just want to find out how a web app can use hbase as its database. I mean is it the thing the big data apps do or they use normal databases and just use these sort of technologies for analysis?
Is it ok to have a online store with Hbase database or something like this?

Yes it is perfectly fine to have hbase as your backend.
What I am doing to get this done,( I have a online community and forum running on my website )
1.Writing C# code to access the Hbase using thrift, very easy and simple to get this done. (Thrift is a cross language binding platform, to HBase Java is only the first class citizen!)
2.Managing the HBase cluster(have it on Amazon) using the Amazon EMI
3.Using ganglia to monitor Hbase
Some Extra tips:
So you can organize the web application like this
You can set up your webservers on Amazon Web Services or IBMWebSphere
You can set up your own HBase cluster using cloudera or use AmazonEC2 again here.
Communication between web server and Hbase master node happens via thrift client.
You can generate thrift code in your own desired programming language
Here are some links that helped me
A)Thrift Client,
B)Filtering options
Along with this I refer to HBase administrative cookbook by Yifeng Jiang and HBase reference guide by Lars George in case I dont get answers on web.
Filtering options provided by HBase are fast and accurate. Let's say if you use HBase for storing your product details, you can have sub-stores and have a column in your Product table, which tells to which store a product may belong and use Filters to get products for a specific store.

I think you should read the article below:
"Apache HBase Do’s and Don’ts"
http://blog.cloudera.com/blog/2011/04/hbase-dos-and-donts/

Related

MapR 5.2.2 clients

I have a task which requires me to create a Go program to read from an HBASE table.
HBASE is installed in a MapR cluster.
Every other application (Java) uses a MapR client to connect to the MapR cluster so as to retrieve the data.
However, I am unable to find a way to connect to HBASE with a Go application.
I have found HBASE package, but it does not support integration with MapR.
It would be great if anyone could guide me in this situation.
I also have seen that for MapR 6 and above has Go support through OJAI, but sadly, upgrading MapR is not an option.
Can someone advice me how to proceed in this situation?
If you are actually running HBase in MapR, then the Go package for HBase should work (assuming version match and such).
If you are actually using the MapR DB Binary tables (which are roughly HBase compatible) the likely best approach would be to use the Thrift API or REST.
The OJAI lightweight client should work well in Go since it uses gRPC to talk to the underlying table (and thus gains lots of portability). The problem in your case won't be so much that you need to upgrade the platform so much as the lightweight client only works with MapR DB JSON (the document oriented version of MapR DB).
Ping me directly if you would like more information.

Does customers usually allow third party applications in their Onpremise hadoop cluster

We are building a solution to validate the data migrated from traditional RDBMS to on-premise hadoop and also perform validation of the data after the migration.The scripts performing the validation will compare the data between hadoop and data present on-premise.I would like to know if customers would allow our application script inside the cluster or we have to execute the scirpts from a remote server where our application will be hosted?
data migrated from traditional RDBMS to on-premise hadoop
Using what tools? Sqoop? Spark? Kakfa? NiFi?
Each one of those tools are installed on the side of your Hadoop cluster, so in that case, yes they are "installable". Whether they are "allowed" is up to your Hadoop Administrator / Architects.
You won't get "24/7 vendor support" if you use a tool you aren't paying for, though.

Accessing Hadoop data using REST service

I am trying to update HDP architecture so data residing in Hive tables can be accessed by REST APIs. What are the best approaches how to expose data from HDP to other services?
This is my initial idea:
I am storing data in Hive tables and I want to expose some of the information through REST API therefore I thought that using HCatalog/WebHCat would be the best solution. However, I found out that it allows only to query metadata.
What are the options that I have here?
Thank you
You can very well use WebHDFS which is basically a REST Service over Hadoop.
Please see documentation below:
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html
The REST API gateway for the Apache Hadoop Ecosystem is called KNOX
I would check it before explore any other options. In other words, Do you have any reason to avoid using KNOX?
What version of HDP are you running?
The Knox component has been available for quite a while and manageable via Ambari.
Can you get an instance of HiveServer2 running in HTTP mode?
This would give you SQL access through J/ODBC drivers without requiring Hadoop config and binaries (other than those required for the drivers) on the client machines.

siebel applications hadoop connectivity

I would like to understand does hadoop support for siebel applications , can any body share experience in doing that. I looked for online documentation and not able to find any proper link to explain this so posting question here
I have and siebel application run with Oracle database, I would like to replace with HAdoop ..is it possible ?
No is the answer.
Basically Hadoop isn't a database at all.
Hadoop is basically a distributed file system (HDFS) - it lets you store large amount of file data on a cloud of machines, handling data redundancy etc.
On top of that distributed file system it provides an API for processing all stored data using something called as Map-Reduce.

What is Facebook's HiPal data analytics tool, and how does it work?

What are all the knowledge management features of Facebook's HiPal data analytics tool, and how does it work? Is it purely architectured for hadoop environment or can be used with other DBs?
Though this is just speculation as HiPal has not been released to the public.
HiPal is a UI for a SQL-like program called HIVE. Hive is a program that allows you to run SQL-like queries on files in a Hadoop File System. Hadoop is a distributed map/reduce architecture used for large(many terabytes) data sets.
But as it's not open source we can't get our hands on it. But this wouldn't be used for other database systems.
http://www.facebook.com/note.php?note_id=89508453919
Facebook uses Hive (http://borthakur.com/ftp/hadoopworld.pdf) to process data. Hive is a SQL-like framework interface that runs on top of Hadoop, created by the Facebook team themselves, and latter on donated to the apache community.
They say they analyse 20 PB of data with Hive/Hadoop.
Here is a quick start guide:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Resources