commercial support for hbase/hdfs - hadoop

I know cloudera is on the forefront of providing commercial support for hadoop/hbase/hdfs.
Are there any other vendors who provide this?
-Chinmay

Cloudera provides commercial support for both hadoop and hbase. There are other vendors such as Karmasphere which provide tools. No one else provides support for hadoop on the scale of Cloudera.
UPDATE: Hortonworks, a hadoop spinoff from Yahoo is also entering this space.

Yes, Sematext provides support about HBase, Hadoop/MapReduce/HDFS.

Currently, only Cloudera can provide reliable 24x7 technical support for Hadoop and HBase.
Tokenizer provides commercial development subscription services (Nutch, Hadoop, HBase, Lucene, SOLR), including code reviews and critical assessments of your design and architecture, custom builds, bug fixes, technical monitoring and performance tuning, and etc.
There are some new really big players too, such as Platform Computing.

Apart from above mentioned vendors you also have :
Talend
WANdisco
EMC Greenplum
DataStax
Intel

Related

Hortonworks vs Cloudera Architecture Difference

What is the main difference between HortonWorks and Cloudera ? I see both of them follow the same Architecture.
After a good research, I found that Cloudera and HortonWorks has no difference. They both follow the same Architecture. They take the raw hadoop and do some regressive testing and give an honed product for enterprises. No difference. Just two brands for the same product.
Additionally this link explains more.

Installing Hadoop tar file vs Cloudera VM on Ubuntu

I am complete beginner to Hadoop and I saw various posts on internet whics tells about installing Cloudera VM using VMWare. Recently I saw a youtube video which shows how to install hadoop on ubuntu by downloading hadoop tar file from Apache but they didn't install Cloudera VM. My Question is:
What is the difference between the two approach? Is there any benefit using one over the another?
I want to learn Hadoop by myself and looking for the best way/more adopted way to learn it.
Cloudera is "yet-another distribution of hadoop" You can think of basic Hadoop as stock android in Nexus mobile phones and Cloudera Hadoop as androids in non-nexus phone. Its basically a custom built version.
Cloudera is more of a plug-and play version meaning you can download the VM and start playing with Hadoop.
On, the other hand,Hadoop in Ubuntu is a get your hands dirty mode where you work on building your own hadoop.
Personal Opinion - I suggest setting up your own Hadoop that helps better understanding of internals of Hadoop and the Hadoop learning activities that follow.
Hope it helps. Happy Hadooping!
I spent a lot of time playing with the Cloudera software and their Quickstart VM is good, until you start trying to e.g. add nodes. It was not designed to do that but when you have invested time using it it would be nice to use it as a basis for a real system.
So the next step would be to use CDH (Cloudera's 'proper' Hadoop) or Hortonworks version HDP or maybe even MapR (I've not used it).
CDH and HDP technologies have nice GUI features over basic Hadoop and are seemingly easier to setup. HOWEVER I spent a lot of time trying to get both CDH and HDP to work unsuccesfully.
They give red lights and cryptic messages when things go wrong and add a layer of obfuscation when trying to fix things. For example in plain hadoop you can easily change the configuration files but in CDH you can't access them directly you have to discover where Cloudera hides their various options.
If would recommend using plain hadoop unless you have a big organisation, lots of people and machines.
UPDATE: I have finally got HDP to work and it is really nice. Good Ambari GUi and you can use Zeppelin Notebooks to do fancy graphics.

Ambari + HDP licensing

I'm experiencing a deployment and management of a Hadoop cluster and I found out that, for the purpose, Ambari is a very useful and convenient tool.
Now I'm trying to figure out if all the licenses of the HDP stack allow to deploy a Hadoop installation also in a production environment.
Does anyone have experience in the use of Ambari in production? Is it completely free to use also for production environment?
As an employee of Hortonworks myself, I can assure you that there is no licensing that you have to comply with to deploy Ambari, including Ambari as bundled into HDP; http://hortonworks.com/hdp/. Now... we would LOVE for you to buy a support contract! ;-)

Recommendation on multi-node hadoop cluster installation

What would be the best way of installing Hadoop 1.0 (whether it is Apache hadoop or CDH)? CDH seems to have some kind of installation manager but somehow I can't find good information on the Web after a couple of hours of searching. I only found documentation about pseudo mode installation.
Just visit Cloudera site. They have both Cloudera Manager free which is very good point to start and standalone CDH package. They also have complete set of documentation like installation guide for every version of such products.
Of course I'd recommend Cloudera blog and official Apache Hadoop site dicumentation for better understanding.
I am using Apache Hadoop not much issues except that I have to resolve any compatibility issue while using hadoop eco system components such as hive, pig, sqoop etc.
Cloudera Manager on the other side take care of most of these compatibility issues and kind of provides u a complete package with support.
Hope this helps!

Hadoop Hive web interface options

I've been experimenting with Hive for some data mining activities and would like to make it easily available to less command line orientated colleagues.
Hive does now ship with a web interface (http://wiki.apache.org/hadoop/Hive/HiveWebInterface) but it's very basic at this stage.
My question is does a visually polished and fully featured interface (either desktop or preferably web based) to Hive exist yet? Are their any open source efforts outside the Hive project working on this?
Now with new version of Cloudera's Hadoop Distribution comes HUE (Hadoop User Experience) with plugin called Beeswax, which most likely all you would need.
It's pretty tricky to configure, but one you get over it, it provides something like phpmyadmin interface, but is much nicer and easier. It supports writing queries, importing data, storing results, etc.
Web based opensource GUI for Hive
HWI - Shipped in Hive. with basic features.
Hue - Nice query editor with autocompletion. Support parameterized query. Latest version includes basic visualization of query result. Includes many other useful tools like managing HDFS, JobFlows, etc. Thus, heavy and little bit tricky to install and configure.
Zeppelin - Only includes Hive tool compare to Hue. Support query template. Pluggable visualization architecture and it's online archive, so easily create custom visualization and share it. Lightweight and easier to install than Hue while it does not include any feature for non-hive related things.
Other alternatives
Excel - Microsoft Excel is capable of making hive query and fetch data from hive. http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_dataintegration/content/ch_using-hive-2.html has guide for doing it.
Commercial BI tool - Commercial BI tool like Tableau, Datameer, Karmasphere support connection to Hadoop or Hive. They have nice dashboards, charts. All they offer trial/community/personal edition.
HUE is usefull and good but you should also try "Karmasphere Analyst Free/community Edition". It is very easy to use and well documented. Free version is very capable. It is not web based but it has different OS support (windows,linux...etc). YOu can check the GUI from documents to see how it looks.

Resources