MapR 5.2.2 clients - go

I have a task which requires me to create a Go program to read from an HBASE table.
HBASE is installed in a MapR cluster.
Every other application (Java) uses a MapR client to connect to the MapR cluster so as to retrieve the data.
However, I am unable to find a way to connect to HBASE with a Go application.
I have found HBASE package, but it does not support integration with MapR.
It would be great if anyone could guide me in this situation.
I also have seen that for MapR 6 and above has Go support through OJAI, but sadly, upgrading MapR is not an option.
Can someone advice me how to proceed in this situation?

If you are actually running HBase in MapR, then the Go package for HBase should work (assuming version match and such).
If you are actually using the MapR DB Binary tables (which are roughly HBase compatible) the likely best approach would be to use the Thrift API or REST.
The OJAI lightweight client should work well in Go since it uses gRPC to talk to the underlying table (and thus gains lots of portability). The problem in your case won't be so much that you need to upgrade the platform so much as the lightweight client only works with MapR DB JSON (the document oriented version of MapR DB).
Ping me directly if you would like more information.

Related

PiG + Cassandra + Hadoop

I have a Hadoop (2.7.2) setup over a Cassandra (3.7) Cluster. I have no problem with using Hadoop MapReduce. Similarly, I have no problem to create tables and keyspace in CQLSH. However, I have been trying to install PIG over hadoop, so as to access the tables in Cassandra. (Installation of PIG is as such fine) It is where I'm having trouble.
I have come across numerous websites, most are either for outdated versions of Cassandra or just plain vague.
The one thing I gleaned from this website is that we can load access the cassandra tables in pig using CqlStorage / CqlNativeStorage. However, in the latest version, it seems this support has been removed (since 2015).
Now my question is, are there any workarounds?
I would be running mapreduce jobs over cassandra tables, and use PiG for querying, mostly.
Thanks in Advance.
All pig support was Deprecated in 2.2 and removed in 3.0. https://issues.apache.org/jira/browse/CASSANDRA-10542
So I think you are a bit out of luck here. You may be able to use old classes with modern C* but Pig is very niche right now. SparkSql is definitely the current favorite child (I may be biased since I work on the Spark + Cassandra Connector) and allows for very flexible querying of C* data.

how to integrate Cassandra with Hadoop to take advantage of Hive

It is almost 3 days that I've been looking for a solution at year 2015 to integrate Cassandra on Hadoop and lots of resources on the net are outdated or vanished from the net and the Datastax Enterprise offers no free of charge solution for such integration.
What are the options for doing such? I want to use Hive query language to get data from my Cassandra and I think the first step is to integrate the Cassandra with Hadoop.
The easiest (but also paid option) is to use Datastax Enterprise packaging of C* with Hadoop + Hive. This provides an automatic connection and registration of Hive tables with C* and includes and setups up a Hadoop execution platform if you need one.
http://www.datastax.com/products/datastax-enterprise
The second easiest way is to utilize Spark instead. The Spark Cassandra Connector is open source and allows HiveQL to be used to access C* tables. This is done running on Spark as an execution platform instead of Hadoop but has similar (if not better) performance.
With this solution I would standup a stand alone Spark Cluster (since you don't have an existing hadoop infra) and then use the spark-sql-thrift server to run queries against C* tables.
https://github.com/datastax/spark-cassandra-connector
There are other options but these are the ones I am most familiar with (and conflict of interest notice, also develop :D )

Difference between MapR-DB and Hbase

I am bit new in MapR but i am aware about hbase. I was going through one of the video where I found that Mapr-DB is a NoSQL DB in MapR and it similar to Hbase. In addition to this Hbase can also be run on MapR. I am confused between MapR-Db and Hbase. What is the exact difference between them ?
When to use Mapr-DB and when to use Hbase?
Basically I have one java code which do bulk load in Hbase on MapR , Now here if I use same code that i have used for Apache hadoop , will that code work here?
Please help me to avoid this confusion.
They are both NOSQL, wide column stores.
HBase is open source and can be installed as a part of a Hadoop installation.
MapR-DB is a proprietary (not open source) NOSQL database that MapR offers. A core difference that MapR will detail with MapR-DB (along with their file system (they do not use HDFS)) is that MapR-DB offers significant performance and scalability over HBase (unlimited tables, columns, re-architecture to name a few).
MapR maintains that you can use MapR-DB or HBase interchangeably. I suggest testing on both extensively before committing to one vs the other. You also need to realize that MapR-DB is MapR's proprietary NOSQL HBase equivalent and if you require support for MapR-DB you'll have to get that from MapR (HBase support can come from any of the other Hadoop distributions as well as the open source community).
Some links you should look at:
http://www.theregister.co.uk/2013/05/01/mapr_hadoop_m7_edition_solr/
https://www.mapr.com/blog/get-real-hadoop-enterprise-grade-nosql#.VVfHuvlVhBc
They are similar but not same. MapR claims that MapR DB is faster and more efficient as they have migrated the critical functionality in native C/C++ code and interface is kept the same. But end of the day MapR DB is propriatory and you depend on the support of MapR for any thing which is done differently than HBase. I didn't liked MapR-DB because it's not compatible with Apache Phoenix(HBase coprocessors are not present in MapR DB) - the SQL way of accessing HBase kind of NoSQL databases.
Limitations that i have taken from MapR documentation:
Custom HBase filters are not supported.
User permissions for column families are not supported. User
permissions for tables and columns are supported.
HBase authentication is not supported.
HBase replication is handled with Mirror Volumes.
Bulk loads using the HFiles workaround are not supported and not
necessary. HBase coprocessors are not supported.
Filters use a different regular expression library
Co processors are not supported
So i second previous answer - try out your solution in both(MapR DB vs HBase) before going too far. I didn't liked to very idea of MapR DB from MapR as it's propitiatory and the code is not open source. If any Hadoop distributor is enhancing hadoop - they should also make it available to open source community. Why one should totally rely on commercial support when using opensource.

Confusion in Apache Nutch, HBase, Hadoop, Solr, Gora

I am new to all these terms and given some time to understand it. But i have some confusions in it. Please correct me if i am wrong.
Nutch: It's for web crawling, using it we can crawl web pages. We can store these web pages somewhere in db.
Solr: Solr can be used for indexing web pages crawled by Apache Nutch. It helps in searching the indexes web pages.
HBase: It's used as an interface to interact with Hadoop. It helps in getting data at real time from HDFS. It provides simple SQL type interface for interacting.
Hadoop: It provides two functionalities: One is HDFS (Hadoop data file system) and other is Map-Reduce functionality taken from Google algorithms. Its basically used for offline data backup etc.
Gora and ZooKeeper: I am not sure of.
Confusions:
1). Is HBase a key-value pair DB or just an interface to Hadoop ? or i should ask, can HBase exist without Hadoop ?
If yes, can you explain a bit more about its usage.
2). Is there any use of crawling data using Apache Nutch without indexing into Solr ?
3). For running apache nutch, do we need HBase and Hadoop ? If no, how we can make it work without it?
4). Is Hadoop part of HBase ?
Here is a good short discussion of HBase vs. Hadoop: Difference between HBase and Hadoop/HDFS
Because HBase is built on top of Hadoop you can't really have HBase without Hadoop.
Yes you can run Nutch without Solr; there do not seem to be lots of use cases, however, much less living examples in the wild.
Yes, you can run Nutch without Hadoop, but again there don't seem to be a lot of real-world examples of people doing this.
Yes Hadoop is part of HBase, in that there is no HBase without Hadoop, but of course Hadoop is used for other things as well.
Zookeeper is used for configuration, naming, synchronization, etc. in Hadoop stack workflows. Gora is a memory management/persistence framework and is built on top of Hadoop.

hbase as database in web application

A big question about using hadoop or related technologies in a real web application.
I just want to find out how a web app can use hbase as its database. I mean is it the thing the big data apps do or they use normal databases and just use these sort of technologies for analysis?
Is it ok to have a online store with Hbase database or something like this?
Yes it is perfectly fine to have hbase as your backend.
What I am doing to get this done,( I have a online community and forum running on my website )
1.Writing C# code to access the Hbase using thrift, very easy and simple to get this done. (Thrift is a cross language binding platform, to HBase Java is only the first class citizen!)
2.Managing the HBase cluster(have it on Amazon) using the Amazon EMI
3.Using ganglia to monitor Hbase
Some Extra tips:
So you can organize the web application like this
You can set up your webservers on Amazon Web Services or IBMWebSphere
You can set up your own HBase cluster using cloudera or use AmazonEC2 again here.
Communication between web server and Hbase master node happens via thrift client.
You can generate thrift code in your own desired programming language
Here are some links that helped me
A)Thrift Client,
B)Filtering options
Along with this I refer to HBase administrative cookbook by Yifeng Jiang and HBase reference guide by Lars George in case I dont get answers on web.
Filtering options provided by HBase are fast and accurate. Let's say if you use HBase for storing your product details, you can have sub-stores and have a column in your Product table, which tells to which store a product may belong and use Filters to get products for a specific store.
I think you should read the article below:
"Apache HBase Do’s and Don’ts"
http://blog.cloudera.com/blog/2011/04/hbase-dos-and-donts/

Resources