Apache Hive on Apache Spark - hadoop

Does anyone has worked on this configuration: Apache Hive on Apache Spark?
What is the latest version compatibility for this configuration?
I want to implement this in my production systems. Kindly help with the compatibility matrix for Apache Hadoop, Apache Hive, Apache Spark and Apache Zeppelin.

You have to use hive2 (0.11+) and SPARK 2.2.0 and in hive-site.xml. And you have to set Spark as executor engine so you can easily run your queries on top of Spark.
In hive2 there are some options like Tez, llap etc. For more information kindly check the document Hive on Spark: Getting Started.

follow the tutorial
apache hive installation
and then just copy the hive-site.xml to $APACHE_HOME/conf

Hive is moving to rely only on the Tez execution engine. Please build all new workloads on MapReduce or Tez.

Related

Apache Airflow/Azkaban workflow Schedulers compatibility with Hadoop MRv1

I'm working on a project that relies on Hadoop but MRv1 architecture (Hadoop-1.1.2). I tried oozie scheduler for creating workflows(mapred) but gave up eventually, cause it is a nightmare to configure and I couldn't get it to work. I was wondering if I should try these other workflow Schedulers such as Azkaban or Apache Airflow. Would they be compatible with my requirements ?

Prometheus Integration with Hadoop (Ozone Cluster)

I am trying to follow the Apache documentation in order to integrate Prometheus with Apache Hadoop. One of the preliminary steps is to setup Apache Ozone cluster. However, I am finding issues in running the ozone cluster concurrently with Hadoop. It throws a class not found exception for "org.apache.hadoop.ozone.HddsDatanodeService" whenever I try to start the ozone manager or storage container manager.
I also found that ozone 1.0 release is pretty recent and it is mentioned that it is tested with Hadoop 3.1. I have a running Hadoop cluster of version of 3.3.0. Now, I doubt if the version is a problem.
The tar ball for Ozone also has the Hadoop config files, but I wanted to configure ozone with my existing Hadoop cluster. I want to configure the ozone with my existing hadoop cluster.
Please let me know what should be the right approach here. If this can not be done, then please also let me know what is good way to monitor and extract metrics for Apache Hadoop in production.

Log4j issue with cdh4 hadoop

I have a cluster of hadoop cdh4.5 managed using cloudera manager. I have a custom log4j.properties file with specific configurations. I have added this log4j.properties through cloudera manager to the cluster (for each process too ie., namenode,datanode,jobtracker,tasktracker). But it was not taken by the hadoop. Has anyone faced this problem earlier?
Before using cdh4, I used hadoop-0.20.2 and just having this log4j.properties in hadoop configuration was enough to pick it. So is this issue anyway related to cloudera manager?

Cassandra and Hadoop

I am new to Cassandra and Hadoop. I am trying to read cassandra data on hourly basis and dump into HDFS. Cassandra and Hadoop are on different clusters. Any pointers on Clients/API I could use to do this is much appreciated.
I recommend Java because Hadoop and Cassandra are both Java based. Astyanax is a good Java Cassandra API.
I've used org.apache.hadoop to write to HDFS using Java but there might be something better out there.

Cassandra wih Hive

Am new in cassandra and Hive. Now i want integrate cassandra with the Hadoop-Hive but how can i integrate the cassandra with Hive.
You're in luck: DataStax just released Brisk, a Cassandra distribution integrating Hadoop and Hive.
http://www.datastax.com/products/brisk
You can look in to WSO2 BAM2 to get an idea about Hive Cassandra integration.
https://svn.wso2.org/repos/wso2/carbon/platform/branches/4.0.0/components/bam2/
You need a Cassandra java storage library.
And here is one https://github.com/dvasilen/Hive-Cassandra
or one mine https://github.com/2013Commons/hive-cassandra

Resources