Spark Hive Warehouse Connector Dependency issues - maven

So I am trying to enhance my Spark application in Scala 2.11 to read data from HDInsight (HDP) using the Hive Warehouse Connector.
The problem is that for whatever reason I am not able to import any version of the dependency required to perform queries from Spark on Hive in HDP 3+ from Maven Central:
https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector?repo=hortonworks-releases
Any suggestions for workarounds?
Thanks.

Related

How is HBase packaged in HDP different from Apache HBase

How is HBase packaged in Hortonworks Data Platform (HDP) different from Apache HBase. We use HDP in production but for dev purposes, test with Apache HBase.
What should we do in our code to allow for any differences?
HDP packages all open source components. There should be no difference

does hadoop 2.8 support apache spark cluster 2.1?

Could you please let me know that is Apache Hadoop 2.8 is compatible with Apache spark 2.1.1 or not?
I have already set up a test cluster where Apache Hadoop 2.8 is installed , and now we need apache spark 2.1.1 to be installed on the top of that.
If yes , then please let us know that which package will be good to install? (Please provide the URL here).

Google Cloud Dataproc - Spark and Hadoop Version

In the Google Cloud Dataproc beta what are the versions of Spark and Hadoop?
What version of Scala is Spark compiled for?
According to the official announcement:
Today, we are launching with clusters that have Spark 1.5 and Hadoop
2.7.1.
Current Spark version info is listed in the docs. Spark 2.1.0 uses Scala 2.11.
The version of Spark depends on the version of DataProc in use, currently it uses Data Proc v1.2 and it has
Spark: 2.2.1
Scala: 2.11.8
There are predefined initialization scripts for DataProc for many frameworks including Kafka which has the following versions:
Kafka: 2.11.0.10.1
Kafka Client: 0.10.1

Integrating Nutch on Hortownworks OR YARN

I am trying to crawl the web. Preferably with Nutch.
Did not find the references if Hortownworks out of the box supports Nutch.
Has any one integrated Nutch on YARN specially with Hortonworks HDP ?
Or someone has tried integrating Nutch on the Hadoop 2.x (YARN) ?
Thanks in advance.
HDP 2.3 doesn't support Nutch out of the box (There is a chart on the HDP website showing supported services: HDP2.3 What's New). However it does support the services that Nutch depends on. A custom Ambari Service could be defined and added to the HDP 2.3 stack definition to enable support for Nutch.

How to access cassandra 2.0.3 from hive 0.9.0

I have installed cassandra 2.0.3 and hive 0.9.0.
I have followed the below link for hive support for cassandra.
https://github.com/milliondreams/hive
But it says "Cassandra Hive handler working with Cassandra 1.2.6 and hive 0.9" and my cassandra version is 2.0.3
Could any one guide me on how to access cassandra 2.0.3 from hive 0.9.0 in detail as I am new to cassandra and hive.
--
Harry
This Hive handler should also work for Cassandra 2.0, as it is using CQL3.
I have tryed it with shark, not Hive. And then found out that it dose not work for cassandra 2.0x, because spark use hadoop2 and cassandra 1.26 use hadoop. It could map the table between shark and cassandra, but can not read data when through a spark process(require cassandra all 2.0x).
the error is java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext.
I have created a project from my work, for cassandra 2.0.4, hive 0.11 and hadoop 2.0
try it
https://github.com/2013Commons/hive-cassandra

Resources