In the Google Cloud Dataproc beta what are the versions of Spark and Hadoop?
What version of Scala is Spark compiled for?
According to the official announcement:
Today, we are launching with clusters that have Spark 1.5 and Hadoop
2.7.1.
Current Spark version info is listed in the docs. Spark 2.1.0 uses Scala 2.11.
The version of Spark depends on the version of DataProc in use, currently it uses Data Proc v1.2 and it has
Spark: 2.2.1
Scala: 2.11.8
There are predefined initialization scripts for DataProc for many frameworks including Kafka which has the following versions:
Kafka: 2.11.0.10.1
Kafka Client: 0.10.1
Related
How is HBase packaged in Hortonworks Data Platform (HDP) different from Apache HBase. We use HDP in production but for dev purposes, test with Apache HBase.
What should we do in our code to allow for any differences?
HDP packages all open source components. There should be no difference
Could you please let me know that is Apache Hadoop 2.8 is compatible with Apache spark 2.1.1 or not?
I have already set up a test cluster where Apache Hadoop 2.8 is installed , and now we need apache spark 2.1.1 to be installed on the top of that.
If yes , then please let us know that which package will be good to install? (Please provide the URL here).
It is possible to run elasticsearch version 5.x in Apache Flink 1.2.0?
I cannot upgrade my Flink to 1.3 because I need the 1.2.0 version to run kafka.
by what it is said in this link : https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/elasticsearch.html
flink-connector-elasticsearch5_2.10 (Supported since) 1.2.0 (Elasticsearch version) 5.x
This connector should work (since that my Flink version it is 1.2.0), but when I run it it doesn't work.
Do I need to install Elasticsearch 2.x or there is some other way to make it work?
Thanks.
The documentation was incorrect, and has been updated to reflect the fact that support for Elasticsearch 5.x was added to Flink after 1.2 -- i.e., it is currently in Flink 1.3-SNAPSHOT.
I am trying to crawl the web. Preferably with Nutch.
Did not find the references if Hortownworks out of the box supports Nutch.
Has any one integrated Nutch on YARN specially with Hortonworks HDP ?
Or someone has tried integrating Nutch on the Hadoop 2.x (YARN) ?
Thanks in advance.
HDP 2.3 doesn't support Nutch out of the box (There is a chart on the HDP website showing supported services: HDP2.3 What's New). However it does support the services that Nutch depends on. A custom Ambari Service could be defined and added to the HDP 2.3 stack definition to enable support for Nutch.
I have installed cassandra 2.0.3 and hive 0.9.0.
I have followed the below link for hive support for cassandra.
https://github.com/milliondreams/hive
But it says "Cassandra Hive handler working with Cassandra 1.2.6 and hive 0.9" and my cassandra version is 2.0.3
Could any one guide me on how to access cassandra 2.0.3 from hive 0.9.0 in detail as I am new to cassandra and hive.
--
Harry
This Hive handler should also work for Cassandra 2.0, as it is using CQL3.
I have tryed it with shark, not Hive. And then found out that it dose not work for cassandra 2.0x, because spark use hadoop2 and cassandra 1.26 use hadoop. It could map the table between shark and cassandra, but can not read data when through a spark process(require cassandra all 2.0x).
the error is java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext.
I have created a project from my work, for cassandra 2.0.4, hive 0.11 and hadoop 2.0
try it
https://github.com/2013Commons/hive-cassandra