Not able to install Spark 2.0 in CDH 5.7.5 - hadoop

I am trying to install SPARK 2.0 in my CDH 5.7.5 cluster. While doing that i am getting below error
CDH (lower than 5.12) parcel required for SPARK2 (2.0.0.cloudera1-1.cdh5.7.0.p0.113931) is not available
P.S: Followed documentation

uncheck Validate Parcel Relations in Parcel configuration

According to Cloudera support, it is a bug in 5.7 and 5.8. This has been fixed in 5.9 and newer.
Adding to the #Ruslan answer, Who ever using CDH 5.7 and 5.8, they need to follow the below workaround
under parcel configurations, uncheck Validate Parcel Relations

Related

does hadoop 2.8 support apache spark cluster 2.1?

Could you please let me know that is Apache Hadoop 2.8 is compatible with Apache spark 2.1.1 or not?
I have already set up a test cluster where Apache Hadoop 2.8 is installed , and now we need apache spark 2.1.1 to be installed on the top of that.
If yes , then please let us know that which package will be good to install? (Please provide the URL here).

It is possible to run an Elasticsearch 5.x with Flink 1.2.0?

It is possible to run elasticsearch version 5.x in Apache Flink 1.2.0?
I cannot upgrade my Flink to 1.3 because I need the 1.2.0 version to run kafka.
by what it is said in this link : https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/elasticsearch.html
flink-connector-elasticsearch5_2.10 (Supported since) 1.2.0 (Elasticsearch version) 5.x
This connector should work (since that my Flink version it is 1.2.0), but when I run it it doesn't work.
Do I need to install Elasticsearch 2.x or there is some other way to make it work?
Thanks.
The documentation was incorrect, and has been updated to reflect the fact that support for Elasticsearch 5.x was added to Flink after 1.2 -- i.e., it is currently in Flink 1.3-SNAPSHOT.

Can not find Solr in HDP 2.3

I have installed HDP2.3 but can not find Solr while trying to add service
in admin panel under version I can see the HDP version as HDP-2.3.0.0-2557 but cnt find solr while trying to add service
Solr is no longer a service in the HDP stack. Instead it is an addon that can be downloaded from the HDP-Utils repository and installed separately. Solr is now part of HDP Search, you can read more about it in the HDP documentation.

Google Cloud Dataproc - Spark and Hadoop Version

In the Google Cloud Dataproc beta what are the versions of Spark and Hadoop?
What version of Scala is Spark compiled for?
According to the official announcement:
Today, we are launching with clusters that have Spark 1.5 and Hadoop
2.7.1.
Current Spark version info is listed in the docs. Spark 2.1.0 uses Scala 2.11.
The version of Spark depends on the version of DataProc in use, currently it uses Data Proc v1.2 and it has
Spark: 2.2.1
Scala: 2.11.8
There are predefined initialization scripts for DataProc for many frameworks including Kafka which has the following versions:
Kafka: 2.11.0.10.1
Kafka Client: 0.10.1

Integrating Nutch on Hortownworks OR YARN

I am trying to crawl the web. Preferably with Nutch.
Did not find the references if Hortownworks out of the box supports Nutch.
Has any one integrated Nutch on YARN specially with Hortonworks HDP ?
Or someone has tried integrating Nutch on the Hadoop 2.x (YARN) ?
Thanks in advance.
HDP 2.3 doesn't support Nutch out of the box (There is a chart on the HDP website showing supported services: HDP2.3 What's New). However it does support the services that Nutch depends on. A custom Ambari Service could be defined and added to the HDP 2.3 stack definition to enable support for Nutch.

Resources