Spring Yarn #OnContainerStart - how to invoke Mapper? - hadoop

i'm using the Spring Yarn package with Spring Boot and i'm trying to figure out how i can start a Mapper from the #OnContainerStart event. how do i pass arguments to the mapper? how do i configure which mapper/reducer to use? i'm trying to follow this guide
thanks

I believe you're trying to create a simple Apache Hadoop MapReduce application and Spring YARN is not meant for that.
To develope MapReduce jobs using Spring you can check our reference documentation which can be found from Spring for Apache Hadoop
Spring YARN is a framework to develope applications which can then run atop of Apache Hadoop YARN, not atop of MapReduce application. It is easy to misunderstand this because Hadoop YARN is too broadly used as a synonym for Apache Hadoop MapReduce V2. New MapReduce V2 is actually a simple YARN application running on YARN which is a Hadoop's new resource scheduling framework.
Having said that, if you want to run something totally different than MapReduce jobs on YARN, then Spring YARN would be a correct candidate for that.

Related

How is HBase packaged in HDP different from Apache HBase

How is HBase packaged in Hortonworks Data Platform (HDP) different from Apache HBase. We use HDP in production but for dev purposes, test with Apache HBase.
What should we do in our code to allow for any differences?
HDP packages all open source components. There should be no difference

Spring XD and Oracle RDBMS

I am newbie to Spring XD. I have a requirement to parse a file in a specific directory and push the output in oracle table.
Can I achieve that in Spring XD?
If not directly, can I create Spring boot/integration/batch application and deploy the same on Sprig XD?
Thanks,
Pratik
If I understood correctly, you want to deploy an ETL process on SpringXD.
Yes, this is possible, in fact, I once worked on that.
Spring Integration is ok with springXD and the things you should do is to adjust some parts of the context.
You can use task module in springXD to do ETL
Thanks for the inputs folks!
Well, I have learned Spring Batch framework and I am building the tasklets to manage my workflow to support my usecase.
Further, I am The resultant Jar of Spring Batch app to SpringXD as a Custom Module and scheduled it for a daily run.
I learnt from the research on this usecase that SpringXD is superset of and includes Spring Batch, Integration and Boot frameworks respectively; so
I kept the involvement of SpringXD as limited to Scheduling a batch job by adding custom module.
Thanks a lot for your support.
Cheers & Happy Architecting,
Pratik

Integration of Hortonworks HDP with Mesos

I have to integrate HDP with Mesos. I don't want to do it with cloudbreak, because it's not a mature project. Is there any other ways you can integrate HDP with Mesos ?
See Apache Myriad (incubating) at http://myriad.incubator.apache.org/

Google Cloud Dataproc - Spark and Hadoop Version

In the Google Cloud Dataproc beta what are the versions of Spark and Hadoop?
What version of Scala is Spark compiled for?
According to the official announcement:
Today, we are launching with clusters that have Spark 1.5 and Hadoop
2.7.1.
Current Spark version info is listed in the docs. Spark 2.1.0 uses Scala 2.11.
The version of Spark depends on the version of DataProc in use, currently it uses Data Proc v1.2 and it has
Spark: 2.2.1
Scala: 2.11.8
There are predefined initialization scripts for DataProc for many frameworks including Kafka which has the following versions:
Kafka: 2.11.0.10.1
Kafka Client: 0.10.1

Integrating Nutch on Hortownworks OR YARN

I am trying to crawl the web. Preferably with Nutch.
Did not find the references if Hortownworks out of the box supports Nutch.
Has any one integrated Nutch on YARN specially with Hortonworks HDP ?
Or someone has tried integrating Nutch on the Hadoop 2.x (YARN) ?
Thanks in advance.
HDP 2.3 doesn't support Nutch out of the box (There is a chart on the HDP website showing supported services: HDP2.3 What's New). However it does support the services that Nutch depends on. A custom Ambari Service could be defined and added to the HDP 2.3 stack definition to enable support for Nutch.

Resources