Hadoop Ecosystem and its workflow [closed] - hadoop

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Which one should start first..in sequential order .. when we have Oozie,HDFS,Hive Zookeeper and other tools in the Hadoop ecosystem?
It an administrative question posed by my superiors .

There are multiple possibilities based on the deployment scenarios like for example if you have highly available Hadoop (NameNode high-availability, ResourceManager/JobTracker high-availability) or have HBase in the cluster then the order would be something like this:
Zookeeper
HDFS
HBase (if used)
Yarn/MapReduce
Other Ecosystem tools (hive, pig, sqoop, oozie)
It doens't matter the start order of the eco-system tools.

Related

HortonWorks or Cloudera certification [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Hortonwork and cloudera companies are merged now so, which certification would be better to take? Please share your thought on this one.
In my opinion it depends on your present case and career goals. For example, if you are (looking for) working as an HDP Administrator, no Cloudera certification would be helpful.
Hortonworks certification exams are being re-branded as 'Cloudera HDP certifications' since there is a definite plan to maintain HDP production for a few years.

Writing Apache Zookeeper like service in go [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to write a very simple (but fully functional) Apache Zookeeper like service in go? Where do I start ?
Hm, i think you should understand what zookeeper do first, eg. configuration information, naming, providing distributed synchronization , also need to know paxos, and so on.
please reference to etcd on github.

Is there a good ETL framework for data warehouse in Hadoop [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have investigated Oozie and Azkaban, but I think they are only used to schedule some jobs.
DW often need large of jobs to schedule, and is there a good framework for it?
You can Use Pentaho data integration tool . Check this out. http://www.pentaho.com/product/data-integration
You may also check Talend for Data integration in hadoop based warehouse. It offers graphical tools to create data intgeration flow between the hadoop components and it is opensource too.
please check http://www.talend.com/resource/hadoop-tools.html

How a Search Script like InoutScripts' Inout Spider attains scalability? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to know more about how these Search Engines scripts like InoutScripts' Inout Spider attains scalability.
Is it because of the technology they are using.
Do u think is it because of the technology of combining hadoop and hypertable.
For storing and large scale processing of data sets on clusters of commodity hardware, Hadoop is used which is an open source software framework. Hadoop is an Apache top level project being built and used by a global community of contributors and users. At the application layer itself Hadoop detects and handle failures rather than rely on hardware to deliver high availability.

Hadoop Oozie Like Projects [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Oozie is a workflow/coordination engine to orchestrate Hadoop jobs where Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
What are other Oozile like workflow engines to manage job chains on a cluster?
Is there a generic Oozile like workflow engine that is capable of orchestrating jobs any cluster through a plug-in or something?
although I personally prefer oozie you could also checkout azkaban.
Hortonworks' NIFI is an awesome workflow engine which does what you are asking.

Resources