does mesos master store information about framework or agents? - mesos

I need to know that does mesos master manage any state information itself such as number of slaves, frameworks or any information. Or does it leverage zookeeper for all information.

Mesos stores cluster data in memory and in a so-called replicated log. If you are curious, what exactly is persisted across Mesos master failovers, check the Registry protobuf. Everything else, e.g. allocation information, agents state, is restored from the cluster via re-registering agents and frameworks.
Zookeeper is used for leader election only, Mesos does not store there any data. However some Mesos frameworks, e.g. Marathon, may use Zookeeper as persistent storage. Such Zookeeper cluster is often configured separately to avoid any interference with Mesos.

Related

Role of Zookeeper in Hadoop

I understand based on the slides that in the context of Hadoop that Zookeeper is used for storing information of Master, and status of different tasks, which worker is working on which partition AND also the available workers are also stored in Zookeeper.
Why is Zookeeper is used for this metadata storage here? Any data store can be used right ?
For instance Celery can configure any result backend Redis/Mongo etc. So in practice Hadoop can use any storage backend right? But why Zookeeper?
This doc suggests that Redis, SQLite, MySQL, PostgreSQL can be used for celery task result storage.
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/index.html
Zookeeper ZAB protocol is utilized for leader election, as well as distributed locks.
It is not simply a datastore, and no, not any can be used.
Celery isn't used within the Hadoop ecosystem, so I'm not sure how that's relevant to the question.

Namenode with high availability vs zookeeper based leader selection

I am reading 2 different things in Apache Hadoop documentation and cloudera's documentation.
Based on cloudera, we should set up namenode in high availability mode, i.e.: by defining primary and secondary namenode, but based on Hadoop documentation, this should automatically taken care by zookeeper and it should decide namenode among the available datanodes.
Can anyone explain the difference and which one to use?
by defining primary and secondary namenode
There is such a thing as a "secondary namenode", but it's actually a very different thing as it's not a standby and able to become active.
There's no "vs". Namenode HA needs Zookeeper
If you read more of the Cloudera documentation it doesn't fail to mention Zookeeper.
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
Cloudera doesn't package much extras, if any, on top of the core Hadoop functions.
Regarding your question...
this should automatically taken care by zookeeper
The failover is automatic if HDFS Zookeeper properties are (manually) configured, Zookeeper is running, and the Active Namenode goes down.
among the available datanodes
The operation has nothing to do with datanodes

Does hadoop itself contains fault-tolerance failover functionality?

I just installed new version of hadoop2, I wish to know if I config a hadoop cluster and it's brought up, how can I know if data transmission is failed, and there's a need for failover?
Do I have to install other components like zookeeper to track/enable any HA events?
Thanks!
High Availability is not enabled by default. I would highly encourage you to read the Hadoop documentation from Apache. (http://hadoop.apache.org/) It will give an overview of the architecture and services that run on a Hadoop cluster.
Zookeeper is required for many Hadoop services to coordinate their actions across the entire Hadoop cluster, regardless of the cluster being HA or not. More information can be found in the Apache Zookeeper documentation (http://zookeeper.apache.org/).

what are the differences zookeeper, journal node tasks and quorum journal manager in hadoop?

On studying the material in multiple no of websites and videos, I am confused with the functionalities and differences in the purposes of the 3 hadoop components ZooKeeper, Journal Node and the Quorum Journal Manager.
Could anyone please explain me the reasons for inventing each of the above and differences in the purposes and functionalities of the above three components?
Thanks in advance.
Think of it like this, zookeeper is a group of people, each assigned to watch over a factory and coordinate them, journal node is a place where all factory managers can check others status and coordinate. QJM is a combination of both to be used in HA for better coordination in case of fail over.
zookeeper coordinates hbase regionservers and other hadoop modules which require zookeeper.
journal node coordinates hadoop datanodes with the namenode.
QJM coordinates regionservers using the technique used by journal node
on core hadoop setup only journal node is necessary in case of distributed setup
Firstly, quorum means there is a need of majority for decisions. So, when you see the word "quorum" you should think of a clustered, saying that; multi-host configuration. You can hear this term for both Zookeeper and Journal Nodes.
Short description of their functionalities will help you distinguish their purpose.
Zookeeper: Zookeeper is the central synchronisation application for informations which applications need to check frequently. There may be many informations that application need like naming structure, information, configuration information (or simply configurations) etc. Most common case is configuration of application. When you change a config which relates to lets say 80 servers, to synchronise this change to all nodes, you need to develop a synchronisation service. Application itself may have this feature. But imagine you add another 12 applications to your environment. You need to take care of each application's synchronisation service one by one. This is where zookeeper comes in. Zookeeper can handle management of all these information by itself. If you set it up as a cluster (need an odd number of hosts. why?) you will have high availability for Zookeeper (failover cases) and have a Zoopeeker Quorum.
Journal Node: In an high availability Hadoop cluster you have more than one Namenodes running in active/passive mode. Active namenode informs journal node for changes. Stand by name node asks to journal node about what changed. Like on the case of Zookeeper if you set up as cluster configuration (need odd number of hosts also here. why?), you have high availability also for Journal Node features and have a Quorum Journal Manager.
Actually I didn't hear them set as single host or node except for lab purposes (vm in pc).
1. Zookeeper
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications
Role of Zookeeper in Hadoop ecosystem:
During the Hadoop Namenode failover process, ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.
Refer to this post for more details:
How does Hadoop Namenode failover process works?
2. JournalNode ( Used in Namenode failover process)
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).
JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.
Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine
3.Quorum Journal Manager (QJM) allows to share edit logs between the Active and Standby NameNodes
Importantly, when using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario

Need help regarding storm

1) What happens if Nimbus fails? Can we convert some other node into a Nimbus?
2) Where is the output of topology stored? When a bolt emits a tuple, where is it stored ?
3) What happens if zookeeper fails ?
Nimbus is itself a failure-tolerant process, which means it doesn't store its state in-memory but in an external database (Zookeeper). So if Nimbus crashes (an unlikely scenario), on the next start it will resume processing just where it stopped. Nimbus usually must be setup to be monitored by an external monitoring system, such as Monit, which will check the Nimbus process state periodically and restart it if any problem occurs. I suggest you read the Storm project's wiki for further information.
Nimbus is the master node of a Storm cluster and isn't possible to have multiple Nimbus nodes. (Update: the Storm community is now (as of 5/2014) actively working on making the Nimbus daemon fault tolerant in a failover manner, by having multiple Nimbuses heartbeating each other)
The tuple is "stored" in the tuple tree, and it is passed to the next bolt in the topology execution chain as topology execution progresses. As for physical storage, tuples are probably stored in an in-memory structure and seralized as necessary to be distributed among the cluster's nodes. The complete Storm cluster's state itself is stored in Zookeeper. Storm doesn't concern itself with persisent storage of a topology or a bolt's output -- it is your job to persist the results of the processing.
Same as for Nimbus, Zookeper in a real, production Storm cluster must be configured for reliability, and for Zookeeper that means having an odd number of Zookeeper nodes running on different servers. You can find more information on configuring a Zookeeper production cluster in the Zookeper Administrator's Guide. If Zookeeper would fail (altough a highly unlikely scenario in a properly configured Zookeeper cluster) the Storm cluster wouldn't be able to continue processing, since all cluster's state is stored in Zookeeper.
Regarding question 1), this bug report and subsequent comment from Storm author and maintainer Nathan Marz clarifies the issue:
Storm is not designed for having topologies partially running. When you bring down the master, it is unable to reassign failed workers. We are working on Nimbus failover. Nimbus is fault-tolerant to the process restarting, which has made it fault-tolerant enough for our and most people's use cases.

Resources