Opendaylight OSGI System bundles went down automatically when node is Quarantined - osgi

We are using Nitrogen version of ODL and we are trying 2 node cluster. During our testing we observed following
Split brain between the nodes.
Akka actor on each node quarantined their peer.
After quarantining, System bundle were stopped automatically and all dependent bundles were also stopped and restarted.
We noticed following code is getting triggered when nodes are quarantined. Please refer to apply() in
ActorSystemProvider Source Code
Could you please answer following queries
Reason for shutting down system bundles and also why are we doing it in both the nodes.
bundleContext.getBundle(0).stop();
Is it possible to disable shutdown and restart of system bundles.

When akka quarantines a node, it won't let it back in the cluster until the actor system is restarted. That essentially means restarting ODL hence we restart the karaf container. It's quirky but unfortunately that is the way akka is designed and works so there's no other choice (at least not that I know of).

Related

Session Persistence Hazelcast client initialization when server is offline

We are trying to replicate the WebSphere Traditional (5/6/7/8/9) behaviour about session persistance for servlets and http, but with Hazelcast and Tomcat. Let me explain...
WebSphere, even when configured as client to a replication domain, keeps a local register of session data. And this local register works fine even if the server processes that should keep replicated data are shutdown from the very first moment. That is, you start the client, and session persistence works within the servlet container. Obviously, you cannot expect to recover your session in another servlet container if the first one crashes, but your applications work anyway.
On the other hand, Hazelcast client on Tomcat containers expect the Hazelcast server (at least one member of the cluster) to be up and running to initialize. If no cluster member is available, initialization fails, and ... web applications in the Tomcat servlet container do not start right. They won't answer any request.
Furthermore, once initialization fails, only way to recover is to shutdown and re-start the tomcat web containers (once a hazelcast cluster member is online).
This behaviour is a bit harsh on system administrators: no one can guarantee that a backup service as distributed session persistence is online all time. That means that launching a Tomcat client becomes a risky task, with a single point of failure by design, which is undesirable.
Now, maybe I overlooked something, maybe I got something wrong. So, ¿Did someone ever managed to start a Hazelcast client without servers, and how? For us, the difference is decisive: if we cannot make the web container start with the hazelcast server offline, then we must keep going on with WebSphere.
We have been trying it on a CentOS 7.5 on Virtual Box 5.2.22, and our Tomcat version is 8.5. Hazelcast client and server is 3.11.1/2.
<group>
<name>Integracion</name>
<password></password>
</group>
<network>
<cluster-members>
<address>hazelcastsrv1/address>
<address>hazelcastsrv2</address>
</cluster-members>
</network>
Sadly, we expect exactly what we get: the reading of the Hazelcast manual suggest that offline servers won't allow tomcat to serve applications. But we cannot beleive what we read, because it makes the library unsafe in a distributed context. We expect to be wrong, and that there are good news around the corner.
Hazelcast is not "a single point of failure by design". The design is to avoid a single point of failure. Data is mirrored across the nodes by default.
It's a data grid, you run as many nodes as capacity and resilience requires, and they cluster together.
If you need 3 nodes to be up for successful operations, and also anticipate that 1 might go down, then you need to run 4 in total. Should that 1 failure happen, you have a cluster surviving that is big enough.
Power-on/Power-off order is not relevant in Hazelcast, as long as you are providing remaining nodes, during power-off, enough time to let repartitioning complete. For example, in a 4 nodes cluster, if you take out 1 node and give the other 3 room to complete repartitioning then you dont loose the data. If you take out 2 nodes together then the cluster will be without the data whose backup was stored on 1 of the 2 nodes you took out.
For starting up, the startup sequence is not relevant as each node owns certain set of partitions that are determined based on consistent hashing. And this ownership continues to change even if there are nodes leaving/joining a running cluster.

Worker node execution in Apach-Strom

Storm topology is been deployed using Storm command on machine X. Worker nodes are running on Machine Y.
Once topology has been deployed, this is ready to process tuples and workers are processing request and response.
Can anyone please suggest that how do Worker node identify work and data, as I am not sure how worker node has access of code which is not at all deployed by developer?
If code to topology is accessible to Worker Nodes, can you please where is the location of this and also suggest execution of Worker nodes?
One, your asking a fairly complex question. I've been using Storm for awhile and don't understand much about how it works internally. Here is a good article talking about the internals of Storm. It's over two years old but should still be highly relevant. I believe that Netty is now used as the internal messaging transport, it's mentioned as being experimental in the article.
As far as code being run on worker nodes, there is an configuration in storm.yaml,
storm.local.dir
When uploading the Topology, I believe it copies the jar to that location. So every different worker machine will have the necessary jar in it's configured storm.local.dir. So even though you only upload the one machine, Storm will distributed it to the necessary workers. (That's from memory and I'm not in a spot to test it at the moment. )

CoreOS, Fleet and Etcd2 fault tolerance

I have a 23 node cluster running CoreOS Stable 681.2.0 on AWS across 4 availability zones. All nodes are running etcd2 and flannel. Of the 23 nodes, 8 are dedicated etcd2 nodes, the rest are specifically designated as etcd2 proxies.
Scheduled to the cluster are 3 nginx plus containers, a private Docker registry, SkyDNS, and 4 of our application containers. The application containers register themselves with with etcd2 and the nginx containers pick up any changes, render the necessary files, and finally reload.
This all works perfectly, until a singe etcd2 node is unavailable for any reason.
If the cluster of voting etcd2 members loses connectivity to a even a single other voting etcd2 member, all of the services scheduled to the fleet become unstable. Scheduled services begin stopping and starting without my intervention.
As a test, I began stopping the EC2 instances which host voting etcd2 nodes until quorum was lost. After the first etcd2 node was stopped, the above symptoms began. After a second node, services became unstable, with no observable change. Then, after the third was stopped quorum was lost and all units were unscheduled. I then started all three etcd2 nodes again and within 60 seconds the cluster had returned to a stable state.
Subsequent tests yield identical results.
Am I hitting a known bug in etcd2, fleet or CoreOS?
Is there a setting I can modify to keep units scheduled onto a node even if etcd is unavailable for any reason?
I've experienced the same thing. In my case, when I ran 1 specific unit it caused everything to blow up. Scheduled and perfectly fine running units were suddenly lost without any notice, even machines dropping out of the cluster.
I'm still not sure what the exact problem was, but I think it might have had something to do with etcd vs etcd2. I had a dependency of etcd.service in the unit file, which (I think, not sure) caused CoreOS to try and start etcd.service, while etcd2.service was already running. This might have caused the conflict in my case, and messed up the etcd registry of units and machines.
Something similar might be happening to you, so I suggest you check each host whether you're running etcd or etcd2 and check your unit files to see which one they depend on.

What keeps the cluster resource manager running?

I would like to use Apache Marathon to manage resources in a clustered product. Mesos and Marathon solves some of the "cluster resource manager" problems for additional components that need to be kept running with HA, failover, etc.
However, there are a number of services that need to be kept running to keep mesos and marathon running (like zookeeper, mesos itself, etc). What can we use to keep those services running with HA, failover, etc?
It seems like solving this across a cluster (managing how many instances of zookeeper, etc, and where they run and how they fail over) is exactly the problem that mesos/marathon are trying to solve.
As the Mesos HA doc explains, you can start multiple Mesos masters and let ZK elect the leader. Then if your leading master fails, you still have at least 2 left to handle things. It is common to use something like systemd to automatically restart the mesos-master on the same host if it's still healthy, or something like Amazon AutoScalingGroups to ensure you always have 3 master machines even if a host dies.
The same can be done for Marathon in its HA mode (on by default if you start multiple instances pointing to the same znode). Many users start these on the same 3 nodes as their Mesos masters, using systemd to restart failed Marathon services, and the same ASG to ensure there are 3 Mesos/Marathon master nodes.
These same 3 nodes are often configured to be the ZK quorum as well, so there are only 3 nodes you have to manage for all these services running outside of Mesos.
Conceivably, you could bootstrap both Mesos-master and Marathon into the cluster as Marathon/Mesos tasks. Spin up a single Mesos+Marathon master to get the cluster started, then create a Mesos-master app in Marathon to launch 2-3 masters as Mesos tasks, and a Marathon-master app in Marathon to launch a couple of HA Marathon instances (as Mesos tasks). Once those are healthy, you can kill the original standalone Mesos/Marathon master and the cluster would failover to the self-hosted Mesos and Marathon masters, which would be automatically restarted elsewhere on the cluster if they failed. Maybe this would work with ZK too. You'd probably need something like Mesos-DNS and/or ELB to let other services find Mesos/Marathon. I doubt anybody's running Mesos this way, but it's crazy enough it just might work!
In order to understand this, I suggest you spend a few minutes reading up on the architecture and the HA part in the official Mesos doc. There, it is clearly explained how HA/failover in Mesos core is handled (which is, BTW, nothing magic—many systems I know of use pretty much exactly this model, incl. HBase, Storm, Kafka, etc.).
Also, note that—naturally—the challenge keeping a handful of the Mesos masters/Zk alive is not directly comparable with keeping potentially 10000s of processes across a cluster alive, evict them or fail them over (in terms of fan out, memory footprint, throughput, etc.).

Checking in OSGI if a bundle is still operating normally

I'm currently making a watchdog to check if all bundles in a pipeline are still functioning properly. (This will be in a distributed environment so failure can be a network failure, software failure, one of the servers failing, ...)
Because a bundle can be bound to N amount of services, N arbitrary, the checking should will happen recursively using the following methodology:
START at the first step in the pipeline
Use getServicesInUse to get the services references of the next step
use getBundle() on the gathered ServiceRerefence objects
REPEAT until we arrive at the bundle we want to stop at
So that way I can get all the bundle objects of the pipeline (I assume) now to check if they are functioning correctly (or just if they are still reachable) I was wondering if
Bundle b = ...
if(b.getState() == Bundle.ACTIVE) ...;
will do the trick? Ofcourse also surrounding this with the necessary try catch clauses to detect hardware/network failure.
Can you clarify what you mean by "all bundles in a pipeline"?
You are right that a bundle can provide and consume zero or more services, but if I were to create a watchdog for an OSGi system I would use one of two approaches:
If the nodes in your distributed system provide mainly REST services, I would write a separate "watchdog" program that monitors these REST services to see if they still respond (on any of the nodes in my distributed system). You can either make "real" calls or just request some HEAD and see if you get a response.
If the nodes in your distributed system provide mainly OSGi services, I would write a watchdog bundle and deploy that to each node. I would then add a REST endpoint to my watchdog to allow me to monitor it remotely (by another watchdog, similar to approach #1).
Checking the active state of a bundle will tell you nothing. Bundles will remain active once started, but the services they provide could be unresponsive.

Resources