configuring datastax cluster dynamically - cluster-computing

I want to create a connection routine that does not include nodes which are down during time of connection, does not create pools to the node. If some node goes down in between, then it is automatically blacklisted from round robin. Is there a way to do so?
The whitelist policy starts by asking a bunch of nodes but does not dynamically change it (to my knowledge). I did not find a way to make "Host" object. I did not find a way to get up/down status through java code as obtained from nodetool utility - but then I'd like to do so without starting a session, just like nodetool utility does without going in cqlsh.

Are you using the DataStax Java driver? I'm guessing you are by your mention of the white list policy. The driver already does automatic node discovery and transparent failover.
If a node goes down, the driver will detect it and stop issuing queries to this node until it comes back up. If you are looking for different functionality can you elaborate?

Related

ActiveMQ Artemis - how does slave become live if there's only one master in the group?

I'm trying to understand a couple of items relating to failover in the ActiveMQ Artemis documentation. Specifically, there is a section (I'm sure I'm reading it wrong) that seems as if it's impossible to for a slave to take over for a master:
Specifically, the backup will become active when it loses connection to its live server. This can be problematic because this can also happen because of a temporary network problem. In order to address this issue, the backup will try to determine whether it still can connect to the other servers in the cluster. If it can connect to more than half the servers, it will become active, if more than half the servers also disappeared with the live, the backup will wait and try reconnecting with the live. This avoids a split brain situation
If there is only one other server, the master, the slave will not be able to connect to it. Since that is 100% of the other servers, it will stay passive. How can this work?
I did see that a pluggable quorum vote replication could be configured, but before I delve into that, I'd like to know what I'm missing here.
When using replication with only a single primary/backup pair there is no mitigation against split brain. When the backup loses its connection with the primary it will activate since it knows there are no other primary brokers in the cluster. Otherwise it would never activate, as you note.
The documentation should be clarified to remove this ambiguity.
Lastly, the documentation you referenced does not appear to be the most recent based on the fact that the latest documentation is slightly different from what you quoted (although it still contains this ambiguity).
Generally speaking, a single primary/backup pair with replication is only recommended with the new pluggable quorum voting since the risk of split brain is so high otherwise.

Apache Geode Locator on different boxes, start but not forming a cluster

I have already gone through the one of the post however, the recommendation was to run things in same LAN segment or may be swarm for docker Docker Geode remote locator
My question is if I run 2 locator on 2 different node (VMs or Physical hardware) and also specify --locators='hostN[port]' referencing one another, how should I know if they did form a cluster?
Do i need to configure them always in a WAN configurations, even if they are part of same LAN but independent node (and not processes within one node)?
list members is not showing both the locators.
I am able to connect from one gfsh console to the locator in a different node gfsh console. but as long as I am not giving any bind addresses while starting the locators, both my locator start however, I don;t know if they formed a cluster (or if they are connected to each other).
I am evaluating with Apache Geode but need HA across VM and not processes within the same node. Please guide. Thanks in advance.
I changed the IPv4 configuration and provided a nearby LAN address, recommendation is to bring them in the same LAN segment.
If I start the locator virtually simultaneously and not wait for another locator to get up already, it seems to find it.
Other options I found was to increase:
ack-wait-threshold=15
member-timeout=5000
But keep in mind, the more time you put in here for acknowledgement, the same time it will take later to tell you the node is not responding in case it fails for any odd reason.

How does JanusGraph.open() work and how to scale?

I am evaluating different Graph databases and libraries etc.. and JanusGraph seems to be providing most of what I need. I do have a couple of questions:
I would like to connect to it via Gremlin Server with Cluster option, however I don't seem to see any Java examples to handle transaction rollbacks etc at all.
And if I was to use JanusGraphFactory.open("...") option, how exactly does this work? Would it mean the entire Graph is loaded into memory in JVM?
If the entire graph is loaded into memory, how would one scale up and different JVMs are keeping up to date with each other?
Thanks & regards
Tin
I would like to connect to it via Gremlin Server with Cluster option, however I don't seem to see any Java examples to handle transaction rollbacks etc at all.
Connecting to Gremlin Server involves sessionless communication, meaning each request equals one transaction. You can connect with a session but it is not typically encouraged for most use cases.
And if I was to use JanusGraphFactory.open("...") option, how exactly does this work? Would it mean the entire Graph is loaded into memory in JVM?
It just creates a reference to the data and provides a Graph instance from which you can create a GraphTraversalSource to interact with for spawning traversals. It doesn't load any of that data into memory just by virtue of calling it.

File sync between n web servers in cluster

There are n nodes in a web cluster. Files may be uploaded to any node and then must be distributed to every other node. This distribution does not have to happen in a transaction (in fact it must not, distributed transactions don't scale) and some latency is acceptable, although must be minimal. Conflicts can be resolved arbitrarily (typically last write wins) provided that the resolution is also distributed to all nodes so that eventually all nodes have the same set of files. Nodes can be added and removed dynamically without having to reconfigure existing nodes. There must be no single point of failure and no additional boxes required to solve this (such as RabbitMQ)
I am thinking along the lines of using consul.io for dynamic configuration so that each node can refer to consul to determine what other nodes are available and writing a daemon (Golang) that monitors the relevant folders and communicates with other nodes using ZeroMQ.
Feels like I would be re-inventing the wheel though. This is a common problem and I expect there are solutions available already that I don't know about? Or perhaps my approach is wrong and there is another way to solve this?
Yes, there has been some stuff going on with distributed synchronization lately:
You could use syncthing (open source) or BitTorrent Sync.
Syncthing is node-based, i.e. you add nodes to a cluster and choose which folders to synchronize.
BTSync is folder-based, i.e. you obtain a "secret" for a folder and can synchronize with everyone in the swarm for that folder.
From my experience, BTSync has a better discovery and connectivity, but the whole synchronization process is closed source and nobody really knows what happens. Syncthing is written in go, but sometimes has trouble discovering peers.
Both syncthing and BTSync use LAN discovery via broadcast and a tracker for discovery, AFAIK.
EDIT: Or, if you're really cool, use IPFS to host the latest version, IPNS to "name" that and mount the IPNS on the servers. You can set the IPFS bootstrap list to some of your servers, which would even make you independent of external trackers. :)

Is there a way asterisk reconnect calls when internet connection is missed

For being specific, I am using asterisk with a Heartbeat active/pasive cluster. There are 2 nodes in the cluster. Let's suppose Asterisk1 Asterisk2. Eveything is well configured in my cluster. When one of the nodes looses internet connection, asterisk service fails or the Asterisk1 is turned off, the asterisk service and the failover IP migrate to the surviving node (Asterisk2).
The problem is if we actually were processing a call when the Asterisk1 fell down asterisk stops the call and I can redial until asterisk service is up in asterisk2 (5 seconds, not a bad time).
But, my question is: Is there a way to make asterisk work like skype when it looses connection in a call? I mean, not stopping the call and try to reconnect the call, and reconnect it when asterisk service is up in Asterisk2?
There are some commercial systems that support such behavour.
If you want do it on non-comercial system there are 2 way:
1) Force call back to all phones with autoanswer flag. Requerment: Guru in asterisk.
2) Use xen and memory mapping/mirror system to maintain on other node vps with same memory state(same running asterisk). Requirment: guru in XEN. See for example this: http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/
Sorry, both methods require guru knowledge level.
Note, if you do sip via openvpn tunnel, very likly you not loose calls inside tunnel if internet go down for upto 20 sec. That is not exactly what you asked, but can work.
Since there is no accepted answer after almost 2 years I'll provide one: NO. Here's why.
If you failover from one Asterisk server 1 to Asterisk server 2, then Asterisk server 2 has no idea what calls (i.e. endpoint to endpoing) were in progress. (Even if you share a database of called numbers, use asterisk realtime, etc). If asterisk tried to bring up both legs of the call to the same numbers, these might not be the same endpoints of the call.
Another server cannot resume the SIP TCP session of the other server since it closed with the last server.
The MAC source/destination ports may be identical and your firewall will not know you are trying to continue the same session.
etc.....
If you goal is high availability of phone services take a look at the VoIP Info web site. All the rest (network redundancy, disk redundancy, shared block storage devices, router failover protocol, etc) is a distraction...focus instead on early DETECTION of failures across all trunks/routes/devices involved with providing phone service, and then providing the highest degree of recovery without sharing ANY DEVICES. (Too many HA solutions share a disk, channel bank, etc. that create a single point of failure)
Your solution would require a shared database that is updated in realtime on both servers. The database would be managed by an event logger that would keep track of all calls in progress; flagged as LINEUP perhaps. In the event a failure was detected, then all calls that were on the failed server would be flagged as DROPPEDCALL. When your fail-over server spins up and takes over -- using heartbeat monitoring or somesuch -- then the first thing it would do is generate a set of call files of all database records flagged as DROPPPEDCALL. These calls can then be conferenced together.
The hardest part about it is the event monitor, ensuring that you don't miss any RING or HANGUP events, potentially leaving a "ghost" call in the system to be erroneously dialed in a recovery operation.
You likely should also have a mechanism to build your Asterisk config on a "management" machine that then pushes changes out to your farm of call-manager AST boxen. That way any node is replaceable with any other.
What you should likely have is 2 DB servers using replication techniques and Linux High-Availability (LHA) (1). Alternately, DNS round-robin or load-balancing with a "public" IP would do well, too. These machine will likely be light enough load to host your configuration manager as well, with the benefit of getting LHA for "free".
Then, at least N+1 AST Boxen for call handling. N is the number of calls you plan on handling per second divided by 300. The "+1" is your fail-over node. Using node-polling, you can then set up a mechanism where the fail-over node adopts the identity of the failed machine by pulling the correct configuration from the config manager.
If hardware is cheap/free, then 1:1 LHA node redundancy is always an option. However, generally speaking, your failure rate for PC hardware and Asterisk software is fairly lower; 3 or 4 "9s" out of the can. So, really, you're trying to get last bit of distance to the "5th 9".
I hope that gives you some ideas about which way to go. Let me know if you have any questions, and please take the time to "accept" which ever answer does what you need.
(1) http://www.linuxjournal.com/content/ahead-pack-pacemaker-high-availability-stack

Resources