How to enable high availability in JanusGraph? - janusgraph

I have the problem of enabling high availability in JanusGraph, that is, enabling several clusters in the same Janus... I have searched many bibliographies, and I have not been able to solve it.

You can take the advanced JanusGraph deployment scenario and add an open source load balancer in front, such as https://www.haproxy.org/.

Related

How does the Titan (not backend storage) clustering work?

Context
I'm using Titan v1.0.0, on AWS infrastructure and want to support failover/fault tolerance. AWS will take care of DynamoDb storage backend but it seems necessary to have several titan instances serviced by an (ELB) load balancer.
I'm using a nodeJs library to get to gremlin, and gremlin to access Titan.
Question
So, how does the Titan (not backend storage) clustering work? If at all.
To be clear, I'm not talking about any backend storage clustering, as I'm using dynamoDb on AWS. The documentation on transaction locking suggests to me that a titan cluster must exist, as other titan nodes wouldn't know about the locking without some sort of inter-communication. But I don't see any configuration options that support this.
If clustering is possible on titan, does anyone have any information on how to get this setup in a production High-Availability setup?
An illustration of the server-side architecture:
[NodeJsA]\ /[TitanA]\
\ / \
[ELB (AWS)] [DynamoDb (AWS)]
/ \ /
[NodeJsB]/ \[TitanB]/
Further, if there is no clustering of the titan nodes then a change made via the TitanA node (above) could take the following amount of time to be seem on TitanB node (worst case):
(AWS eventual consistency convergence time (~1 sec) + TitanB cache timeout + poll time from NodeJs nodes to titanDB)
Another consequence of the lack of clustering would be that sessions would have be to pinned in the ELB else a read request, after an update, could be served by a different node with stale information.
Titan doesn't do any "clustering" outside of what is supported by the selected backend. You referred to "locking" as something that would indicate that clustering is supported, but if you read about the locking providers in that link you supplied you'll see that locking isn't really doing anything terribly fancy at a Titan level and that it is backend dependent. So Titan instances really don't have any external clustering capabilities or knowledge of each other. You therefore need to take that into account with respect to your architecture.

How to prevent being affected by data-center DDoS attack & maintainance related downtime?

I'm hosting a web application which should be highly-available. I'm hosting on multiple linodes and using a nodebalancer to distribute the traffic. My question might be stupid simple - but not long ago I was affected by a DDoS hitting the data-center. That made me think how I can be better prepared next time this happens.
The nodebalancer and servers are all in the same datacenter which should, of course, be fixed. But how does one go about doing this? If I have two load balancers in two different data centers - how can I setup the domain to point to both, but ignore the one affected by DDoS? Should I look into the DNS manager? Am I making things too complicated?
Really would appreciate some insights.
Thanks everyone...
You have to look at ways to load balance across datacenters. There's a few ways to do this, each with pros and cons.
If you have a lot of DB calls, running to datacenters HOT can introduce a lot of latency problems. What I would do is as follows.
Have the second datacenter (DC2) be a warm location. It is configured for everything to work and is constantly getting data from the master DB in DC 1, but isn't actively getting traffic.
Use a service like CLoudFlare for their extremely fast DNS switching. Have a service in DC2 that constantly pings the load balancer in DC1 to make sure that everything is up and well. When it has trouble contacting DC1, it can connect to CloudFlare via the API and switch the main 'A' record to point to DC2, in which case it now picks up the traffic.
I forget what CloudFlare calls it but it has a DNS feature that allows you to switch 'A' records almost instantly because the actual IP address given to the public is their own, they just route the traffic for you.
Amazon also have a similar feature with CloudFront I believe.
This plan is costly however as you're running much more infrastructure that rarely gets used. Linode is and will be rolling out more network improvements so hopefully this becomes less necessary.
For more advanced load balancing and HA, you can go with more "cloud" providers but it does come at a cost.
-Ricardo
Developer Evangelist, CircleCI, formally Linode

GCE managed groups api for horizontally scaling kubernetes nodes

i would like to scale kubernetes nodes according to unscheduled pods. if i have pods that can't be scheduled because of their resource requirements, i want to add a new node to the cluster.
looking at autoscaling feature of managed groups in GCE, this doesn't seem to be possible, as their model requires a metric per node in the cluster, while my metric is global.
can anyone confirm that this can't be achieved with current GCE solution?
anyone know of any existing tool/blogpost whatever that could help implementing a solution?
assuming i'm going to roll my own, i'm having problems finding an api that controls GCE managed groups (allows to add a node, remove a node)
thanks,
Nathan
If you are fine with the standard per-node metrics, read the "Horizontal auto-scaling of nodes (GCE)" section of the kubernetes cluster management guide to enable to autoscaler.
If you want custom metrics, you can check out the GCE document.
There is also a similar question on stackoverflow, and the author of one of the answers said that after writing their own custom metrics, the standard per-node metrics was found to be just as good, if not better, for their use case.

Websphere application server 8.5.5 clustering same application

I have the same application running on two WAS clusters. Each cluster has 3 application servers based in different datacenters. In front of each cluster are 3 IHS servers.
Can I specify a primary cluster, and a failover cluster within the plugin-cfg.xml? Currently I have both clusters defined within the plugin, but I'm only hitting 1 cluster for every request. The second cluster is completely ignored.
Thanks!
As noted already the WAS HTTP server plugin doesn't provide the function your're seeking as documented in the WAS KC http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rwsv_plugincfg.html?lang=en
assuming that by "failover cluster" what is actually meant is "BackupServers" in the plugin-cfg.xml
The ODR alternative mentioned previously likely isn't an option either, this because the ODR isn't supported for use in the DMZ (it's not been security hardened for DMZ deployment) http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/twve_odoecreateodr.html?lang=en
From an effective HA/DR perspective what you're seeking to accomplish should handled at the network layer, using the global load balancer (global site selector, global traffic manager, etc) that is routing traffic into the data centers, this is usually accomplished by setting a "site cookie" using the load balancer
This is by design. IHS, at least at the 8.5.5 level, does not allow for what you are trying to do. You will have to implement such level of high availability in a higher level in your topology.
There are a few options.
If the environemnt is relatively static, you could post-process plugin-cfg.xml and combine them into a single ServerCluster with the "dc2" servers listed as <BackupServer>'s in the cluster. The "dc1" servers are probably already listed as <PrimaryServer>'s
BackupServers are only used when no PrimaryServers are reachable.
Another option is to use the Java On-Demand Router, which has first-class awareness of applications running in two cells. Rules can be written that dictate the behavior of applications residing in two clusters (load balancer, failover, etc.). I believe these are "ODR Routing Rules".

High Performance Highly Available Tracking System

I currently hold a tracking service that records visits from various sources. At times we record the visits and redirect to our clients or we let clients call us to report visits. The architecture is two worker boxes configured behind a load-balancer. This system is setup using Amazon EC2 and the load-balancer used is Amazon's Elastic LB.
I did some benchmarking tests and have noticed significant network latencies. The traffic through load-balancer suffers atleast 2 times more delay than hitting any of the boxes directly.
Has anyone experienced such an issue and has attempted to solve it? Is this an Amazon EC2 specific issue?
Is there any other architecture in use that would lower my network latencies significantly. e.g. Using a HA such that traffic needn't go through a load-balancer but instead hits the end point servers directly? Before I start investing time on that I wanted to hear what others think of the same.
Thanks alot for your time,
Santosh
Change your LB and give it another try. HAProxy is great session/cookie aware L7 balancer and can be setup in Amazon cloud AFAIK. See this: http://agiletesting.blogspot.com/2009/02/load-balancing-in-amazon-ec2-with.html
You have to take into account that ELBs perform better after a while, then initially. Don't ask me why, but that's how it is -- loadbalancer warming?
It also really depends how much traffic you sent the ELB. Keep in mind that the hardware the ELB is provisioned on seems like a regular small instance. So the throughput is capped at ~25 MBit (last time I checked). If you require more, go dedicated.
In the end, I too would suggest that you look Haproxy on a dedicated instance. I'd expect some delay, 2x more delay sounds unreal. Maybe use another small instance and benchmark it directly against the ELB and then try a c1.medium.

Resources