Is it safe to use etcd across multiple data centers? - etcd

Is it safe to use etcd across multiple data centers? As it expose etcd port to public internet.
Do I have to use client certificates in this case or etcd has some sort of authification?

Yes, but there are two big issues you need to tackle:
Security. This all depends on what type of info you are storing in etcd. Using a point to point VPN is probably preferred over exposing the entire cluster to the internet. Client certificates can also be used.
Tuning. etcd relies on replication between machines for two things, aliveness and consensus. Since a successful write must be committed to at majority of the cluster before it returns as successful, your write performance will degrade as the distance between the machines increases. Aliveness is measured with periodic heartbeats between the machines. By default, etcd has a fairly aggressive 50ms heartbeat timeout, which is optimized for bare metal servers running on a local network. Without tuning this timeout value, your cluster will constantly think that members have disappeared and trigger frequent master elections. This gets worse if both of your environments are on cloud providers that have variable networks plus disk writes that traverse the network, a double whammy.
More info on etcd tuning: https://etcd.io/docs/latest/tuning/

Related

What is the recommended way of creating a distributed Lock with Redis on Azure?

I'm looking to create a distributed Lock within Redis on Azure for our multi-instance Worker Role. I need a way of creating "critical sections" for which only a single thread can have access at a time across multiple-instances of the Worker Role.
I am using the StackExchange.Redis client to do this and, helpfully, it has an implementation of transactional TakeLock\ReleaseLock already, and this answer on SO gives me a good idea of the pattern to use and details about how to create a lock.
Reading further around the subject, I also read this Redis article regarding distlock which describes the weaknesses of failover-based Redis nodes when trying to implement a distributed lock mechanism.
The Azure Redis cache implements master/slave failover (apart from the Basic tier) so does this mean that I will need to implement the redlock pattern in order to guarantee that only one thing will ever have the lock?
Additionally, I am wondering:
Why do Azure Redis example connection strings not seem to list the master and slave in them? Have Azure implemented the master/slave failover in a different way?
Why has one .NET implementation of redlock chosen not to support using master/slaves in its usage? (See Usage section, first para) Is this just by choice or is it because master/slave is not a valid usage of redlock (that would not seem to be the case in the redis article)
I'm the author of the RedLock.net library that you linked in your question. The reason the documentation specifies connecting to independent redis instances is based on the reasoning in the Redis Distlock documentation. By forcing writes only to master nodes, we hopefully avoid the situation where a user might misconfigure Redlock to connect to multiple replicated hosts.
According to Azure Redis Cache 103 - Failover and Monitoring there is a load balancer in front of an Azure Redis Cache (at the standard tier and above) that ensures that you are always connected to the master.
Connecting to multiple redis instances (either replicated or not) should give a fairly good guarantee that no two processes end up running at the same time (moreso than a single replicated instance).
In order for another process to 'steal' the lock before the first had finished, more than half of the independent redis instances would need to lose their lock keys (e.g. by restarting without persistence), then have process two gain the lock before the timer in process one reacquired it during its extend timer.

Redis cluster-network latency

There is a new Redis cluster setup, one team I know in my company is working on, in order to improve the application data caching based out on Redis. The setup is as follows, a Redis cluster with a Redis master and many slaves, say 40-50 (but can grow more when the application is scaled), one Redis instance per one virtual machine. I was told this setup helps the applications deployed in servers on every virtual machines query the data present in the local Redis instance than querying an instance in the network in order to avoid network latency. Periodically, the Redis master is updated only with whatever data are modified or newly created or deleted (data backed by a relational database), say every 5 seconds or so. This will initiate the data sync operation with all the Redis slave instances. The data-consumers (the application deployed on all the virtual machines) of the Redis (slaves) reads updated values to do processing. Is this approach a correct one to the network latency problem faced by the applications in querying from a Redis instance that is within a data center network? Will this setup not create lots of network traffic when Redis master syncing the data with all its slave nodes?
I couldn't find much answers on this from the internet. Your opinions on this are much appreciated.
The relevance of this kind of architecture depends a lot about the workload. Here are the important criteria:
the ratio between the write and read operations. Obviously, the more read operations, the more relevant the architecture. The main benefit IMO, is not necessarily the latency gains, but the scalability, the extra reliability it brings, and the network resource consumption.
the ratio between the cost of a local Redis access against the cost of a remote Redis access. Do not assume that the only cost of a remote Redis access is the network latency. It is not. On my systems, a local Redis access costs about 50 us (in average, very low workload), while a remote access costs 120 us (in average, very low workload). The network latency is about 60 us. Measure the same kind of figures on your own system/network, with your own data.
Here are a few advices:
do not use a single Redis master against many slave instances. It will limit the scalability of the system. If you want to scale, you need to build a hierarchy of slaves. For instance, have the master replicates to 8 slaves. Each slave replicates to 8 other slaves locally running on your 64 application servers. If you need to add more nodes, you can tune the replication factor at the master or slave level, or add one more layer in this tree for extreme scalability. It brings you flexibility.
consider using unix socket between the application and the local slaves, rather than TCP sockets. If it good for both latency and throughput.
Regarding your last questions, you really need to evaluate the average local and remote latencies to decide whether this is worth it. Note that the protocol used by Redis to synchronize master and slaves is close to the normal client server traffic. Every SET commands applied on the master, will be also applied on the slave. The network bandwidth consumption is therefore similar. So in the end, it is really a matter of how many reads and how many writes you expect.

What is the difference between failover vs high availability?

According to my reading on jboss documentation it says,
We define high availability as the ability for the system to continue
functioning after failure of one or more of the servers. A part of
high availability is failover which we define as the ability for
client connections to migrate from one server to another in event of
server failure so client applications can continue to operate.
Is failover part of high availability? How can we differentiate failover vs high availability?
Failover is a means of achieving high availability (HA). Think of HA as a feature and failover as one possible implementation of that feature. Failover is not always the only consideration when achieving HA.
For example, Cassandra achieves HA through replication, but the degree of availability is determined by data consistency settings. In essence, these settings dictate how many nodes need to respond for an action (a read or a write) to succeed. Requiring more nodes to respond means less availability, and requiring fewer nodes means more availability. That's an example of HA that has nothing to do with failover, strictly speaking.
High Availability
Refers to the fact that the server system is in some way tolerant to failure.
Most of the time this is done with hardware redundancy. Assume a machine has redundant power supplies, if one fails the machine will keep running.
Failover
Then you have application redundancy (failover), which usually refers to the ability for an application running on multiple hardware installations to respond to clients in a consistent manner from any of those hardware installations. That way, if the hardware does totally fail, or the O/S dies on a particular machine, another machine can carry on.
SQL Server deals with application redundancy in four ways:
Clustering
Mirroring
Replication
Log Shipping
High-availability (HA for short) is a broad term, so when I think about it I tend to think as HA clusters.
From Wikipedia High-availability cluster:
High-availability clusters are groups of computers that
support server applications that can be reliably utilized with a
minimum amount of down-time. They operate by using high availability
software to harness redundant computers in groups or clusters that
provide continued service when system components fail. Without
clustering, if a server running a particular application crashes, the
application will be unavailable until the crashed server is fixed.
So the takeaway from the description above is that HA clusters will provide you with the minimum amount of down-time during a failover. Let me explain the two types of failover that HA clusters can provide you:
Hot-Hot / Active-Active: The redundant computers are truly operating in parallel, producing the exact same state, and the exact same output. They are all active nodes, operating as a perfect mirror of each other. In this scenario, your failover down-time is zero, and you can simply pull the power plug from any machine in the cluster without any downtime or disruption to your service.
Hot-Warn / Active-Passive: Only one primary computer is the active one, while the other computers in the cluster are passively rebuilding the same state as the primary. When the primary computer fails, it has to be disabled or killed (automatically or by an operator) and then a passive computer from the cluster needs to be made active (automatically or by an operator).
So what is the catch? The catch is that applications that can operate in a HA cluster are not trivial to design as they need to be true deterministic finite-state machines. A classic problem is when your application needs to use the clock to build state based on time, as clocks are very non-deterministic by nature.
Disclaimer: I am one of the developers of CoralSequencer.

Best EC2 setup for redis server [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We are deploying a large scale web application that uses only redis as a data store. I notice the the benchmark of our redis master is around 8000 transactions per second on EC2, far less than the stated benchmarks on dedicated hardware.
I understand that there is a performance penalty for running Redis on a virtual machine like EC2, but I would love some pointers from people who have deployed Redis in production environments on EC2 on what EC2 setup you have found most effective for getting more out of redis.
Thanks.
EC2 is probably not the best environment to run Redis on virtualized hardware, but it is a popular one, and there are a number of points to know to get the best from Redis on this platform.
I'm one of the authors of http://redis.io/topics/benchmarks and http://redis.io/topics/latency which cover most of the topics I present below. This is just a summary of the main points.
Virtualization toll
It is not specific to EC2, but Redis is significantly slower when running on a VM (in term of maximum supported throughput). This is due to the fact for basic operations, Redis does not add much overhead to the epoll/read/write system calls required to handle client connections (like memcached, or other efficient key/value stores). System calls are typically more expensive on a VM, and they represent a significant part of Redis activity (especially in benchmarks). In that conditions, a 50% decrease in term of maximum throughput compared to bare metal is not uncommon.
Of course, it also depends on the quality of the hypervisor. For EC2, Xen is used.
Benchmarking in good conditions
Benchmarking can be tricky, especially on a platform like EC2. One point often forgotten is to ensure a proper configuration for both the benchmark client and server. For instance, do not run redis-benchmark on a CPU starved micro-instance (which will likely be throttled down by Amazon) while targeting your Redis server. Both machines are equally important to get a good maximum throughput.
Actually, to evaluate Redis performance, you need to:
run redis-benchmark locally (on the same machine than the server), assuming you have more than one vCPU core.
run redis-benchmark remotely (from a different VM), on a machine whose QoS configuration is equivalent to the server machine
So you can evaluate and compare performance of the machines and the network.
On EC2, you will have the best results with second generation M3 instances (or high-memory, or cluster compute instances) so you can benefit of HVM (hardware virtualization) instead of relying on slower para-virtualization.
The fork issue
This is not specific to EC2, but to Xen: forking a large process can be really slow on Xen (it looks better with kvm). For Redis this is a big problem if you plan to use persistence: both persistence options (RDB or AOF) require the main thread to fork and launch background save or rewrite processes.
In some cases, fork latency can freeze Redis event loop for several seconds. The more memory managed by the Redis instance, the more latency.
On EC2, be sure to use a HVM enabled instance (M3, high-memory, cluster), it will mitigate the issue.
Then, if you have large memory requirements, and your application can tolerate it, consider running several smaller Redis instances on the same machine, and shard your data. It can decrease the latency due to fork operations to an acceptable level.
Persistence configuration
This is a key point to get good performance from Redis (both on VM and bare metal). So please take the time to carefully read http://redis.io/topics/persistence
If you use RDB, keep in mind the memory copy-on-write mechanism will start duplicating pages once the save background process has been forked off. So you need to ensure there is enough memory for Redis itself, plus some margin to cope with the COW. the amount of extra memory depends on your workload. The more you write in the instance, the more extra memory you need.
Please note writing a file may also consume some memory (because of the filesystem cache), so during a Redis background save, you need to account for Redis memory, COW overhead, and size of the dump file.
The machine running the Redis server must never swap. If it does, the result will be catastrophic. Contrary to some other stores, Redis is not virtual memory friendly.
With Linux, be sure to set sensible system parameters: vm.overcommit_memory=1 and vm.swappiness=0 (or a very low value anyway). Do not use old kernel versions: they are quite bad at enforcing a low swappiness (resulting in swapping when a large file is written).
If you use AOF, review the fsync options. It is a tradeoff between raw performance and durability of the write operations. You need to make a choice and define a strategy.
You also need to get familiar with the EC2 storage options. On some VM, you have the choice between ephemeral storage and EBS. On some others, you only have EBS.
Ephemeral storage is generally faster, and you will probably get less issues than with EBS, but you can easily loose your data in case of disk failure or reboot of the host, etc ... You can imagine putting RDB snapshots on ephemeral storage, and then copying the resulting files to EBS directories, as a tradeoff between performance and robustness.
EBS is remote storage: it may eat the standard network bandwidth allocated to the VM, and impact the maximum throughput of Redis. If you plan to use EBS, consider selecting the "EBS-optimized" option to establish a QoS between the standard network and storage links.
Finally, a very common setup for performance demanding instances with EC2 is to deactivate persistence on the master, and only activate it on a slave instance. It is probably less safe for the data, but it may prevent a lot of potential latency issues on the master.

Common Issues in Developing Cluster Aware non-web-based Enterprise Applications

I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are
Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?
My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?
I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).
re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.
Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.
I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.
For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.
For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).
Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.
I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.
Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.
MSDN:
When a port rule uses multiple-host
load balancing, one of three client
affinity modes is selected. When no
client affinity mode is selected,
Network Load Balancing load-balances
client traffic from one IP address and
different source ports on
multiple-cluster hosts. This maximizes
the granularity of load balancing and
minimizes response time to clients. To
assist in managing client sessions,
the default single-client affinity
mode load-balances all network traffic
from a given client's IP address on a
single-cluster host. The class C
affinity mode further constrains this
to load-balance all client traffic
from a single class C address space.
In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;
The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.
Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.
Network Load Balancing Best practices
Web Farming with the
Network Load Balancing Service
in Windows Server 2003 might help you give an insight
Concurrency (Check out Apache Cassandra, et al)
Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)

Resources