How are the clusters connected in different regions in a distributed system? - cluster-computing

A newbie here trying to understand distributed architecture. I understand that the nodes in clusters are interconnected via LAN. How are the clusters connected in different regions, lets say different continents? Are there any frameworks or patterns I can read on this?

In general, it is achieved using fully redundant undersea cables (fiber network cables). The cables have very thin threads(glass fiber) that transfer data using fiber-optic technology almost at the speed of light in ocean between the continents. Once the data is received at the other continent, it shall be processed by connecting with an existing network via edge networks that is closest to it which in turn takes it to other endpoints/gateways as applicable.
The routing in such scenario depends on the underlying routing protocol and the endpoints. In general, there are Border Gateway Protocol (BGP) enabled gateways that will automatically learn routes to other sites and carry the data accordingly.
Cloud providers such as AWS has components such as AWS region extended with AWS Local Zones and AWS wavelength that in turn work along with internet service providers and meet the performance requirement of the application . This is achieved by having the AWS infrastructure (have AWS compute and storage services within ISP datacenters ) configured closer to the user or at the edge of the 5G network such that the application traffic from the particular set of 5G devices can reach the servers in wavelength zones with minimal latency without opting through normal internet which would have introduced latency in reaching the server.
The exact pattern/architecture depends on the software requirement/design and the software components and hardware components in use.
A typical pattern that can be taken for example is Geode pattern as depicted below. This has set of geographical nodes with backend services deployed such that they can service any request for any client in any region. By distributing request processing around the globe, this pattern brings in improvement in latency and improves availability.
Typically, the geo-distributed datastores should also be co-located with the compute resources that process the data if the data is geo-distributed across a far-flung user base. The geode pattern brings the compute to the data. It is a kind deploying service in the form satellite deployments that are spread around the globe where each of this is termed as geode.
This pattern relies on features(routing protocols) of Azure that routes traffic to nearby geode via the shortest path which in-turn brings improvement in latency and performance. The pattern is deployed such that there is global load balancer and the geode is behind it. It uses a geo-replicated read-write service like Azure Cosmos DB for the data plane, that brings in data consistency in cross-geode via Data replication services such that all geodes can serve all requests.
There is also the deployment-stamp pattern that can be used if there are Multi-region applications where each tenant's data and traffic should be directed to a specific region.
This relies on Azure Front Door for directing traffic to the closest instance or it can utilize API Management deployed across multiple regions ​for enabling geo-distribution of requests and geo-redundancy of the traffic routing service. As shown in below diagram, the Azure Front Door can be configured with a backend pool, enabling requests to be directed to the closest available API Management instance. The global distribution features of Cosmos DB can be used to keep the mapping information updated across each region.
Azure Front door is often referred as "Scalable and secure entry point for fast delivery of your global applications". As shown below, the Front Door operates at Layer 7 (HTTP/HTTPS layer) using anycast protocol with split TCP and Microsoft's global network for improved latency, global connectivity. Based on the routing method the Front Door will route the client requests to the fastest and most available application backend(Internet-facing service hosted inside or outside of Azure).
The equivalent of Azure FrontDoor in Google Cloud Platform is the Google Cloud CDN which is termed as "Low-latency, low-cost content delivery using Google's global network" and it leverages Google's globally distributed edge caches to accelerate content delivery for websites and applications served out of Google Compute Engine.
Similarly, Amazon has Amazon CloudFront . This is as a CDN service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment. The AWS backbone is also a private network built on a global, fully redundant, fiber network linked via trans-oceanic cables across various oceans. Amazon CloudFront automatically maps network conditions and intelligently routes the traffic to the most performant AWS edge location to serve up cached or dynamic content.
Here is a reference for AWS and a use case describing between different continents using AWS global backbone. for users need access to the applications running in one data center as well as the core systems running in their another data center with the different sites interconnected by a global WAN. Traffic using inter-region Transit Gateway peering is always encrypted, stays on the AWS global network, and never traverses the public Internet. Transit Gateway peering enables international, in this case intercontinental, communication. Once the traffic arrives at the particular continent/region’s Transit Gateway, the customer routes the traffic over an AWS Direct Connect (or VPN) to the central data center, where core systems are hosted.

Related

What if I choose us-central1-a zone for my google compute VM Instance and my traffic, calling VM is from asia? (in respect of pricing & efficiency)

I am trying to create a Google Compute VM Instance which will host my website, the traffic to this website will be coming mostly from asia, so which region should I select for my compute VM Instance.
How selecting of region will effect on the pricing and performance?
Have a look at the Best practices for Compute Engine regions selection section Factors to consider when selecting regions:
Latency
The main factor to consider is the latency your user experiences.
However, this is a complex problem because user latency is affected by
multiple aspects, such as caching and load-balancing mechanisms.
In enterprise use cases, latency to on-premises systems or latency for
a certain subset of users or partners is more critical. For example,
choosing the closest region to your developers or on-premises database
services interconnected with Google Cloud might be the deciding
factor.
For example you can serf some sites located and Asia and then compare your experience to sites located in US - you'll notice significant difference in response caused by latency. The same with your site - it'll be less responsive. You should set up your VM instance as close to your customers as possible.
To estimate pricing check resources below:
Pricing
Google Cloud resource costs differ by region. The following resources
are available to estimate the price:
Compute Engine pricing
Pricing calculator
Google Cloud SKUs
Billing API
If you decide to deploy in multiple regions, be aware that there are
network egress charges for data synced between regions.
In addition, you can find monthly estimate cost in Create a new instance wizard as well - try to set different regions and you'll get the numbers.
If your customers located in different regions you can try Google Cloud CDN:
Cloud CDN (Content Delivery Network) uses Google's globally
distributed edge points of presence to cache HTTP(S) load balanced
content close to your users. Caching content at the edges of Google's
network provides faster delivery of content to your users while
reducing serving costs.

Global borderless implementation website/app on Serverless AWS

I am planning to use AWS to host a global website that have customers all around the world. We will have a website and app, and we will use serverless architecture. I will also consider multi-region DynamoDB to allow users closer to the region to access the closest database instance.
My question regarding the best design to implement a solution that is not locked down to one particular region, and we are a borderless implementation. I am also looking at high traffic and high number of users across different countries.
I am looking at this https://aws.amazon.com/getting-started/serverless-web-app/module-1/ but it requires me to choose a region. I almost need a router in front of this with multiple S3 buckets, but don't know how. For example, how do users access a copy of the landing page closest to their region?, how do mobile app users call up lambda functions in their region?
If you could point me to a posting or article or simply your response, I would be most grateful.
Note: would be interested if Google Cloud Platform is also an option?
thank you!
S3
Instead of setting up an S3 bucket per-region, you could set up a CloudFront distribution to serve the contents of a single bucket at all edge locations.
During the Create Distribution process, select the S3 bucket in the Origin Domain Name dropdown.
Caveat: when you update the bucket contents, you need to invalidate the CloudFront cache so that the updated contents get distributed. This isn't such a big deal.
API Gateway
Setting up an API Gateway gives you the choice of Edge-Optimized or Regional.
In the Edge-Optimized case, AWS automatically serves your API via the edge network, but requests are all routed back to your original API Gateway instance in its home region. This is the easy option.
In the Regional case, you would need to deploy multiple instances of your API, one per region. From there, you could do a latency-based routing setup in Route 53. This is the harder option, but more flexible.
Refer to this SO answer for more detail
Note: you can always start developing in an Edge-Optimized configuration, and then later on redeploy to a Regional configuration.
DynamoDB / Lambda
DynamoDB and Lambda are regional services, but you could deploy instances to multiple regions.
In the case of DynamoDB, you could set up cross-region replication using stream functions.
Though I have never implemented it, AWS provides documentation on how to set up replication
Note: Like with Edge-Optimized API Gateway, you can start developing DynamoDB tables and Lambda functions in a single region and then later scale out to a multi-regional deployment.
Update
As noted in the comments, DynamoDB has a feature called Global Tables, which handles the cross-regional replication for you. Appears to be fairly simple -- create a table, and then manage its cross-region replication from the Global Tables tab (from that tab, enable streams, and then add additional regions).
For more info, here are the AWS Docs
At the time of writing, this feature is only supported in the following regions: US West (Oregon), US East (Ohio), US East (N. Virginia), EU (Frankfurt), EU West (Ireland). I imagine when enough customers request this feature in other regions it would become available.
Also noted, you can run Lambda#Edge functions to respond to CloudFront events.
The lambda function can inspect the AWS_REGION environment variable at runtime and then invoke (and forward the request details) a region-appropriate service (e.g. API Gateway). This means you could also use Lambda#Edge as an API Gateway replacement by inspecting the query string yourself (YMMV).

Does Serverless Framework support any kind of multi-cloud load balancing?

Does Serverless Framework support the ability to deploy the same API to multiple cloud providers (AWS, Azure and IBM) and route requests to each provider based on traditional load balancer methods (i.e. round robin or latency)?
Does Serverless Framework support this function directly?
Does Serverless integrate with global load balancers (e.g. dyn or neustar)?
Does Serverless Framework support the ability to deploy the same API to multiple cloud providers (AWS, Azure and IBM)
Just use 3 different serverless.yml files and deploy each function 3 times.
and route requests to each provider based on traditional load balancer methods (i.e. round robin or latency)?
No, there is no such support for multi-cloud load balancing
The Serverless concept is based on trust: you trust that your Cloud provider will be able to handle your traffic with proper scalability and availability. There is no multi-cloud model, a single Cloud provider must be able to satisfy your needs. To achieve this, they must implement a proper load-balacing schema internally.
If you don't trust on your Cloud provider, you are not thinking in a serverless way. Serverless means that you should not worry about the infra the supports your app.
However, you can implement a sort of multi-cloud load balancing
When you specify a serverless.yml file, you must say which provider (AWS, Azure, IBM) will create those resources. Multi-cloud means that you need one serverless.yml file per each Cloud, but the source code (functions) can be the same. When you deploy the same function to 3 different providers, you will receive 3 different endpoints to access them.
Now, which machine will execute the Load Balance? If you don't trust that a single Cloud provides enough availability, how will you define who will serve the Load Balance feature?
The only solution that I see is to implement this load-balacing in your frontend code. Your app would know the 3 different endpoints and randomize the requests. If one request returns an error, the endpoint would be marked as unhealthy. You could also determine the latency for each endpoint and select a preferred provider. All of this in the client code.
However, don't follow this path. Choose just one provider for production code. The SLA (service level agreement) usually provides a high availability. If it's not enough, you should still stick with just one provider and have in hand some scripts to easily migrate to another cloud in case of a mass outage of your preferred provider.

Akka, AMI - discover remote actors for database access

I am working on a prototype for a client where, on AWS auto-scaling is used to create new VMs from Amazon Machine Images (AMIs), using Akka.
I want to have just one actor, control access to the database, so it will create new children, as needed, and queue up requests that go beyond a set limit.
But, I don't know the IP address of the VM, as it may change as Amazon adds/removes VMs based on activity.
How can I discover the actor that will be used to limit access to the database?
I am not certain if clustering will work (http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html), and this question and answers are from 2011 (Akka remote actor server discovery), and possibly routing may solve this problem: http://doc.akka.io/docs/akka/2.4.16/scala/routing.html
I have a separate REST service that just goes to the database, so it may be that this service will need to do the control before it goes to the actors.

Azure cache configs - multiple on one storage account?

while developing Azure application I got famous error "Cache referred to does not exist", and after a while I found this solution: datacacheexception Cache referred to does not exist (for short: dont point multiple cache clusters to one storage account by ConfigStoreConnectionString)
Well, I have 3 roles using co-located cache, and testing+production environment. So I would have to create 6 "dummy" storage accounts just for cache configuration. This doesnt seems very nice to me.
So the question is - is there any way to point multiple cache clusters to one storage account? for example, specify different containers for them (they create one named "cacheclusterconfigs" by default) or so?
Thanks!
Given your setup, i would point each cloud service at its own storage account. So this gives two per environment (one for each cloud service). Now there are other alternatives, you could set up Server AppFabric cache in an IaaS VM and expose that to both of your cloud services by placing them all within a single Azure Virtual Network. However, this will introduce latency to the connections as well as increase costs (from running the virtual network).
You can also put the storage account for cache as the same one used by diagnostics or the data storage for your cloud services, just be aware of any scalability limits as the cache will generate some traffic (mainly from the addition of new items to the cache).
But unfortunately, to my knowledge there's no option currently to allow for two caches to share the same storage account.

Resources