Multiple locations in Google Cloud SQL - location

I need to use SQL on multiple different locations. The best option will be to set some databases (or even some records, like tagging on Mongo) in different locations. Is it possible to achieve on Google SQL?

There maybe two scenarios -
One single Cloud SQL instance in multiple locations
Different Cloud SQL instances in multiple locations
When you create a Cloud SQL instance, you choose a region where the instance and its data are stored. To reduce latency and increase availability, choose the same region for your data and your Compute Engine instances, standard environment apps, and other services.
Location types are of mainly two types, regional location i.e. a specific geographic place and multi-regional location which contains at least two geographic places. Multi-regional locations are only used for backup operations in Cloud SQL.
You choose a location when you first create the instance. The location can't be changed after the instance is created.
One single region consists of many small data centers called zones. While creating a Cloud SQL instance you can specify the instance to be available in a single zone or in two different zones within the selected region. Selecting the Cloud SQL instance to be in two different zones is called High Availability (HA) configuration.
The purpose of an HA configuration is to reduce downtime when a zone or instance becomes unavailable which might happen during a zonal outage, or when an instance becomes corrupted. With HA, your data continues to be available to client applications.
The HA configuration provides data redundancy. A Cloud SQL instance configured for HA is also called a regional instance and is located in a primary and secondary zone within the configured region. Within a regional instance, the configuration is made up of a primary instance and a standby instance.
So considering the first scenario when you say if a Cloud SQL instance can be located in multiple locations then it is yes if you consider different zones as different locations (This is correct as two zones are physically separated data centers within a single GCP region). But it can only be located in two zones and for that you have to configure High Availability(HA) for the instance.
For the second scenario, you can always create different Cloud SQL instances in different regions.
You can go through instance locations in Cloud SQL and overview of HA configuration to have a brief understanding of the above.
There is another option in Cloud SQL called read replicas.
You use a read replica to offload work from a Cloud SQL instance. The read replica is an exact copy of the primary instance. Data and other changes on the primary instance are updated in almost real time on the read replica.
Read replicas are read-only; you cannot write to them. The read replica processes queries, read requests, and analytics traffic, thus reducing the load on the primary instance.
If you want the data to be available in multiple locations you may consider using cross-region read replicas.
Cross-region replication lets you create a read replica in a different region from the primary instance.
Cross-region read replicas has many advantages -
Improve read performance by making replicas available closer to your
application's region.
Provide additional disaster recovery capability to guard against a
regional failure.
Let you migrate data from one region to another.

Related

Is it Ok to have single index for multiple tenant?(Azure Search)

Is it ok to have multiple tenant QnA's to be stored in a single data source? for ex: in Azure Table Storage with all QnA's stored in a single table but each tenant data differentiated by an unique key and then filter results based on their unique key, this would help me to reduce the azure service cost but is their any drawbacks in using this method ?
Sharing a service/index in developer/test environments is fine, but there are additional concerns for production environments. These are some drawbacks, though you might not care about some of them:
competing queries: high traffic volume for one tenant can affect query latency/throughput for another tenant
harder to manage data for individual tenants: can you easily delete all documents for a particular tenant? Would the whole index need to be deleted or recreated for any reason which will affect all tenants?
flexibility in location: multiple services allow you to put data physically closer to where the queries will be issued. There can also be legal requirements for where data is stored.
susceptible to bugs/human error: people make mistakes; how bad is it to return data for the wrong tenant? How would you guard against that?
permission management: do you need to need grant permissions to view data for a subset of the tenants?

Is there someway to make distributed table still work for query when one of the shard servers down?

There is a common case that we will update the clickhouse's config which must restart clickhouse to take effect. And during the restarting, the query services depend on clickhouse's distributed table will return the exception due to disconnecting with the restarting server.
So,as the title says, what I want is the way to make distributed table still work for query when one of the shard server down. Thanks.
I see two ways:
Since this server failure is transient, you can refactor your server-side code by adding retry-policy to your request (for c# I would recommend use Polly)
Use the proxy (load-balancer) to CH (for example chproxy).
UPDATE
When one node is restarting in a cluster the distributed table created over replicated tables should be accessible (of course request shouldn't be sent to restarted node).
Availability of data is achieved by using replication, therefore, you need to create Replicated*-tables over materialized view and then create Distributed-tables over Replicated*-tables.
Please look at the articles CH Data Distribution, Distributed vs Shard vs Replicated..
and as a working example (it is not your case) to CH Circular cluster topology.

Distributed database design style for microservice-oriented architecture

I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?
Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you
Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.
In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.

HBase: how put/get knows which region server to write to?

In HBase, how the put/get operations know which region server the row should be written to?
In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?
I assume your question is simply curiosity, since this behavior is abstracted from the user and you shouldn't care.
In HBase, how the put/get operations know which region server the row should be written to?
From the hbase documentation book:
The HBase client HTable is responsible for finding RegionServers that are serving the particular row range of interest. It does this by querying the .META. and -ROOT- catalog tables (TODO: Explain). After locating the required region(s), the client directly contacts the RegionServer serving that region (i.e., it does not go through the master) and issues the read or write request. This information is cached in the client so that subsequent requests need not go through the lookup process. Should a region be reassigned either by the master load balancer or because a RegionServer has died, the client will requery the catalog tables to determine the new location of the user region.
So first step is looking up in meta and root to determine where it is, then it contacts that regionserver to do that work.
In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?
There are two ways to read from HBase in general: scanners and gets.
If you run multiple gets, those will each individually fetch those records separately. Each one of those is possibly going to a different region server.
The scanner will simply look for the start of the range and then move forward from there. Sometimes it needs to move to a different regionserver when it reaches the end, but the client handles that behind the scenes. If there is some way to design the table such that your multiple gets is a scan and not a series of gets, you should hypothetically have better performance.
Providing the same scenario and explanation from BigTable Paper: "The client library caches tablet locations. If the client
does not know the location of a tablet, or if it discovers
that cached location information is incorrect, then
it recursively moves up the tablet location hierarchy.
If the client's cache is empty, the location algorithm
requires three network round-trips, including one read
from Chubby. If the client's cache is stale, the location
algorithm could take up to six round-trips, because stale
cache entries are only discovered upon misses (assuming
that METADATA tablets do not move very frequently).
Although tablet locations are stored in memory, so no
GFS accesses are required, we further reduce this cost
in the common case by having the client library prefetch
tablet locations: it reads the metadata for more than one
tablet whenever it reads the METADATA table."
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf

One Instance Versus Multiple Instances In Oracle

What are the advantages and disadvantages of having a single instance compared to multiple instances when multiple databases are intended to be created?
You may want to browse the Oracle concept guide, especially if you're more familiar with other DBMS.
A database is a set of files, located on disk, that store data.
These files can exist independently of
a database instance.
An instance is a set of memory structures that manage database files.
The instance consists of a shared
memory area, called the system global
area (SGA), and a set of background
processes. An instance can exist
independently of database files.
A single instance (set of processes) can mount at most one database (set of files). If you need to access multiple databases, you will need multiple instances. More on the difference between instances and databases on askTom.
Ideally, you only want one instance per server (the server may be a logical server -- i.e a virtual server). This will allow Oracle to know exactly what is going on. This implies one database per server.
If your databases are really independent, going with multiple instances/databases would make sense, since you have greater control over DB version, administration, etc.
If however your databases are not really independent (you frequently share data across them, you need some common data accessible to all of them), it may be more efficient (and simpler) to go with a single consolidated database. Each original database would have its own set of schemas. In this case cross-schema referential integrity would be easy, you wouldn't need to duplicate the data that needs to be shared.

Resources