Why?
For educational purposes. I think it would be really nice for my audience to actually "see" it work like that.
Setup
A dockerized Spring boot REST API (serving up customer information)
A dockerized Cassandra cluster consisting of three connected nodes, holding customer data with a replication factor of two.
Suggestions
Showing which IP address or container name served my request
Showing which IP address or container name held the data that was used to show my request.
If I were to run these nodes on three seperate physical machines, maybe which machine held my data?
Something else you have in mind that really shows the distributed capabilities of Cassandra
Can this be achieved in docker logs or something in Spring data Cassandra that I am not aware of?
I don't know about Spring Data, but in normal Java driver you can get execution information from ResultSet via getExecutionInfo, and call function getQueriedHost from it. If you're using default DCAware/TokenAware load balancing policy, then you reach at least one of the nodes that hold your data. The rest of information you can get via Metadata class from which you can get a list of token ranges owned by hosts, generate a token for your partition key, and lookup in the token ranges.
P.S. See Java driver documentation for more details.
Related
I am mapping users to connections as described in the following link https://learn.microsoft.com/en-us/aspnet/signalr/overview/guide-to-the-api/mapping-users-to-connections so I can find which user's to send messages to.
I was wondering if there is any additional work required for this to work smoothly on multi node servers / load balancing. Im not experienced on the infrastructure side but I'm assuming if there are multi servers spun up, there would be multiple static hashmaps storing the mappings of users to connections - i.e., one for each server.
Would this mean users that have made a connection from their browser to node A will not be able to communicate to users who've connected to node B ?
If this is the case, how would we go about making this possible.
In that same link, just below the Introduction section, it discusses 4 different mapping methods:
The User ID Provider (SignalR 2)
In-memory storage, such as a dictionary
SignalR group for each user
Permanent, external storage, such as a database table or Azure table storage
And after that there is a table that show which of these works in different scenarios. One of those scenarios being "More than one server".
Since it is not mentioned, it depends on which mapping method you are following.
From there, you can check out "scaling out" on the same site you noted which has several methods you can follow depending on what suites your needs. This is where sending messages to clients regardless of which server they connect are handled.
We have configured our application servers (two or three), to work as clients with a Hazelcast cluster (one or two members), for session persistence.
At first, we configured it as a two nodes of application servers with the embedded setup of Hazelcast, then we moved on to the client-server scenario.
On the embedded model, the console showed plenty of information related to the cache objects, replicating between nodes and moving from one instance to another when necessary.
On the Client-Server model we see both clients and members registered on the Hazelcast console, and we get basic information (versions, memory consumption, etc). But we cannot see session information (maps) travel and replicate.
We are pretty sure Hazelcast is working, because we have forced some intrincate combinations of client and member shutdowns that ensure that information recovered by next client must come from the surviving member, and data has traveled from cluster member to cluster member before going down to the client.
So, being convinced we are doing something wrong with the configuration, we humbly ask: Did anyone configure this before (sure, because it seems a very common configuration goal), and did you have similar problems? Did you solve them? How?
You need to enable the statistics for caches to monitor them in the Management Center. Use the element or setStatisticsEnabled() method in declarative or programmatic configuration, respectively, to enable the statistics of maps you want to see on Management Center.
We are building a multi tenant application which has restrictions on the regions/countries where the data is persisted.
The application is based on microsoft .Net microservice architecture but we have shared Domains, although we have separate DBs at very lower levels say for each city a separate DB. We cannot persist the data of one country in another country's data center. Hazelcast will be used as the distributed cache. I could not find any direct ways to configure data isolation for ex. like "Memory Regions" in apache ignite. Do we have "Memory Regions" in hazelcast?
I need to write behind the data from cache to respective Database. Can I segregate a part/partition of cache specific to a database instance?
Any help would be greatly appreciated. Thanks in advance.
I am not directly replying to your question. IMHO, from my understanding when you have a data stored across different clusters / nodes, there will still be a network call, despite you having some key formats so that the data is stored within the same Cluster / Node.
Based on my experience, you could easily setup a MemoryCache that comes as part of the System.Runtime.Caching to store the data in every node and then use Redis Pub-Sub or Azure Service bus as the back-bone for the pub-sub.
In that case,
any data that is updated in a cache is notified to all the other instances of the application via a ServiceBus / Redis message which is typically the key.
Upon receipt of the key, each application clears out its internal cache and then gets the data cached back on the next DB access.
This method is more commonly prevalent in Multi-Tenant Applications and also is fail-safe and light weight. The payloads / network transfers are less and each AppDomain has its internal memory used as a cache which does support different regions via different instances of MemoryCache.
Hope this helps if no direct response is available regarding HazelCast
Also, you may refer to this link for some details regarding the Hazelcast
I have a small web and mobile application partly running on a webserver written in PHP (Symfony). I have a few clients using the application, and slowly expanding to more clients.
My back-end architecture looks like this at the moment:
Database is Cloud SQL running on GCP (every client has it's own
database instance)
Files are stored on Cloud Storage (GCP) or S3 (AWS), depending on the client. (every client has it's own bucket)
PHP application is running in a Compute Engine VM (GCP), (every client has it's own VM)
Now the thing is, in the PHP code, the only thing client specific is a settings file with the database credentials and the Storage/S3 keys in it. All the other code is exactly the same for every client. And mostly the different VMs sit idle all day, waiting on a few hours usage per client.
I'm trying to find a way to avoid having to create and maintain a VM for every customer. How could I rearchitect my back-end so I can keep separate Databases and Storage Buckets per client, but only scale up my VM's when capacity is needed?
I'm hearing alot about Docker, was thinking about keeping db credentials and keys in a Redis DB or Cloud Datastore, was looking at Heroku, AppEngine, Elastic Beanstalk, ...
This is my ideal scenario as I see it now
An incoming request is done, hits a load balancer
From the request, determine which client the request is for
Find the correct settings file, or credentials from a DB
Inject the settings file in an unused "container"
Handle the request
Make the container idle again
And somewhere in there, determine based on the the amount of incoming requests or traffic, if I need to spin up or spin down containers to handle the extra or reduced (temporary) load.
All this information overload has me stuck, I have no idea what direction to choose, and I fail seeing how implementing any of the above technologies will actually fix my problem.
There are several ways do it with minimum efforts:
Rewrite loading of config file depending from customer
Make several back-end web sites on one VM (best choice i think)
This is a detail that the Spanner paper glosses over with a single line, and I am hoping someone from Google may be able to shed some light on.
The per-zone location proxies are used by clients to locate the
spanservers assigned to serve their data.
How do clients figure out the IP addresses of the location proxies?
After they retrieve the data, do clients cache this data somewhere or do they talk to the location proxies for every read and write?
If there is a cache on the client, how does the client discover that it needs to be updated?
Spanner is now a Google Cloud Platform service, so you can review the docs and play with the service.
https://cloud.google.com/spanner/docs/
Basically, we auto-route everything to the nearest version of the data that can respond to your request. You just address the instance and we do the routing, so you can't address a replica directly.
Going to answer the first question, since details of location proxies are not public at this point.
Naming resolution at Google is solved with Borg, see section 2.6 of the Borg paper.
How do clients figure out the IP addresses of the location proxies? - as far as I understand client only knows about each DC address, where client is connected
After they retrieve the data, do clients cache this data somewhere or do they talk to the location proxies for every read and write? - actually couldn't find any info about caching. I think caching not supported because Spanner is going to store hundreds of petabytes
If there is a cache on the client, how does the client discover that it needs to be updated? - client doesn't have local cache