How is Apache Zookeeper used in sharding? - sharding

We are thinking of centralizing cfg information and looks like zookeeper is a good choice. We are also interested in sharding and have a scheme. In the poweredBy[1] saw that Rackspace and Yahoo is using Zookeeper for sharding. Would appreciate pointers and details.
[1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy

Solr is going to use Zookeeper for sharding. ZooKeeper Integeration design doc might be interesting for you.

I can think of two things that they could be referencing.
They could be referencing using the built in ensemble features. Using those you can actually setup a group management protocol for your service. As you add more servers to the ensemble you effectively shard your pool out to greater numbers. The data between the members of the ensemble is sync'd between the member servers. This is especially useful for applications that shard out the same data set to multiple read pools - such as index servers, search servers, read cache's, etc.
They could be using ZooKeeper for configuration management. Let's now assume that your application may have thousands of clients that all need to update their config files at the same time. Let's say that your application now accesses a data storage layer of 50 servers - but that pool needs to be sharded out to 200. You can setup a slaving relationship to perform the 1 to 4 slave relationship. ZooKeeper could then be used to update that config file and in essence change every config file within a second of each other.

You should also take a look at how HBase uses Zookeeper; specifically to maintain information about regions. This would be analogous to using ZK to maintain DB sharding info.

For managing the lookup table .
Since this lookup table have to be strong consistent, this is where zookeeper comes into picture.

Related

what are the best approaches (practices) to create stateful microservices?

I need to create a food ordering service, using microservices, scalable , cluster, several steps to order. Need to store user data between steps / requests.
What is an approach to keep state and user data? Store it in DB? Cache? Shared memory?
Are there any tutorials for the best practice of it?
(I gonna use spring / springboot and modules)
Anything that you cannot afford to lose (usually the business data) will go in DB and can be parallelly cached in an in-memory DB like Redis that has a cache eviction algorithm inbuilt.
Anything that, if lost, is not a big deal (usually the technical things that are not directly linked with the business data) can go only in an in-memory DB.
Since you are using Spring, you could probably use something like Redis with Spring Data Redis. There are already known Spring solutions (such as this) to fall back on api calls to fetch data from DB if the Redis server goes down. You can also run multiple Redis instances behind Redis Sentinel to provide failover. Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes. Also, you can configure Redis to persist the data in file system once daily or so to backup the cache data for disaster recovery.
If you are looking for a fully managed service, AWS provides "Step Functions" to satisfy your stateful requirements: https://stackoverflow.com/questions/tagged/aws-step-functions

Hazelcast data isolation ("Memory Regions")

We are building a multi tenant application which has restrictions on the regions/countries where the data is persisted.
The application is based on microsoft .Net microservice architecture but we have shared Domains, although we have separate DBs at very lower levels say for each city a separate DB. We cannot persist the data of one country in another country's data center. Hazelcast will be used as the distributed cache. I could not find any direct ways to configure data isolation for ex. like "Memory Regions" in apache ignite. Do we have "Memory Regions" in hazelcast?
I need to write behind the data from cache to respective Database. Can I segregate a part/partition of cache specific to a database instance?
Any help would be greatly appreciated. Thanks in advance.
I am not directly replying to your question. IMHO, from my understanding when you have a data stored across different clusters / nodes, there will still be a network call, despite you having some key formats so that the data is stored within the same Cluster / Node.
Based on my experience, you could easily setup a MemoryCache that comes as part of the System.Runtime.Caching to store the data in every node and then use Redis Pub-Sub or Azure Service bus as the back-bone for the pub-sub.
In that case,
any data that is updated in a cache is notified to all the other instances of the application via a ServiceBus / Redis message which is typically the key.
Upon receipt of the key, each application clears out its internal cache and then gets the data cached back on the next DB access.
This method is more commonly prevalent in Multi-Tenant Applications and also is fail-safe and light weight. The payloads / network transfers are less and each AppDomain has its internal memory used as a cache which does support different regions via different instances of MemoryCache.
Hope this helps if no direct response is available regarding HazelCast
Also, you may refer to this link for some details regarding the Hazelcast

How does the Titan (not backend storage) clustering work?

Context
I'm using Titan v1.0.0, on AWS infrastructure and want to support failover/fault tolerance. AWS will take care of DynamoDb storage backend but it seems necessary to have several titan instances serviced by an (ELB) load balancer.
I'm using a nodeJs library to get to gremlin, and gremlin to access Titan.
Question
So, how does the Titan (not backend storage) clustering work? If at all.
To be clear, I'm not talking about any backend storage clustering, as I'm using dynamoDb on AWS. The documentation on transaction locking suggests to me that a titan cluster must exist, as other titan nodes wouldn't know about the locking without some sort of inter-communication. But I don't see any configuration options that support this.
If clustering is possible on titan, does anyone have any information on how to get this setup in a production High-Availability setup?
An illustration of the server-side architecture:
[NodeJsA]\ /[TitanA]\
\ / \
[ELB (AWS)] [DynamoDb (AWS)]
/ \ /
[NodeJsB]/ \[TitanB]/
Further, if there is no clustering of the titan nodes then a change made via the TitanA node (above) could take the following amount of time to be seem on TitanB node (worst case):
(AWS eventual consistency convergence time (~1 sec) + TitanB cache timeout + poll time from NodeJs nodes to titanDB)
Another consequence of the lack of clustering would be that sessions would have be to pinned in the ELB else a read request, after an update, could be served by a different node with stale information.
Titan doesn't do any "clustering" outside of what is supported by the selected backend. You referred to "locking" as something that would indicate that clustering is supported, but if you read about the locking providers in that link you supplied you'll see that locking isn't really doing anything terribly fancy at a Titan level and that it is backend dependent. So Titan instances really don't have any external clustering capabilities or knowledge of each other. You therefore need to take that into account with respect to your architecture.

How to achieve high avaliability?

I'm about to build a new system and I want maximum availability! I'll have to use Windows!
I will have clients talking to my system using webservices. I'll also get data from surrounding systems. This data is delivered using messaging, MQ-series and MSMQ.
The system will produce some data that is sent back to the surrounding systems using queues.
After new data has come to the system different processes will use this data to do diffrent tasks, like printing, writing to databases etc.
To achieve high availablity I'm planning to have two versions of the system running in parallel on two different machines. The clients will try to use the first server thats responds correctly.
I think an ideal soultion would be that the incomming data from anyone of the two servers is placed in a COMMON queue(on a third machine?). Data in the queue can be picked up by processes on both servers(think producer-consumer pattern).
I think that maybe NServiceBus will suits my needs. I have a few questions according to the above.
Can a queue be shared between two servers? I dont want data to be stuck on a server if its gets down. I that case I want the other server to keep processing.
Can two(or more) "consumers"/processes on different machines pick data from a common queue?
Any advice is welcome!
The purpose of NSB distributor is not to address availability issues but to address scale issues, distributors help scaling out systems at a low cost.
By looking at the description, your system consist of WebService endpoits, multiple databases and queuing infrastructure. If you want to achieve complete high-availability you will have to make sure there are no single points of failures. In order to do that you will need,
A load balanced web farm for web service endpoints (2 or more servers)
Application cluster for queues and applications that relies on those queues.
Highly available database server, again clustered.
On top of everything a good SAN.
But if you are referring to being available to consumers, you just have to make sure target queues and webservice endpoints are available. And making sure the overall architecture promotes deferred execution.
Two or more applications can read a MSMQ queue remotely but thats something you don't want to do since it's based on DTC. And that's a real performance killer.
Some references
[http://blogs.msdn.com/b/clustering/archive/2012/05/01/10299698.aspx][1]
[http://msdn.microsoft.com/en-us/library/ms190202.aspx][2]
In short you will want to use the distributor... http://support.nservicebus.com/customer/portal/articles/859556-load-balancing-with-the-distributor
The key thing here is that the distributor node is a single point of failure so you want to run it on a cluster.

hazelcast vs ehcache

Question is clear as you see in the title, it would be appreciated to hear your ideas about adv./disadv. differences between them.
UPDATE:
I have decided to use Hazelcast because of the advantages like distributed caching/locking mechanism as well as the extremely easy configuration while adapting it to your application.
We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast. When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.
Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.
Our caching layer is on top of hazelcast for more than a year, we are quite pleased with it.
Even though Ehcache has been popular among Java systems, I find it less flexible than other caching solutions. I played around with Hazelcast and yes it did the job, it was easy to get running etc and it is newer than Ehcache. I can say that Ehcache has much more features than Hazelcast, is more mature, and has big support behind it.
There are several other good cache solutions as well, with all different properties and solutions such as good old Memcache, Membase (now CouchBase), Redis, AppFabric, even several NoSQL solutions which provides key value stores with or without persistence. They all have different characteristics in the sense they implement CAP theorem, or BASE theorem along with transactions.
You should care more about, which one have the functionality you want in your application, again, you should consider CAP theorem or BASE theorem for your application.
This test was done very recently with Cassandra on the cloud by Netflix. They reached to million writes per second with about 300 instances. Cassandra is not a memory cache but you data model is like a cache, which is consist of key value pairs. You can as well use Cassandra as a distributed memory cache.
Hazelcast has been a nightmare to scale and stability is still a major issue.
The dedicated client to grid component choices are
The messy version that cant survive node loss anywhere, negating the point of backups (superclient), or
An incredibly slow native client option that does not allow for any type of load balancing to processing nodes in the grid.
If any host could request records from this data grid it would be a sweet design, but you are stuck with those two lackluster option to get anything out of it.
Also multiple issues with database thread pools locking up on individual members and not writing anything to the databases, causing permanent records loss is a frequent issue and we often have to take the whole thing down for hours to refresh any of the JVM's. Split brain is also still an issue, although in 1.9.6 it seems to have calmed down a little.
Rallying to move to Ehcache and improving the database layer instead of using this as a band-aid.
Hazelcast serializes everything whenever there is a node (standard-one), so the data you will save to Hazelcast must implement serialization.
http://open.bekk.no/efficient-java-serialization/
Hazelcast has been a nightmare for me. I was able to get it "working" in a clustered Websphere environment. I use the term "working" loosely. First, all of Hazelcast's documentation is out of date and only shows examples using deprecated method calls. Trying to use the new code without comments in the Javadocs and no examples in the documentation is very hard. Also, the J2EE container code simply does not work at this point because it does not support XA transactions in Websphere. An error is thrown calling code that follows their only J2EE example explicitly(it does look like Milestone 3.0 is addressing this). I had to forget about joining Hazelcast to a J2EE transaction. It does seem Hazelcast is definitely geared to a non EJB/Non-J2EE container environment. Making calls to Hazelcast.getAllInstances() fails to retain any information about Hazelcast's state when switching from one enterprise java bean to another. That forces me to create a new Hazelcast instance just to run calls that give me access to my data. That causes many Hazelcast Instances to start up on the same JVM. Also,retrieving data from Hazelcast is not fast. I tried retrieving data using both the Native Client and directly as a member of the cluster. I stored 51 lists, each containing only 625 objects in Hazelcast. I could not perform a query directly on a list and did not want to store a map just to get access to that feature (SQL operations can be performed on a map). It took about a half second to retrieve each list of 625 objects because Hazelcast Serializes the entire list and sends it over the wire rather than just giving me the delta (what has changed). Another thing, I had to switch to a TCPIP configuration and explicitly list the ip addresses of the servers I wanted to be in the cluster. The default Multicast configuration did not work and from the group discussions in google, other people are experiencing that difficulty as well. To sum up; I did eventually get 8 machines communicating in a cluster through many hours of torturous programmatic configuration and trial and error (the documentation will be little help) but when I did, I still had no control over the number of instances and partitions being created on each JVM due to the half finished nature of Hazelcast for EJB/J2EE and it was VERY SLOW. I implemented a real use case in the unemployment insurance application I work on and the code was much faster making direct calls to the database. It would have been cool if Hazelcast worked as advertised because I really did not want to use a separate service to implement what I am trying to do. I have used MongoDB extensively so I may skip the whole in memory cache and just serialize my objects as documents in a separate repository.
One advantage of Ehcache is that it is backed by a company (Terracotta) that does extensive performance, failover, and platform testing in a large performance lab. Terracotta provides support, indemnity, etc. For many companies, that sort of thing is important.
I have not used Hazelcast but I've heard that it is easy to use and that it works. I haven't heard anything with respect to scalability or performance of Hazelcast vs Terracotta/Ehcache but given the amount of scalability and failover testing that Terracotta does, it's hard for me to imagine that Hazelcast would be competitive in a production deployment. But I presume it would work fine for smaller uses.
[Bias: I'm a former employee of Terracotta.]
Developers describe Ehcache as "Java's Most Widely-Used Cache". Ehcache is an open-source, standards-based cache for boosting performance, offloading your database, and simplifying scalability. It's the most widely-used Java-based cache because it's robust, proven, and full-featured. Ehcache scales from in-process, with one or more nodes, all the way to mixed in-process/out-of-process configurations with terabyte-sized caches. On the other hand, Hazelcast is detailed as "Clustering and highly scalable data distribution platform for Java". With its various distributed data structures, distributed caching capabilities, elastic nature, memcache support, integration with Spring and Hibernate and more importantly with so many happy users, Hazelcast is feature-rich, enterprise-ready and developer-friendly in-memory data grid solution.
Ehcache and Hazelcast are primarily classified as "Cache" and "In-Memory Databases" tools respectively.

Resources