I have 10 amazon ec2 node cluster used for every day data processing and i want to use all the 10 nodes for each day batch process (2 hours process only) and once the reporting data points got generated then i want to shutdown 5 nodes and make only 5 nodes active rest of the day for cost optimization.
I have a replication factor of 3.
In some scenarios all the 3 data blocks(actual & replication blocks) got stored in those 5 nodes which i am shutting down. Because of which i am not able to read the data properly.
Can i make some settings in cloudera manager to persist specific Database or Specific tables into given nodes, so that i will not have any issue in reading the data with only 5 nodes active.
Or any other suggestions will be appreciated.
You can use rack awareness (virtually) to separate your cluster into 2 "racks", and place your 5 nodes that you shut down regularly on a separate "rack". Replication policy will require that the NN place the replicas on separate racks, if configured. Again, I'm referring to racks in the virtual sense here. That should get you what you want.
Related
We have a 13 nodes nifi cluster with around 50k processors. The size of the flow.xml.gz is around 300MB. To bring up the 13 nodes Nifi cluster, it usually takes 8-10 hours. Recently we split the cluster into two parts, 5nodes cluster and 8 nodes cluster with the same 300MB flow.xml.gz in both. Since then we are not able to get the Nifi up in both the clusters. Also we are not seeing any valid logs related to this issue. Is it okay to have the same flow.xml.gz . What are the best practices we could be missing here when splitting the Nifi Cluster.
You ask a number of questions that all boil down to "How to improve performance of our NiFi cluster with a very large flow.xml.gz".
Without a lot more details on your cluster and the flows in it, I can't give a definite or guaranteed-to-work answer, but I can point out some of the steps.
Splitting the cluster is no good without splitting the flow.
Yes, you will reduce cluster communications overhead somewhat, but you probably have a number of input processors that are set to "Primary Node only". If you load the same flow.xml.gz on two clusters, both will have a primary node executing these, leading to contention issues.
More importantly, since every node still loads all of the flow.xml.gz (probably 4 Gb unzipped), you don't have any other performance benefits and verifying the 50k processors in the flow at startup still takes ages.
How to split the cluster
Splitting the cluster in the way you did probably left references to nodes that are now in the other cluster, for example in the local state directory. For NiFi clustering, that may cause problems electing a new cluster coordinator and primary node, because a quorum can't be reached.
It would be cleaner to disconnect, offload and delete those nodes first from the cluster GUI so that these references are deleted. Those nodes can then be configured as a fresh cluster with an empty flow. Even if you use the old flow again later, test it out with an empty flow to make it a lot quicker.
Since you already split the cluster, I would try to start one node of the 8 member cluster and see if you can access the cluster menu to delete the split-off nodes (disconnecting and offloading probably doesn't work anymore). Then for the other 7 members of the cluster, delete the flow.xml.gz and start them. They should copy over the flow from the running node. You should adjust the number of candidates expected in nifi.properties (nifi.cluster.flow.election.max.candidates) so that is not larger than the number of nodes to slightly speed up this process.
If successful, you then have the 300 MB flow running on the 8 member cluster and an empty flow on the new 5 member cluster.
Connect the new cluster to your development pipeline (NiFi registry, templates or otherwise). Then you can stop process groups on the 8 member cluster, import them on the new and after verifying that the flows are running on the new cluster, delete the process group from the old, slowly shrinking it.
If you have no pipeline or it's too much work to recreate all the controllers and parameter contexts, you could take a copy of the flow.xml.gz to one new node, start only that node and delete all the stuff you don't need. Only after that should you start the others (with their empty flow.xml.gz) again.
For more expert advice, you should also try the Apache NiFi Users email list. If you supply enough relevant details in your question, someone there may know what is going wrong with your cluster.
I read documentation, but unfortunately I still don't understand one thing. While creating AWS Elasticsearch domain, I need to choose "Number of nodes" in "Data nodes" section.
If i specify 3 data nodes and 3-AZ, what it actually means?
I have suggestions:
I'll get 3 nodes with their own storages (EBS). One of node is master and other 2 are replicas in different AZ. Just copy of master, not to lose data if master node become broken.
I'll get 3 nodes with their own storages (EBS). All of them will work independent and on their storadges are different data. So at the same time data can be processed by different nodes and store on different storages.
It looks like in other AZ's should be replicas. but then I don't understand why I have different values of free space on different nodes
Please, explain how it works.
Many thanks for any info or links.
I haven't used AWS Elasticsearch, but I've used the Cloud Elasticsearch service.
When you use 3 AZ (availability zones), means that your cluster will use 3 zones in order to make it resilient. If one zone has problems, then the nodes in that zone will have problems as well.
As the description section mentions, you need to specify multiples of 3 if you choose 3 AZ. If you have 3 nodes, then every AZ will have one zone. If one zone has problems, then that node is out, the two remaining will have to pick up from there.
Now in order to answer your question. What do you get with these configurations. You can check so yourself. Use this via kibana or any HTTP client
GET _nodes
Check for the sections:
nodes.roles
nodes.attributes
In the various documentations, blog posts etc you will see that for production usage, 3 nodes and 3 AZ is a good starting point in order to have a resilient production cluster.
So let's take it step by step:
You need an even number of master nodes in order to avoid the split brain problem.
You need more than one node in your cluster in order to make it resilient (if the node is unavailable).
By combining these two you have the minimum requirement of 3 nodes (no mention of zones yet).
But having one master and two data nodes, will not cut it. You need to have 3 master-eligible nodes. So if you have one node that is out, the other two can still form a quorum and vote a new master, so your cluster will be operational with two nodes. But in order for this to work, you need to set your primary shards and replica shards in a way that any two of your nodes can hold your entire data.
Examples (for simplicity we have only one index):
1 primary, 2 replicas. Every node holds one shard which is 100% of the data
3 primaries, 1 replica. Every node will hold one primary and one replica (33% primary, 33% replica). Two nodes combined (which is the minimum to form a quorum as well) will hold all your data (and some more)
You can have more combinations but you get the idea.
As you can see, the shard configuration needs to go along with your number and type of nodes (master-eligible, data only etc).
Now, if you add the availability zones, you take care of the problem of one zone being problematic. If your cluster was as a whole in one zone (3 nodes in one node), then if that zone was problematic then your whole cluster is out.
If you set up one master node and two data nodes (which are not master eligible), having 3 AZ (or 3 nodes even) doesn't do much for resiliency, since if the master goes out, your cluster cannot elect a new one and it will be out until a master node is up again. Now for the same setup if a data node goes out, then if you have your shards configured in a way that there is redundancy (meaning that the two nodes remaining have all the data if combined), then it will work fine.
Your answers should be covered in following three points.
If i specify 3 data nodes and 3-AZ, what it actually means?
This means that your data and replica's will be available in 3AZs with none of the replica in same AZ as the data node. Check this link. For example, When you say you want 2 data nodes in 2 AZ. DN1 will be saved in (let's say) AZ1 and it's replica will be stored in AZ2. DN2 will be in AZ2 and it's replica will be in AZ1.
It looks like in other AZ's should be replicas. but then I don't understand why I have different values of free space on different nodes
It is because when you give your AWS Elasticsearch some amount of storage, the cluster divides the specified storage space in all data nodes. If you specify 100G of storage on the cluster with 2 data nodes, it divides the storage space equally on all data nodes i.e. two data nodes with 50G of available storage space on each.
Sometime you will see more nodes than you specified on the cluster. It took me a while to understand this behaviour. The reason behind this is when you update these configs on AWS ES, it takes some time to stabilize the cluster. So if you see more data or master nodes as expected hold on for a while and wait for it to stabilize.
Thanks everyone for help. To understand how much space available/allocated, run next queries:
GET /_cat/allocation?v
GET /_cat/indices?v
GET /_cat/shards?v
So, if i create 3 nodes, than I create 3 different nodes with separated storages, they are not replicas. Some data is stored in one node, some data in another.
I created a cluster from 6 nodes.
3 nodes in Eu west1 and 3 nodes in EU west2
I set the locality for every group of nodes like : --locality=region=europe,datacenter=west1
I also set the replica to 6 to have all ranges and all data on every node.
What will happen if the connection between data centers is lost the whole cluster goes down ?
I tried to kill 3 nodes in one of the datacenters and cluster is not operational because the majority of the nodes are down and quorum is less that 4.
Is it possible to make the 2 datacentes to work with their local quorum 2/3
I also played a bit with replications settings and sometimes cluster is healthy if I kill 3 nodes from 6 and was I was able to write to the cluster. Sometimes I can only read from the cluster. Cluster is working with replica of 5 and 3 nodes killed from 6. Still paying with this but if someone can give me more information will be very helpful.
To be able to replicate across datacentes is very cool feature but if I lost the whole cluster when one of the datacenters is down ruin the whole good idea at least for me.
CockroachDB requires a majority of replicas to be fully operational, which means > half, not >= half. In order to survive the loss of a full datacenter or region, you must have three DCs/regions, not two. Try running two nodes in each of three regions instead of three nodes in two regions.
Is it possible to make the 2 datacenters to work with their local quorum 2/3
Not for a single table (because it would be impossible to guarantee consistency if each datacenter were able to act in isolation from the other). You've configured the data to be replicated across all six replicas, which means four replicas are required to make a quorum. If you want each datacenter to be able to operate independently of the other, you would need two separate tables, with each one configured to be located within one of the datacenters.
Thanks for the answer just to clear few thing. But looks like you got my point and what I want to accomplish.
But as far as I understand if I have 2x3 node in 2 different DC's if one DC goes down. I have 3 live nodes for the quorum I need at least 4 . N/2 +1.
So if I have 3x3 I can lost one DC because if I have 2 DC's live I will have a quorum .
And one last question if I don't set replication to 9 if I loose 3 nodes some in one DC some ranges will be not available right ?
I planning to launch three EC2 instance across Amazon hosting region. For say, Region-A,Region-B and Region-C.
Based on the above plan, Each region act as Cluster(Or Datacenter) and have one node.(Correct me if I am wrong).
Using this infrastructure, Can I attain below configuration?
Replication Factor : 2
Write and Read Level:QUORUM.
My basic intention to do these are to achieve "If two region are went down, I can be survive with remaining one region".
Please help me with your inputs.
Note: I am very new to cassandra, hence whatever your inputs you are given will be useful for me.
Thanks
If you have a replication factor of 2 and use CL of Quorum, you will not tolerate failure i.e. if a node goes down, and you only get 1 ack - thats not a majority of responses.
If you deploy across multiple regions, each region is, as you mention, a DC in your Cluster. Each individual DC is a complete replica of all your data i.e. it will hold all the data for your keyspace. If you read/write at a LOCAL_* consistency (eg. LOCAL_ONE, LOCAL_QUORUM) level within each region, then you can tolerate the loss of the other regions.
The number of replicas in each DC/Region and the consistency level you are using to read/write in that DC will determine how much failure you can tolerate. If you are using QUORUM - this is a cross-DC consistency level. It will require a majority of acks from ALL replicas in your cluster in all DCs. If you loose 2 regions then its unlikely that you will be getting a quorum of responses.
Also, its worth remembering that Cassandra can be made aware of the AZ's it is deployed on in the Region and can do its best to ensure replicas of your data are placed in multiple AZs. This will give you even better tolerance to failure.
If this was me and I didnt need to have a strong cross-DC consistency level (like QUORUM). I would have 4 nodes in each region, deployed across each AZ and then a replication factor of 3 in each region. I would then be reading/writing at LOCAL_QUORUM or LOCAL_ONE (preferably). If you go with LOCAL_ONE than you could have fewer replicas in each DC e.g a replication factor of 2 with LOCAL_ONE means you could tolerate the loss of 1 replica.
However, this would be more expensive than what your initially suggesting but (for me) that would be the minimum setup I would need if I wanted to be in multiple regions and tolerate the loss of 2. You could go with 3 nodes in each region if you wanted to really save costs.
We have cassandra cluster of 6 nodes on EC2,we have to double its capacity to 12 nodes.
So to add 6 more nodes i followed the following steps.
1 Calculated the tokens for 12 nodes and configured the new nodes accordingly.
2 With proper configuration started the new nodes so that they new nodes will bisect the
existing token ranges.
In the beginning all the new nodes were showing the streaming in
progress.
In ring status all the node were in "Joining" state
After 12 hours 2 nodes completed the streaming and came into the normal state.
But on the remaining 4 nodes after streaming some amount of data they are not showing any progress , look like they are stuck
We have installed Cassandra-0.8.2 and have around 500 GB of data on each existing nodes and storing data on EBS volume.
How can i resolve this issue and get the balanced cluster of 12 nodes?
Can i restart the nodes?
If i cleaned the data directory of stuck Cassandra nodes and restarted with fresh installation, will it cause any data loss?
There will not be any data loss if you replication factor 2 or greater.
Version 0.8.2 of Cassandra has several known issues - please upgrade to 0.8.8 on all original nodes as well as the new the nodes that came up and then start the procedure over for the nodes that did not complete.
Also, be aware that storing data on EBS volumes is a bad idea :
http://www.mail-archive.com/user#cassandra.apache.org/msg11022.html
While this won't answer your question directly, hopefully it points you in the right direction:
There is a fairly active #cassandra IRC channel on freenode.org.
So here is the answer why our some of the nodes were stuck.
1) We have upgraded from cassandra-0.7.2 to cassandra0.8.2
2) And we are loading the sstables with sstable-loader utility
3) But some data for some of the column families are directly inserted from hadoop job.
And the data of these column families are showing some other version as we have not upgraded the cassandra api in hadoop.
4) Because of this version mismatch cassandra throws 'version mismatch exception' and terminate the streaming
5) So the solution for this is to use "nodetool scrub keyspace columnfamily". I have used this and my issue is resolved
So the main thing here is if you are upgrading the cassandra cluster capacity u must do the nodetool scrub