Consul Data Center: Leader node not automatically selected after failure of previous leader node - microservices

I'm new to Consul and I have created a one Data Center with 2 server nodes.
I followed the steps provided in this documentation,
https://learn.hashicorp.com/tutorials/consul/deployment-guide?in=consul/datacenter-deploy
The nodes are successfully created, and they both are in sync when I launch a service. Every thing is working fine till this step.
However I face an issue in case where the leader node fails (goes offline). In that case, the follower node DOES NOT automatically assume the role of leader node and Consul as whole becomes inaccessible for the service. Even the follower node stops responding to requests even though it is still running.
Can anyone help me understand what exactly is wrong with my setup and how can I keep my setup still working with the follower node automatically becoming leader node and respond to queries from API Gateway?
The below documentation gives some pointer and talks about fulfilling a 'Quorum' for automatic selection of a leader. I'm not sure if it is applicable in this case of mine?
https://learn.hashicorp.com/tutorials/consul/recovery-outage-primary?in=consul/datacenter-operations#outage-event-in-the-primary-datacenter
Edit:
consul.hcl
First Server:
datacenter = "dc1"
data_dir = "D:/Hashicorp/Consul/data"
encrypt = "<key>"
ca_file = "D:/Hashicorp/Consul/certs/consul-agent-ca.pem"
cert_file = "D:/Hashicorp/Consul/certs/dc1-server-consul-0.pem"
key_file = "D:/Hashicorp/Consul/certs/dc1-server-consul-0-key.pem"
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
retry_join = ["<ip1>", "<ip2>"]
Second Server:
datacenter = "dc1"
data_dir = "D:/Hashicorp/Consul/data"
encrypt = "<key>"
ca_file = "D:/Hashicorp/Consul/certs/consul-agent-ca.pem"
cert_file = "D:/Hashicorp/Consul/certs/dc1-server-consul-1.pem"
key_file = "D:/Hashicorp/Consul/certs/dc1-server-consul-1-key.pem"
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
retry_join = ["<ip1>", "<ip2>"]
server.hcl:
First Server:
server = true
bootstrap_expect = 2
client_addr = "<ip1>"
ui = true
Second Server:
server = true
bootstrap_expect = 2
client_addr = "<ip2>"
ui = true

The size of the cluster and the ability to form a quorum is absolutely applicable in this case. You will need a minimum of 3 nodes in the cluster in order to tolerate a failure of one node without sacrificing the availability of the cluster.
I recommend reading Consul's Raft Protocol Overview as well as reviewing the deployment table at the bottom of the page to help understand the failure tolerance provided by using various cluster sizes.

Related

Hazelcast persisting and loading data on all nodes

I have a 2 node setup distributed cache setup which needs persistence setup for both members.
I have MapSore and Maploader implemented and the same code is deployed on both nodes.
The MapStore and MapLoader work absolutely ok on a single member setup, but after another member joins, MapStore and Maploader continue to work on the first member and all insert or updates by the second member are persisted to disk via the first member.
My requirement is that each member should be able to persist to disk independently so that distributed cache is backed up on all members and not just the first member.
Is there a setting I can change to achieve this.
Here is my Hazlecast Spring Configuration.
#Bean
public HazelcastInstance hazelcastInstance(H2MapStorage h2mapStore) throws IOException{
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setImplementation(h2mapStore);
mapStoreConfig.setWriteDelaySeconds(0);
YamlConfigBuilder configBuilder=null;
if(new File(hazelcastConfiglocation).exists()) {
configBuilder = new YamlConfigBuilder(hazelcastConfiglocation);
}else {
configBuilder = new YamlConfigBuilder();
}
Config config = configBuilder.build();
config.setProperty("hazelcast.jmx", "true");
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.setMapStoreConfig(mapStoreConfig);
return Hazelcast.newHazelcastInstance(config);
}
Here is my hazlecast yml config - This is placed in /opt/hazlecast.yml which is picked up by my spring config up above.
hazelcast:
group:
name: tsystems
management-center:
enabled: false
url: http://localhost:8080/hazelcast-mancenter
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
- 0
join:
multicast:
enabled: false
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: true
member-list:
- 192.168.1.13
Entire code is available here :
[https://bitbucket.org/samrat_roy/hazelcasttest/src/master/][1]
This might just be bad luck and low data volumes, rather than an actual error.
On each node, try the running the localKeySet() method and printing the results.
This will tell you which keys are on which node in the cluster. The node that owns key "X" will invoke the map store for that key, even if the update was initiated by another node.
If you have low data volumes, it may not be a 50/50 data split. At an extreme, 2 data records in a 2-node cluster could have both data records on the same node.
If you have a 1,000 data records, it's pretty unlikely that they'll all be on the same node.
So the other thing to try is add more data and update all data, to see if both nodes participate.
Ok after struggling a lot I noticed a teeny tiny buy critical detail.
Datastore needs to be a centralized system that is accessible from all Hazelcast members. Persistence to a local file system is not supported.
This is absolutely in line with what I was observing
[https://docs.hazelcast.org/docs/latest/manual/html-single/#loading-and-storing-persistent-data]
However not be discouraged, I found out that I could use event listeners to do the same thing I needed to do.
#Component
public class HazelCastEntryListner
implements EntryAddedListener<String,Object>, EntryUpdatedListener<String,Object>, EntryRemovedListener<String,Object>,
EntryEvictedListener<String,Object>, EntryLoadedListener<String,Object>, MapEvictedListener, MapClearedListener {
#Autowired
#Lazy
private RequestDao requestDao;
I created this class and hooked it into the config as so
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.addEntryListenerConfig(new EntryListenerConfig(entryListner, false, true));
return Hazelcast.newHazelcastInstance(config);
This worked flawlessly, I am able to replicate data over to both the embedded databases on each node.
My use case was to cover HA failover edge-cases. During HA failover, The slave node needed to know the working memory of the active node.
I am not using hazelcast as a cache, rather I am using as a data syncing mechanism.

dht nodes in Libtorrent

How can I increase the number of dht nodes?
Current number is about 240:
app['lt_session'].status().dht_nodes)
While uTorrent says it has about 500 ones.
Here is the settings:
async def lt_session(app):
ses = app['lt_session'] = lt.session({
'active_downloads': 50,
})
ses.listen_on(6881, 6881)
# ses.set_max_connections(3)
ses.add_extension('ut_metadata')
ses.add_extension('smart_ban')
ses.add_extension('ut_pex')
ses.add_extension('metadata_transfer')
ses.add_dht_router("router.utorrent.com", 6881)
ses.add_dht_router("router.bittorrent.com", 6881)
ses.add_dht_router("dht.transmissionbt.com", 6881)
ses.add_dht_router("dht.aelitis.com", 6881)
ses.add_dht_router("router.bitcomet.com", 6881)
ses.start_dht()
ses.start_lsd()
app['torrents'] = {}
If you have an "extended routing table", and a depth of 19 (which I believe is about how large the network is), you end up having 128+64+32+16+(15*8) = 360 nodes in the regular routing table buckets (once it fills up). In
addition to this, each level has 8 replacement buckets, so + (19*8).
libtorrent's node_count only count the nodes in the main routing table buckets, the ones that are used. If you want to count the replacement nodes too, add dht_node_cache.
If you want to make sure your routing table fits as many nodes as possible, and is the least restrictive of which nodes are allowed to be added:
Make sure you set dht_extended_routing_table to true.
If you want to remove restrictions about which nodes are let into the routing table you can set:
dht_restrict_routing_ips to false
dht_restrict_search_ips to false
Also, if you have multiple external IPs, you may want to make sure to enable listening to all of them, as libtorrent will run a DHT node for each. For example if you have bot hIPv4 and IPv6.

Ganglia seeing nodes but not metrics

I have a hadoop cluster with 7 nodes, 1 master and 6 core nodes. Ganglia is setup on each machine, and the web front end correctly shows 7 hosts.
But it only shows metrics from the master node (with both gmetad and gmond). The other nodes have the same gmond.conf file as the master node, and the web front end clearly sees the nodes. I don't understand how ganglia can recognize 7 hosts but only show metrics from the box with gmetad.
Any help would be appreciated. Is there a quick way to see if those nodes are even sending data? Or is this a networking issue?
update#1: when I telnet into a gmond host machine that is not the master node, and look at port 8649, I see the XML but no data. When I telnet to 8649 on the master machine, I see XML and data. Any suggestions of where to go from here?
Set this to all gmond.conf files of every node you want to monitor:
send_metadata_interval = 15 // or something.
Now all the nodes and their metrics are showed in master (gmetad).
This extra configuration is necessary if you are running in a unicast mode, i.e., if you are specifying a host in udp_send_channel rather than mcast_join. In the multi-cast mode, the gmond deamons can query each other any time and proactive sending of monitoring data is not required.
In gmond configuration, ensure the following is all provided:-
cluster {
name = "my cluster" #is this the same name as given in gmetad conf?
## Cluster name
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
udp_send_channel {
#mcast_join = 239.2.11.71 ## Comment this
host = 192.168.1.10 ## IP address/hostname of gmetad node
port = 8649
ttl = 1
}
/* comment out this block itself
udp_recv_channel {
...
}
*/
tcp_accept_channel {
port = 8649
}
save and quit. Restart your gmond daemon. Then execute "netcat 8649". Are you able to see XML with metrics now?

Multiple node cassandra cluster is really slow

I had a single node cassandra cluster on EC2. I was running my tests on it and it worked great.
But then, I had to move this cluster to a VPC, so rather than moving the data, I created a new cluster with two nodes (both seeds), and imported the data from the former cluster using sstableloader.
I thought it was really slow, so decided to add two more instances (not seeds). It's even slower.
I use a ONE consistency, and my replication factor is 1, so I don't quite see why it is so slow.
To give you an idea, I can only do 3 read per second.
We use the EC2Snitch but not the AMI recommended by Cassandra though (we didn't see that part in the documentation when we installed it).
I didn't run a cleanup yet on the two first nodes after adding the two new nodes.
When I request all elements of a column family which contains only a dozen of rows, it times out. If I request one element, I get the result after a long time, and with a huge tracing session (~30000 lines...)!
Does anyone know what I can do to make it faster? I don't quite know where to look at right now.
My Cassandra version is Cassandra 2.1.3.
Here is my keyspace schema:
CREATE KEYSPACE keyspace_name WITH replication = {'class': 'NetworkTopologyStrategy', 'us-west-2': '1'} AND durable_writes = true;
And the options for our column family
CREATE TABLE keyspace_name."CFName" (
// ...
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
I had to run a compaction on my nodes because I had too many tombstones.
Many thanks to the amazing IRC channel on freenode #cassandra.

Connecting to elasticsearch cluster in NEST

Let's assume I have several elasticsearch machines in a cluster: 192.168.1.1, 192.168.1.2 and 192.168.1.3
Any of the machines can go down. It doesn't look like NEST supports providing a range of IPs to try to connect.
So how do I make sure I connect to any of the available machines from Nest? Just try to open connection to one, if TryConnect didn't work, try another?
You can run a local ES instance at your application server (eg your web server) and config it to work as a load balancer:
Set node.client: true (or node.master: false and node.data: false) for this local ES config to make it a load balancer. This mean ES will not become master nor contains data
Config it to join the cluster (your 3 nodes don't need to know this ES)
Config NEST to use local ES as your search server
Then this ES become a part of your cluster, and will distribute your requests to suitable nodes
If you don't want "load balancer", then you have to manually checking on client side to determine which node is alive.
Since you have a small set of nodes, you can use a StaticConnectionPool:
var uri1 = new Uri("192.168.1.1");
var uri2 = new Uri("192.168.1.2");
var uri3 = new Uri("192.168.1.3");
var uris = new List<Uri> { uri1, uri2, uri3 };
var connectionPool = new StaticConnectionPool(uris);
var connectionSettings = new ConnectionSettings(connectionPool); // <-- need to be reused
var client = new ElasticClient(connectionSettings);
An important point to keep in mind is to reuse the same ConnectionSetting when creating a new elastic client, since elasticsearch cache is per ConnectionSetting. See this GitHub post:
...In any case its important to share the same ConnectionSettings
instance across any elastic client you instantiate. ElasticClient can
be a singleton or not as long as each instance shares the same
ConnectionSettings instance.
All of our caches are per ConnectionSettings, this includes
serialization caches.
Also a single ConnectionSettings holds a single IConnectionPool and
IConnection something you definitely want to reuse across requests.
I would set up one of the nodes as a load balancer. Meaning that the URL your are calling should allways be up.
Though if you increase the number of replicas you can call any of the nodes by URL and still access the same data. ElasticSearch does not care which one you access while in a cluster. So you could build your own range of ips in your application.

Resources