Check whether Hazelcast cluster is ready for requests via bash

Check whether Hazelcast cluster is ready for requests via bash - bash

I'm using Docker with Hazelcast 3.6.
I launch Hazelcast instances and want to launch my application only after cluster is ready for requests.
I've read in documentation, that it's possible to access Hazelcast via curl, but it doesn't work for me.
I'm getting (52) Empty reply from server, when trying to POST to Hazelcast instance, which is started and ready.
Is there a method to check, whether Hazelcast is ready?
E.g., for Cassandra I run
wget --spider 0.0.0.0:9042
for RabbitMQ this one works great netcat -z -w 2 rabbit 5672. Is there similar solution for Hazelcast?

You can use REST API for cluster management, which described in the documentation.
Another hint, you can use cluster.sh script in bin directory to interact with management endpoints.
You're trying to hit a REST client API (for accessing maps and queues), which is disabled by default.
Let me know if you have any questions.
Thank you

(Posted on behalf of the OP).
Caution: Don't forget to hide group name and password when using public CI and/or code repository if you don't need all people to see this data.
first of all I had to add cluster group in configuration, see Creating Cluster Groups in documentation;
then use curl --data "${GROUPNAME}&${PASSWORD}" http://${ADDRESS}:${PORT}/hazelcast/rest/management/cluster/state to wait until cluster is ready.
E.g., in my case, when Hazelcast listens on 0.0.0.0:5701 with group name app1 and password app1-pass, it looks as follows
curl --data "app1&app1-pass" \
http://0.0.0.0:5701/hazelcast/rest/management/cluster/state
Note
As I understood, it will not show, whether all nodes are ready, so I need to check them separately.
When I set hazelcast.initial.min.cluster.size to 2, main node shows message HazelcastInstance waiting for cluster size of 2 and curl returns {"status":"success","state":"active"}.

Related

Cannot find datadog agent connected to elasticserch

I have an issue where i have multiple host dashboards for the same elasticsearch server. Both dashboards has its own name and way of collecting data. One is connected to the installed datadog-agent and the other is somehow connected to the elasticsearch service directly.
The weird thing is that i cannot seem to find a way to turn off the agent connected directly to the ES service, other than turning off the elasticsearch service completly.
I have tried to delete the datadog-agent completely. This stops the dashboard connected to it, to stop receiving data (of course) but the other dashboard keeps receiving data somehow. I cannot find what is sending this data and therefor is not able to stop it. We have multiple master and data node and this is an issue for all of them. ES version is 7.17
another of our clusters is running ES 6.8, and we have not made the final configuration of the monitoring of this cluster but for now it does not have this issue.
just as extra information:
The dashboard connected to the agent is called the same as the host server name, while the other only has the internal ip as it's host name.
Does anyone have any idea what it is that is running and how to stop it? I have tried almost everything i could think of.

i finally found the reason. as all datadog-agents on all master and data nodes was configured to not use the node name as the name and cluster stats was turned on for the elastic plugin for datadog. This resulted in the behavior that when even one of the datadog-agents in the cluster was running, data was coming in to the dashboard which was not named correclty. Leaving the answer here if anyone hits the same situation in the future.

hazelcast-jet deployment and data ingestion

I have a distributed system running on AWS EC2 instances. My cluster has around 2000 nodes. I want to introduce a stream processing model which can process metadata being periodically published by each node (cpu usage, memory usage, IO and etc..). My system only cares about the latest data. It is also OK with missing a couple of data points when the processing model is down. Thus, I picked hazelcast-jet which is an in-memory processing model with great performance. Here I have a couple of questions regarding the model:
What is the best way to deploy hazelcast-jet to multiple ec2 instances?
How to ingest data from thousands of sources? The sources push data instead of being pulled.
How to config client so that it knows where to submit the tasks?
It would be super useful if there is a comprehensive example where I can learn from.

What is the best way to deploy hazelcast-jet to multiple ec2 instances?
Download and unzip the Hazelcast Jet distribution on each machine:
$ wget https://download.hazelcast.com/jet/hazelcast-jet-3.1.zip
$ unzip hazelcast-jet-3.1.zip
$ cd hazelcast-jet-3.1
Go to the lib directory of the unzipped distribution and download the hazelcast-aws module:
$ cd lib
$ wget https://repo1.maven.org/maven2/com/hazelcast/hazelcast-aws/2.4/hazelcast-aws-2.4.jar
Edit bin/common.sh to add the module to the classpath. Towards the end of the file is a line
CLASSPATH="$JET_HOME/lib/hazelcast-jet-3.1.jar:$CLASSPATH"
You can duplicate this line and replace -jet-3.1 with -aws-2.4.
Edit config/hazelcast.xml to enable the AWS cluster discovery. The details are here. In this step you'll have to deal with IAM roles, EC2 security groups, regions, etc. There's also a best practices guide for AWS deployment.
Start the cluster with jet-start.sh.
How to config client so that it knows where to submit the tasks?
A straightforward approach is to specify the public IPs of the machines where Jet is running, for example:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("jet");
clientConfig.addAddress("54.224.63.209", "34.239.139.244");
However, depending on your AWS setup, these may not be stable, so you can configure to discover them as well. This is explained here.
How to ingest data from thousands of sources? The sources push data instead of being pulled.
I think your best option for this is to put the data into a Hazelcast Map, and use a mapJournal source to get the update events from it.

How does one setup a Distributed Map Cache for NiFi?

I'm brand new to NiFi and simply playing around with processors.
I'm trying to incorporate Wait and Notify processors in my testing, but I have to setup a Distributed Map Cache (server and client?).
The NiFi documentation assumes a level of understanding that I do not have.
I've installed memcached on my computer (macOS) and verified that it's running on Port 11211 (default). I've created a DistributedMapCacheClientService and DistributedMapCacheServer under NiFi's CONTROLLER SERVICES, but I'm getting java.net.SocketTimeoutException & other errors.
Is there a good tutorial on this entire topic? Can someone suggest how to move forward?

the DistributedMapCacheClientService and DistributedMapCacheServer does not require additional software.
To create these services, right-click on the canvas, select Configure and then select the Controller Services tab. You can then add new services by clicking the + button on the right and searching by name.
create DistributedMapCacheServer with default parameters (port 4557) and enable it. this will start built-in cache server.
create DistributedMapCacheClientService with hostname localhost and other default parameters and enable it
create a simple flow GenerateFlowFile set the run schedule and not zero bytes size in parameters.
connect it to PutDistributedMapCache set Entry Identifier as Key01 and choose your DistributedMapCacheClientService
try to run it. and if port 4557 not used by other software the put cache should work.

#Darshan
Yey it will work beacause in the documentation of DistributedMapCacheClientService says that it :
Provides the ability to communicate with a DistributedMapCacheServer. This can be used in order to share a Map between nodes in a NiFi cluster

Splitting a Redis RDB file

Currently I'm using redis on a EC2 machine, with 60G RAM without any slaves, but as my data grows I will need more memory.
I was thinking to migrate to 2 x 60G machines and split the already existing data between the two.
Is there any tool for splitting the RDB file? I haven't found anything specifically designed for this.

If you want to split your data, you will need to have a way to shard your keys so some keys will be written/read from server A and the others from server B
There is no way to split a RDB file, but there is something you can do to achieve what you want.
First what you can do is start a redis instance on your second server and say it is a slave of your current server, but set the param slave-read-only to false. This will cause the slave to synchronize and read all of your redis data from master. So far you only have a slave with all the data, but now we will do the interesting bit.
Then you need to decide on a sharding strategy. Some redis clients do this for you. For example, the official Ruby client knows how to handle that if you configure it. You will need to configure your client so keys will be sharded to A and B (or alternative use twemproxy so the clients won't know about different servers and the twemproxy will take care of it)
Once you have the clients configure, you need to deploy the new clients to production and immediately configure the slave as not a slave anymore. You can do this directly using the CONFIG command on the slave server (don't forget to persist the config using CONFIG REWRITE) or you can change the config file of the slave and restart, whatever is more convenient for you. Since the slave is configured as slave-read-only false, it will accept writes even on slave mode. This means if you change the config directly from the redis-cli you can change from slave to just a sharded stand-alone redis without restarting, which I think is quite cool.
Be aware once you shard, you will have to be careful with MULTI commands or when using LUA scripts. If you are using twemproxy you won't be able to use those commands, but if you are sharding on the client side, you will still be able to use MULTI or LUA. Just be careful to use a sharding mechanism in which all the related keys will stay on the same server.

step1: install https://github.com/leonchen83/redis-rdb-cli/
step2: create a config file to set spliting condition
content of nodes.conf
34b6e1dfb871ad30398ef5edd6b9a954617e6ec1 127.0.0.1:10003#20003 master - 0 1531044047088 3 connected 8193-16383
89d020a7e727e81f003836207902ae26fe05fd51 127.0.0.1:10001#20001 myself,master - 0 1531044047000 1 connected 0-8192
vars currentEpoch 6 lastVoteEpoch 0
step3: run rdt -s your-dump.rdb -c nodes.conf -o /path/to
after step3. that will generate 2 rdb files in /path/to directory 34b6e1dfb871ad30398ef5edd6b9a954617e6ec1.rdb and 89d020a7e727e81f003836207902ae26fe05fd51.rdb

Monitoring instances in cloud

I usually use Munin as monitoring software, but this (as others software I presume) needs an IP to make the ICMP or whatever pings to collect data.
In Amazon EC2 instances are created on the fly, with IP's you don't know.
How can they be monitored ?
I was thinking about using amazon console commands to read the IP's of the instances up, and change the monit configuration file on the fly also , but it can be too complicated ... or not?
Any other solution / suggestion ?
Thank you

I use revealcloud to monitor my amazon instances. You can install it once and create an ami from that systen, or bootstrap the install command if that's your method. Since the install is just one command, it's easy enough to put into the rc.local (or similar). You can then see all the instances in the dashboard or topiew as soon as they boot up.

Our instances are bootstrapped using chef recipes, so it's easier for me to provide IPs/hosts as they (= all members of my cluster) get entered into /etc/hosts on start-up. Generally, it doesn't hurt to use elastic IPs for a master server and allow all connections (in /etc/munin/munin.conf by default).
I'd solve the security 'question' on the security groups level. E.g. allow only instances with a certain security group to connect to the munin-node process (on port 4949). The question which remains is.
E.g., using ec2-authorize you can achieve
ec2-authorize mygroup -o monitorgroup -u <AWS-USER-ID>
This means that all instances with group monitorgroup can access resources on instances with mygroup.
Let me know if this helps!

If your Munin master and nodes are all hosted on EC2 than it's better to use internal hosts like domU-00-00-00-00-00-00.compute-1.internal. because this way you don't have to deal with IP addresses and security groups.
You also have to set this in /etc/munin/munin-node.conf:
allow ^.*$
You can read more about it in Monitoring AWS Ubuntu Instances using Munin
But if your Munin master is not on EC2 your best bet is to attach Elastic IP to your EC2 instance.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio