Starting a node in emqtt and creating cluster - emq

I am new to emqtt and erlang. Using the documentation provided in emqtt.io I configured the emqtt in my machine and wanted to create a cluster.
I followed the steps given below to create a node
erl -name node1#127.0.0.1
erl -name node2#127.0.0.1
And to connect these nodes i used the below command.
(node1#127.0.0.1)1> net_kernel:connect_node('node2#127.0.0.1')
I am not getting any response(true or false) after executing this command.
Also I tried the following command
./bin/emqttd_ctl cluster emqttd#192.168.0.10
but got a failure message
Failed to join the cluster: {node_down,'node1#127.0.0.1'}
When I hit the URL localhost:8080/status I am getting the following message
Node emq#127.0.0.1 is started
emqttd is running
But i couldn't get any details about the cluster.
Am I following the right steps?. Need help on the creation of cluster in emqtt.
Thanks in advance!!

For each node that is created in a machine a separate process is initiated and on creating many bodes will finally end up with using the memory the most which leads to a situation where you will not be able to join any nodes in a cluster. Hence to join we have to stop the nodes that are not in use using the ./emqttd stop command

You need two emqx nodes running on different machine, as the port may conflicts with each other on the same machine.
And the node names MUST not use loopback ip address 127.0.0.1 such as node1#127.0.0.1.

Related

Creating rethinkdb cluster

I'm writing an automation script that supposed to create 4 instances in AWS and deploy rethinkdb cluster on them without any human interaction. According to the documentation I need to either use --join parameter on command line or put join statements in configuration file. However, what I don't understand is if I need to specify join only once in order to create the cluster or every time I restart any of the cluster nodes?
My current understanding is that I only need to issue it once, the cluster configuration is somehow stored in metadata and next time I can just start rethinkdb without --join parameter and it will reconnect to the rest of the cluster on its own. But when would I need the join option in the configuration file then?
If this is true then do I need to start rethinkdb with --join option in my script then shut it down and then start again without --join? Is this the right way to do it or there are better alternatives?
You're right that on subsequent restarting, you don't need to specify --join from command line, it will discover the cluster and attempt to re-connect. Part of cluster state is store in system table server_config.
Even if you wiped out the data directory on a this node, it may still be able to form cluster because other nodes may have information about that node, and will attempt to connect to it. But if no other node store information about this particular server, or when this particular node is restarted and have a new IP address for some reason, and its data directory is wiped as well, this time, the cluster doesn't know about it(with new IP address).
So, I'll always specifiing --join. It doesn't hurt. And it helps in worst case to make the new node be able to join cluster again.

(Window) MPJExpress - runtime.MPJRuntimeException: Cannot connect to the daemon at machine <x> and port <10000>

I'm new to EC2. I have successfully created a window server 2012 hpc cluster using Amazon Web Service and hope to run parallel programming.
I have successfully run MPJ Express in the Multi-core configuration. However, I am facing some problem with cluster Configuration with niodev. My head node not able to connect to the compute node.
I have followed the instruction given at http://mpj-express.org/docs/guides/windowsguide.pdf. I have setup all the enviroment variable.
Screenshot of my error
The IP address i put in machines file is Private IP of compute node.
My machines file is put inside directory c:\mpj-user.
My compute node have started mpj daemon with same MPJExpress Configuration.
I am able to ping from head node to compute node.
I found out most of the solution on Internet is using ubuntu, I can't really find a solution for windows.
Any help or solution is much appreciated.
You need to start daemons on each compute node separately with same mpjepxress configurations and then try to run hello world program.
If that does not work, use -src switch if there is no Network File sharing enabled. You can find information about this switch in windows guide.
Let me know if you still get the problem.

Cloudant : Error with running weatherreport to check cluster health

We have three node cluster setup and facing issue to run weather report command.
By looking at error, it is clear that machine from where weatherreport utility is running not able to connect to other two machines. I have checked all machines and they are accessible using fqdn. But from message it looks like it is using shortname while connecting to peer machine. So how to check from where it is taking peer machine names? So I can give a try to change them to full machine name and that might work for me. if there is any other solution then let us know.
Error is coming as
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2031'
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2032'
['cloudant_diag17506#machine2031.domain.com'] [crit] Could not run check weatherreport_check_safe_to_rebuild on cluster node 'cloudant#machine2033'
['cloudant#machine2032.domain.com'] [crit] Rebuilding this node will leave the following shard with NO live copies: default/t_alpha e0000000-ffffffff, default/t_alpha a0000000-bfffffff, default/t_alpha 60000000-7fffffff, default/t_alpha 20000000-3fffffff, default/metrics_app e0000000-ffffffff, default/metrics_app a0000000-bfffffff, default/metrics_app 60000000-7fffffff, default/metrics_app 20000000-3fffffff
I got solution for this problem.
It was problem that when DB was created first time, short name was used so in database it might be referring for short name to connect to other peer hosts.
Now that the Cloudant Local installation is in problematic stage, to make it consistent would be to remove all the files under /srv/cloudant/ on all database nodes. This will remove all default Cloudant databases. Then run the configure.sh script again on each node as before but now that "hostname -f" correctly outputs the fully qualified host name, then create your databases again.

Adding node to existing cluster in Kubernetes

I have a kubernetes cluster running on 2 machines (master-minion node and minion node). I want to add a new minion node without disrupting the current set up, is there a way to do it?
I have seen that when I try to add the new node, the services on the other nodes stops it, due to which I have to stop the services before deploying the new node to the existing cluster.
To do this in the latest version (tested on 1.10.0) you can issue following command on the masternode:
kubeadm token create --print-join-command
It will then print out a new join command (like the one you got after kubeadmn init):
kubeadm join 192.168.1.101:6443 --token tokentoken.lalalalaqyd3kavez --discovery-token-ca-cert-hash sha256:complexshaoverhere
You need to run kubelet and kube-proxy on a new minion indicating api address in params.
Example:
kubelet --api_servers=http://<API_SERVER_IP>:8080 --v=2 --enable_server --allow-privileged
kube-proxy --master=http://<API_SERVER_IP>:8080 --v=2
After this you should see new node in
kubectl get no
In my case the issue was due to an existing wront Route53 "A" record.
Once it's been updated to point to internal IPs of API servers, kube-proxy was able to reach the masters and the node appeared in the list (kubectl get nodes).

unable to create node with FQDN for chef client configuration

I have created node through command Knife node create node1 in my laptop(which is configured as chef-client) i am able to create the node , but it has no FQDN entry in it,so its affecting knife operations like executing recipes to continue ,I tried to edit ,but no luck . Can anyone know how to solve this ?
After you create a node, chef-client should run on this node at least once successfully, for automatic attributes (such as fqdn) to reach chef-server.

Resources