I'm trying to follow MonetDB docs on Cluster Management
to setup a 3 nodes cluster using 3 Centos machines, I created the 3 dbfarm using monetdbd create /path/to/mydbfarm and from the first node, I run monetdb discover and it returns nothing where it should discover the other nodes, and when I try to run monetdb -h [second node IP] -P mypasshphrase status it returns the following error
status: cannot connect: Connection refused
PS: I have a passwordless connection between these 3 nodes, ssh [any node IP] works just fine,
Thank you
By default MonetDB listens only for local connections. This is for security reasons.
To listen also for remote connections, run
monetdbd set listenaddr=0.0.0.0 .../path/to/dbfarm
on each of the nodes and restart monetdbd.
Related
I would like to run multiple instances of Rethinkdb on the same machine.
Is that possible? if so, what is the set up?
I have found the answer:
https://rethinkdb.com/docs/start-a-server/
Multiple RethinkDB instances on a single machine
Adding a node to a RethinkDB cluster is as easy as starting a new RethinkDB process and pointing it to an existing node in the cluster. Everything else is handled by the system without any additional effort required from the user.
Now start the second RethinkDB instance on the same machine:
$ rethinkdb --port-offset 1 --directory rethinkdb_data2 --join localhost:29015
info: Creating directory /home/user/rethinkdb_data2
info: Listening for intracluster connections on port 29016
info: Attempting connection to 1 peer...
info: Connected to server "Chaosknight" e6bfec5c-861e-4a8c-8eed-604cc124b714
info: Listening for client driver connections on port 28016
info: Listening for administrative HTTP connections on port 8081
info: Server ready
I am trying to configure two node(node1 and node2 HA cluster using pacemaker on centos 7. I executed below steps on both nodes
yum install pcs
systemctl enable pcsd.service pacemaker.service corosync.service
systemctl start pcsd.service
passwd hacluster
After that execute below command on node1
pcs cluster auth node1 node2
i am getting below error
Error: Unable to communicate with node2 Error: Unable to
communicate with node1
I have also verified that both nodes are listening on port 2224 and also used telnet to verify that both nodes are able to connect to each other on 2224.
Need help.
The issue got resolved after using FQDN instead of hostname(node1.demo.in, node2.demo.in). below command worked fine.
pcs cluster auth node1.demo.in node2.demo.in
Don't know exact cause for this. Any Idea?
I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.
You should check the host file (/etc/hosts), see the host name and FNDN, check if there any duplicates name, IP address.
Could you also confirm the firewall activity by steps:
sudo ufw status
And also check the port in iptables (or allow port in firewall: udp, tcp).
I've a linux instance in Amazon EC2 instance. I manually installed Spark in this instance and it's working fine. Next I wanted to set up a spark cluster in Amazon.
I ran the following command in ec2 folder:
spark-ec2 -k mykey -i mykey.pem -s 1 -t t2.micro launch mycluster
which successfully launched a master and a worker node. I can ssh into the master node using ssh -i mykey.pem ec2-user#master
I've also exported the keys: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
I've a jar file (which has a simple Spark program) which I tried to submit to the master:
spark-submit --master spark://<master-ip>:7077 --deploy-mode cluster --class com.mycompany.SimpleApp ./spark.jar
But I get the following error:
Error connecting to master (akka.tcp://sparkMaster#<master>:7077).
Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#<master>:7077
No master is available, exiting.
I'm also updated EC2 security settings for master to accept all inbound traffic:
Type: All traffic, Protocol: All, Port Range: All, Source: 0.0.0.0/0
A common beginner mistake is to assume Spark communication follows a program to master and master to workers hierarchy whereas currently it does not.
When you run spark-submit your program attaches to a driver running locally, which communicates with the master to get an allocation of workers. The driver then communicates with the workers. You can see this kind of communications between driver (not master) and workers in a number of diagrams in this slide presentation on Spark at Stanford
It is important that the computer running spark-submit be able to communicate with all of the workers, and not simply the master. While you can start an additional EC2 instance in a security zone allowing access to the master and workers or alter the security zone to include your home PC, perhaps the easiest way is to simply log on to the master and run spark-submit, pyspark or spark-shell from the master node.
I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service.
Initialization failed for block pool Block pool BP-77624948-10.88.65.174-13492342342 (storage id DS-820323624-10.88.76.223-50010-142302323234) service to hadoop-name-node-01/10.88.65.174:6666
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(10.88.69.31, storageID=DS-820323624-10.88.76.223-50010-142302323234, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1486084428;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:656)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3593)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:899)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91), I was unable to start the datanode service due to the following error:
Has anyone had success with changing the IP address of a hadoop data node and join it back to the cluster without data loss?
CHANGE HOST IP IN CLOUDERA MANAGER
Change Host IP on all node
sudo nano /etc/hosts
Edit the ip cloudera config.ini on all node if the master node ip changes
sudo nano /etc/cloudera-scm-agent/config.ini
Change IP in PostgreSQL Database
For the password Open PostgreSQL password
cat /etc/cloudera-scm-server/db.properties
Find the password lines
Example. com.cloudera.cmf.db.password=gUHHwvJdoE
Open PostgreSQL
psql -h localhost -p 7432 -U scm
Select table in PostgreSQL
select name,host_id,ip_address from hosts;
Update table IP
update hosts set ip_address = 'xxx.xxx.xxx.xxx' where host_id=x;
Exit the tool
\q
Restart the service on all node
service cloudera-scm-agent restart
Restart the service on master node
service cloudera-scm-server restart
Turns out its better to:
Decommission the server from the cluster to ensure that all blocks are replicated to other nodes in the cluster.
Remove the server from the cluster
Connect to the server and change the IP address then restart the cloudera agent
Notice that cloudera manager now shows two entries for this server. Delete the entry with the old IP and longest heartbeat time
Add the server to the required cluster and add required roles back to the server (e.g. HDFS datanode, HBASE RS, Yarn)
HDFS will read all data disks and recognize the block pool and cluster IDs, then register the datanode.
All data will be available and the process will be transparent to any client.
NOTE: If you run into name resolution errors from HDFS clients, the application has likely cached the old IP and will most likely need be restarted. Particularly Java clients that previously referenced this server e.g. HBASE clients must be restarted due to the JVM caching IPs indefinitely. Java based clients will likely throw errors relating to connectivity to the server with changed IP because they have the old IP cached until they are restarted.