Cloudera fails to recognize hosts - hadoop

Initially, i set up 2 machines (Ubuntu 12.04, x64) on vSphere server.
The name and ip of these two machines were
host ip
vm-cluster-node1 10.211.55.100
vm-cluster-node2 10.211.55.101
I have installed cloudera manager in vm-cluster-node1.
Then i cloned second one (vm-cluster-node2) to create 2 more hosts, and changed the ip and names as:
host ip
vm-cluster-node3 10.211.55.102
vm-cluster-node4 10.211.55.103
But the problem is, when i add these all 4 machines from cloudera, no matter how many times i try, i can only see two machines in hosts tab. later i realized that, if i refresh the web-page, i can see 2 machines only, but the second machine is switched between vm-cluster-node2, vm-cluster-node3 and vm-cluster-node4.
to illustrate, i have included images to make things clear.
So, as far i've understood, the cloudera manager is not able to resolve the hosts cloned from same source as different machines. Even though the host-names and IPs have been changed. So is there anything that is common in these machines and this problem is occurring?

SOLVED
The problem was that the three nodes/hosts have same HOST ID. This can be changed in /etc/hosts/cloudera-scm-agent file with CMF_AGENT_ARGS="--host_id new_host_id"

Related

Strange DNS behavior

I am experiencing a strange behavior of DNS servers at OVH / Cloudflare. I came here hoping to find some help or an idea of ​​the source of the problem.
I am not comfortable with English which is not my native language so I will try to summarize.
I have 3 dedicated Windows Server 2016 physical servers.
On 2 servers, I have 8 CentOS 7 virtual machines (Hyper-V).
The 8 CentOS VMs communicate with an API on the 3rd Windows machine. This 3rd machine has a web server (II7) and has a domain name pointing to an API (.Net MVC).
Until yesterday everything worked well for 1 year and a half. But today the 8 CentOS machines no longer recognize the dns domain of the Windows machine that contains the web server.
However :
The 8 virtrual machines can make DNS request for any other domain with success.
Any other machine can point the domain of 3rd Windows machine except these 8 machines.
This is not a DNS server problem ?
All machines (physical and virtual) use the host's DNS (OVH). The domain name is at Cloudflare but does not have a proxy (just a classic A record).
I tried on CentOS virtual machines to put Cloudflare DNS 1.1.1.1 and 1.0.0.1. But still the same problem.
When I make a DNS request from one of the VMs I have this message: host *****. Com not found: 2 (SERVFAIL Error)
Yet if I make the same request from any other computer (even the physical machine on which the virtual machines are located) with the same DNS server it works.
It looks like my virtual machines can not access the DNS server, but if I make a request to another domain it works. And even if I change DNS server the same problem occurs !
I am completely lost and I do not see where the problem may come from.
I specify that before this malfunction no change was made neither on the domain, nor at Cloudflaire nor on the servers. Everything had been working for a year and a half.
Thank you for your help.
Mayzz.

Why can't standalone slaves connect to master on separate Mac OS boxes?

I have two Macs (both OS X EI Caption) at home, both are connected to same wifi. I want to install an spark cluster (with two workers) on this two computers.
Mac1 (192.168.1.2) is my master, with Spark 1.5.2, it is up and working well, and I can see the Spark UI at http://localhost:8080/ (also I see spark://Mac1:7077)
I also have run one slave on this machine (Mac1), and I see it under workers in the Spark UI.
Then, I have copied the Spark on the second machine (Mac2), and I am trying to run another Slave on Mac2 (192.168.2.9) by this command:
./sbin/start-slave.sh spark://Mac1:7077
But, it does not work: Looking at log it shows:
Failed to connect to master Mac1:7077
Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#Mac1:7077/),Path(/User/Master)]
Networking-wise, at Mac1, I can SSH to Mac2, and vice versa, but I cannot telnet to Mac1:7077.
I will appreciate it if you help me to solve this problem.
tl;dr Use -h option for ./sbin/start-master.sh, i.e. ./sbin/start-master.sh -h Mac1
Optionally, you could do ./sbin/start-slave.sh spark://192.168.1.2:7077 instead.
The reason is that binding to ports in Spark is very sensitive to what names and IPs are used. So, in your case, 192.168.1.2 != Mac1. They're different "names" in Spark, and that's why you can use ssh successfully as it uses name resolver on OS while it does not work at Spark level where the above condition holds, i.e. the "names" are not equal.
Likely a networking/firewall issue on the mac.
Also, your error message you copy/pasted reference port 7070. is this the issue?
using IP addresses in conf/slaves works somehow, but I have to use IP everywhere to address the cluster instead of host name.
SPARK + Standalone Cluster: Cannot start worker from another machine

Vagrant Remote Box Setup

I have a requirement to setup VM boxes across multiple host machines and this has to be initiated from a single master host. To elaborate bit more, I will have VM templates with different configurations (created as say a VgrantFile) and the master host should initiate connection to the child host and bring up the VM based on a specific template.
Can I use Vagrant for this ? Appreciate if you can suggest alternatives.
Regards
Best I found according to your description:
https://github.com/fjsanpedro/vagrant-nodemaster
https://github.com/fjsanpedro/vagrant-node

Can't join into a cluster on marklogic

I'm working with marklogic database and I tried to create a cluster.
I already have a development key. The OS is the same in all the nodes (win 7 x64).
When you tried to add a node into the cluster, you need to type the host name or the IP adress. For some reason when I type de host name, marklogic sometimes can't find the node , but that doesn't matter, because with the IP, the connection is successfull.
The main problem is when continues trought the process. At the end when marklogic try to transfer cluster configuration information to the new host, the process never ends and finally a message like "No data received" appear in the web browser.
I know that this message doesnt mean that the process fails, because when I change for example the host name, the same message appear.
So, when I check the summary in the first node, the second node appears, so that means the node "joins" into the cluster, but I'm not able to start the admin interface and always the second node appears disconnected even if I restart the service.
Aditionally, I'm able to make a ping from any computer to another.
I tried to create another network, because in my school some ports are not allowed, furthermore I tried to use different development key and the same key in my nodes too,
and finally I already have all the services enabled, but the problem persist.
Any help or comments would be appreciated.
Make sure ports 7998 - 8003 are open on both computers for both inbound and outbound traffic and that you don't have a firewall (Windows firewall, or iptables) blocking these.
You can also start looking into the Logs/ErrorLog.txt file and see if something obvious shows up.
Stick to IP addresses for now as it seems your DNS isn't fully working.
Your error looks like a kind of networking connectivity problem between the hosts.
Also you might get more detailed, or atleast different, answers from the MarkLogic developer mailing list.
http://developer.marklogic.com/discuss
-David Lee
Make sure the host names in MarkLogic configuration match the DNS names at which the hosts can see each other. If those are unreliable, then simply use IP addresses as host names. Go to the Admin interface on both ends, lookup the host name, change the DNS name into IP name, try again.
Also look at DALDEI's suggestion about ports and firewalls, that could be interfering as well.
HTH!

How to change broadcasted ip in tomcat cluster

I set up a tomcat 7 cluster by including the -part in the server.xml.
In the Docs ( http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html ) it says:
The IP broadcasted is java.net.InetAddress.getLocalHost().getHostAddress() (make sure you don't broadcast 127.0.0.1, this is a common error)
Unfortunately getLocalHost().getHostAddress() return 127.0.1.1 for all my virtual machines (Ubuntu run in Virtual Box under Win7) instead of the correct ip that I can reach the vm's with, ie 10.42.29.191.
Question:
Is there a way to tell tomcat what ip to send to other members of the cluster via the multicast? Or can I specify (e.g. code) a different way to obtain the ip?
Additional info:
My cluster seems to fail session replication and above "error" could be the cause of it. Glassfish doesn't do session replication either, maybe it's the same error. If you could give information for glassfish configuration regarding this I'd be glad too. The multicast between virtual machines works according to the tool iperf.
Since the vm is a Ubuntu machine, I had to edit the file /etc/hosts.
replace entry like this:
127.0.1.1 tim-VirtualBox
with the correct ip:
10.42.29.191 tim-VirtualBox

Resources