Unable to remove a dead host from Ambari - hadoop

We had some problem with a host and we had to shutdown that host.
Now, we are not able to remove that dead host from Ambari.
Whenever we click Hosts -> Click on the host that is dead -> Host Actions -> Delete Host
This host cannot be deleted since it has the following master components: DRPC Server, Falcon Server etc.
If I go to that service, all the actions against each services are greyed out. So, there is no way I can move those services to another hosts because those are disabled.
Please suggest a way ahead. Is handling sudden death of a service not possible in Ambari?

You can try the Ambari API as explained here. Some features of the Ambari API aren't implementet in der User Interface right now.
I remeber a case in my company where we couldn't remove a Node with the Ambari UI. With an API-Call like it's explained this link it was possible.

Related

How to access the flink web UI when running inside a container (wsl2)

In the First Steps instructions for flink, it says you can connect to the web UI via a local host link, I have been searching for a way to make this work on Windows 10, when running inside wsl2. I followed all steps from the linked First Steps page, but the connection is refused every time.
I did eventually figure this out. If you edit the ./conf/flink-conf.yaml file and change:
rest.bind-address: localhost to rest.bind-address: 0.0.0.0
then stop and restart the cluster, I can now access the web UI via http://localhost:8081

amabri metrics collector error

We have 5 node hortonworks cluster with ambari monitors installed in all nodes and metrics collector installed in master node.
I am getting Connection failed: [Errno 111] Connection refused to 0.0.0.0:6188
PFA for error.
https://drive.google.com/file/d/0B85rPUe3-QPXbXJSRzJmdUwwQU0/view?usp=sharing
I followed the below document and tried removing the service and added it.
https://cwiki.apache.org/confluence/display/AMBARI/Moving+Metrics+Collector+to+a+new+host
First of all, I am not able to find the origin of the error. Please share your experience if you ever faced this problem.
This happens sometimes that port is already being use by another process when we try to move collector to new host with 'Curl' commands specified on apache wiki.
Istead of doing using this you can leverage the feature which Ambari provides from it's GUI to move components from one host to another host .
'Move Master Wizard'
Follow the steps stated at Move Master Wizard , Ambari will take care rest of things for you.
I have fixed this issue by killing the process running in that port and restart the service. You can also do a manual reboot of the machine to fix this issue.

How to restart single node hadoop cluster on ec2

I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
However, I got the following error:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Please let me know how to fix it. Thanks.
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.
I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.
(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.
This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one
(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.
Are you executing the command on your local machine, or on the created EC2 node? If you're doing it locally, make sure port 7077 is open in the security settings, as its closed to the outside by default.

Can't join into a cluster on marklogic

I'm working with marklogic database and I tried to create a cluster.
I already have a development key. The OS is the same in all the nodes (win 7 x64).
When you tried to add a node into the cluster, you need to type the host name or the IP adress. For some reason when I type de host name, marklogic sometimes can't find the node , but that doesn't matter, because with the IP, the connection is successfull.
The main problem is when continues trought the process. At the end when marklogic try to transfer cluster configuration information to the new host, the process never ends and finally a message like "No data received" appear in the web browser.
I know that this message doesnt mean that the process fails, because when I change for example the host name, the same message appear.
So, when I check the summary in the first node, the second node appears, so that means the node "joins" into the cluster, but I'm not able to start the admin interface and always the second node appears disconnected even if I restart the service.
Aditionally, I'm able to make a ping from any computer to another.
I tried to create another network, because in my school some ports are not allowed, furthermore I tried to use different development key and the same key in my nodes too,
and finally I already have all the services enabled, but the problem persist.
Any help or comments would be appreciated.
Make sure ports 7998 - 8003 are open on both computers for both inbound and outbound traffic and that you don't have a firewall (Windows firewall, or iptables) blocking these.
You can also start looking into the Logs/ErrorLog.txt file and see if something obvious shows up.
Stick to IP addresses for now as it seems your DNS isn't fully working.
Your error looks like a kind of networking connectivity problem between the hosts.
Also you might get more detailed, or atleast different, answers from the MarkLogic developer mailing list.
http://developer.marklogic.com/discuss
-David Lee
Make sure the host names in MarkLogic configuration match the DNS names at which the hosts can see each other. If those are unreliable, then simply use IP addresses as host names. Go to the Admin interface on both ends, lookup the host name, change the DNS name into IP name, try again.
Also look at DALDEI's suggestion about ports and firewalls, that could be interfering as well.
HTH!

Resources