Ambari remove dead host - hadoop

I have a host configured into Ambari which no longer exists. Ambari still thinks it's there. When I try to delete it through the UI I get:
400 status code received on DELETE method for API:
/api/v1/clusters/handy091015/hosts/r-hadoopeco-celeryworker-07ac46a4.hbinternal.com/host_components/ZOOKEEPER_CLIENT
Error message: Bad Request
When I try to delete it via the api, with the command below, I get the same host information as with a GET:
curl -H "X-Requested-By: ambari" -DELETE http://admin:admin#ambari.handy-internal.com//api/v1/clusters/handy091015/hosts/r-hadoopeco-celeryworker-07ac46a4.hbinternal.com
I have tried the instructions here to no avail:
https://cwiki.apache.org/confluence/display/AMBARI/Using+APIs+to+delete+a+service+or+all+host+components+on+a+host
My question is: how do I get Ambari to no longer know about/try to do things with this host.

I am not able to reproduce your behaviour with Ambari 2.1.2 and HDP 2.3 stack.
Limitation:
Note that host removing is supported only for hosts with no master components, so if they are present, then deleting is not possible.
Options:
Try to do ambari-server restart, sometimes it have intermittent issues
If this is an option, I recomend you to do ambari-server reset and install it from scratch. If you don't have much setup, it will save your time probably.
If not, you may want to post ambari-server.log file additionally. This may help to debug the core issue
Another option - just ignore that host, it will not do much harm to you. You can move it to maintenance mode, that will ease cluster operation.

Related

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

Hadoop installation on Windows 10

I’m trying to install Hadoop 3.2.0 on Windows 10 using mostly the following tutorial:
https://wiki.apache.org/hadoop/Hadoop2OnWindows
I’ve found some relevant tutorial on the web even though they are principally related to Linux.
Every time I'm trying to verify the HDFS daemons are running with this code:
"%HADOOP_PREFIX%\bin\hdfs dfs -put myfile.txt /"
I constantly get the same error message: “Your endpoint configuration is wrong;”
I tried to change the port to 9000, change to localhost, also tried to use hostname:8820.
I checked Stack Overflow and Super User but I haven’t found already the answer.
What should I try?
Try backslashes instead of forward slashes for windows paths

VM not restarting automatically when installed with kickstart with --noautoconsole option

I created a Centos 7.3 VM using kickstart using the following command:
virt-install --name=vm1 --disk path=vm1.img,size=20 --vcpus=2 --ram=10240 --os-type=linux --os-variant=rhel7.0 --network bridge=br0 --graphics none --location=http://<IP>/centos7.3 -x "ks=http://<IP>/centos73vm-ks.cfg append ip=<VM IP> netmask=255.255.252.0 gateway=<gw> bootproto=static console=ttyS0"
This works fine. VM is created, rebooted automatically and the node is usable. However, the problem with this is that I cannot use it to automate since I don't get the control back. To do that, I added the --noautoconsole options of the virt-install command at the end of the above command.
After doing so, VM is installed, but after reboot it does not come up automatically. It remains in shut off state. I need to start it manually. There are no errors on logging to the console. May someone give any leads on how to fix this?
Any help would be greatly appreciated. Thanks in advance.
you need to add --wait=-1 so that virt-install waits for the installation to complete before exiting. The vm will then automatically start when the installation completes.
this sure sounds like an issue that was covered on the RedHat customer portal. I'm not sure if that requires a paid license but your company (or you) might have one already?
-- Jonas

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

Installing Membase from source

I am trying to build and install membase from source tarball. The steps I followed are:
Un-archive the tar membase-server_src-1.7.1.1.tar.gz
Issue make (from within the untarred folder)
Once done, I enter into directory install/bin and invoke the script membase-server.
This starts up the server with a message:
The maximum number of open files for the membase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
Tried updating limits.conf as suggested, but no luck it continues to pop up the same message and continues booting
Given that the server is started I tried accessing memcached over port 11211, but I get a connection refused message. Then figured out (netstat) that memcached is listening to 11210 and tried telneting to port 11210, unfortunately the connection is closed as soon as I issue the following commands
stats
set myvar 0 0 5
Note: I am not getting any output from the commands above {Yes: stats did not show anything but still I issued set.}
Could somebody help me build and install membase from source? Also why is memcached listening to 11210 instead of 11211?
It would be great if somebody could also give me a step-by-step guide which I can follow to build from source from Git repository (I have not used autoconf earlier).
P.S: I have tried installing from binaries (debian package) on the same machines and I am able to successfully install and telnet. Hence not sure why is build from source not working.
You can increase the number of file descriptors on your machine by using the ulimit command. Try doing (you might need to use sudo as well):
ulimit -n 10240
I personally have this set in my .bash_rc so that whenever I start my terminal it is always set for me.
Also, memcached listens on port 11210 by default for Membase. This is done because Moxi, the memcached proxy server, listens on port 11211. I'm also pretty sure that the memcached version used for Membase only listens for the binary protocol so you won't be able to successfully telnet to 11210 and have commands work correctly. Telneting to 11211 (moxi) should work though.

Resources