Ambari-Update failed (Ambari 2.4 to 2.6) - Hadoop services won't start anymore - hadoop

I just worked through this Upgrade guide for the Hortonworks Data Platform:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-upgrade/bk_ambari-upgrade.pdf
I did all the steps as described in section 1 - 4 (Ambari upgrade). But now I have the problem, that my services won't start anymore!
Ambari can find all hosts, but they won't start!
E.g. for the HDFS starting I got the following error message:
2017-11-13 19:41:11,427 - Unable to load available packages
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 771, in load_available_packages
self.available_packages_in_repos = pkg_provider.get_available_packages_in_repos(repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 85, in get_available_packages_in_repos
available_packages.extend(self._get_available_packages(repo))
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 146, in _get_available_packages
return self._lookup_packages(cmd, 'Available Packages')
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 191, in _lookup_packages
if items[i + 2].find('#') == 0:
IndexError: list index out of range
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 73, in <module>
HdfsClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 367, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 930, in restart
self.install(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 35, in install
import params
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params.py", line 25, in <module>
from params_linux import *
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py", line 391, in <module>
lzo_packages = get_lzo_packages(stack_version_unformatted)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_lzo_packages.py", line 45, in get_lzo_packages
lzo_packages += [script_instance.format_package_name("hadooplzo_${stack_version}"),
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 538, in format_package_name
raise Fail("Cannot match package for regexp name {0}. Available packages: {1}".format(name, self.available_packages_in_repos))
resource_management.core.exceptions.Fail: Cannot match package for regexp name hadooplzo_${stack_version}. Available packages: []
(I think the most important part is the message resource_management.core.exceptions.Fail: Cannot match package for regexp name hadooplzo_${stack_version}. Available packages: [] which looks like there would be no version (package) available...!
I just saw, that I upgraded also the Ambari Metrics Monitor, Ambari Metrics Hadoop Sink and the Metrics Collector before starting the Services once (the manual is a little bit confusing here, see step 4.3.3)! Was this a mistake?
I tried to upgrade from Ambari 2.4 to Ambari 2.6 (HDP 2.5 installed)! Operating system is CentOS 7.
However, I need to reset / downgrade the Ambari or upgrade the Services to be able to start them again! Can someone help? Any help would be appreciated! Thank you!

Finally I was able to downgrade my Ambari installation back to version 2.4.2 as it was before starting the upgrade process.
To make a downgrade you have to do the following steps on the appropriate nodes:
# delete the new ambari repo file
rm /etc/yum.repos.d/ambari.repo
# download the old ambari repo file (for me version 2.4.2), as described in ambari installation guide (here for Centos 7)
wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
yum clean all
yum repolist
# check for the correct version (e.g. 2.4.2) of the Ambari repo
# Downgrade all components to this version
yum downgrade ambari-metrics-monitor
yum downgrade ambari-metrics-hadoop-sink
yum downgrade ambari-agent
...
Afterwards I upgraded again, but to Ambari version 2.5.2.0, which now worked without a problem. I also was able to upgrade my HDP installation to version 2.6.3.0 via this Ambari version.
I will skip Ambari 2.6.0 and will try to upgrade Ambari with a later future release.

Related

failed to import pywin32 (even if it already installed) while starting supervisord in windows nanoserver container

I am trying to bring two process up in a windows Nanoserver container using supervisord (pip install supervisor-win )
Everything is setup in supervisord.conf, while starting it I am facing below issue,
C:\data>supervisord -n
C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\options.py:480: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (inc
luding its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
self.warnings.warn(
Traceback (most recent call last):
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\loggers.py", line 220, in _disable_inheritance_filehandler
import win32api
oduleNotFoundError: No module named 'win32api'
D
uring handling of the above exception, another exception occurred:
T
raceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\python-3.11.1-embed-amd64\Scripts\supervisord.exe\__main__.py", line 7, in <module>
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\supervisord.py", line 403, in main
go(options)
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\supervisord.py", line 415, in go
d.main()
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\supervisord.py", line 77, in main
self.options.make_logger()
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\options.py", line 1221, in make_logger
loggers.handle_file(
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\loggers.py", line 444, in handle_file
handler = RotatingFileHandler(filename, 'a', maxbytes, backups)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\loggers.py", line 211, in __init__
self._disable_inheritance_filehandler() # fix file used by others process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\python-3.11.1-embed-amd64\Lib\site-packages\supervisor\loggers.py", line 223, in _disable_inheritance_filehandler
raise ImportWarning("log rotation requires the installation of the \"pywin32\" library.\n"
ImportWarning: log rotation requires the installation of the "pywin32" library.
Download and install from https://github.com/mhammond/pywin32/releases
C:\data>
pywin32 module is already installed (tried reinstall with whl and pip anyway, that did not help)
C:\data>python -m pip show pywin32
Name: pywin32
Version: 305
Summary: Python for Window Extensions
Home-page: https://github.com/mhammond/pywin32
Author: Mark Hammond (et al)
Author-email: mhammond#skippinet.com.au
License: PSF
Location: C:\python-3.11.1-embed-amd64\Lib\site-packages
Requires:
Required-by: pypiwin32, supervisor-win
C:\data>
What could be the issue here? I see less dll files (around 400 only) in C:\Windows\System32 compared to servercore/server image (This is working fine with windows servercore image.)

unable to init h2o. can somebody help me with it

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
Starting server from
C:\Users\Ramakanth\Anaconda2\lib\site-packages\h2o\backend\bin\h2o.jar
Ice root: c:\users\ramaka~1\appdata\local\temp\tmpeaff8n JVM stdout:
c:\users\ramaka~1\appdata\local\temp\tmpeaff8n\h2o_Ramakanth_started_from_python.out
JVM stderr:
c:\users\ramaka~1\appdata\local\temp\tmpeaff8n\h2o_Ramakanth_started_from_python.err
Traceback (most recent call last): File "", line 1, in
File
"C:\Users\Ramakanth\Anaconda2\lib\site-packages\h2o\h2o.py", line 262,
in init
min_mem_size=mmin, ice_root=ice_root, port=port, extra_classpath=extra_classpath) File
"C:\Users\Ramakanth\Anaconda2\lib\site-packages\h2o\backend\server.py",
line 121, in start
mmax=max_mem_size, mmin=min_mem_size) File "C:\Users\Ramakanth\Anaconda2\lib\site-packages\h2o\backend\server.py",
line 317, in _launch_server
raise H2OServerError("Server process terminated with error code %d" % proc.returncode) h2o.exceptions.H2OServerError: Server process
terminated with error code 1
Assuming "build 9.0.1+11" means Java 9, that is your problem: H2O currently only supports Java 7 or Java 8. This is the ticket to follow for adding Java 9 support. In the meantime uninstall your current Java, then install Java 8.
UPDATE: It seems Java 9 is now supported, so upgrade to h2o 3.20 or later.
BTW, normally you should be giving a lot more information: which language you are using, what code you used to try and start H2O (or the commandline if you started it that way), what OS, what versions of Java, R, Python, etc., number of cores, amount of memory, etc.

Server install hdfs client fail

I am getting the following errors for HDFS client installation on Ambari. Have reset the server several times but still cannot get it resolved. Any idea how to fix that?
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 120, in <module>
HdfsClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 36, in install
self.configure(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 41, in configure
hdfs()
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs.py", line 61, in hdfs
group=params.user_group
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/xml_config.py", line 67, in action_create
encoding = self.resource.encoding
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 87, in action_create
raise Fail("Applying %s failed, parent directory %s doesn't exist" % (self.resource, dirname))
resource_management.core.exceptions.Fail: Applying File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] failed, parent directory /usr/hdp/current/hadoop-client/conf doesn't exist
This is a soft link that link to /etc/hadoop/conf
I run
python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users
After run it, it removes /etc/hadoop/conf
However, reinstall does not recreate it.
So you may have to create all conf files by yourself.
Hope someone can patch it.
yum -y erase hdp-select
If you have done installation multiple times, some packages might not be cleaned.
To remove all HDP packages and start with fresh installation, erase hdp-select.
If this is not helping, remove all the versions from /usr/hdp delete this directory if it contains multiple versions of hdp
Remove all the installed packages like hadoop,hdfs,zookeeper etc.
yum remove zookeeper* hadoop* hdp* zookeeper*
I ran into the same problem: I was using HDP 2.3.2 on Centos 7.
The first problem:
Some conf files point to the /etc//conf directory (same as they are supposed to)
However, /etc//conf points back to the other conf directory which leads to an endless loop.
I was able to fix this problem by removing the /etc//conf symbolic links and creating directories
The second problem
If you run the python scripts to clean up the installation and start over however, several directories do not get recreated, such as the hadoop-client directory. This leads to exact your error message. Also this cleanup script does not work out well as it does not clean several users and directories. You have to userdel and groupdel.
UPDATE:
It seems it was a problem of HDP 2.3.2. In HDP 2.3.4, I did not run into that problem any more.
Creating /usr/hdp/current/hadoop-client/conf on failing host should solve the problem.

Is there any way to run gevent-socketio 0.3.5-rc2 with gunicorn 18.0 without downgrading

I'm running:
gevent==0.13.8
gevent-socketio==0.3.5-rc2
gunicorn==18.0
And have run into the following error:
2013-11-05 06:40:00 [5671] [ERROR] Exception in worker process:
Traceback (most recent call last):
File "/home/vagrant/server/lib/python2.7/site-packages/gunicorn/arbiter.py", line 495, in spawn_worker
worker.init_process()
File "/home/vagrant/server/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 165, in init_process
super(GeventWorker, self).init_process()
File "/home/vagrant/server/lib/python2.7/site-packages/gunicorn/workers/base.py", line 112, in init_process
self.run()
File "/home/vagrant/server/lib/python2.7/site-packages/socketio/sgunicorn.py", line 14, in run
self.socket.setblocking(1)
AttributeError: 'GeventSocketIOWorker' object has no attribute 'socket'
A previous stack overflow question has the solution "downgrade to version 16.0"
GeventSocketIOWorker has no attribute 'socket'
However I'm reluctant to do this because additions in v18.0 are really useful to me.
I'm asking here because I'm not sure if there's an easy solution that I'm missing. If not I imagine I'll need to raise a ticket for gunicorn?
It was a version thing.
gevent-socketio version 0.3.5-rc2 was uploaded to Pypi in July 2012. The fix for this issue came out in Jan 2013.
I solved it by using the master branch from the gevent-socketio repository on GitHub. To do this, change the line for gevent-socketio in requirements.txt to
-e git+git#github.com:abourget/gevent-socketio.git#egg=gevent_socketio

MySQL 5.5. installation on Windows doesn't show up in the registry properly

I am trying to get Django with MySQL working on my Windows 7 machine.
I need mySQLDB interface for this. When I try to install "mySQLDB interface", it looks for the MySQL installation in HKEY_LOCAL_MACHINE.
But in my registry it shows up in HKEY_CURRENT_USER/SOFTWARE/MySQL AB/MySQL Server 5.5.
My SQL instance seems to work fine.
MySQLDB installer somehow needs this in HKEY_LOCAL_MACHINE (I think) as it doesn't let me finish the install. It throws me this error.
Traceback (most recent call last):
File "setup.py", line 15, in <module>
metadata, options = get_config()
File "C:\Blah\Software\MySQL-python-1.2.3\setup_windows.
py", line 8, in get_config
mysql_root, dummy = _winreg.QueryValueEx(serverKey,'Location')
WindowsError: [Error 2] The system cannot find the file specified
Thanks a lot for your time.
You can fix the registry key in site.cfg. Or you can download a Windows MSI package from SourceForge. (No, it did not exist when the question was posted.)

Resources