Cloudera Host Installation failure: Failed to detect Cloudera Manager Server - hadoop

I'm trying to put a host on a hadoop cluster using the Cloudera Manager.
The two computers I am using for this are the following:
10.10.10.9 is supposed to be a DataNode and my first host
10.10.10.10 has the Cloudera Manager and will be the NameNode
The manager is having trouble with the "Cluster Installation" part of the "Add Hosts to the Cluster" scenario on the GUI.
I get the following error when the manager tries to detect the Cloudera Manager Server:
BEGIN host -t PTR 10.10.10.10
10.10.10.10.in-addr.arpa domain name pointer stardestroyer.riis.local.
END (0)
using stardestroyer.riis.local as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();' stardestroyer.riis.local 7182
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
END (1)
could not contact scm server at stardestroyer.riis.local:7182, giving up
waiting for rollback request
I tried to do as Cheloute instructs in the following link, but it didn't seem to fix my issue. I also had a different error than the poster.
Cloudera Manager. Failed to detect Cloudera Manager Server
If the following is used on the commandline, there's no error.
python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();' 10.10.10.10 7182
I'm not really sure how to fix this in the Cloudera Manager, though.

I found that I could have my system administrator delete my reverse DNS entry and the manager ended up working. I don't know if there is a better solution (there likely is), but this is the one I came up with.

Related

Error while connecting to AWS EMR cluster from mac

I'm trying to create 3 node AWS EMR cluster. I have also create a key to connect to cluster from macOS with command :
ssh -i ~/Downloads/BigdataKey.pem hadoop#ec2-xx-xx-xx-xx.ap-south-1.compute.amazonaws.com
But its giving error :
192:Downloads nageshsinghchauhan$ ssh -i ~/Downloads/BigdataKey.pem hadoop#ec2-xx-xx-xx-xx.ap-south-1.compute.amazonaws.com
ssh: connect to host ec2-xx-xx-xx-xx.ap-south-1.compute.amazonaws.com port 22: Operation timed out
Any one please help me out, I'm trying this for the first time using macOS.
The solution I found is that:
Go to EC2 security groups and and open "ElasticMapReduce-master".
Under Inbound tab, click edit.
Add rule, and provide Type = All TCP, port range = 0-65535, source = MyIP.
now go to terminal and provide permission as :chmod 400 my-key-pair.pem
Last step, try SSH to your cluster via your key from mac.
It's Done :)

Presto installation error

Im trying to install presto on a cluster, but when im trying to deploy/ install presto server through the several nodes it gives an error on every node:
Fatal error: [host1] Needed to prompt for a connection or sudo password (host: host1), but input would be ambiguous in parallel mode.
Does anyone know where the problem came from?
That error comes when you don't have SSH connectivity between the node running presto-admin and the nodes in the cluster. The default user for the connection is root, but that can be changed by modifying config.json. Either set up passwordless SSH or specify the SSH password via --password (or -I for an interactive prompt). See the docs for more on this: https://teradata.github.io/presto/docs/current/presto-admin/ssh-configuration.html.

HAWQ service check Fails From Ambari

We have small Hadoop-Hawq Cluster. In that, once of slave host service fails When I am executing the check of HAWQ services.
Getting following error From Ambari UI:
**stderr:**
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/PHD/3.0/services/HAWQ/package/scripts/service_check.py", line 9, in <module>
HAWQServiceCheck().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/PHD/3.0/services/HAWQ/package/scripts/service_check.py", line 6, in service_check
hawq.verify_segments_state(env)
File "/var/lib/ambari-agent/cache/stacks/PHD/3.0/services/HAWQ/package/scripts/hawq.py", line 20, in verify_segments_state
raise Exception("gpstate command returned non-zero result: {0}. Out: {1} Error: {2}".format(retcode, out, err))
Exception: gpstate command returned non-zero result: 255. Out: Error: Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
**stdout:**
(255, '', 'Permission denied, please try again.\r\nPermission denied, please try again.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n')
Any help on it would be much appreciated ?
HAWQ requires passwordless ssh access to all cluster nodes. Check if the system is configured to allow that, and execute the following hawq command to set up passwordless ssh on the nodes defined in your hostfile:
$ gpssh-exkeys -f hostfile (in version 1.x)
$ hawq ssh-exkeys -f hostfile (in version 2.x)

cluster installation stuck at "installation in progress" in cloudera manager

I have installed Cloudera manager 5.4 version into ubuntu OS. When I try to install cluster using cloudera manager it got stucked "Installation in progress" with no errors.
I could see only two errors from /var/log/cloudera-scm-server/cloudera-scm-server.log
2015-05-12 19:11:42,715 ERROR main:org.hibernate.engine.jdbc.spi.SqlExceptionHelper: ERROR: relation "cm_version" does not exist
Position: 21
2015-05-12 19:16:58,585 ERROR main:com.cloudera.server.web.cmf.cloud.EC2MetadataFetcher: Request to EC2 metadata failed: I/O error: The host did not accept the connection within timeout of 2000 ms; nested exception is org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 2000 ms
can someone pls help
Based on your comments looks to be a DNS/ IP loopback issue.
Update your /etc/hosts file like the following:
I updated the /etc/hosts file to look like this and the inspector worked.
> 127.0.0.1 localhost.localdomain localhost
> xx.xx.xx.xx mc.domain.com mc
Substitute xx.xx with your ip & mc.domain.com with appropriate values for you machines hostname. HTH.

PostgreSQL first time install and attempt to connect

I just installed postgresql-9.1.4-1-windows-x64 on a Windows 7 64 bit machine. I'm having trouble starting the service and connecting to a database.
After a successful installation I've tried the following based on similar postings.
1) Looked for "Start Server" under Start > All Programs > PostreSQL 9.1 and could not find it.
2) Tried starting the server from the command line
pg_ctl.exe -D "C:\Program Files\PostgreSQL\9.1\bin\data"
This gave me the error:
could not create lock file 'postmaster.pid': Permission denied
I have administrative rights, and there is not postmaster.pid file either in the bin or data directories.
3) Next I tried starting the Service from Admistrative Tools by right clicking on the postgresql-9.1.4-1-windows-x64 Service and selecting Start. I received the message:
The postgresql-9.1.4-1-windows-x64 Service on local computer started
and stopped. Some services stop automatically if they are not in use
by other services or programs.
The Event Viewer showed the error as Timed out waiting for server startup
4) I figured there the data in the data directory was probably and initial database, but just in case I ran "initdb" and got:
If you want to create a new database system either remove or empty the
directory c:\program files\postgreSql/9.1/data or run initdb with an
argument other than c:\program files\postgreSql/9.1/data
4) And just for fun I Started pgAdminIII, right clicked on "PostreSQL 9.1(localhost:5432)", selected Connect, entered password, and got:
could not connect to server: Connection refused (0x0000274D/10061) Is
the server running on host "localhost" (::1) and accepting TCP/IP
connections on port 5432?
Does anybody have a suggestion?
Thanks.
«"could not create lock file 'postmaster.pid': Permission denied"»
Do not look any further, Postgres cannot start if it can't create this temp file. If it is not created, you evidently cannot find it if you look for it on the disk. Your DATA directory has been created so no need to re run initdb again and if you try to use pgadmin it complains that it cannot connect to Pg -- which is not running.
I am not familiar with windows but found out where postmaster.pid is to be created you will probably find out why Postgres cannot create this file.
Hope it helps.

Resources