What are these warning that fleetctl is outputting? - vagrant

Running the command fleetctl load registry#1.service registry-presence#1.service I get the following output:
2015/05/08 10:25:26 WARN fleetctl.go:772: Error retrieving Unit(registry#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry%401.service?alt=json: forwarding request denied
2015/05/08 10:30:31 WARN fleetctl.go:772: Error retrieving Unit(registry-presence#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry-presence%401.service?alt=json: forwarding request denied
2015/05/08 10:36:14 WARN fleetctl.go:772: Error retrieving Unit(registry#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
2015/05/08 10:42:44 WARN fleetctl.go:772: Error retrieving Unit(registry-presence#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry-presence%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
2015/05/08 10:54:46 WARN fleetctl.go:772: Error retrieving Unit(registry#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
2015/05/08 10:57:51 WARN fleetctl.go:772: Error retrieving Unit(registry-presence#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry-presence%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
2015/05/08 10:58:12 WARN fleetctl.go:772: Error retrieving Unit(registry#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
2015/05/08 11:02:43 WARN fleetctl.go:772: Error retrieving Unit(registry-presence#1.service) from Registry: Get http://domain-sock/fleet/v1/units/registry-presence%401.service?alt=json: ssh: rejected: administratively prohibited (open failed)
This just repeats, I've left the command running for over 30 minutes. When I press CTRL-C and run fleetctl list-unit-files I see the following:
UNIT HASH DSTATE STATE TARGET
registry-presence#1.service f54aa0d loaded inactive 0d8d13be.../172.17.8.101
registry#1.service d233714 loaded inactive 0d8d13be.../172.17.8.101
And the output of fleetctl list-units is:
UNIT MACHINE ACTIVE SUB
If I run the load command with -block-attempts=2 it gives the same errors but completes and the output of fleetctl list-unit-files is:
UNIT HASH DSTATE STATE TARGET
registry-presence#1.service f54aa0d loaded loaded 0d8d13be.../172.17.8.101
registry#1.service d233714 loaded loaded 0d8d13be.../172.17.8.101
And the output of fleetctl list-units is:
UNIT MACHINE ACTIVE SUB
registry-presence#1.service 0d8d13be.../172.17.8.101 inactive dead
registry#1.service 0d8d13be.../172.17.8.101 inactive dead
I'm wondering what the WARN logs trying to tell me, what registry is it talking about?

That happens to me also when running fleetctl from my local machine to a distant cluster.
The registry it's talking about is the fleet registry, the units that have been submitted and/or loaded in with fleetctl submit or fleetctl load.
What I usually do in those cases is connecting to one of the nodes, and run my fleetctl start command from there. Then you can run them from your local machine without any more problems.
What I suspect is that for some reason the unit file is not loaded in the whole cluster when you run that command.

Related

Passwordless chef client bootstrapping

I am bit familiar with Chef and its bootstrapping techniques. I am trying to bootstrap my new chef-client/node without passing password
I tried below by generating a ssh key but still failing
knife bootstrap MY_NODE_IP -x SERVER_ADMIN_USERNAME -i PATH_TO_KEY_FILE --sudo --node-name THE_NODE_NAME
On triggering above command on Chef DK getting error as below
WARN: [SSH] PTY requested: stderr will be merged into stdout
WARN: [SSH] connection failed, terminating (#<Net::SSH::AuthenticationFailed: Authentication failed for user user#mynode>)
ERROR: Train::Transports::SSHFailed: SSH session could not be established
I also tried doing manual installation as per below instruction , but again a failure https://serverfault.com/questions/761167/how-to-manually-set-up-a-chef-node
I created a client manually, but I was unable to create a node in chef server manually. Please suggest
Getting network error as below
Networking Error:
-----------------
Error connecting to https://myserver/organizations/organization/nodes/mynode - Failed to open TCP connection to www.internet:8080 (getaddrinfo: Name or service not known)
Bootstrapping from my chef DK also throws an error
Is there a way to bootstrap linux chef client without using password from a windows chef DK?
Below is my Chef environment
1.Chef Infra Client: 15.14.0
2.Chef Workstation 0.8.7.1
3.Chef-server 12.18.14

drbdadm not creating block device

We are in process of building active-passive cluster via DRBD installed in Centos-7.4 which running kernel-3.10.0-862.el7. While creating cluster with drbadm is unable to create a volume and giving below error. Can you please help me out.
open(/dev/vdb) failed: Invalid argument
could not open with O_DIRECT, retrying without
'/dev/vdb' is not a block device!
open(/dev/vdb) failed: Invalid argument
could not open with O_DIRECT, retrying without
'/dev/vdb' is not a block device!
Command 'drbdmeta 0 v08 /dev/vdb internal create-md' terminated with exit code 20

greenplum initialization failed

When I tried to initialize Greenplum I got the following error
20180408:23:21:02:017614 gpstop:datanode3:root-[INFO]:-Starting gpstop with args:
20180408:23:21:02:017614 gpstop:datanode3:root-[INFO]:-Gathering information and validating the environment...
20180408:23:21:02:017614 gpstop:datanode3:root-[ERROR]:-gpstop error: postmaster.pid file does not exist. is Greenplum instance already stopped?
also when i tried to check gpstate command i got the following error
20180408:23:21:48:017711 gpstate:datanode3:root-[INFO]:-Starting gpstate with args:
20180408:23:21:48:017711 gpstate:datanode3:root-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.7.0 build f7c6eb5-oss'
20180408:23:21:48:017711 gpstate:datanode3:root-[CRITICAL]:-gpstate failed. (Reason='could not connect to server: Connection refused
I also did the configuration an add a permission on PostgreSQL.conf, but the same issue
You have pasted the output of gpstop.
gpstop error: postmaster.pid file does not exist. is Greenplum
instance already stopped?
Which means that the database is not running.

Kylo service startup fails

I try to install Kylo on my existing HDP 2.6.2 Hadoop cluster. I follow Kylo Document but when I try to start Kylo, it says starting, but start fails a few seconds later and status turns stopped.
[root#<KYLO_EDGE_NODE> ~]# service kylo-services start
Starting kylo-services ...
using NiFi profile: nifi-v1.2
[root#<KYLO_EDGE_NODE> ~]# service kylo-services status
Running. Here are the related processes:
29294 java
[root#<KYLO_EDGE_NODE> ~]# service kylo-services status
Running. Here are the related processes:
29294 java
[root#<KYLO_EDGE_NODE> ~]# service kylo-services status
Stopped.
I share important part of Kylo service logs due to restriction.
2017-10-23 17:34:58 INFO main:KyloVersionUtil:100 - finding version information from /opt/kylo/kylo-services/conf/version.txt
2017-10-23 17:34:58 INFO main:KyloVersionUtil:108 - loaded Kylo version file: 0.8.3.3 build Time: 2017-10-16 16:17
2017-10-23 17:35:00 ERROR main:ConnectionPool:182 - Unable to create initial connections of pool.
java.sql.SQLException: Access denied for user 'kylo'#'<KYLO_EDGE_NODE>' (using password: YES)
2017-10-23 17:35:01 ERROR main:ConnectionPool:182 - Unable to create initial connections of pool.
java.sql.SQLException: Access denied for user 'kylo'#'<KYLO_EDGE_NODE>' (using password: YES)
2017-10-23 17:35:16 ERROR localhost-startStop-1:TomcatStarter:63 - Error starting Tomcat context: org.springframework.beans.factory.UnsatisfiedDependencyException
on mysql instance, kylo database:
GRANT ALL PRIVILEGES ON *.* TO 'kylo'#'<KYLO_EDGE_NODE>' IDENTIFIED BY '%password%' WITH GRANT OPTION;
didn't work.
I can reach mysql instance with:mysql -u kylo -p
It appears that the MySQL user you've configured in Kylo does not exist in the MySQL server. Please ensure that you've created a user in MySQL with the same username as Kylo's spring.datasource.username, same password as spring.datasource.password, and the same host as <KYLO_EDGE_NODE>.
You can verify the MySQL user is created properly by running MySQL from the command-line on the Kylo server:
mysql -u kylo -p <MYSQL_INSTANCE_NODE> kylo
Additionally, you can download the MySQL JDBC driver and install the jar file to /opt/kylo/kylo-services/lib/ then delete the /opt/kylo/kylo-services/lib/mariadb-java-client-1.5.7.jar file from Kylo.
I have found fundamental solution. I removed kylo and activemq, then freshly installed again with rpm. I don't know what I have done different? But, now It works, I can login and kylo-services starts and doesn't stop.
Try the below one, It worked for me
The above syntax is incorrect
drop the created user, then run below query
CREATE USER 'kylo'#'<KYLO_EDGE_NODE>' IDENTIFIED BY IDENTIFIED BY '%password%';
GRANT ALL PRIVILEGES ON *.* TO 'kylo'#'<KYLO_EDGE_NODE>' WITH GRANT OPTION;
And restart the mysql server
then start the kylo-service

MS MPI Permission errors

I have two machines both with MS MPI 7.1 installed, one called SERVER and one called COMPUTE.
The machines are set up on LAN in a simple windows workgroup (No DA), and both have an account with the same name and password.
Both are running the MSMPILaunchSvc service.
Both machines can execute MPI jobs locally, verified by testing with the hostname command
SERVER> mpiexec -hosts 1 SERVER 1 hostname
SERVER
or
COMPUTE> mpiexec -hosts 1 COMPUTE 1 hostname
COMPUTE
in a terminal on the machines themselves.
I have disabled the firewall on both machines to make things easier.
My problem is I can not get MPI to run jobs from SERVER on a remote host:
1: SERVER with MSMPILaunchSvc -> COMPUTE with MSMPILaunchSvc
SERVER> mpiexec -hosts 1 COMPUTE 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 1722
Aborting: mpiexec on SERVER is unable to connect to the smpd service on COMPUTE:8677
Other MPI error, error stack:
connect failed - The RPC server is unavailable. (errno 1722)
What's even more frustrating here is that only sometimes I get prompted to enter a password. It suggests SERVER\Maarten as the user for COMPUTE, the account I am already logged in as on SERVER and shouldn't exist on COMPUTE (should be COMPUTE\Maarten then?). Nonetheless it also fails:
SERVER>mpiexec -hosts 1 COMPUTE 1 hostname.exe -pwd
Enter Password for SERVER\Maarten:
Save Credentials[y|n]? n
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the
smpd manager on COMPUTE:50915 error 1726
2: COMPUTE with MSMPILaunchSvc -> SERVER with MSMPILaunchSvc
COMPUTE> mpiexec -hosts 1 SERVER 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 5
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
3: COMPUTE with MSMPILaunchSvc -> SERVER with smpd daemon
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
4: SERVER with MSMPILaunchSvc -> COMPUTE with smpd daemon
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the smpd manager on
COMPUTE:51022 error 1726
Update:
Trying with smpd daemon on both nodes I get this error:
[-1:9796] Authentication completed. Successfully obtained Context for Client.
[-1:9796] version check complete, using PMP version 3.
[-1:9796] create manager process (using smpd daemon credentials)
[-1:9796] smpd reading the port string from the manager
[-1:9848] Launching smpd manager instance.
[-1:9848] created set for manager listener, 376
[-1:9848] smpd manager listening on port 51149
[-1:9796] closing the pipe to the manager
[-1:9848] Authentication completed. Successfully obtained Context for Client.
[-1:9848] Authorization completed.
[-1:9848] version check complete, using PMP version 3.
[-1:9848] Received session header from parent id=1, parent=0, level=0
[01:9848] Connecting back to parent using host SERVER and endpoint 17979
[01:9848] Previous attempt failed with error 5, trying to authenticate without Kerberos
[01:9848] Failed to connect back to parent error 5.
[01:9848] ERROR: Failed to connect back to parent 'ncacn_ip_tcp:SERVER:17979' error 5
[01:9848] smpd manager successfully stopped listening.
[01:9848] SMPD exiting with error code 4294967293.
and on the host:
[-1:12264] Launching SMPD service.
[-1:12264] smpd listening on port 8677
[-1:12264] Authentication completed. Successfully obtained Context for Client.
[-1:12264] version check complete, using PMP version 3.
[-1:12264] create manager process (using smpd daemon credentials)
[-1:12264] smpd reading the port string from the manager
[-1:16668] Launching smpd manager instance.
[-1:16668] created set for manager listener, 364
[-1:16668] smpd manager listening on port 18033
[-1:12264] closing the pipe to the manager
[-1:16668] Authentication completed. Successfully obtained Context for Client.
[-1:16668] Authorization completed.
[-1:16668] version check complete, using PMP version 3.
[-1:16668] Received session header from parent id=1, parent=0, level=0
[01:16668] Connecting back to parent using host SERVER and endpoint 18031
[01:16668] Authentication completed. Successfully obtained Context for Client.
[01:16668] Authorization completed.
[01:16668] handling command SMPD_CONNECT src=0
[01:16668] now connecting to COMPUTE
[01:16668] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:16668] using spn msmpi/COMPUTE to contact server
[01:16668] SERVER posting a re-connect to COMPUTE:51161 in left child context.
[01:16668] ERROR: Failed to connect to SMPD Manager Instance error 1726
[01:16668] sending abort command to parent context.
[01:16668] posting command SMPD_ABORT to parent, src=1, dest=0.
[01:16668] ERROR: smpd running on SERVER is unable to connect to smpd service on COMPUTE:8677
[01:16668] Handling cmd=SMPD_ABORT result
[01:16668] cmd=SMPD_ABORT result will be handled locally
[01:16668] parent terminated unexpectedly - initiating cleaning up.
[01:16668] no child processes to kill - exiting with error code -1
I found after trial and error that these and other unspecific errors come up when trying to run MS MPI with different configurations (in my case a mix of HPC Cluster 2008 and HPC Cluster 2012 with MSMPI).
The solution was to downgrade all nodes to Windows Server 2008 R2 with HPC Cluster 2008. Because I dont use AD, I had to fall back to using the SMPD daemon and add firewall rules for it (skipping the cluster management tools alltogether).

Resources