MS MPI Permission errors - windows

I have two machines both with MS MPI 7.1 installed, one called SERVER and one called COMPUTE.
The machines are set up on LAN in a simple windows workgroup (No DA), and both have an account with the same name and password.
Both are running the MSMPILaunchSvc service.
Both machines can execute MPI jobs locally, verified by testing with the hostname command
SERVER> mpiexec -hosts 1 SERVER 1 hostname
SERVER
or
COMPUTE> mpiexec -hosts 1 COMPUTE 1 hostname
COMPUTE
in a terminal on the machines themselves.
I have disabled the firewall on both machines to make things easier.
My problem is I can not get MPI to run jobs from SERVER on a remote host:
1: SERVER with MSMPILaunchSvc -> COMPUTE with MSMPILaunchSvc
SERVER> mpiexec -hosts 1 COMPUTE 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 1722
Aborting: mpiexec on SERVER is unable to connect to the smpd service on COMPUTE:8677
Other MPI error, error stack:
connect failed - The RPC server is unavailable. (errno 1722)
What's even more frustrating here is that only sometimes I get prompted to enter a password. It suggests SERVER\Maarten as the user for COMPUTE, the account I am already logged in as on SERVER and shouldn't exist on COMPUTE (should be COMPUTE\Maarten then?). Nonetheless it also fails:
SERVER>mpiexec -hosts 1 COMPUTE 1 hostname.exe -pwd
Enter Password for SERVER\Maarten:
Save Credentials[y|n]? n
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the
smpd manager on COMPUTE:50915 error 1726
2: COMPUTE with MSMPILaunchSvc -> SERVER with MSMPILaunchSvc
COMPUTE> mpiexec -hosts 1 SERVER 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 5
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
3: COMPUTE with MSMPILaunchSvc -> SERVER with smpd daemon
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
4: SERVER with MSMPILaunchSvc -> COMPUTE with smpd daemon
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the smpd manager on
COMPUTE:51022 error 1726
Update:
Trying with smpd daemon on both nodes I get this error:
[-1:9796] Authentication completed. Successfully obtained Context for Client.
[-1:9796] version check complete, using PMP version 3.
[-1:9796] create manager process (using smpd daemon credentials)
[-1:9796] smpd reading the port string from the manager
[-1:9848] Launching smpd manager instance.
[-1:9848] created set for manager listener, 376
[-1:9848] smpd manager listening on port 51149
[-1:9796] closing the pipe to the manager
[-1:9848] Authentication completed. Successfully obtained Context for Client.
[-1:9848] Authorization completed.
[-1:9848] version check complete, using PMP version 3.
[-1:9848] Received session header from parent id=1, parent=0, level=0
[01:9848] Connecting back to parent using host SERVER and endpoint 17979
[01:9848] Previous attempt failed with error 5, trying to authenticate without Kerberos
[01:9848] Failed to connect back to parent error 5.
[01:9848] ERROR: Failed to connect back to parent 'ncacn_ip_tcp:SERVER:17979' error 5
[01:9848] smpd manager successfully stopped listening.
[01:9848] SMPD exiting with error code 4294967293.
and on the host:
[-1:12264] Launching SMPD service.
[-1:12264] smpd listening on port 8677
[-1:12264] Authentication completed. Successfully obtained Context for Client.
[-1:12264] version check complete, using PMP version 3.
[-1:12264] create manager process (using smpd daemon credentials)
[-1:12264] smpd reading the port string from the manager
[-1:16668] Launching smpd manager instance.
[-1:16668] created set for manager listener, 364
[-1:16668] smpd manager listening on port 18033
[-1:12264] closing the pipe to the manager
[-1:16668] Authentication completed. Successfully obtained Context for Client.
[-1:16668] Authorization completed.
[-1:16668] version check complete, using PMP version 3.
[-1:16668] Received session header from parent id=1, parent=0, level=0
[01:16668] Connecting back to parent using host SERVER and endpoint 18031
[01:16668] Authentication completed. Successfully obtained Context for Client.
[01:16668] Authorization completed.
[01:16668] handling command SMPD_CONNECT src=0
[01:16668] now connecting to COMPUTE
[01:16668] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:16668] using spn msmpi/COMPUTE to contact server
[01:16668] SERVER posting a re-connect to COMPUTE:51161 in left child context.
[01:16668] ERROR: Failed to connect to SMPD Manager Instance error 1726
[01:16668] sending abort command to parent context.
[01:16668] posting command SMPD_ABORT to parent, src=1, dest=0.
[01:16668] ERROR: smpd running on SERVER is unable to connect to smpd service on COMPUTE:8677
[01:16668] Handling cmd=SMPD_ABORT result
[01:16668] cmd=SMPD_ABORT result will be handled locally
[01:16668] parent terminated unexpectedly - initiating cleaning up.
[01:16668] no child processes to kill - exiting with error code -1

I found after trial and error that these and other unspecific errors come up when trying to run MS MPI with different configurations (in my case a mix of HPC Cluster 2008 and HPC Cluster 2012 with MSMPI).
The solution was to downgrade all nodes to Windows Server 2008 R2 with HPC Cluster 2008. Because I dont use AD, I had to fall back to using the SMPD daemon and add firewall rules for it (skipping the cluster management tools alltogether).

Related

Pre-login handshake Error while connecting to Azure SQl Edge from ADS

I'm setting up Azure SQL Database on the local machine (Windows 11) using Azure Data Studio.
I followed the below article to create an Azure SQL Edge instance:
https://learn.microsoft.com/en-us/azure/azure-sql/database/local-dev-experience-quickstart?view=azuresql
And after publishing (i.e after step 11 in above article) I'm getting the below error logs:
Waiting for 2 seconds before another attempt for operation 'Validating the docker container'
Running operation 'Validating the docker container' Attempt 0 of 10
> docker ps -q -a --filter label=source=sqldbproject-choicemls -q
stdout: 142c44a8b420
stdout:
>>> docker ps -q -a --filter label=source=sqldbproject-choicemls -q … exited with code: 0
Operation 'Validating the docker container' completed successfully. Result: 142c44a8b420
Docker created id: '142c44a8b420
'
Waiting for 10 seconds before another attempt for operation 'Connecting to SQL Server'
Running operation 'Connecting to SQL Server' Attempt 0 of 3
Operation 'Connecting to SQL Server' failed. Re-trying... Current Result: undefined. Error: 'Connection failed error: 'A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)''
Waiting for 10 seconds before another attempt for operation 'Connecting to SQL Server'
Running operation 'Connecting to SQL Server' Attempt 1 of 3
Operation 'Connecting to SQL Server' failed. Re-trying... Current Result: undefined. Error: 'Connection failed error: 'A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)''
Waiting for 10 seconds before another attempt for operation 'Connecting to SQL Server'
Running operation 'Connecting to SQL Server' Attempt 2 of 3
Operation 'Connecting to SQL Server' failed. Re-trying... Current Result: undefined. Error: 'Connection failed error: 'A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)''
Please give suggestions to solve this issue.
Thanks,
Saurabh

MC can't Add a Cloud Storage Device

I am trying to create a MinIO Server and Client so I can run some tests and I am getting the error bellow when I try to run mc:
C:\mc.exe alias set myminio http://169.254.44.236:9000 admin password
mc.exe: <ERROR> Unable to initialize new alias from the provided credentials. Get
"http://169.254.44.236:9000/probe-bucket-sign-6oubmdhiezos/?location=": dial tcp
169.254.44.236:9000: connectex: A socket operation was attempted on an unreachable network.
I just copied the command given to me when I started the MinIO Server. I am using a Windows10 PC remotely.

How to connect to mongo shell from windows cmd?

This is my first time using Mongodb, It connect normally using the IDE from Mongodb university basics m001 course but when I tried to connect from cmd it won't connect and gives me this error
*** You have failed to connect to a MongoDB Atlas cluster. Please ensure that your IP whitelist allows connections from your network.
Error: Authentication failed. :
connect#src/mongo/shell/mongo.js:374:17 #(connect):2:6 exception:
connect failed exiting with code 1
I added the path to my environment & checked the network access from my Atlas and I can connect to it from anywhere
Can anyone help me with this please??

ORA-12505, TNS:listener does not currently know of SID given in connect descriptor while connecting as JDBC remote client

I have gone through all the suggestions given by many of you for this query. But unfortunately none of them has fixed my issue.
Problem:- I have installed Oracle12c into a our remote machine(Host1) and by default ORCL database has created and running. After that by using Database configuration Assistant I have created new database "YILIDB". I can connect to these databases from Host1 through sqldeveloper tool. But when I'm trying to access the above databases from another machine (Host2) I can't connect to it. First I tried by using JDBC code to connect to that data base. I got the below Exception.
Code:-
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn = DriverManager.getConnection("jdbc:oracle:thin:172.26.8.188:1521:YILIDB", "WM6",
"WM6");
Error:-
java.sql.SQLException: Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:825)
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:755)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:38)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:599)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at ConnectionTest.main(ConnectionTest.java:39)
Caused by: oracle.net.ns.NetException: Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor
at oracle.net.ns.NSProtocolStream.negotiateConnection(NSProtocolStream.java:324)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:287)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1963)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:564)
... 6 more
And when I'm trying to connect from Host2 sqldeveloper to Host1 database I got below error which means basically I can't connect to remote database neither with JDBC code nor using sqldeveloper.
Error while I'm trying to connect from sqldeveloper:-
Status: Test-failed: IO error the network adapter could not establish the connection
And In the remote database machine I have verified services. Both ORCL and YILIDB services are running. I see only one listener in services up and running.
Can someone please provide me the solution for this.
Find the status of >lsnrctl status below
C:\Users\Administrator>lsnrctl status
LSNRCTL for 64-bit Windows: Version 12.1.0.1.0 - Production on 25-OCT-2016 14:14:41
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for 64-bit Windows: Version 12.1.0.1.0 - Production
Start Date 25-OCT-2016 14:04:56
Uptime 0 days 0 hr. 9 min. 44 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Default Service YILIDB
Listener Parameter File C:\app\Administrator\product\12.1.0\dbhome_1\network\admin\listener.ora
Listener Log File C:\app\Administrator\diag\tnslsnr\SMYB2SW12-Yili\listener\alert\log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.pipe\EXTPROC1521ipc)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.28.8.199)(PORT=1521)))
Services Summary...
Service "CLRExtProc" has 1 instance(s).
Instance "CLRExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
The problem has been fixed by changing the jdbc url with #.
Previously:
jdbc:oracle:thin:123.45.6.78:1521:YILIDB
Now:
jdbc:oracle:thin:#123.45.6.78:1521:YILIDB

Can't connect remotely to WAS 8.5 full profile installed on Ubuntu 14.04 from RAD 9.5 installed on OSX

WAS 8.5 full profile isn't officially supported on OSX by IBM so the only option to perform development from OSX is to install stub runtime and connect to WAS installed remotely. I'm trying to install such scenario but something done wrong and can't connect to my WAS.
There's my installation:
On OSX El Capitan Installed:
RAD 9.5 with WAS 8.5 stub runtime (WebSphere Application Server traditional V8.5 stub)
Installed Virtual Box with Ubuntu Desktop edition 14.04
Ubuntu's hostname: anatoly-ubuntu-vm and it's accessible from host, i.e. ping anatoly-ubuntu-vm works fine
On Ubuntu installed:
WAS 8.5 full profile at /opt/IBM/WebSphere
Created AppSrv01 profile at /opt/IBM/WebSphere/AppServer/profiles
WAS installed with root user, IBM Installation Manager required root permission when was started
My connection settings in RAD:
server name: WebSphere Application Server traditional V8.5 stub at anatoly-ubuntu-vm
hostname: anatoly-ubuntu-vm
Runtime environment: WebSphere Application Server traditional V8.5 stub
Connection type:
I've tried RMI 2809 and SOAP 8880 both options didn't work
Enable the server to start remotely is checked and Select the operating system running the remote server: checked Linux option with my username and password. I've tried my regular Ubuntu account and root, both option didn't work.
Server profile path defined as /opt/IBM/WebSphere/AppServer/profiles/AppSrv01
When I try to start server I get the following exception:
The following problems has occurred when starting the server.
CTGRI0001E The application could not establish a connection to
anatoly-ubuntu-vm .
What am I doing wrong?
UPDATE 1:
After I've written this post I've figured out that SSH server isn't installed and configured at all as described here: Requirements for using Remote Execution and Access (RXA) Now, I've installed and it seems like connecting but stuck at 23% at stage Preparing launch delegate, after a while it throw the following error:
The following problems has occurred when starting the server. The
server may not be started in the correct mode. You can restart the
server to desired mode if it is started. CTGRI0075E A file transfer to
or from the system named [anatoly-ubuntu-vm] timed out before the
transfer could complete. The current timeout interval is set to 240000
milliseconds, and might need to be increased.
UPDATE 2:
As I can see despite error message server is started and I even can connect to web console anatoly-ubuntu-vm:9060/console/ibm but it looks that not SOAP connection, neither RMI connection don't work. When I run Test Connection from Settings overview page in RAD, I get the following error:
The connection failed after trying to use all the available connection
types.
Verify the port values are correct and the server has been started. If
the security of the server is enabled, verify the "Security is enabled
on this server" check box is selected, and the user ID and password
are provided. You can specify this in the server editor or when
creating a new server.
For a Technote with details on the most common server connection
problem, see http://www.ibm.com/support/docview.wss?uid=swg21266028.
The last connection attempt failed with the following exception:
ADMC0016E: The system cannot create a SOAP connector to connect to
host anatoly-ubuntu-vm at port 8880.
UPDATE 3
As #DanielBarbarian guessed I tried to run ./wsadmin.sh -port 8880 and it worked and returned
Connected to process "server1" on node anatoly-ubuntu-vmNode01 using SOAP connector; The type of process is: UnManagedProcess`
This is settings of my ports:
UPDATE 4
When I trying to run telnet anatoly-ubuntu-vm 8880 from OSX host I get the following response (ip address changed to non real due to privacy issue):
anatoly-mac:~ anatoly$ telnet anatoly-ubuntu-vm 8880
Trying 192.168.10.10...
Connected to anatoly-ubuntu-vm
Escape character is '^]'.
HTTP/1.1 408 Request Timeout
Content-Type: text/html
Content-Length: 117
Connection: close
<HTML><TITLE>408 - Request Timeout</TITLE><BODY>
<h1>408 Connection timed out while reading request</h1></BODY>
</HTML>Connection closed by foreign host

Resources