HPC Pack 2016: "Identity check failed for outgoing message" Error - windows

Hello Stack Overflow community, I am encountering the following errors when I try to add a node to my local computer cluster using Microsoft HPC Pack 2016:
Could not contact node 'NODE-A08' to perform change. Identity check
failed for outgoing message. The expected DNS identity of the remote
endpoint was 'HEAD-NODE01' but the remote endpoint provided DNS claim
'NODE-A08'. If this is a legitimate remote endpoint, you can fix the
problem by explicitly specifying DNS identity 'NODE-A08' as the
Identity property of EndpointAddress when creating channel proxy.
Could not contact node 'NODE-A08' to perform change. The management
service was unable to connect to the node using any of the IP
addresses resolved for the node.
Ultimately I would like to write and test my own MPI programs while using HPC Pack as my cluster manager, but I cannot seem to get past this preliminary step of setting up my cluster.
Through my research in to the issue I have found "Identity check failed for outgoing message..." to be a well documented error related to Windows Communication Foundation (WCF). My understanding is that it occurs when the common name (CN) of the endpoint computer's certificate does not match its DNS identity.
The solutions that I found where lines of code for people writing their own programs, however those solutions do not apply to HPC Pack because I cannot access its source code directly.
Some additional information specific to my situation:
the certificates used by both the head node and the node were issued
individually by a trusted domain
all computers are connect to one enterprise network
the head node's PC name is 'HEAD-NODE01'
the node's PC name is 'NODE-A08'
these errors occur during the provisioning stage of adding a node
the errors are displayed in the provisioning log within HPC Pack
2016's user interface
I was successful in pinging each computer from the other
both computers display the proper DNS IP address when I use command
prompt
the head node is running Windows Server 2012 R2
the node is preconfigured to be a workstation node and is running
Windows 10 Enterprise
Any help would be greatly appreciated. I have looked for a few days and in a lot of places for an answer, but I have not been very successful. Thank you very much in advance!

Subject names of both SSL certificates must be identical

Related

windows AD server has been corrupted

I have used windows 2008 AD since 2013.and I have secondary domain as well. unfortunate due to hardware failure the primary domain was corrupted. I configured a new AD on windows 2012R2.now my concern is when I restart my primary domain it giving many errors. "Naming information cannot be located because: The specified domain either does not exist or could not be contacted. Contact your system administrator to verify that your domain is properly configured and is currently online."
It will automatically resolving when I switch on the secondary domain.
Now what I want to do , I need to resolve this error..
Which server holds the FSMO roles? Are both servers Global Catalog and DNS servers?
What is the status of Sysvol.Go to Registry editor and open the key SysvolReady at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters
Hi this is a known issue.
These issues may occur if TCP/IP filtering is configured to permit only port 80 for TCP/IP traffic.
And you can consult the link for a possible solution

nltest dclist does not show same result

We have two Windows 2012 servers reside on the same subnet on domain "FACTORY".
And we have intermittent authentication issue(3rd party app) with users from domain "OFFICE".
During troubleshooting using nltest command, something which I don't understand.
Here is the output from the first Windows 2021 server:
nltest /dclist:OFFICE
Get list of DCs in domain 'OFFICE' from '\\DC01'.
You don't have access to DsBind to OFFICE (\\DC01) (Trying NetServerEnum).
I_NetGetDCList failed: Status = 6118 0x17e6 ERROR_NO_BROWSER_SERVERS_FOUND
Here is the output from the second Windows 2012 server:
nltest /dclist:OFFICE
Get list of DCs in domain 'OFFICE' from '\\DC02'.
You don't have access to DsBind to OFFICE (\\DC02) (Trying NetServerEnum).
List of DCs in Domain OFFICE
\\DC03 (PDC)
The command completed successfully
Why the 2nd Windows 2012 could get list of DCs in Domain OFFICE? Both servers are located on the same network subnet, both have the same network settings, no WINS. I can see that the nltest was using different DC (DC01 vs DC02) to get the result, which I also don't understand.
I was reading a lot of articles about the error ERROR_NO_BROWSER_SERVERS_FOUND, which pointed to "Computer Browser Service". However, this service is disabled on both servers.
The intermittent authentication issue has never been reported from the 2nd Windows 2012, so I would suspect this nltest result might contribute to that.
What's the domain topology?
What kind of trust is it?
Are there any error events from NETLOGON in the DC event logs on either side?
Does nltest /trusted_domains show the correct info on the FACTORY DCs
Does nltest /sc_query:OtherDomain show any errors on the Trusting side?
Same with netdom trust TrustingDomainName /domain:TrustedDomainName /verify on each of the DCs on each side of the trust? (Or you can check it in AD Domains and Trusts). Unlike nltest, this requires credentials.
Are all the required ports, including all the required RPC ports, open between all the DCs in each domain? And in Windows Firewall? The most important aspect is that the Trusting domain DCs must be able to get to the PDCE in the Trusted domain. At the very least, you need these ports: LDAP (389 UDP and TCP), SMB (445 TCP), Kerberos (88 UDP), RPC portmapper (135 TCP), DNS (53 UDP and TCP)
Have you tried DNS queries from all the DCs to see if you can resolve the SRVs on each side? e.g. nslookup -q=SRV _ldap._tcp.mydomain.com (and the same for _kerberos.tcp and _kerberos.udp)
Do any of the DCs in either domain have the same hostname? Or duplicate SIDs? If the DCs were built from a custom image, were they Sysprepped?
Is the time in sync on all DCs on both sides of the trust? (Within 5 minutes, maximum)
Any errors in NETLOGON.LOG? You can enable NETLOGON debug logging for richer information, but only leave it on for a short time.

Can I sync two PRTG deployments using one database?

I have the following scenario:
WS1 = Windows **(Has Internet)**
WS2 = Windows (No Internet)
SERVERX = Linux (No Internet)
We want to monitor SERVERX ( get CPU usage, disk space..etc and get alerts via email!) I was thinking about using Zabbix OR PRTG (Monitoring tools)
But turned out Zabbix cannot be installed on Windows. and we need to get Disk space usage to be reported view email when exceeding a limit.
Please have a look at the picture to understand the challenge.
I was thinking about the following setup:
WS1 have PRTG installed
WS2 have PRTG installed
Both WS1,WS2 share the same PRTG database (Synced)
Is this even possible? or do you have other solutions?
Use Master Probe on WS1 and remote probe on WS2 - this way your master probe will store collected results into database and send emails since it is connected to Internet.
Remote probe is the one who is doing checks and who is constantly connected to master probe and pushes results to it.
This is standard PRTG functionality - see following page for details https://www.paessler.com/manuals/prtg/install_a_prtg_remote_probe

number of lines returned from TapiManager

I`m working with Avaya IP Office TAPI2 service provider.
I have a question regarding how the lines available to work with are obtained.
If i run a test piece of code from my dev PC, to initiate a TapiManager object and work with the collection of TapiLine objects exposed by TapiManager.Lines
the number of lines returned is more than the number of lines which are exposed on a server running the same test windows forms app. Both machines, i.e. my dev PC and the server have the same TAPI2 provider installed and both are connecting to the same switch IP address under Third Party
dev PC returns something like 460 lines, server has 30 less at 430, and entries which are missing i`m assured by our tech guys that the extentions/lines are configured the same as ones which do appear in the available list.
Thanks
The Avaya TSP exposes two types of IP Office objects:
Users (extensions and TAPI WAV users)
Hunt Groups
The devices listed as available lines, when configured in third-party mode, is dependent on whether the TAPI WAV and ACD queues checkboxes are selected (see Control Panel|Phone and Modem Options).
I would verify that both servers have their TAPI options configured identically in the TSP settings in Phone and Modem Options, e.g. you may not have ACD Queues checked on the server.
FYI - you can also use the Ex Directory attribute (see Users in the IP Office Manager) to control which extensions are visible via TAPI but that will have the same affect on both servers.
The only other thing I think of is whether the machines were restarted after new extensions were added to the IP Office? The TSP will only download a list of lines at startup so extensions added after the TSP is loaded by Windows will not be aware of the new extensions.
If you can enable the TSP logging for the Avaya TSP and include I can review the difference between the two machines.

Test Controller exception: No such host is known

I'm getting following errors on the test controller machine, when I'm trying to run CodedUI Tests remotely:
(QTController.exe, PID 3032, Thread 12)
ControllerDeployment.DoDeployment: System.Net.Sockets.SocketException
(0x80004005): No such host is known
During controller and agent configuration no errors came up. And when I go to Manage test controller dialog in Visual Studio I can see all the agents active. But when I try to execute any CodedUI test remotely it's hanging forever.
Not sure if it's connected with the fact that I've upgraded client/controller/agents to 2012 versions recently, but I've started seeing the problem only after this upgrade.
From Microsoft KB 2643086:
This issue occurs because the test agent computer sends its Network
Basic Input/Output System (NetBIOS) name instead of sending its Fully
Qualified Domain Name (FQDN) name to the test controller computer.
When the DNS server of the test controller computer does not have the
IP address mapping of the NetBIOS name of the test agent computer, the
issue that is described in the "Symptoms" section occurs.
You should ensure you are using fully qualified domain names (FQDN).
There is also a hotfix is available from Microsoft. However, you have to contact Microsoft Customer Support Services to obtain the hotfix.
I had similar problem, still not completely resolved but a workaround is to install Visual Studio on controller box and keeping result DB on same box.
Mostly the issue is restriction / firewall on VPN which might be blocking incoming traffic on TCP ports of machine / laptop.

Resources