How to debug TFS Lab Management Network Isolation? - visual-studio-2010

How do I debug the TFS Lab Management Network Isolation process? Currently network isolation goes into a Configuring... state, and never actually completes. There are informational messages in the additional information, but other environments I have complete with these informational messages.
What is the steps the environment is going through when configuring Network Isolation and where can I look to find out why the network isolation stays in the configuring state.

The following was how my environment was structured:
Network Isolated, AD domain with multiple other servers including TMG.
First thing to ensure was that the network connections were being treated as a Public Location, and therefore the first fix was to ensure that they were treated as Private Networks.
1. Opened Secpol.msc on each of the VM's
2. Choose Network List Manager Policies
3. Change Unidentified Networks to Location Type Private
4. Change Identifying Networks to Location Type Private
Then when I created the environment, I neglected to remove the machines from the domain and add them to Workgroup before storing it in the library. Then whenever deploying the environment from the library, the environment exhibited the behaviour above, i.e. it never achieved network isolation.
To resolve, remove all servers from the domain, add them to Workgroup, store the environment into the library and then deploy a new environment from the library.
Wait for network isolation to be achieved, and then add each of the servers back into the domain one by one. The key here is patience and to wait for network isolation to be achieved, before moving onto the next server.
If network isolation is not achieved after 20 minutes, shut down the environment. Wait 10 minutes and then start it up again. By this time, it should resolve itself.

Related

How to keep external applications in synch to intersystems instance of a mirror

Our application is built on top of Intersystems IRIS (previously cache) and consists of a large core and DB that is enhanced with several external modules that connect to the core.
We deploy IRIS and the external apps on premise on the same server (for several reasons).
When we use mirror, we have several servers with the same content (IRIS + external modules) that act as a high availability mirroring system, where only one node is the 'active' one and the rest of them are waiting.
Ideally, our external modules are started up and stopped following the IRIS instance on each node using two callbacks available.
When configured in mirror, they are only started on the 'active' node (by a provided callback) and initially stopped on all other nodes.
When a failover occurs and one of the 'waiting' nodes is promoted to 'active', the external apps are started on that promoting node.
On the demoting node (passing from 'active' to waiting, crashed or hang) we don't have a good way to stop those services as there is no callback from intersystems.
We are analyzing possible alternatives, but any other would be greatly appreciated as well as comments:
Implementing an additional service that keeps track of the IRIS instance
Making the external modules 'mirror' aware
I would recommend to use one more server, where you would not need to stop/start services. Just keep the mirror alone, two servers, just only for the mirror, only data, no running code, no users. And one more Server, connected to the mirror as ECP via VIP. In this case, all of your services would work on that server, and should not care about where the status was changed. There will be a short outage during the switch in the mirror, but nothing fatal. I have such a configuration in production, but I have 10 servers behind the mirror, including 1 is just for interoperability reasons. And we already had a few switches, with no issues.

Which local machine components could affect a RDP-session performance-wise?

I've got the following totally reproducible scenario, which I'm unable to understand:
There is a very simple application, which does nothing else than calling CreateObject("Word.Application"), that is creating an instance of MS Word for COM interop. This application is located on a Windows Terminal Server. The test case is to connect via RDP, execute the application and the application will output the time taken for the CreateObject call.
The problem now is that the execution time is significantly longer, if I connect from a specific notebook (HP Spectre): It takes 1,7s (+/- 0.1s).
If I connect from any other machine (notebook or desktop computer), then the execution time is between 0,2-0,4s.
The execution times don't depend on the used RDP account, or screen resolution, or local printers. I even did a fresh install of Windows on that HP notebook to rule out any other side-effects. It doesn't matter if the HP notebook is connected via WLAN or an USB network card. I'm at a loss understanding the 4x to 8x execution time difference to any other machine.
Which reason (component/setting) could explain this big difference in execution time?
Some additional information: I tried debugging the process using an API monitor and could see that >90% of the execution time is actually being spent between a call to RpcSend and RpcReceive. Unfortunately I can't make sense of this information.
It could be the credential management somehow being in the way.
Open the .rdp file with notepad and add
enablecredsspsupport:i:0
This setting determines whether RDP will use the Credential Security Support Provider (CredSSP) for authentication if it is available
Related documentation
https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/ff393716%28v%3dws.10%29
According to your information about RpcSend and RpcReceive time consumption, it could be the case you have some service stopped on your client machine, like DCOM server or some other COM-related (they usually have "COM" or "transaction" in their names).
Some of that services could be started/stopped (if Manually mode selected) by system to/after transfer your request, but there is a time delay to starting service.
I suggest you to open Computer Management - Services or run -> services.msc and compare COM-related services running on your "slow" client and on your "fast" clients, and try to set Automatically running instead Manually or Disabled.
Also, try to run API Monitor on such processes to determine the time-consuming place more precisely.

Azure Availability sets, Fault Domains, and Update Domains

I'm Azure newbie and need some clarifications:
When adding machines to Availability set, in order to prevent VM from rebooting, what's best strategy for VM's, put them in:
-different update and fault domains
-same update domain
-same fault domain ?
My logic is that it's enough to put them in diffrent update AND fault domain
I used this as reference:https://blogs.msdn.microsoft.com/plankytronixx/2015/05/01/azure-exam-prep-fault-domains-and-update-domains/
Am i correct ?
These update/fault domains are confusing
My logic is that it's enough to put them in diffrent update AND fault
domain
You are right, we should put VMs in different update and fault domain.
We put them in different update domain, when Azure hosts need update, Microsoft engineer will update one update domain, when it completed, update another update domain. In this way, our VMs will not reboot in the same time.
we put them in different fault domain, when an Unexpected Downtime happened, VMs in that fault domain will reboot, other VMs will keep running, in this way, our application running on those VMs will keep health.
To shot, add VMs to an availability set with different update domain and fault domain, that will get a high SLA, but not means one VM will not reboot.
Hope that helps.
There are three scenarios that can lead to virtual machine in Azure being impacted: unplanned hardware maintenance, unexpected downtime, and planned maintenance.
Unplanned Hardware Maintenance
An Unexpected Downtime
Planned Maintenance events
Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.
Fault domains define the group of virtual machines that share a common power source and network switch. By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.
For more details, refer this documentation.

Continuous deployment with Microsoft Azure

If a worker role or for that matter web roles are continuously serving both long/short running requests. How does continuous delivery work in this case? Obviously pushing a new release in the cloud will abort current active sessions on the servers. What should be the strategy to handle this situation?
Cloud Services have production and staging slots, so you can change it whenever you want. Continuous D or I can be implemented by using Visual Studio Team Services, and i would recommend it - we use that. As you say, it demands to decide when you should switch production and staging slots (for example, we did that when the user load was very low, in our case it was a night, but it can be different in your case). Slots swapping is very fast process and it is (as far as i know) the process of changing settings behind load balancers not physical deployment.
https://azure.microsoft.com/en-us/documentation/articles/cloud-services-continuous-delivery-use-vso/#step6
UPD - i remember testing that, and my experience was that incoming connections were stable (for example, RDP) and outgoing are not. So, i can not guarantee that existing connections will be ended gracefully, but from my experience there were no issues.

Common Issues in Developing Cluster Aware non-web-based Enterprise Applications

I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are
Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?
My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?
I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).
re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.
Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.
I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.
For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.
For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).
Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.
I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.
Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.
MSDN:
When a port rule uses multiple-host
load balancing, one of three client
affinity modes is selected. When no
client affinity mode is selected,
Network Load Balancing load-balances
client traffic from one IP address and
different source ports on
multiple-cluster hosts. This maximizes
the granularity of load balancing and
minimizes response time to clients. To
assist in managing client sessions,
the default single-client affinity
mode load-balances all network traffic
from a given client's IP address on a
single-cluster host. The class C
affinity mode further constrains this
to load-balance all client traffic
from a single class C address space.
In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;
The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.
Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.
Network Load Balancing Best practices
Web Farming with the
Network Load Balancing Service
in Windows Server 2003 might help you give an insight
Concurrency (Check out Apache Cassandra, et al)
Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)

Resources