AppFabric Cache seems unstable - caching

We're trying to use AppFabric distributed cache. After a lot of back and forth with non-domain servers we finally put them in a domain and installation/setup was a bit easier. We got it up and running after fighting through a ton of errors, most of which seems trivial to include some test or more descriptive error message for in AppFabric. "Temporary error" does not explain a lot...
But there are still issues.
We set up 3 servers, one of which is "lead". We finally got the cache working and we confirmed this by pointing a Network Load Balancer to one server at a time confirming that we can set cache at one server and retrieve it at another.
Then I restarted the AppFabric Caching service on all servers and suddenly it is not working. Get-CacheHost says they are up, but we get exceptions like:
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out
ErrorCode<ERRCA0017>:SubStatus<ES0001>:There is a temporary failure. Please retry later.
Why would this error condition occur by simply restarting the services?
Is AppFabric Cache really ready for production use?
What happens if a server goes offline? Long timeouts?
Are we dependent on the "lead" server being up?
I suspect it will be back up after 5-10 minutes of R&R. It seems to come back by itself sometimes.
Update: It did come up after a few minutes. We have now tested by removing one server from the cluster and it resulted in a long timeout and finally an exception.

We have been debugging this for some time and I'm sharing what we have found so far.
UAC on Windows 2008 actually blocks access to local computer, so commands towards local computer will fail. Start PowerShell as admin or turn off UAC completely to bypass.
Simply changing the config file manually will not work. You need to use export and import commands.
Firewalls are a major issue as the installer opens the 222* range of ports, but the PowerShell tools use other Windows services. Turning off the firewall on all servers (not recommended) solved the problem.
If a server is removed from the cluster there will be an initial timeout before the cluster can operate again.
After restart the cluster uses 2-5 minutes to get back up.
If restarting and one server is not reachable the startup time is increased.
If the server holding the shared fileshare for config is not reachable the services will not start. We tried to solve this by giving each server a private share.

Related

How to wait for network connectivity before starting a service in Windows?

I'm attempting to use Suricata IDS for monitoring the network events in a specific interface.
I have been able to install the service and I am able to execute it.
Everything seems to be working fine, except for the fact that the service might fail to start at computer startup.
The service in fact necessarily needs to have the connection ready and available (assigned IP, etc.) when Windows starts it.
So, the service is working correctly at startup ONLY when the network is up and running for that interface.
If for some reason the network connection comes up later than usual, the service fails to start, and Windows seems to not try again to start it.
I've been looking at other stackoverflow questions in order to correctly set the dependencies for the network connection and for what I'm trying to do, but they seem to not be working.
So, what are the correct dependencies to add to my service for instructing Windows to wait for the network connection to be up and running (IP already assigned, DNS already set, etc.) before starting it?
If you have an alternative idea on how to solve the problem you're welcome to post a reply.
Any suggestion is appreciated.

Microservice HTTP port problem on Windows Server

We have written (in Go and Delphi) several Windows microservices, which respond to HTTP requests on specific ports in the 11000-12000 range. These are designed to run internally within the Domain or Private network of the client (i.e. not on the internet).
They run perfectly on all but one of our 50+ client systems, on OS's ranging from Windows 7/10/11 to Windows Server 2008R2/2012/2016/2019. The installation process for each of these services sets up rules in the Windows firewall to accept the requests to each service exe.
The one client system that they dont work on is running Windows Server 2016 Essentials. This is the only client system running that specific OS, so that may be a factor in the problem.
Even locally using a web browser on that system to query the services they dont work. The requests just wait for a while and then timeout: ERR_CONNECTION_TIMED_OUT.
However the same requests to the same ports at address 127.0.0.1 (localhost) work instantly - proving the services are actually running.
The mode of failure when the targeted service is not running, or if we address the wrong port, is different. In that case we get a quick "refused to connect" failure: ERR_CONNECTION_REFUSED
There are no third party antivirus or firewall products installed on the system, which is only using Windows Defender with the normal Windows firewall. We've tried everything we can think of with the Windows firewall, including turning it off completely. Nothing we've tried made any difference.
We've tried using many alternative port numbers, but we dont get any success until we get up to the 49000 range and above, but we'd really rather not have to change from our normal port number range unless its completely unavoidable.
We've spent many hours trying to find any solution without any luck. We are really hoping that some bright person out there has some idea that will lead to finding the cause of the problem.

Why do I get a 403 when hitting a localhost iisexpress site hosted on a VM

So before I start, I know how to open up IISExpress to the network, there's plenty of articles about this, I even wrote one. However, these approaches do have limitations - most notably the need to add bindings which I've found to be hit-or-miss and worse, breaks any code that does things based on hostname.
I am on OsX running Windows 10 inside of Parallels and earlier this week I thought I should try to get localhost forwarded into the VM in the same way that I can do with docker containers.
I got really close, but I cannot figure out the last step. Here is what I did and what I am seeing:
I temporarily disabled my firewall on Windows
I forwarded the correct port (44300 in this case) to the guest VM and rebooted it.
I ran iisexpress through the command line so that I can see the full log (I tried through visual studio as well)
What I'm seeing is that when I hit localhost:44300 from inside the VM, my site loads fine and all sorts of things are logged.
But when I hit it from the host I'm getting
HTTP Error 403. The request URL is forbidden.
There is nothing logged by iisexpress in this scenario.
However, I know that the request is in some manner getting through since if I stop iisexpress and head to localhost:44300 again on the host, I get the standard
localhost refused to connect. ERR_CONNECTION_REFUSED
The fact that iisexpress is or is not listening on that port is cascading up to the host VM therefore, but nothing else is getting through.
So I'm a bit at a loss - it almost looks kinda like a binding thing, but as far as iisexpress knows, due to the port forwarding, wouldn't this request be coming from localhost? How can it tell if it is not? Even that sounds unlikely due to the lack of log messages. Also I never see anything pop up in fiddler.
What I wonder instead if there might be some yet other windows component that sees something bound to 44300, sees an request coming over the network stack, and shuts it down. Is that a thing?

Windows 2008 R2 Failover Cluster FTP with IIS 7.5 (0x80070490 Element Not Found)

I setup a Failover FTP using a script service/application on our 3 node cluster. I have followed the following guide which seems to be fairly complete: http://support.microsoft.com/kb/974603
However the FTP site I've added which is linked to the storage for that service will not start. I get the following error: 0x80070490 Element Not Found. I think it may be related to this kbb, but I'm not sure: http://support.microsoft.com/kb/2720218
Failing over/moving the service around the 3 nodes seems to work fine (except the FTP doesn't start, and starting it manually fails). The IP, computer name, and 2 mount points for storage get moved successfully. The only way I can get it to start is to go into IIS on the owning node, remove the FTP site and set it up again. As soon as I fail it over to another node however, I'm back to the error.
I believe it has something to do with IIS not seeing the storage despite it being available. I've made the storage a prerequisite for the script so the storage must be online before the script tries to start the FTP site. Nevertheless, it doesn't work.
Summary: Windows 2008 R2 Cluster FTP Server is set to broadcast on the service IP. It's root directory is the a root drive of assigned storage in the cluster service. The other storage is a MP mounted underneath this drive. FTP site works fine on initial setup but fails when failing over with Element Not Found error. Seems to be related to disk not being available despite it existing as if you go to of of the other nodes without the disks, the FTP site in IIS has the red 'X' on it and attempting to start it gives the same error.
This was my fault for not setting up Offline files. Once I completed that it worked. Offline files requires two server restarts and I didn't want to go through that process without testing how the Clustered FTP would work (this Cluster is in Production use). Unfortunately, once the share hosting the IIS shared configuration goes offline it will NOT come back online until you recycle the Microsoft FTP Service (which is why offline files is required). I could have modified the script to perform a recycle in the StartFTPSVC function (instead of just checking if it was started and if not starting it).

My IP seems to be blocked by web hosting server

I have a strange problem, I just installed my php web site on a shared hosting, all services were working fine. But after configuring my app I just could visit my web site only once, other attempts gives:
"The server is taking too long to respond.".
But from other IP i can access, but only once, it seems all ip addressess beeing blocked after first visit(even ftp and other services get down, no access at all from the IP), can anyone help to explore this problem ? I don't think that it's my app problem, the app works fine on my local PC.
Thanks.
First thing to try would be a traceroute to determine where your traffic is being blocked.
In a windows command prompt:
tracert www.yoursharedhostingserver.com
At the moment, trying to access this address gives this:
Fatal error: Class 'mainController'
not found in
/home/myicms/public_html/core/application/crApplication.class.php
on line 181
I have tried it multiple times and it didn't block me. It might be that You have already solved this problem.
As far as I know, the behavior described by You could only be explained by a badly configured intelligent firewall. It may have been misconfigured by Your host.
If You visit a site at a certain host and suddenly You cannot access an ftp on this host, then it's either a (really bad) firewall or a (very mean) site that explicitly adds a firewall rule to ignore that address.
Some things that You might look into:
It might be something with identd too. What was the service You have configured on Your host? Was it by any chance any kind of server-controll panel (that might have an ability to controll a firewall)?
Is the blockade permanent, or does it go off after 24h, or does it only go off after rebooting the server? Does restarting some services makes the blockade go off?
Did You install any software that "protects Your server from portscanning"? It might be a bit too aggressive.
I wish You good luck in finding a source of this problem!
Chances are that if you can access it once that its actually working. The problem is more than likely in the php code than in the server.

Resources