Windows Azure Caching (Preview) ErrorCode<ERRCA0017>:SubStatus<ES0006>: - windows

I'm using the role-based caching feature for a windows azure web role.
Configured as co-located. I've followed the steps given by windows azure docs for caching (preview). I get the following error:
ErrorCode <ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.
Please retry later. (One or more specified cache servers are
unavailable, which could be caused by busy network or servers. For
on-premises cache clusters, also verify the following conditions.
Ensure that security permission has been granted for this client
account, and check that the AppFabric Caching Service is allowed
through the firewall on all cache hosts. Also the MaxBufferSize on the
server must be greater than or equal to the serialized object size
sent from the client.). Additional Information : The client was trying
to communicate with the server: net.tcp://127.255.0.4:20010/.
I'm running everything as localhost, using the local development storage, my cache client is in the same role as the server. Changed many configuration attributes, but I always get that excpection or similar like "cannot connect to tcp....".
I'd appreciate some help. Thanks.

There are couple of things which could go wrong with your application.
Very first thing to make sure that you have SDK 1.7 in your machine even with Windows Azure Caching Services and then verify that you have reference set from Windows Azure Cache (not from Windows Server App Fabric SDK). I have seen such misconfiguration in past which lead to such errors.
Now have you changed your dataCacheClient, identifier to your ROLE Name as described in the documentation link here. If you follow the documentation as described to you should not hit any error so for the sake of checking what could be wrong, you can create exact same application as described in this link and see if that works or not.
To get more details error, please be sure to increase the DataCacheFactoryConfiguration.ChannelOpenTimeout value to longer i.e. 2 minutes then default 20 seconds as described here. This step will help you to get details about inner exception which may lead to actual root cause to your problem.

We use Azure co-located caching (not in preview anymore) as our session backer and have fairly regular outages. About once a month.
We tried using the Enterprise library Transient Fault Handling but our instances still hang when caching experiences problems. I think that the transient fault code would work for data caching, but for session backing there is some activity closer to the metal that we can't seem to code against.
The error codes have become more informative over the last year and go something like...
ErrorCode:SubStatus:The request timed out..
Additional Information : The client was trying to communicate with the
server: net.tcp://10.xx.xxx.xx:xxxxx/.
Our best guess so far from experimenting and MS support is that each, or at least one co-located cache role/instance needs to know about all the other instance's IPs, since Azure can destroy and re-up instances whenever they want, this sometimes fails to update the dependent instances. This is secret sauce for Azure, but it is not a secret when our site goes down. I'm looking for any more information on this and to see how others are working around this issue.
One possible work-around. One of our talented platform administrators found that resetting IIS on the instances and scaling up two more instances seem to help the problem. This makes sense to me because it gives caching another chance to gather all the required info about the other instances. This is NOT CONFIRMED to solve the problem but if we repeat this during the next outage it could be a valid work around.

Related

Is Plaid's development environment more prone to need frequent reauthentication?

I'm using Python and Plaid's development environment to download bank balances and transactions. To get the initial access tokens, I'm launching Link from quickstart, and can do that in standard and update mode.
The problem I'm running into is how frequently my API call returns the ITEM_LOGIN_REQUIRED error and I have to re-authenticate. For a Regions account I've been testing, this happens a few times throughout the day. For a Pinnacle Financial Partners bank, this happens almost immediately after updating the access token. As in, I can log in through link, successfully fire an auth/get request, and by the time I can send another request (e.g., balance/get), I'm already getting ITEM_LOGIN_REQUIRED again.
As I'm evaluating Plaid for production use, is this frequent authentication atypical? Is it a known limitation with development, or with specific banks even on production? I've seen some banks (Bank of America) only work in production, but I'm hoping what I'm experiencing is just the nature of working in development. Thanks.
Development vs. Production environments are virtually identical and shouldn't impact how often you hit ITEM_LOGIN_REQUIRED.
What you're seeing is atypical, though. Unless you have multi-factor auth turned on and configured not to trust known devices, this shouldn't happen.
Assuming you don't have that configured, would you mind submitting a support ticket so Plaid Support can look into this and help figure out why it's happening?

Azure web role with co-located cache giving slow response

I have a web role with co-located cache. there are two instances of this role.
Even when there is a cache-hit, the turn-around time for our request measures to a few seconds. Upon analysis we found that the time taken by cache to get back with data is 1 second on average. However, IIS logs suggest that the overall servicing of the request takes about 4 seconds. there is no intermediate operation before or after data retrieval from cache.
What could be wrong here? What would be a good way to analyse the problem?
For what it's worth we were having a similar problem with caching in Redis in Azure and a RESTful API.
The problem turned out to be the serialization of data.
Some ways to debug the problem:
Download ANTS profile (it has a free trial) and profile your worker role locally.
Enable profiling for your worker role, deploy it, run it for a bit, then download the profile file in Visual Studio. (You can use Server Explorer to find your instance and download the log).
Download the Azure tool kit (http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx) on your instance. It has things like Process Explorer that can tell you how much memory your role is taking, how much CPU, what it's doing on the network etc.
You can contact Azure support and have them help you profile your application. We did that and got absolutely amazing support. They talked with us on the phone for hours and helped us profile our code.
You really should increase the log level for client and server refer In-Role Cache Troubleshooting and Diagnostics (Windows Azure Cache) and take a look at the performance counters. If read operations (GET) is taking long time then there can be paging in one of the instances or may be there is overload on the server. If you see any performance issue on the cache instances then you should take reassess the capacity using Capacity Planning Considerations for In-Role Cache (Windows Azure Cache) .
If this doesn't help then please open a support ticket.

Services extremely slow when deployed on Azure

I have a rather strange scenario. We have a range of WEBAPIs hosted on the cloud. We consume those services in our Windows 8 application. The problem is when the services are running locally it takes less than 400ms but when hosted on Windows azure it takes upto 20 seconds for some requests. I have checked the indexes of our database tables and its fine. I have no clue so as to what to profile and how to improve the performance.
Thanks!
Everyone Thanks a lot!
But I found a way to use dottrace(Excellent profiling tool) on the azure deployment. Here is the link
http://blog.maartenballiauw.be/post/2013/03/13/Remote-profiling-Windows-Azure-Cloud-Services-with-dotTrace.aspx
You can also use windows azure diagnostics and stopwatch class to log all times to the wad tables.
Also found out that the first request to the azure service is always slow in another thread. Have just copied it here below
Serkan, you would need to first make sure in your post, weather you have published a Cloud Service or a Website to Windows Azure. Based on Cloud Service (A Web Role) or a WebSite the answer to your question will be different. As you want to learn more I would explain what goes on behind.
As you suggested that your first connection is slow, I can see that happen with Windows Azure Websites. Windows Azure Websites are running in shared pool of resources and uses the concept of hot (active) and cold (inactive) sites in which if a websites has no active connection for x amount of time, the site goes into cold state means the host IIS process exits. When a new connection is made to that websites it takes a few seconds to get the site ready and working. Depend on how your first page code is, the time to load the site for the first time varies. Similar discussion is logged: Very slow opening MySQL connection using MySQL Connector for .net
With Windows Azure Cloud Service the overall application model is different. Your webrole has its own IIS server which is fully dedicated to your application and above Website limitation does not occur however there could be other reasons which could have slower page load. If you are using WebRole, then what you could do is run a page load profiler first and RD to your Azure Instance to collect the page load data to see what else you could do to boost the performance.
You'll obviously need to profile your app to find the real cause. Check out these two articles which should get you started:
http://msdn.microsoft.com/en-us/library/windowsazure/hh369930.aspx
http://www.windowsazure.com/en-us/develop/net/common-tasks/profiling-in-visual-studio/

windows azure website load time

Sometimes when I access my windows azure website, the initial response time is very slow. After the first page load the website is fast. Some background: The website is not that often visited at the moment. Further, I am using a keepalivecontroller to keep the website running and the website is running in shared mode. I am wondering: are websites that are not that active removed from memory in windows azure? Or is it just that background tasks on the operational level of windows azure are interfering sometimes? It is not transparent for me what is happening, so is there some sla of something for windows azure websites?
There is now a new feature available for Windows Azure Websites in 'Reserved' mode that will keep your website warm. You can now turn on "Always-on" under the "Configuration"-tab on your Azure Website. As explained in this blog post:
When the new “Always On” feature is is enabled on a site, “Windows
Azure will automatically ping your website regularly to ensure that
the website is always active and in a warm/running state,” Guthrie
writes. “This is useful to ensure that a site is always responsive
(and that the app domain or worker process has not paged out due to
lack of external HTTP requests).”
Easiest way to keep a website warm is to call it regularly using the Scheduler feature in Windows Azure Mobile Services.
You simply write a script in the Scheduler that pings your website every x minutes.
Here's a post covering how to do that: http://fabriccontroller.net/blog/posts/job-scheduling-in-windows-azure/
The Windows Azure Web Sites are still in preview, so there is currently no SLA with that service.
The Web Sites do idle out when in free or in Shared mode, which is likely what you are seeing. When the site idles out it actually is removed from memory, and indeed the IIS process host running the site is shut down. This is how they can get the density of hosting 100 sites on the same VM.
You can find a lot of info on the Channel9 site about why this is the case, or, as a shameless plug, here is an article that talks about how the process is handled.
Now, you mentioned that you were using a keepalivecontroller, but what exactly do you mean by that? I use pingdom.com to contantly request data for one of my websites, and that seems to do pretty well. It is still possible that a request doesn't come in and the idle time is met which then cycles the site. It is also possible that even if you always have the site running that the VM the site sites on needs to have the underlying OS updated, in which case Azure would then move the site process to another VM, which could also cause the slow start up on the next request.
I'd start logging your application start ups and then look through your logs to see how often that is happening.
If you only need to warm it up once (vs keeping it warm) and are mostly trying to prevent your customers experience page cold starts, I believe the correct tool is IIS Application Initialization. You can configure it with a list of urls to hit before it deems the app ready for action.
My site is suffering from page cold starts and that is severely magnified in Azure Websites (even on an S3), but it is absolutely speedy after its served that first time thanks to several layers of caching (our inefficient use of Umbraco's dynamic nodes query language creates a lot of database churn--which we're cleaning up opportunistically).
From what I've read and my own web.config attempts this is still not available in Azure Websites. I've asked Microsoft for it here: MS IDEA: Application Initialization to warm up specific pages when app pool starts. Please consider voting for it.
For each service/site you need to go to "Configure", then switch "Always On" to ON. Also make sure you click Save; it took my website about 2 minutes before noticing the change.
Why this is not the default is kind of mind boggling, because my setup on HostGator was running much faster than Azure. I guess Microsoft is figuring if nobody is accessing your site, it's okay if it has a long load time.

steps to securing amazon EC2+EBS

I have just installed a fedora linux AMI on amazon EC2, from the amazon collection. I plan to connect it to EBS storage. Assuming I have done nothing more than the most basic steps, no password changed, nothing extra has been done at this stage other than the above.
Now, from this point, what steps should I take to stop the hackers and secure my instance/EBS?
Actually there is nothing different here from securing any other Linux server.
At some point you need to create your own image (AMI). The reason for doing this is that the changes you will make in an existing AMI will be lost if your instance goes down (which could easily happen as Amazon doesn't guarantee that an instance will stay active indefinitely). Even if you do use EBS for data storage, you will need to do the same mundane tasks configuring the OS every time the instance goes down. You may also want to stop and restart your instance in certain periods or in case of peak traffic start more than one of them.
You can read some instructions for creating your image in the documentation. Regarding security you need to be careful not to expose your certification files and keys. If you fail on doing this, then a cracker could use them to start new instances that will be charged for. Thankfully the process is very safe and you should only pay attention in a couple of points:
Start from an image you trust. Users are allowed to create public images to be used by everyone and they could either by mistake or in purpose have left a security hole in them that could allow someone to steal your identifiers. Starting from an official Amazon AMI, even if it lacks some of the features you require, is always a wise solution.
In the process of creating an image, you will need to upload your certificates in a running instance. Upload them in a location that isn't bundled in the image (/mnt or /tmp). Leaving them in the image is insecure, since you may need to share your image in the future. Even if you are never planning to do so, a cracker could exploit a security fault in the software your using (OS, web server, framework) to gain access in your running instance and steal your credentials.
If you are planning to create a public image, make sure that you leave no trace of your keys/identifies in it (in the command history of the shell for example).
What we did at work is we made sure that servers could be accessed only with a private key, no passwords. We also disabled ping so that anyone out there pinging for servers would be less likely to find ours. Additionally, we blocked port 22 from anything outside our network IP, wit the exception of a few IT personnel who might need access from home on the weekends. All other non-essential ports were blocked.
If you have more than one EC2 instance, I would recommend finding a way to ensure that intercommunication between servers is secure. For instance, you don't want server B to get hacked too just because server A was compromised. There may be a way to block SSH access from one server to another, but I have not personally done this.
What makes securing an EC2 instance more challenging than an in-house server is the lack of your corporate firewall. Instead, you rely solely on the tools Amazon provides you. When our servers were in-house, some weren't even exposed to the Internet and were only accessible within the network because the server just didn't have a public IP address.

Resources