I have a .net project that I am trying to deploy as a worker role in Azure. I am able to publish the file directly from Visual Studio but then when the worker role runs I am getting a uncaught exception. I am attempting to enable logging to azure storage from the worker role so I can get more information on the exception but I am running into issues getting MIT configured. Can anyone provide assistance on the best way to enable this logging?
I’m not a massive fan of the recommended Azure Worker Role logging process, namely using the Trace.WriteLine() method as I don’t feel as though it provides sufficient flexibility for my logging needs and I think it looks crap when my code is liberally scattered with Trace.WriteLine() statements, code is art and all that. I also don't like that Trace statements aren't always logged and can be 'lost' if the Worker Role hiccups or generally goes astray.
I therefore came up with an approach that writes log files to local storage via NLog, which are then flushed to Azure Storage on a schedule. Works like a charm.
I've got it documented over in the blog post at: https://modhul.wordpress.com/2014/10/28/capturing-custom-logs-from-azure-worker-roles-using-azure-diagnostics/
If I want to watch my log files in real-time (rather than waiting for them to be flushed to Azure Storage), I RDP into the Worker Role instance and fire-up a copy of BareTail (http://www.baremetalsoft.com/baretail/) which is a great way of watching log files in real-time, it also lets you add colour-coding for errors, info, warnings etc.
Related
I am experienced in working with AWS but this is my first foray onto Google cloud and I am stuck on how to debug it properly. I am building a simple experimental setup, using Cloud Firestore to store some data and planning to do some small API functions to query it.
I am inputting my information from a Go app, which I built using the official SDK for Go. Everything builds fine, but when I run it I see nothing other than rpc error: code = PermissionDenied desc = Missing or insufficient permissions..
I have tried setting the authentication to open in the Firestore rules console (allow read, write: if true), but I still see the same error, so it seems to be an issue with the credentials I have generated rather than Firestore itself.
The credentials in question were generated in the main Google Cloud Console, under Service Accounts. I've saved it out as a JSON file and am loading this into the app via option.WithCredentialsFile() which is then passed into the NewFirestoreWriter() constructor.
It's far from obvious, to me at least, exactly how to configure the permissions on the Service Account as it seems to work quite differently from Amazon IAM. I was expecting to find a way to add on specific actions related to Firestore but I can't find anything at all like that once the service account is created. Under Permissions, it looks like I can associate other accounts with the service account, which seems to be the other way around to what I want to do. Or do I need to assume another identity once I have the service account in order to do anything, a la Amazon STS? Or am I barking up the wrong tree here?
I am running locally while I am playing with the apps, planning to think about deployment later.
I guess my questions are:
Should I be using a different form of credential when making programmatic writes to Firestore?
What permissions need to be on the credential that I am using?
How do the Google Service Account permissions interact with the Firestore access rules, or are they completely separate?
Thanks in advance for your help.
I finally worked out the answer. Turns out I was reading some of the screens too fast....
The programmatic approach with the credential was fine, but the service account setup was not.
In case anyone else has a similar issue, the fix was to:
Go to "Access" under IAM (NOT identity). Coming from AWS this confused me a little because I was expecting roles to be a sublevel to identity rather than a seperate level
Click the Edit button next to the service account
Add the Cloud Datastore User and Cloud Datastore Owner roles (I'll work on trimming down permissions now it's working!). This confused me particularly because I was looking for "Firestore" or "Cloud Firestore", and there is the very similarly named "Cloud Filestore" which tripped me up.
After a few seconds, it started working.
According to https://cloud.google.com/firestore/docs/reference/libraries?_ga=2.87049368.-1865513281.1592929406#server_client_libraries,
In this environment, requests are not evaluated against your Firestore security rules
So I reset my access permissions in Firebase back to allow read, write: if false.
I searched for about one week for the oppurtunity to monitor different windows events, for example the SQL-event(or service) in AzureRM virtual machines.
I tried it with different LogAnalytics queries, Runbooks, Powershell scripts to connect to the vm, etc. But everything I tried doesn't work.
Do you have any suggestions?
The solution should inform me, when a windows service stopped.
Best regards!
Azure Monitor, which now includes Log Analytics and Application Insights, provides sophisticated tools for collecting and analyzing telemetry that allow you to maximize the performance and availability of your cloud and on-premises resources and applications. It helps you understand how your applications are performing and proactively identifies issues affecting them and the resources they depend on.
Activity log alerts are the alerts that get activated when a new activity log event occurs that matches the conditions specified in the alert. You can follow the following document to set an activity log alert: https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/alert-activity-log?toc=/azure/azure-monitor/toc.json
I have a web role with co-located cache. there are two instances of this role.
Even when there is a cache-hit, the turn-around time for our request measures to a few seconds. Upon analysis we found that the time taken by cache to get back with data is 1 second on average. However, IIS logs suggest that the overall servicing of the request takes about 4 seconds. there is no intermediate operation before or after data retrieval from cache.
What could be wrong here? What would be a good way to analyse the problem?
For what it's worth we were having a similar problem with caching in Redis in Azure and a RESTful API.
The problem turned out to be the serialization of data.
Some ways to debug the problem:
Download ANTS profile (it has a free trial) and profile your worker role locally.
Enable profiling for your worker role, deploy it, run it for a bit, then download the profile file in Visual Studio. (You can use Server Explorer to find your instance and download the log).
Download the Azure tool kit (http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx) on your instance. It has things like Process Explorer that can tell you how much memory your role is taking, how much CPU, what it's doing on the network etc.
You can contact Azure support and have them help you profile your application. We did that and got absolutely amazing support. They talked with us on the phone for hours and helped us profile our code.
You really should increase the log level for client and server refer In-Role Cache Troubleshooting and Diagnostics (Windows Azure Cache) and take a look at the performance counters. If read operations (GET) is taking long time then there can be paging in one of the instances or may be there is overload on the server. If you see any performance issue on the cache instances then you should take reassess the capacity using Capacity Planning Considerations for In-Role Cache (Windows Azure Cache) .
If this doesn't help then please open a support ticket.
I'm using the role-based caching feature for a windows azure web role.
Configured as co-located. I've followed the steps given by windows azure docs for caching (preview). I get the following error:
ErrorCode <ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.
Please retry later. (One or more specified cache servers are
unavailable, which could be caused by busy network or servers. For
on-premises cache clusters, also verify the following conditions.
Ensure that security permission has been granted for this client
account, and check that the AppFabric Caching Service is allowed
through the firewall on all cache hosts. Also the MaxBufferSize on the
server must be greater than or equal to the serialized object size
sent from the client.). Additional Information : The client was trying
to communicate with the server: net.tcp://127.255.0.4:20010/.
I'm running everything as localhost, using the local development storage, my cache client is in the same role as the server. Changed many configuration attributes, but I always get that excpection or similar like "cannot connect to tcp....".
I'd appreciate some help. Thanks.
There are couple of things which could go wrong with your application.
Very first thing to make sure that you have SDK 1.7 in your machine even with Windows Azure Caching Services and then verify that you have reference set from Windows Azure Cache (not from Windows Server App Fabric SDK). I have seen such misconfiguration in past which lead to such errors.
Now have you changed your dataCacheClient, identifier to your ROLE Name as described in the documentation link here. If you follow the documentation as described to you should not hit any error so for the sake of checking what could be wrong, you can create exact same application as described in this link and see if that works or not.
To get more details error, please be sure to increase the DataCacheFactoryConfiguration.ChannelOpenTimeout value to longer i.e. 2 minutes then default 20 seconds as described here. This step will help you to get details about inner exception which may lead to actual root cause to your problem.
We use Azure co-located caching (not in preview anymore) as our session backer and have fairly regular outages. About once a month.
We tried using the Enterprise library Transient Fault Handling but our instances still hang when caching experiences problems. I think that the transient fault code would work for data caching, but for session backing there is some activity closer to the metal that we can't seem to code against.
The error codes have become more informative over the last year and go something like...
ErrorCode:SubStatus:The request timed out..
Additional Information : The client was trying to communicate with the
server: net.tcp://10.xx.xxx.xx:xxxxx/.
Our best guess so far from experimenting and MS support is that each, or at least one co-located cache role/instance needs to know about all the other instance's IPs, since Azure can destroy and re-up instances whenever they want, this sometimes fails to update the dependent instances. This is secret sauce for Azure, but it is not a secret when our site goes down. I'm looking for any more information on this and to see how others are working around this issue.
One possible work-around. One of our talented platform administrators found that resetting IIS on the instances and scaling up two more instances seem to help the problem. This makes sense to me because it gives caching another chance to gather all the required info about the other instances. This is NOT CONFIRMED to solve the problem but if we repeat this during the next outage it could be a valid work around.
So say I've got an MVC app hosted in the cloud somewhere, meaning I don't have access to IIS or any infrastructure.
All I have control over is the App code itself, and what comes down to the client.
My goal
Is to collect data over time of how well the MVC app is performing in terms of response times.
Current Problems
I can get a lot of data from Google Analyics, and other client-side tricks, but that won't tell if say, the App Pool is recycling too often.
Similarly if I put stop watches in the actions, that won't tell me about any delays in the App Startup (if it has to start up again).
Also, if I do put a stop watch in the Action, it doesn't take into account any delays in redering the View. For example, even though it's bad practice, there might be a DB call being made from the View, and my action metrics won't take that into account.
My Question
So, if I want to get true metrics of how long requests are taking overal from mulitple clients and users, where are the best places to but Stopwatches in the App. Or is it impossible to get true metrics from the app itself, and I have to place counters outside of the App (like in IIS).
Add New Relic, it's available for free as part of the AppHarbor service - https://appharbor.com/addons/newrelic
Since you mention "in the cloud somewhere" are you using Microsoft Azure for hosting? If so, there's some great diagnostics you can log to your Azure storage with DiagnosticsMonitorConfiguration.
Here's a tutorial on how to add diagnostics to your web and worker roles. You can find a full list of performance counters on MSDN
You can get everything from application requests/second, memory and CPU utilization, network adapter statistics, output cache hits/misses, request execution time, etc.