List of cached objects (including primary/secondary location) by key in AppFabric cache - caching

I have AppFabric installed and working great caching my ASP.Net Sessions. I have 3 W2k8 Enterprise servers as my cache hosts. I created my cache with the Secondaries=1 option. I'm trying to test the High Availability option. In order to do this, I would like to login to my website, find the cache server that has my session and unplug it from the network (simulating a server crash). If I can still work as a logged in user, I can prove that High Availability is working and the secondary copy of my session was promoted.
How can I see a list of the objects in the cache and where the primary/secondary objects "live"?

The get-cache Powershell command can show you your caches running in a cluster, and where their objects (and regions) are located.

Use this code to get all cache objects. Be careful though because depending on your cache size it can take a significant amount of time to dump all cache objects:
foreach (var regionName in cache.GetSystemRegions())
{
foreach (KeyValuePair<string, object> cacheItem in cache.GetObjectsInRegion(regionName))
{
// TODO: process cacheItem.Key and cacheItem.Value
}
}

Related

Clean Up Azure Machine Learning Blob Storage

I manage a frequently used Azure Machine Learning workspace. With several Experiments and active pipelines. Everything is working good so far. My problem is to get rid of old data from runs, experiments and pipelines. Over the last year the blob storage grew to enourmus size, because every pipeline data is stored.
I have deleted older runs from experimnents by using the gui, but the actual pipeline data on the blob store is not deleted. Is there a smart way to clean up data on the blob store from runs which have been deleted ?
On one of the countless Microsoft support pages, I found the following not very helpfull post:
*Azure does not automatically delete intermediate data written with OutputFileDatasetConfig. To avoid storage charges for large amounts of unneeded data, you should either:
Programmatically delete intermediate data at the end of a pipeline
run, when it is no longer needed
Use blob storage with a short-term storage policy for intermediate data (see Optimize costs by automating Azure Blob Storage access tiers)
Regularly review and delete no-longer-needed data*
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines#delete-outputfiledatasetconfig-contents-when-no-longer-needed
Any idea is welcome.
Have you tried applying an azure storage account management policy on the said storage account ?
You could either change the tier of the blob from hot -> cold -> archive and thereby reduce costs or even configure a auto delete policy after a set number of days
Reference : https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview#sample-rule
If you use terraform to manage your resources this should be available a
Reference : https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_management_policy
resource "azurerm_storage_management_policy" "example" {
storage_account_id = "<azureml-storage-account-id>"
rule {
name = "rule2"
enabled = false
filters {
prefix_match = ["pipeline"]
}
actions {
base_blob {
delete_after_days_since_modification_greater_than = 90
}
}
}
}
Similar option is available via the portal settings as well.
Hope this helps!
Currently facing this exact problem. The most sensible approach is to enforce retention schedules at the storage account level. These are the steps you can follow:
Identify which storage account is linked to your AML instance and pull it up in the azure portal.
Under Settings / Configuration, ensure you are using StorageV2 (which has the desired functionality)
Under Data management / Lifecycle management, create a new rule that targets your problem containers.
NOTE - I do not recommend a blanket enforcement policy against the entire storage account, because any registered datasets, models, compute info, notebooks, etc will all be target for deletion as well. Instead, use the prefix arguments to declare relevant paths such as: storageaccount1234 / azureml / ExperimentRun
Here is the documentation on Lifecycle management:
https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview?tabs=azure-portal

Local Cache contents get updated during execution using AppFabric caching services

We are using AppFabric Caching services with local cache enabled.
the in-memory operations being performed on data (acquired from Cache) seem to be getting saved in local cache (without explicitly placing the updated objects in cache).
What I've read in documentation, local cache just holds a local copy and there are mechanisms to invalidate local Cache as well.
What can I do in order to over-write this local-cache behavior (as per my initial understanding, local-cache contents are read-only, which does not seem the case here)
this the configuration being used:
DataCacheFactoryConfiguration configuration = new DataCacheFactoryConfiguration();
configuration.TransportProperties.MaxBufferSize = int.MaxValue;
configuration.TransportProperties.MaxBufferPoolSize = long.MaxValue;
configuration.MaxConnectionsToServer = MaxConnectionsToServer;
DataCacheServerEndpoint server = new DataCacheServerEndpoint(host, port);
List<DataCacheServerEndpoint> servers = new List<DataCacheServerEndpoint>();
servers.Add(server);
configuration.Servers = servers;
DataCacheLocalCacheProperties localCacheProperties = new DataCacheLocalCacheProperties(MaxObjectCount, LocalCacheTimeout, DataCacheLocalCacheInvalidationPolicy.TimeoutBased);
configuration.LocalCacheProperties = localCacheProperties;
How can I overwrite this behavior of local cache (not using local cache is not an option due to lot of read operations going on) ?
Thanks in advance,
I think this is explained in the documentation (emphasis added):
Locally cached objects are stored within the same process space as the
cache client process. When a cache client requests a locally cached
object, the client receives a reference to the locally cached object
rather than a copy
and
After objects are stored in the local cache, your application
continues to use those objects until they are invalidated, regardless
of whether those objects are updated by another client on the cache
cluster. For this reason, it is best to use local cache for data that
changes infrequently.

Local mongo server with mongolab mirror & fallback

How to set up a local mongodb with mirror on mongolab (propagate all writes from local to mongolab, so they are always synchronized - I don't care about atomic, just that it syncs in a reasonable time frame)
How to use mongolab as a fallback if local server stops working (Ruby/Rails, mongo driver and mongoid).
Background: I used to have a local mongo server but it kept crashing occasionally and all my apps stopped working + I had to "repair" the DB to restart it. Then I switched to mongolab which I am very satisfied with, but it's generating a lot of traffic which I'd like to avoid by having a local "cache", but without having to worry about my local cache crashing causing all my apps to stop working. The DBs are relatively small so size is not an issue. I'm not trying to eliminate the traffic overhead of communicating to mongolab, just lower it a bit.
I'm assuming you don't want to have the mongolab instance just be part of a replica set (or perhaps that is not offered). The easiest way would be to add the remote mongod instance as a hidden member (priority 0) and just have it replicate data from your local instance.
An alternative immediate solution you could use is mongooplog which can be used to poll the oplog on one server and then apply it to another. Essentially replication on demand (you would need to seed one instance appropriately etc. and would need to manage any failures). More information here:
http://docs.mongodb.org/manual/reference/mongooplog/
The last option would be to write something yourself using a tailable cursor in your language of choice to feed the oplog data into the remote instance.

Azure Local Cache versus Distributed Cache

The past days I've been working with Azure Caching. Good practice is to use the local cache functionality to prevent roundtrips to the distributed cache.
As can be read in the documentation; when you make a call to dataCache.Get(...), the application first checks if a version in the local cache is available and, if not, the object is retrieved from the distributed cache.
The problem is that the local version can be older than the distributed version of the object. To solve this problem, the method 'dataCache.GetIfNewer(...)' is available that can be used to check if the version of the local object differs from the distributed version and, if it does, it returns the new object.
So far, so good...now my questions; I've created two seperate applications (app A en app B) to test the Azure Caching mechanism. Both applications run on two different (physical) locations so they both have their own local-cache but they both use the same distributed cache.
Is it true that something has changed in the process of invalidating the local cache? I've tested the following scenario and found out that the local cache is update automatically:
App A stores the value "123" in the distributed cache using the key "CacheTest"
App B uses the dataCache.Get(...) method to retrieve the object for the key "CacheTest" which cannot be found in the local cache so it is retrieved from the distributed cache en returns the object with value "123".
App A changes the object with key "CacheTest" to the value "456"
App B uses the datacache.Get(...) method to (again) retrieve the object. Because the object should be in the local cache, I would expect the value "123" but it returns the new value "456"!
How strange is that? Is something changed in Azure Caching lately? And yes...I'm sure that I've turned on local caching and yes, I've set the time-out on the local cache to 3600 seconds (1 hour).
Can somebody confirm that Azure Caching has been changed?
Edit for Nick:
So what you're saying is that the next lines of code that I've found on a Dutch Microsoft site are nonsense? When the local-cache is updated automatically, there's no need to call the 'GetIfNewer' method: http://www.dotnetmag.nl/Artikel/1478/Windows-Azure-AppFabric-Caching
///
/// Ensures that the newest version of a cached object is returned
/// from either the local cache or the distrbuted cache.
///
public TCachedObjectType GetNewest<TCachedObjectType>(string key) :
where TCachedObjectType : class
{
DataCacheItemVersion version;
// Gets cached object from local cache if it exists.
// Otherwise gets cached object from distributed cache and
// adds it to local cache.
object cachedObject = cache.Get(key, out version);
// Gets cached object from distributed cached if it is newer
// than given version.
// If newer it adds it to local cache.
object possiblyNewerCachedObject =
cache.GetIfNewer(key, ref version);
if (possiblyNewerCachedObject != null)
{
// Cached object from distributed cache is newer
// than cached object from local cache.
cachedObject = possiblyNewerCachedObject;
}
return cachedObject as TCachedObjectType;
}
If the described behaviour is the same as appfabric velocity, the behaviour described is as expected. When local caching is enabled it means that when a given node requests a cache item from the distributed cache, it asks the distributed cache what the current version is.
If the locally cached version matches the distributed version, it returns the data from the local cache. If not, it retrieves the latest value from the distributed cache, caches it locally and then returns it.
The idea is that if any node updates the key, all nodes will always be ensured to get the latest version even if appfabric had already cached them locally. The distributed cache keeps track of the latest versions and where their data is stored.

When is OpenAFS cache cleared?

Let's say I have a bunch of users who all access the same set of files, that have permission system:anyuser. User1 logs in and accesses some files, and then logs out. When User2 logs in and tries to access the same files, will the cache serve the files, or will it be cleared between users?
The cache should serve the files (in the example above).
How long a file will persist in the OpenAFS cache manager depends on how the client is configured, variables include the configured size of the cache, whether or not the memcache feature is enabled, and how "busy" the client is.
If OpenAFS memcache (cache chunks stored in RAM) is enabled, then the cache is cleared upon reboot. With the more traditional disk cache, the cache can persist across reboots. Aside from that key difference files persist in the cache following the same basic rules. The cache is a fixed size stack, recently accessed files stay in the cache and older files are purged as needed when newer files are requested.
More details are available in the OpenAFS wiki:
http://wiki.openafs.org/

Resources