Do I need garbage collector when I delete object from branch by API? - lakefs

Do I need a garbage collector in LakeFS when I delete an object from a branch by API?
Using appropriate method of course.
Do I understand right that the garbage collector is used only for objects that are deleted by a commit. And this objects are soft deleted (by the commit). And if I use the delete API method than the object is hard deleted and I don’t need to invoke the garbage collector?

lakeFS manages versions of your data. So deletions only affect successive versions. The object itself remains, and can be accessed by accessing an older version.
Garbage collection removes the underlying files. Once the file is gone, its key is still visible in older versions, but if you try to access the file itself you will receive HTTP status code 410 Gone.
For full information, please see the Garbage collection docs.

Related

FileSystemPersistentAcceptOnceFileListFilter: flushing policy that differs from flushOnUpdate?

For my spring-integration setup, please see this question, which I posted yesterday. The relevant point is that I am using FileSystemPersistentAcceptOnceFileListFilter; the metadataStore is PostgreSQL.
So far, the filter works as expected; filenames and timestamps are properly archived in the metadataStore.
My concern now is that over time, the metadataStore will grow, and that without a flushing policy, it will grow unbounded. The built-in flushing policy for FileSystemPersistentAcceptOnceFileListFilter seems to be rather limited: ie, you can request that on updates to the metadataStore that it be flushed. I fear, however, that that policy may result in missed files for my use-case.
Is there any way to support a different flushing policy for the metadataStore within my Spring app? I suppose one option is to use the Spring scheduler and just periodically flush records in the metadataStore with a timestamp beyond a certain age. But I was really hoping there might be a pre-packaged way to do this in Spring.
I think you have misunderstood the meaning of Flushable - it has nothing to do with expiring store entries; it is for stores that keep data in memory - such as the PropertiesPersistingMetadataStore - by default, that store only writes the entries to disk when the application context is closed normally; flushing on each update persists to disk whenever the store changes.
There is no out of the box mechanism for removing old entries from metadata stores.

Xamarin Realm Invalidate Method

A Realm holds a read lock on the version of the data accessed by it, so that changes made to the Realm on different threads do not modify or delete the data seen by this Realm. Calling this method releases the read lock, allowing the space used on disk to be reused by later write transactions rather than growing the file
Is there a matching function in Xamarin.Realm like in Objc/Swift's RLMRealm invalidate.
If not, is this a backlog item or is it not required(?) with the C# wrapper.
I think calling Realm.Refresh() would be a workaround - it will cause the Realm instance to relinquish the read lock it has at the moment and move it to the latest version which would free up the old version for compaction.
Ordinarily moving the read lock to the latest version would happen automatically if the thread you run on has a running CFRunLoop or ALooper, but on a dedicated worker thread you'd be responsible for calling Refresh() on your own to advance the read lock.
Please open an issue on https://github.com/realm/realm-dotnet for Invalidate() if Refresh() doesn't work for you.
I think you would use Realm.Close(). See:
https://realm.io/docs/xamarin/latest/api/class_realms_1_1_realm.html#a7f7a3199c392465d0767c6506c1af5b4
Closes the Realm if not already closed. Safe to call repeatedly. Note that this will close the file. Other references to the same database on the same thread will be invalidated.

How to fix this Realm Exception?

After deleting some items from my db i get this -> Realms.RealmInvalidObjectException: This object is detached. Was it deleted from the realm?
In Realm Xamarin, you have to use RealmResult Notifications to be notified of when there is a change to your database.
Considering Realm is zero-copy and the objects you obtain from it are just proxies to the underlying database, if you delete an object on any thread, then that object will be deleted on every thread in Realm's latest snapshot for the thread.
So it's best if you always make sure you're notified of changes in your result set and update the UI accordingly, and handle the case when your objects could have been deleted due to some operation (by making sure they're still valid).

AppFabric and CachingPolicy/ChangeMonitors

We're investigating moving to a distributed cache using Windows AppFabric. Our ASP.NET 4.0 application currently has a cache implementation that uses MemoryCache.
One key feature is that when items are added to the cache, a CacheItemPolicy is included that contains a ChangeMonitor:
CacheItemPolicy policy = new CacheItemPolicy();
policy.Priority = CacheItemPriority.Default;
policy.ChangeMonitors.Add(new LastPublishDateChangeMonitor(key, item, GetLastPublishDateCallBack));
The change monitor internally uses a timer to periodically trigger the delegate passed into it - which is usually a method to get a value from a DB for comparison.
The policy and its change monitor are then included when an item is added to the cache:
Cache.Add(key, item, policy);
An early look at AppFabric's DataCache class seem to indicate whilst a Timespan can be included when adding items to cache, a CacheItemPolicy itself can't be.
Is there an another way to implement the same ChangeMonitor-type functionality in AppFabric. Notifications perhaps?
Cheers
Neil
There are only two hard problems in computer science: cache
invalidation, naming things and off-by-one errors.
Phil Karlton
Unfortunately AppFabric has no support for this sort of monitoring to invalidate a cached item, and similarly no support for things like SqlCacheDependency.
However, AppFabric 1.1 brought in support for read-through and write-behind. Write-behind means that your application updates the cached data first rather than the underlying database, so that the cache always holds the latest version (and therefore the underlying data doesn't need to be monitored); the cache then updates the underlying database asynchronously. To implement read-through/write-behind, you'll need to create an object that inherits from DataCacheStoreProvider (MSDN) and write Read, Write and Delete methods that understand the structure of your database and how to update it.

How to emulate shm_open on Windows?

My service needs to store a few bits of information (at minimum, at least 20 bits or so, but I can easily make use of more) such that
it persists across service restarts, even if the service crashed or was otherwise terminated abnormally
it does not persist across a reboot
can be read and updated with very little overhead
If I store this information in the registry or in a file, it will not get automatically emptied when the system reboots.
Now, if I were on a modern POSIX system, I would use shm_open, which would create a shared memory segment which persists across process restarts but not system reboots, and I could use shm_unlink to clean it up if the persistent data somehow got corrupted.
I found MSDN : Creating Named Shared Memory and started reimplementing pieces of it within my service; this basically uses CreateFileMapping(INVALID_HANDLE_NAME, ..., PAGE_READWRITE, ..., "Global\\my_service") instead of shm_open("/my_service", O_RDWR, O_CREAT).
However, I have a few concerns, especially centered around the lifetime of this pagefile-backed mapping. I haven't found answers to these questions in the MSDN documentation:
Does the mapping persist across reboots?
If not, does the mapping disappear when all open handles to it are closed?
If not, is there a way to remove or clear the mapping? Doesn't need to be while it's in use.
If it does persist across reboots, or does disappear when unreferenced, or is not able to be reset manually, this method is useless to me.
Can you verify or find faults in these points, and/or recommend a different approach?
If there were a directory that were guaranteed to be cleaned out upon reboot, I could save data in a temporary file there, but it still wouldn't be ideal: under certain system loads, we are encountering file open/write failures (rare, under 0.01% of the time, but still happening), and this functionality is to be used in the logging path. I would like not to introduce any more file operations here.
The shared memory mapping would not persist across reboots and it will disappear when all of its handles are closed. A memory mapping object is a kernel object - they always get deleted when the last reference to them goes away, either explicitly via a CloseHandle or when the process containing the reference exits.
Try creating a registry key with RegCreateKeyEx with REG_OPTION_VOLATILE - the data will not preserved when the corresponding hive is unloaded. This will be at system shutdown for HKLM or user logoff for HKCU.
sounds like maybe you want serialization instead of shared memory? If that is indeed appropriate for your application, the way you serialize will depend on your language. If you're using c++, check out boost::serialize. C# undoubtedly has lots of serializations options (like java), if that's what you're using.

Resources