What open source cloud storage system offer an append-only mode for buckets, directories, etc

What open source cloud storage system offer an append-only mode for buckets, directories, etc - cloud-storage

I'm curious if any cloud storage system could be configured to provide the following workflow :
Anonymous users may upload messages/files into identifiable locations which we'll call buckets.
All users should have read access to all messages/files, but no anonymous user should have permissions to modify or delete them.
Buckets have associated public keys which a moderator uses to authenticate approvals or deletions of uploads.
Unapproved messages/files are eventually culled by the system to save space.
I suspect the answer might be "Tahoe-LAFS would love for someone to implement append-only mutable files, but nobody has done so yet."

I've surveyed a number of OSS projects in the storage space and not encountered anything that would provide this workflow purely by configuration and without writing code.
While not OSS, the lowest level of Windows Azure storage is actually implemented via an append only mechanism. A video, presentation, and whitepaper can all be found here and the details in the whitepaper would be useful to anyone looking to implement something like this for Tahoe-LAFS or any other OSS cloud storage system.

Related

NEFilterProvider record network activity

NEFilterProvider, or more specifically its 2 subclasses NEFilterDataProvider and NEFilterPacketProvider, has the functionality to allow or deny network activity. However, I couldn't find any way to log in the activity, for debugging purposes.
I know the documentation says this:
it runs in a very restrictive sandbox. The sandbox prevents the Filter
Data Provider extension from moving network content outside of its
address space by blocking all network access, IPC, and disk write
operations.
but is there any trick to log this anyway in debug mode? Maybe using os_log or something like that?

yes, you can use os_log and read the output in the Console app. if you want to workaround the privacy restrictions (while developing/testing), use the %{public} prefix, like so...
import os.log
// ...somewhere in the provider class
os_log("something i want to log %{public}#", someVar)
you're right, the documentation is really, really lacking for this area, other than the SimpleFirewall sample code, and wwdc video. i have an app in production using NEFilterDataProvider but it about cost me my sanity to figure out how to put it all together. at some point i'm going to try to write some blog posts or make a demo repo to try to help create a central community resource to share knowledge and fill in the gaps in the documentation with hard-won knowledge.

How to detect Windows file closures locally and on network drives

I'm working on a Win32 based document management system that employs an automatic check in/check out model. The model it currently uses for tracking documents in use (monitoring the processes of the applications that open the documents) is not particularly robust so I'm researching alternatives.
Check outs are easy as the DocMgt application is responsible for launching the other application (Word, Adobe, Notepad etc) and passing it the document.
It's the automatic check-in requirement that is more difficult. When the user closes the document in Word/Adobe/Notepad ideally the DocMgt system would be automatically notified so it can perform an automatic check in of the updated document.
To complicate things further the document is likely to be stored on a network drive not a local drive.
Anyone got any tips on API calls, techniques or architectures to support this sort of functionality?
I'm not expecting a magic 3 line solution, the research I've done so far leads me to believe that this is far from a trivial problem and will require some significant work to implement. I'm interested in all suggestions whether they're for a full or part solution.

What you describe is a common task. It is perfectly doable, though not without its share of hassle. Here I assume that the files are closed on the computer where your code can run (even if the files are stored on the mounted network share).
There exist two approaches to controlling the files when they are used: the filter and the virtual filesystem.
The filter sits in the middle, between the process and the filesystem (any filesystem, either local, network or fully virtual) and intercepts file requests that go to this filesystem. Here it is required that the filter code is run on the computer, via which the requests are passed (this requirement seems to be met in your scenario).
The virtual filesystem is an endpoint for the requests that come from the applications. When you implement the virtual filesystem, you handle all requests, so you always fully control the lifetime of the files. As the filesystem is virtual, you are free to keep the files anywhere including the real disk (local or network) or even in the cloud.
The benefit of the filter approach is that you can control individual files that reside on the real disks, while the virtual filesystem can be mounted only to the new drive letter or into the empty directory on the NTFS drive, which is not always fisible. At the same time, sitting in the middle, the filter is to some extent more restricted at what it can do, and the files can be altered while the filter is not running. Finally, filters are more complicated and potentially error prone, as they sit in the middle and must play nice with other filters and with endpoints.
I don't have specific recommendations, but if the separate drive letter is an option, I would recommend the virtual filesystem.
Our company developed (and continues to maintain for the new owner) two products, CBFS Filter and CBFS Connect, which let you create a filter and a virtual filesystem respectively, all in the user mode. Those products are used in many software titles, including some Document Management Systems (which is close to what you do). You will find both products on their website.

How to implement a Definitive Media Library (ITIL DML)?

How to implement a Definitive Media Library (ITIL DML) ?
I would like to know some way to implement a DML based on ITIL.
Given a library of heterogeneous software the only solution that crosses my mind is to use a system file structure (with proper security and access permissions), however this seems very simple and if the library gets too big it will be hard to find software that search.
Is there any specific software for DML?

Many tools that offer CMDB management also offer DML management. Some options for this are ServiceNow and IBM's Change and Configuration Management Database.
If you are only looking for DML functionality, a binary repository manager, such as Sonatype Nexus or Artifactory, provides metadata tagging, version control, and many other useful features. Implementing a binary repository manager and proper procedures for maintaining it serves as an excellent DML solution.

There is nothing wrong with a file system to store software in form of completed, tested (software) configuration items which passed appropriate quality assurance test, etc. But because of the controlled it environment in ITIL, it is mandatory to establish an control structure for tracking the appropriate information of every CI or software in the DML. This records have to contain all relevant informations like version, build date, development, release date, etc. Only tested, confirmed and quality checked, and deployable CIs should be hold in this system.
Because of the extensive meta-data, some years ago, we created a DML depending on postgreSQL because the handling and management for the CIs along with the mandatory tracking (time when inserted, time and logging of access, access control, maybe licenses, etc.) and meta-data is much easier in a sql database. Of course, we had to build the structure for the meta-data into the sql-db, but that was not too complicated and for our straightforward DML it was sufficient. Building and managing the metadata with each ci was supposedly easier for our system than installing, configuring, learning and managing a whole third-party DML/CML system. A caveat is when we had to save whole system images from deployed, tested, already integrated and checked systems because their size were several hundreds of GB (With the newest DB-versions, this should now also be possible, the question is, if it is useful within a SQL-DB) . But we stored the disk-images on separate disks and tracked the meta-data in our tailored DML-system (our postgreSQL) along with the information where to access it.
The advantage was, that we could easily duplicate the DML and take it for example to customer, where we only had to set up or run our (postges-based) DML and were able to access all relevant CIs we needed to set up an heterogeneous network of the target system.
In other cases, maybe it is easier to rely on already existing third party solution, but the idea of a DML can be fulfilled with every storage system as long as the appropriate proecedures, meta-data, informations and access points to the overall life-cycle management are provided and maintained.
regards

Core Data cloud sync - need help with logic

I'm in the middle of brainstorming a cloud sync solution for a Core Data app that I am currently developing. I'm planning to open source the code for this once its done, for anyone to use with their Core Data apps, so input from the community on how this system should work is much appreciated :-) Here's what I'm thinking:
Server Side
Storage Provider
As with all cloud sync systems, storage is a major piece of the puzzle. There are many ways to handle this. I could set up my own server for storage, or use a service like Amazon S3, but because I'm starting out with $0 capital, at this moment, a paid storage solution isn't a viable option. After some thought, I decided to settle with Dropbox (an already well established cloud sync application and storage provider). The pros of using Dropbox are:
It's free (for a limited amount of space)
In addition to being a storage service, it also handles cloud sync
They recently released an Objective-C SDK which makes it much easier to interface with it in Mac and iPhone apps
In case I decide to switch to a different storage provider in the future, I intend to add "services" to this cloud sync framework, basically allowing anyone to create a service class to interface with their choice of storage provider, which can then simply be plugged into the framework.
Storage Structure
This is a really difficult part to figure out, so I need as much input as I can here. I've been thinking about a structure like this:
CloudSyncFramework
======> [app name]
==========> devices
=============> (device id)
================> deviceinfo
================> changeset
==========> entities
=============> (entity name)
================> (object id)
A quick explanation of this structure:
The master "CloudSyncFramework" (name undecided) folder will contain separate folders for each app that uses the framework
Each app folder contains a devices folder and an entities folder
The devices folder will contain a folder for each device that is registered with the account. The device folder will be named according to the device ID, obtained using something like [[UIDevice currentDevice] uniqueIdentifier] (on iOS) or a serial number (on Mac OS).
Each device folder contains two files: deviceinfo and changeset. deviceinfo contains information about the device (e.g. OS version, last sync date, model, etc.) and the changeset file contains information about objects that have changed since the device last synchronized. Both files will just be simple NSDictionaries archived into files using NSKeyedArchiver.
Each Core Data entity has a subfolder under the entities folder
Under each entity folder, every object that belongs to that entity will have a separate file. This file will contain a JSON dictionary with the key-value pairs.
Simultaneous Sync
This is one of the areas where I am almost completely clueless. How would I handle 2 devices connecting and syncing with the cloud at the same time? There seems to be a high risk of things getting out of sync here, or even data corruption.
Handling migrations
Once again, another clueless area here. How would I handle migrations of the Core Data managed object model? The easiest thing to do here seems to be just to wipe the cloud data store clean and upload a new copy of the data from a device which has undergone the migration process, but this seems somewhat risky, and there may be a better way.
Client Side
Converting NSManagedObjects into JSON
Converting attributes into JSON isn't a very hard task (theres lots of code for it floating around the web). Relationships are the key problem here. In this stackoverflow post, Marcus Zarra posts code in which the relationship objects themselves are added to the JSON dictionary. However, he mentions that this can cause an infinite loop depending on the structure of the model, and I'm not sure if this would work with my method, because I store each object as an individual file.
I've been trying to find a way to get an ID as a string for an NSManagedObject. Then I could save relationships in JSON as an array of IDs. The closest thing I found was [[managedObject objectID] URIRepresentation], but this isn't really an ID for an object, its more of a location for the object in the persistent store, and I don't know if its concrete enough to use as a reference for an object.
I suppose I could generate a UUID string for each object and save it as an attribute, but I'm open for suggestions.
Syncing changes to the cloud
The first (and still best) solution that popped into my head for this was to listen for the NSManagedObjectContextObjectsDidChangeNotification to get a list of changed objects, then update/delete/insert those objects in the cloud data store. After the changes have been saved, I would need to update the changeset file for every other registered device to reflect the newly changed objects.
One problem that comes up here is, how would I handle a failed or interrupted sync?. One idea I have is to first push changes to a temporary directory on the cloud, then once that has been confirmed as successful, to merge it with the master data on the cloud so that an interruption in the middle of the sync won't corrupt data. Then I would save records of the objects that need to be updated in the cloud into a plist file or something, to be pushed during the next time the app is connected to the internet.
Retrieving changed objects
This is fairly simple, the device downloads its changeset file, figures out which objects need to be updated/inserted/deleted, then acts accordingly.
And that sums up my thoughts for the logic that this system will use :-) Any insight, suggestions, answers to problems, etc. is greatly appreciated.
UPDATE
After lots of thinking, and reading TechZens suggestions, I have come up with some modifications to my concept.
The largest change I've thought up is to make each device have a separate data store in the cloud. Basically, every time the managed object context saves (thanks TechZen), it will upload the changes to that device's data store. After those changes are updated, it will create a "changeset" file with change details, and save it into the changeset folders of the OTHER devices that are using the application. When the other devices connect to sync, they will go through the changeset folder and apply each changeset to the local data store, then update their respective data stores in the cloud as well.
Now, if a new device is registered with the account, it will find the newest copy of the data out of all the devices and download that for use as its local storage. This solves the problem of simultaneous sync and reduces the chances for data corruption because there is no "central" data store, each devices touches only its data and just updates changes rather than every device accessing and modifying the same data at the same time.
There's some obvious conflict situations to deal with, mainly in relation to deleting objects. If a changeset is downloading instructing the app to delete an object that is currently being edited, etc. there needs to be ways to deal with this.

You want to look at this pessimistic take on cloud sync: Why Cloud Sync Will Never Work.
It covers a lot of the issues that you are wrestling with. Many of them are largely intractable.
It is very, very, very difficult to synchronize information period. Adding in different devices, different operating systems, different data structures, etc snowballs the complexity often fatally. People have been working on variants of this problem since the 70s and things really haven't improve much.
The fundamental problem is that if you leave the system flexible and customizable, then the complexity of synchronizing all the variations explodes exponentially as a function of the number of customization. If you make it rigid, you can sync but you are limited in what you can sync.
How would I handle 2 devices
connecting and syncing with the cloud
at the same time?
If you figure that out, you will be rich. It's a big issue for current cloud sync providers. They real problem here is that your not "syncing" your merging. Software sucks at merging because its very hard to establish a predefined rule set to describe all the possible merges.
The simplest system is to establish either a canonical device or a device hierarchy such that the system always knows which input to choose. This however, destroys flexibility.
How would I handle migrations of the
Core Data managed object model?
The migration of the Core Data model is largely irrelevant to the server. That's something that Core Data manages internally to itself. Model migration updates the model i.e. the entity graph, not the actual data.
Converting NSManagedObjects into JSON
Modeling relationships is hard especially with tools that don't support it as easily as Core Data does. However, the URI of a permanent managed object ID is supposed to serve as a UUID that nails the object down to a specific location in a specific store on a specific device. It's not technically guaranteed to be universally unique but its close enough for all practical purposes.
Syncing changes to the cloud
I think you're confusing implementation details of Core Data with the cloud itself. If you use NSManagedObjectContextObjectsDidChangeNotification you will evoke network traffic every time the observed context changes regardless of whether those changes are persisted or not. Depending on the app, this could drive connections thousands of times in a few minutes. Instead, you only want to sync when context is saved at the most.
One problem that comes up here is, how
would I handle a failed or interrupted
sync?
You don't commit changes until the sync completes. This is a big problem and leads to corrupt data. Again, you can have flexibility, complexity and fragility or inflexibility, simplicity and robustness.
Retrieving changed objects: This is
fairly simple, the device downloads
its changeset file, figures out which
objects need to be
updated/inserted/deleted, then acts
accordingly
It's only simple if you have an inflexible data structure. Describing changes to a flexible data structure is a nightmare.
Not sure if I have helped any. None of the problems have elegant solutions. Most designer end up with rigidity and/or slow, brute force iterative merging.

Take a serious look at RestKit.
It is an open source project that aims to help with integrating iOS apps with cloud data, including but not limited to the scenario where there is a core-data model for that data on the client.
I have recently started to use it in one of my projects, and found it to be quite useful. In the core-data scenario, you implement declarative mappings between your data model and the content you GET from and POST to the server, and it takes care of things like injecting objects from the cloud into your client model, posting new objects to the server and incorporating server-generated objects IDs into your client-side model, doing all of this in a background thread and taking care of all the core-data context threading issues and so on.
RestKit by no means is a mature product, but is has a fairly good foundation and quite a few things that can use help from other contributors. Especially, if your goal is to create an open source solution, it would be great to contribute and improve something like this rather than re-invent a new solution. Unless of course, your see serious differences between what you have in mind and other existing solutions :-)

Since this post was current, there are several new options available. It is possible to develop a solution, and there are apps shipping with these solutions.
Here is a short list of the main Core Data sync options:
Apple's native Core Data/iCloud sync. (Had a rocky start. Seems better now.)
TICDS
Wasabi Sync, a paid service.
Simperium (Seems abandoned.)
ParcelKit with Dropbox Datastore API
Ensembles, the most recent. (Disclosure: I am the founder of the project)

It's like Apple answered my question for me with the announcement of the iCloud SDKs, which come complete with Core Data integration. Win!

Windows Registry best practices

In what way is the Windows registry meant to be used? I know it's alright to store a small amount of user preferences, but is it considered bad practice to store all your users data there? I would think it would depend on the data set, so how about for small amounts of data, say, less than 2KB, in 100 or so different key/value pairs. Is this bad practice? Would a flat file or SQLite db be a better practice?

I'm going to take a contrarian view.
The registry is a fine place to put configuration data of all types. In general it is faster than most configuration files and more reliable (individual operations on the registry are transacted so if your app crashes during a write the registry isn't corrupted - in general that isn't the case with ini files).
Marcelo MD is totally right: Storing things like operation percentage complete in the registry (or any other non volitile storage) is a horrible idea. On the other hand storing data like the most recently used files is just fine - the registry was built for just that kind of problem.
A number of the other commenters on this post talking about the MRU list have discussed the problem of what happens when the MRU list gets out of sync due to application crashes. I'm wondering why storing the MRU list in a flat file in per-user storage is any better?
I'm also not sure what the "security implications" of storing your data in the registry are. The registry is just as secure as the filesystem - the registry and the filesystem use the same ACL mechanism to protect their data.
If you ARE going to store your user data in a file, you should absolutely put your data in %APPDATA%\CompanyName\ApplicationName at least - that way if two different developers create an application with the same name (how many "Media Manager" applications are there out there?) you won't have collisions.

For me, simple user configuration items and user data is better to be stored in either a simple XML configuration file, a SQLLite db, or a MS SQL Server Compact db. The exact storage medium depends on the specifics of the implementation.
I only use the registry for things that I need to set infrequently and that users don't need to be able to change/see. For example, I have stored encrypted license information in the registry before to avoid accidental user removal of the data.

Using the registry to store data has mainly one problem: It's not very user-friendly. Users have virtually no chance of backing up their settings, copying them to another computer, troubleshooting them (or resetting them) if they get corrupted, or generally just see what their software is doing.
My rule of thumb is to use the registry only to communicate with the OS. Filetype associations, uninstaller entries, processes to run at startup, those things obviously have to be in the registry.
But data that is for use in your application only belongs in a file in your App Data folder. (whiever one of the 3+ App Data folders Microsoft currently wants you to use, anyway)

As each user has directory space in Windows already dedicated to storing application user data, I use it to store the user-level data (preferences, for instance) there.
In C#, I would get it by doing something like this:
Environment.GetFolderPath( Environment.SpecialFolder.ApplicationData);
Typically, I'll store SQLite files there or whatever is appropriate for the application.

If your app is going to be deployed "in the enterprise", keep in mind that administrators can tweak the registry using group policy tools. For example, if firefox used the registry for things like the proxy server, it would make deployment a snap because an admin can use the standard tools in active directory to set it up. If you use anything else, I dont think such things can be done very easily.
So don't dismiss the registry all together. If there is a chance an admin might want to standardize parts of your configuration across a network, put the setting in the registry.

I think Microsoft is encouraging use of isolated storage instead of the Windows registry.
Here's an article that explains how to use it in .Net.
You can find those files in Windows XP under Documents & Settings\\Local Settings\ App Data\Isolated Storage. The data is in .dat files

I would differentiate:
On the one hand there is application specific configuration data that is needed for the app to run, e.g. IP addresses to connect to, which folders to use for what sort of files etc, and non trivial per user settings.
Those I put in a config file, ini format for simple stuff, xml if it gets more complex.
On the other hand there is trivial per user settings (best example: window positions and layout). To avoid cluttering the config files (which some users will want to edit themselves, so few and clearly arranged entries are a must), I like to put those in the registry (with conservative defaults being set in the app if no settings in the registry can be found).
I mainly do it like istmatt sais: I store config files inside the %APPDATA% folder. Usually in %APPDATA%\ApplicationName, I don't like the .NET default of APPDATA%\CompanyName\ApplicationName\Version, that level of detail and complexity is counterproductive for most small to medium sized applications.
I disagree with the example of Marcelo MD of not storing recently used files in the registry. IMO this is exactly the volatile sort of user specific information that can be stored there.
(His example of what not to do is very good, though!)

To me it seems easier to think of what you should NOT put there.
e.g: dynamic data, such as an editor's "last file opened" and per project options. It is really annoying when your app loses sync with the registry (file deletion, system crash, etc) and retrieves information that is not valid anymore, possibly deadlocking the user.
At an earlier job I saw a guy that stored a data transfer completness percentage there, Writing the new values at every 10k or so and having the GUI retrieve this value every second so it could show on the titlebar.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio