Basic Software-Design Questions about backup-prog. in D

Basic Software-Design Questions about backup-prog. in D - windows

For some Webhost issues I have to write a file backup/syncronisation tool for a common OS in the server sector (Windows/Linux). Most Linux root-servers offers the ssh-interface for secure communication so I could use the SSH File Transfer Protocol, but what's the best solution on the windows side (on the fly) ? And are there good D libraries (or C alternatives)
I'm writing here and not in the admin or windows stack because there is one reason: It's important that there are existing libraries. So an easy implementation is more important than the existence of an interface or protocol. The simplicity and the language features and not the possibilities have priority.
All in all I am looking for an easy way to implement an os independent tool for a file exchange. For the synchronisation work it has to be possible to access some file information ( last write time, modified time, filesize, etc.)
edit:"My Version" of a synconisation tool should work on a new system without extra sotfware installation (maybe some automated installation over ssh-windows-equivalent [if there is one])
You only enter your access data and it should work. Furthermore I also need a protocol and this is the biggest problem. Because ssh doesn't work on windows on the fly - is there an equivalent?

rsync is a popular file synchronization tool, best suited to files being added, deleted, or extended. It's been very well debugged and is quite simple to set up. (rsync -avzP username#hostname:/path/to/source/ /path/to/dest/ or rsync -avzP /path/to/source/ username#hostname:/path/to/dest/ are common.)
rsync is frequently tunneled over ssh; it does have its own protocol if you don't mind it being publicly open.
But if you've got a lot of data that is being slightly moved, or frequent renames, a tool like git can make much better use of bandwidth. It does carry the downside of keeping history on both sides, which might be less disk-efficient than you'd like, but it can more than compensate if your bandwidth is a bit less amazing.
git is also frequently tunneled over ssh; it also has its own protocol if you don't mind it being publicly open.
I doubt either one has D library bindings, but C bindings ought to be easy to come by. :)

Related

Configure system wide proxy setting

I want to set system wide proxy settings on my windows machine. I know about the settings from Internet Explorer but dont want to do it that way. Is there a way to set up a proxy which will be used by all the applications on my machine(especially firefox, I dont want to have to set Use System proxy Settings in the Firefox options menu)?

In windows, that is the preferred way to set up the proxy settings.
But you can have a look at this for command line options
https://superuser.com/questions/419696/in-windows-7-how-to-change-proxy-settings-from-command-line

How can this be achieved theoretically
I am going to provide a somewhat unusual answer, because I've noticed that this particular 'way' of solving this problem has (for some reason) not crossed people's minds so far.
If you want to really make all apps without exception send internet traffic through your proxy, you are going to have to use a special technology known as TUN/TAP devices.
In short, these are special drivers, which when installed appear to a system as a network adapter (just like your local Ethernet or Wireless card), but they are in fact built in such a way so as to be easy to control from a software level.
Basically, when you install such a driver on the system, the system now regards that device as a fully functional Network Adapter. Therefore, if you now set this network adapter as the default gateway, all apps (without knowing it or being able to prevent it) will automatically pass through it, the same way as all apps pass through a generic Wireless Adapter / Ethernet.
Practical ways of achieving this / How can I use this with proxies?
Now that you have a basic idea of what redirecting system traffic through a TAP/TUN device means, there are a couple of ways of doing this.
Before I start, I really recommend that even if you stray from the suggested resources here, you stick to using OpenVPN's open source TAP device, since it has been extensively tested and confirmed to work on many systems, and is very widely used now (Some basics are available at https://openvpn.net/tuntap, and I trust you should find it embedded in any latest version of OpenVPN, the only files you need are the compiled drivers (.inf), you don't need to have the entirety of OpenVPN installed to use them).
The project that instantly comes to mind when thinking of using SOCKS proxies as the endpoint of a TAP device is badvpn/tun2socks. The project basically does exactly what is outlined here, so I definitely recommend you read the source code, or use it as a standalone utility (If you need some help with usage, I suggest you check out this wiki page.
What if any are the drawbacks of using this approach?
First of all, speaking of compatibility, performance and bugs, there are no drawbacks of using this approach at all, it is if anything more reliable and easier to use then even the ways of doing this provided by the system.
The only two drawbacks I can see at this point would be:
You have to be careful to make sure whatever proxy/intermediate host you are using, it is capable of handling at least the majority of system traffic, because if an app sends incompatible internet traffic, it will still be redirected through the TAP device (that is it's purpose).
The code base may be larger than in other cases
An alternative, 'unclean' way of doing this for Firefox in particular
If you are interested in only setting this proxy for Firefox, there are a couple of unclean ways of doing this: For instance, via the command line. It is, however (in my opinion), a very cheap and dirty way of achieving this, as this does not provide any compatibility whatsoever (basically a hack).
Conclusion
While implementing this may take a while, and the code base may be large:
It is not really possible, through any other means to achieve the same effect as VPNs achieve when they tunnel the entirety of your machine's traffic through the OpenVPN server.
If you want to achieve this kind of behavior, it is recommended that you use the approach outlined above, as it is a lot cleaner then 'alternative' methods of doing so (e.g. Socksifying traffic by intercepting it at a software level)

Graceful File Reading without Locking

Whiteboard Overview
The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.
Internal
Global
Additional Information
I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.
Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.
Possible Implementations
Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.
I figure there are two simple ways for me to accomplishing the above on the in-house side.
Method one (slow):
Walk the /Foo directory tree every N minutes.
Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.
Merge changes with off-site FTP server.
Method two:
Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).
Log changes.
Merge changes with off-site FTP server every N minutes.
I'll probably end up using something like the second method due to performance considerations.
Problem
Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.
While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.
Possible Solution
The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.
The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.
This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.
One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.
Questions
This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.
I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?
I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.

After the discussions in the comments my proposal would be like so:
Create a partition on your data server, about 5GB for safety.
Create a Windows Service Project in C# that would monitor your data driver / location.
When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
Create another service that would do the following:
Monitor Bandwidth Usages
Monitor file creations on the temporary partition.
Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
Remove the files from the partition that have successfully transferred.
So basically you have your drives:
C: Windows Installation
D: Share Storage
X: Temporary Partition
Then you would have following services:
LocalMirrorService - Watches D: and copies to X: with the dir structure
TransferClientService - Moves files from X: to ftp server, removes from X:
Also use multi threads to move multiples and monitors bandwidth.
I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.
When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.
this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.

For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.
rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...
Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!

Zipped/compressed file aware delta uploading (cross platform)

So I am interested in a way (ideally cross platform) to have a zip file which is uploaded over annoyingly slow uplink connections (think ADSL) - where only the delta is uploaded (assuming a recent version is on the server and minimal changes to be uploaded).
Now rsync can work - with gzip rsync-aware support (ie you compress the file with gzip - but telling it to be rsync friendly) - but this is, well, a bit of a pain on windows.
Has this been solved before? or is rsync/gzip combo the state of the art?
(note that this network is asymetric - downloads are an order of magnitude faster - so not a bidirectional sync issue).

rdiff-backup is available for Windows as well, and pretty much intended for these kind of problems. Also handles binary diffs pretty well it seems. Only use it if it's non-mission-critical data, as the Windows-variant is not that well tested.
I use it for Linux, no experience with Windows whatsoever. It would probably be a good idea to compare hashes on local / remote location to be sure.
You have to implement some cronjob (schedule a job in the taskbar, don't know how it's called in Windows) to clean older files if you don't want incremental backups though.
Not sure if it fits your needs, but I think it comes very close and is definitely worth checking out!

The best solution seems to be "use rsync, even on windows"

Which DVCS would work best on Windows for my scenario?

At work I use ClearCase and SourceSafe, but have found some time to do some time to code for myself enroute thanks to a disposable laptop.
However, I wish I had a lightweight VCS on my system using which I would be able to make changes to my code during the commute and then push/grab them from my Linux systems.
I use git on my home system, but I can't really get it working on Windows. I don't want all that cygwin hack.
If it does not run natively on Windows, it just won't do.
What have you guys tried on your Windows system? Something that YOU use.
The big player at the moment seems to be Mercurial?
What would be best for a one (or maybe two) man team?
I just need to maintain :
Versioned copies of source code.
Checking in and out should be as less obtrusive as possible.
I am looking forward to a multiple Undo kind of feature (like that in an EMacs buffer) but persistent.
I really like the way git keeps track of lines moving between files in a source code set
I should be able to move part(s)/sub tree(s) of the source tree (each sub tree implies a module/plugin to my the main software I am building) to an archival system either completly or partially and restore them back from the archive as and when required and the system should track any changes to this tree as well.
I actually want to experiment with my code as much as possible without me manually keeping track of what I modified and what I need to undo once I try out some idea, so that I am back to where I want to continue from.
Notes : A similar topic came up a year ago : DVCS Choices - What's good for Windows?
I hope things have changed, and I really want people to share their own, real life experiences. Not something they recommend without using it or they think will work.

Bazaar and Mercurial both work very well on Windows. I posted in the question you linked, and since then, both have improved their Windows support even more. Using them is easy and flawless, and they even have GUIs if you swing that way.

I for one have switched from bazaar to git, and I've been pleased.

If you've a Clearcase background, why don't you take a look at Plastic SCM? Check this link, it will show you how it works on a distributed setup (and of course all the basic operations) http://codicesoftware.blogspot.com/2010/03/distributed-development-for-windows.html.
You won't miss any of the "good" clearcase features but all the shortcomings are simply gone (faster, installs on 45seconds, no cumbersome setup to use on a mixed Win/Linux scenario, built-in ACLs, excellent branching and merging, much better common ancestor algorithm, visualizations, better GUI, and you still have "selectors" in case you miss config_specs, but not being mandatory)

Technical Hurdles for Win32 rsync port

Despite primarily being a windows user, I am a huge fan of rsync. Now, I don't want to argue the virtues of rsync vs any other tool...this is not my point.
The only way I've ever found of running rsync on windows is via a version that is built to run on top of Cygwin, and as Cygwin has issues with Unicode, so does rsync.
Is anyone familiar enough with the workings of rsync to say if there are any real technical programming hurdles to porting rsync to a native Win32 binary?
Or is it maybe that there has just never been enough interest from windows users to care to port it over?
Partly I ask because I'm am considering trying to take on the task of starting a port, but I want to make sure there's not something I'm missing in terms of why it may not be possible.

The way that windows locks open files might cause an issue requiring you to hook into the Volume Shadowcopy Service.
About two years ago this fellow ported the algorithm to C#. I haven't taken a look at the code (or the provided binary), but it might be a place to start looking or someone to try contacting.
http://www.russiantequila.com/wordpress/?p=8

I've been evaluating an effort to undertake a win32 port as well. I don't believe anything major would block it, but evidence from both the rsync mailing list and another discussion points to a heavy reliance on unix fork() system calls. Using threads appears the way to go for win32.
Threads vs. Fork discussion

(disclaimer: i promise, i don't google myself, but google analytics brought me here)
i went through porting rsync to .net (sig11's link is my blog). there are no technical hurdles, just practical ones. as was already said, the code is rather... dense. difficult to follow, and complete lack of comments. i'm more than happy to make my work available, but unfortunately, since it was part of a commercial effort, it's not in significantly better shape.
i have, on a number of occasions, messed around with the idea of reverse-engineering the protocol and doing a ground-up implementation that's wire-compatible with the existing one, but ... a bit cleaner to work with. i've even started a wiki to that effect, but... as you can see from the lack of contents there, other item have taken priority. if anyone would like to work with me on this, that may be the impetus i need to get going.
the concept of the tool is great, as is the functionality it offers, however it's rather limited outside the *ix space, and could definitely benefit from an api.
wiki link for reference:
http://www.russiantequila.com/wiki/index.php?title=Main_Page

Have you seen this:
http://www.itefix.no/i2/taxonomy/term/39
I have used cwrsync without any problem (and with the much of the usual cygwin misery), but I haven't had any need for unicode filenames, so I've not seen that problem.
I don't really know why there isn't a native Win32 port, but I did look at the source a while back because I implemented a similar delta-copy system in C#. As one would expect from the world of brilliant *nix hackers, the source is largely single-character variable names and a total absence of comments, which isn't terrible helpful and might be rather off-putting to would-be porters.

I would really appreciate a port of rsync to MS-Windows such that it can be built using Visual Studio. I am encountering various protocol errors at random, somewhat intermittently. I am using rsync to distribute sw to a grid of around 200 machines and typically get around a a dozen failures. I am using GCC 4.4.2 and the latest cygwin to build rsync v3.0.7. It would help me alot if I could experiment with a version that does not require cygwin. This is because the machines in the grid already have another cygwin-based app running that is a different version to the one I have.
Having spent some time on the rsynv mailing list opinion seems to be divided as to cause of protocol errors on MS-Windows. Some say it is a bug in rsync where it failed to do a clean socket shutdown, a bug that was fixed a while ago. Others say that it is a fundamental protocol error in rsync where the client does not tell the server that it is finished, it just shuts down, causing MW-windows servers to get a RST signal on the socket, something that does not happen on Unix.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio