Limiting Dropbox cache size [closed] - caching

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I am having an issue with the Dropbox cache, whereby periodically I find that a particular machine I am syncing to with Dropbox has run out of disk space and the Dropbox cache is the culprit. This is a problem because the machine Dropbox is installed on is headless (or nearly so) and therefore the only indication that something is wrong is suddenly data that should be available on the machine isn't.
I have read that it is possible to clear the cache, but this is a pain as this machine is running OS X and there is no command line interface, meaning that I have to VNC into the machine simply to restart Dropbox. This also seems to limit my options for automatically clearing the cache, although having to create a periodic task to clean the Dropbox folder seems kludgy and error prone. (For instance, the disk could fill up before the script runs.)
(Update: It appears that deleting the files in a low disk condition results in Dropbox starting to sync again without restarting, but I am not sure if there are any undesirable side-effects to this, everywhere I have read about the cache says to stop Dropbox during the delete and restart it afterwards.)
In addition, it appears that the reason Dropbox is running out of space so fast is that I have a single large log file (on the order of half a gigabyte) which is append-only, but Dropbox is creating a new cached copy of the entire old version every time a change is made. So from the standpoint of performance, it is kinda undesirable that it keep creating duplicates of this large file for every tiny addition of a few bytes to the file.
Disk space is rather tight on this machine, so I would rather simply have Dropbox limit how much caching it does. Is there some way to do this? My searches so far have turned up empty.
Update: I tried opening a Dropbox support request, only to get an e-mail reply stating: "Thanks for writing in. While we'd love to answer every question we get, we
unfortunately can't respond to your inquiry due to a large volume of support
requests." ಠ_ಠ

I just have a command file that I run now and then on my MacBook Air to clear space, which contains also these lines:
rm -rf /Users/MYUSERNAME/Dropbox/".dropbox.cache"/old_files/{*,.*}
osascript -e 'tell application "Terminal" to quit' & exit
Should be easy enough to automate, no?

I have the same issue with the exact same cause (took a while to figure out too): a log file inside a Dropbox folder that is actually not that big (several MB), but it does update every minute with a couple of hundred bytes. My cache is killing me. My total local Dropbox folder has 150 GB of which 50 GB is the cache!
I just cleared it, and my understanding is there are no consequences other than resync, but this is unsustainable.
I see several solutions here:
Dropbox is not suitable for this use case. Do not keep frequently updated logs on Dropbox. I think this would be a bummer, because there should be a fairly simple technical solution to this, and they are:
Dropbox either has OR SHOULD HAVE a setting for the maximum size of the cache, the way browsers do. This should not be too hard to implement if it does not exist (apparently), otherwise tell us where it is.
A script can be written (talking about Linux here) that periodically (every hour should be enough, but it can be done every minute in theory) checks for disk size of .dropbox.cache and if it is over some limit, it will delete some files. You could delete 10 most recent ones, or 10% of the files, or if you really wanted to go fancy you could calculate how much you have to delete, from oldest file on to maintain a certain cache size. The issue might be stopping Dropbox but seems like if you simply pause syncing that should be ok and enough.
Number 2 and #3 are really one and the same, it's just a question of who is going to do it. Given that Dropbox isn't an open source platform, it would probably be best for Dropbox to write and maintain this feature. Any third party plugin for this may stop working when something inside Dropbox codebase changes.
Dropbox does have an incentive NOT to provide this feature, because frequent syncing = more bandwidth. But I thought we pay for bandwidth.
Thank you Dropbox, we all love you, especially since you gave us all that extra space for free.

Related

How to detect unnecessary Windows\Installer file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
i have search a lot to find out How to detect unnecessary Windows\Installer file !
I am sure lot of you have faced this issue previously and solve it somehow.
Now when i look at my ---
C:\Windows\Installer
directory on Windows Server 2008 R2 i can see it took already 42 GB out if total 126 GB.
Now what i would like to know can i just delete all the files from that Installer directory or i have to detect which file can be removed !
Do anyone knows any solution for this issue !
How do you define unnecessary?
Specialized system case: You want the minimum footprint and are willing to sacrifice functionality that you don't expect to use.
If all is well, each the files in C:\Windows\Installer are a local cache of an installed Windows Installer package, patch, transform, etc. They are necessary for uninstallation, auto-repair or on-demand installation to succeed. If you will never need any of those things on these machines (i.e. if you are bringing them up on demand as VMs, and would rebuild them rather than uninstall something), then unless the app itself invokes Windows Installer APIs itself, it may be relatively safe to remove files from C:\Windows\Installer. In addition, you could call the Windows Installer API MsiSourceListEnum to find other caches of files that are used for these same purposes. It may be similarly safe (or unsafe) to remove those files.
More usual case: You'd rather not rebuild the system
If you suspect there are unreferenced files in that folder left over from prior upgrades or uninstallations, you can try to use Windows Intstaller API calls to verify this. At a very low level, you can call MsiEnumProducts (or possibly MsiEnumProductsEx) to find the product codes of all installed products, and MsiGetProductInfo/Ex(szProduct, INSTALLPROPERTY_LOCALPACKAGE, ...) to find its cached .msi file and INSTALLPROPERTY_TRANSFORMS for a list of its transforms. Then MsiEnumPatches/Ex to find all patch codes and MsiGetPatchInfo/Ex (again with INSTALLPROPERTY_LOCALPACKAGE and/or INSTALLPROPERTY_TRANSFORMS) to list the .msp and .mst files it references. In theory, the full set of all files referenced here should match up with the full set of files in C:\Windows\Installer. (Or there are more references to look for...)
(Before you write anything to do this, consider that there are probably apps out there that automate this, or are even smarter about it, such as the one referenced in another answer.)
You could not delete them all.
There is a good answer about your problem, I test in my lab. It works for me.
Notes: If possible, you had better copy this folder to anther disk (such as E:)

Make deleted files unrecoverable [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I'm cleaning a machine with Windows 7 that I use that will be reassigned to another co-worker, and I would like to clear all the deleted files so they can be unrecoverable.
I tried using cipher w:f:\, then I installed Recuva and I still can see a lot of files that can be recovered.
Then I created a little program that creates a file with 0's that has the size of the free space on disk (after creating the file, I can see on Windows Explorer that the disk has like 100kb of free space only).
Then I delete the file and I run Recuva, and again I can see all those files as recoverable.
I'm just curious about what's happening under the hood. If I leave like 100Kb of free space in the disk, then why are there more than 100k of recoverable files still?
To make files unrecoverable, you need to use a "digital file shredder" application. This will write a series of zeroes and ones to the file to be shredded, multiple times. While 3 passes seems sufficient for many users, the US government has set a standard of 7 passes to meet most of its security needs.
There are several free file shredder applications, and even more commercial file shredder tools. Some security suite software (such as Antivirus with personal security protection tools) may also provide a file shredder.
For recommendations on digital file shredder applications, please ask for Windows digital file shredder recommendations at https://softwarerecs.stackexchange.com/
As for why "deleted" files are still listed by recovery tools as "recoverable", when a file is deleted, all that normally happens is a flag is set in the master file index maintained by the file system. The raw data of the file is left on the hard disk as "noise/garbage". If no other files are written into the area occupied by the deleted file, then it is trivial to recover the data. If other data has been overwritten on it, it becomes a non-trivial, but still possible, exercise to recover the data as it was before it was overwritten. Large scale recovery vendors are capable of recovering a file even if it has been overwritten a few tiles. This is why the "security" standards of the US government call for the file area to be overwritten 7 times, as only the most serious (and expensive) recovery operation can recover that data.
To make a file "disappear", the master file index also needs to have the information "erased" and overwritten ("shredding" the file's meta-data to be hidden and very hard to recover).
If you are interested in the details and how to more permanently hide or delete a file, you might want to consider asking at https://security.stackexchange.com/ about how the windows 7 file system works, and what it takes to truly delete or make a file sufficiently overridden to make it impractical to recover.

Possible to bypass caching and download/open file to RAM?

Preamble:
Recently I came across an interesting story about people who seem to be sending emails with documents that contain child pornography. This is an example (this one is jpeg but im hearing about it being done with PDFs, which generally cant be previewed)
https://www.youtube.com/watch?v=zislzpkpvZc
This can pose a real threat to people in investigative journalism, because even if you delete the file after its been opened in Temp the file may still be recovered by forensics software. Even just having opened the file already puts you in the realm of committing a felony.
This also can pose a real problem to security consultants for a group. Lets say person A emails criminal files, person B is suspicious of email and forwards it to security manager for their program. In order to analyze the file the consultant may have to download it on a harddrive, even if they load it in a VM or Sandbox. Even if they figure out what it is they are still in this legal landmine area that bad timing could land them in jail for 20 years. Thinking about this if the memory was to only enter the RAM then upon a power down all traces of this opened file would disappear.
Question: I have an OK understanding about how computer architecture works, but this problem presented earlier made me start wondering. Is there a limitation, at the OS, hardware, or firmware level, that prevents a program from opening a stream of downloading information directly to the RAM? If not let's say you try to open a pdf, is it possible for the file it's opening to instead be passed to the program as a stream of downloading bytes that could then rewrite/otherwise make retention of the final file on the hdd impossible?
Unfortunately I can only give a Linux/Unix based answer to this, but hopefully it is helpful and extends to Windows too.
There are many ways to pass data between programs without writing to the hard disk, it is usually more of a question of whether the software applications support it (web browser and pdf reader for your example). Streams can be passed via pipes and sockets, but the problem here is that it may be more convenient for the receiving program to seek back in the stream at certain points rather than store all the data in memory. This may be a more efficient use of resources too. Hence many programs do not do this. Indeed a pipe can be made to look like a file, but if the application tries to seek backward, it will cause an error.
If there was more demand for streaming data to applications, it would probably be seen in more cases though as there are no major barriers. Currently it is more common just to store pdfs in a temporary file if they are viewed in a plugin and not downloaded. Video can be different though.
An alternative is to use a RAM drive, it is common for a Linux system to have at least one set up by default (tmpfs), although it seems for Windows that you have to install additional software. Using one of these removes the above limitations and it is fairly easy to set a web browser to use it for temporary files.

Graceful File Reading without Locking

Whiteboard Overview
The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.
Internal
Global
Additional Information
I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.
Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.
Possible Implementations
Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.
I figure there are two simple ways for me to accomplishing the above on the in-house side.
Method one (slow):
Walk the /Foo directory tree every N minutes.
Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.
Merge changes with off-site FTP server.
Method two:
Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).
Log changes.
Merge changes with off-site FTP server every N minutes.
I'll probably end up using something like the second method due to performance considerations.
Problem
Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.
While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.
Possible Solution
The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.
The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.
This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.
One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.
Questions
This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.
I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?
I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.
After the discussions in the comments my proposal would be like so:
Create a partition on your data server, about 5GB for safety.
Create a Windows Service Project in C# that would monitor your data driver / location.
When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
Create another service that would do the following:
Monitor Bandwidth Usages
Monitor file creations on the temporary partition.
Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
Remove the files from the partition that have successfully transferred.
So basically you have your drives:
C: Windows Installation
D: Share Storage
X: Temporary Partition
Then you would have following services:
LocalMirrorService - Watches D: and copies to X: with the dir structure
TransferClientService - Moves files from X: to ftp server, removes from X:
Also use multi threads to move multiples and monitors bandwidth.
I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.
When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.
this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.
For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.
rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...
Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!

Can anyone recommend a good backup "system" for a developer? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm known around the office as "the backup guy". As a developer, I often jump back and forth between projects, and as a result I don't always remember exactly what changes were present in each when I return to them. I usually have to compare my local changes versus those in our source control system, and then I'll eventually remember it all. Thing is, I don't always have the luxury of doing this. Sometimes I have to build something for a client quickly, and so I make a backup of the working directory, and that way I can get the latest files from source control, and build the DLL quickly - all while knowing that the other (in-progress) changes are safe.
The problem is that I've now accumulated a bunch of backup folders in each project directory, which makes it harder to find the specific change I was looking for. while my practices have evolved to the point that I always take the time to give each backup folder an informative name, I'm starting to think I'd be better off writing my own tool.
For example: If I select a few folders in windows explorer, I'd like to have my own context menu item that triggers my own backup application. This application would prompt me for a backup name, and description. It would then move the selected folders to a specific, centralized backup directory - where it would also generate a 'readme.txt' file, outlining the backup details. Also, the backups would also be organized by date/time. I feel this would refine my backup procedure, and facilitate future lookups.
But yet, I can't help but wonder if such tools already exist. Surely, someone must be as obsessive as me when it comes to backups.
Do you know of any tools that could help me improve my backups?
I'm aware of this post, but isn't exactly aligned with what I want. I'd prefer to keep the backups on the same machine - I'll handle moving them over to other machines myself.
Update
To clarify: If I'm working on Task A, and suddenly I need build something for a client (Task B), I have to backup what I have so far for Task A, and get the latest from source control into the working directory. I then start and finish Task B, and then restore Task A. This is an ideal, neat scenario. But sometimes, I only get back to Task A a week down the line, or further - because I get hit with Task C, Task D, etc - all of which affect the same project. Now, if these changes are scheduled to be checked in, then I would probably benefit from checking them in as I progress (but to be honest, we usually wait until it is complete before we check it in, at this company - that means less checkins of unfinished code). So I'm not sure if each of my backups should equal a branch - because I'm sometimes excessive with my backups.
I think what you want is a distributed version control system, such as git.
First, your existing source control system can probably already support this, in the form of branches. Instead of just copying the working directory, commit it as a separate branch, where you can keep that client's version of the application.
However, as skiphoppy said, a distributed source control system would be much better suited for this. I quite like Bazaar, but git is very popular too (although I don't know how good its Windows support is, since it is primarily a *nix tool developed for the Linux kernel)
Subversion using TortoiseSVN will provide you with this functionality. The concepts are different (revisions, not "backup names") . The readme.txt that you make mention of is summarized in the Subversion log. Any comment that you provide can be used to guide others looking at the revision. Check out the Wikipedia page on Subversion as well as the homepage to download it and TortoiseSVN.
CloneZilla, backs up your entire hard drive partition, its free and reliable. I use it in place of Acronis Echo Server, and it restores my entire system in 8 minutes.
As skiphoppy says, a DSVN can really help. Git offers the ability to shelve the stuff you're working on now so that your working copy is clean yet you can pull your current working set off the shelf when you're done. That seems like what you really want.
If you're using Perforce, there's a couple of tar-based utilities that do this, too, but I haven't yet used them.
How about changing the way you work, sounds like one day things will go tits up if you carry on as is. Fair enough on the need to build a dll mid way through a change and having to back up your work in progress, but once release is done then re-integrate your changes with the release version immediately. I'd never allow myself to have multiple back ups of the same app, but hey, that's just me.
I use Hybrid Backup www.hybridbackup.com.au - based in Australia they were the only real people i could speak to that could handle exacly what i wanted - i dont have dll problems i have over 1000 files that all change everyday and everytime anyone inmy office does anything - i have well over 250gb of live data i need backed up everynight with every single change i have ever done - ever - basically i can be fairly lazy and copy files all over the place and copy directories to make sure everythings is backed up again but knowing that everyday everysingle thing i change (including my directory backups) are backed up and i can remmeber a file i know i had and see my backups exacly as they were 5 months ago - that was it - but the big thing is it syncs to 2 different places - brisbane and sydney - so i know everything safe - they even sent me a external backup vault/server to store everything on. cost a bit but business is data where im from and im sure most other people.
anyway just trying to point out you should have a awesome backup system so you dont worry about those things to start with.
I think it's a pretty reasonable practice to check in every night. Sometimes I check in 3 or 4 times a day, sometimes 20 (every time my code is working, actually).
If your code is always checked in, you should easily be able to just sync to a different branch without backing anything up.
If you can't check in your changes by the end of the day, a very reasonable answer is to discard them. You are most likely in some hole that you will have trouble digging yourself out of, and the next day you will replicate the work in an hour and do it MUCH BETTER than you did the first time. Also, if you go that long with broken code, how do you test?
Finally, if your code REALLY can't be checked into the build every day (it actually does happen in some situations, regardless of what I said in the previous paragraph), branch.
No more backups.
I use:
ZenOK Online Backup for my documents and small files (photos, videos and large files)
Love it.

Resources