Fastest way to move files on a Windows System [closed] - windows

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I want to move about 800gb of data from an NTFS storage device to a FAT32 device (both are external hard drives), on a Windows System.
What is the best way to achieve this?
Simply using cut-paste?
Using the command prompt ? (move)
Writing a batch file to copy a small chunks of data on a given interval ?
Use some specific application that does the job for me?
Or any better idea...?
What is the most safe, efficient and fast way to achieve such a time consuming process?

Robocopy
You can restart the command and it'll resume. I use it all the time over the network. Works on large files as well.

I would physically move the hard dsk if possible.

I've found fast copy to be quite good for this sort of thing. Its a gui tool ....
http://www.ipmsg.org/tools/fastcopy.html.en

If you have to move it over a network, you want to use FTP between the servers. The Windows File system will get bogged down with chatty protocols.

I've found Teracopy to be pretty fast and handy. Allegedly Fastcopy (as suggested by benlumley) is even faster, but I don't have any experience with it.

Try using WinRar or a zipping tool. Big "files" are moved quicker than lots of small ones.
Most zipping tools allow to split the archive(zip) files into multiple archives.
You might even reduce the size a bit when you turn on compression.

Command Line: xcopy is probably your best bet
Command Reference:
http://www.computerhope.com/xcopyhlp.htm

I used Teracopy and copied 50+GB to a 128GB flash drive.
Too almost 48 hours...had to do it twice because had a power
hiccup. Had to re-format and start over...Not my favorite thing
to do...

One of the fastest way to copy files is use robocopy as pointed by Pyrolistical in above post. its very flexible and powerful.
If command doesn't work from your dos prompt directly then try with powershell option like below example.
Must Check the documentation for this command before using it "robocopy /?".
powershell "robocopy 'Source' 'destination' /E /R:3 /W:10 /FP /MT:25 /V"
/E - Copy subdirectory including empty ones.
/R - Retry 3 times if failed.
/W - wait for 10 seconds between retries.
/FP - include full path name in output.
/MT - Multi thread.
/V - verbose output.

I wanted to comment a comment about multithreading, from #hello_earth, 201510131124, but I don't have enough reputation points on Stackoverflow (I've mostly posted on Superuser up until now) :
Multithreading is typically not efficient when it comes to copying files from 1 storage device to 1 other, because the fastest throughput is reached for sequential reads, and using multiple threads will make a HDD rattle and grind like crazy to read or write several files at the same time, and since a HDD can only access one file at a time it must read or write one chunk from a file then move to a chunk from another file located in a different area, which slows down the process considerably (I don't know how a SSD would behave in such a case). It is both inefficient and potentially harmful : the mechanical stress is considerably higher when the heads are moving repeatedly across the platters to reach several areas in short succession, rather than staying at the same spot to parse a large contiguous file.
I discovered this when batch checking the MD5 checksums of a very large folder full of video files with md5deep : with the default options the analysis was multithreaded, so there were 8 threads with an i7 6700K CPU, and it was excruciatingly slow. Then I added the -j1 option, meaning 1 thread, and it proceeded much faster, since the files were now read sequentially.
Another consideration that derives from this is that the transfer speed will be significantly higher if files are not fragmented, and also, more marginally, if they are located at the begining of a hard disk drive, corresponding to the outermost parts of the platters, where the linear velocity is maximum (that aspect is irrelevant with a solid state drive or other flash memory based device).
Also, the original poster wanted “the most safe, efficient and fast way to achieve such a time consuming process” – I'd say that one has to choose a compromise favoring either speed/efficiency, or safety : if you want safety, you have to check that each file was copied flawlessly (by checking MD5 checksums, or with something like WinMerge) ; if you don't do that, you can never be 100% sure that there weren't some SNAFUs in the process (hardware or software issues) ; if you do that, you have to spend twice as much time on the task.
For instance : I relied on a little tool called SynchronizeIt! for my file copying purposes, because it has the huge advantage compared to most similar tools of preserving all timestamps (including directory timestamps, like Robocopy does with the /DCOPY:T switch), and it has a streamlined interface with just the options I need. But I discovered that some files were always corrupted after a copy, truncated after exactly 25000 bytes (so the copy of a 1GB video for instance had 25000 good bytes then 1GB of 00s, the copy process was abnormally fast, took only a split second, which triggered my suspicion in the first place). I reported this issue to the author a first time in 2010, but then he chalked it up to a hardware malfunction, and didn't think twice about it. I still used SI, but started to check files thoroughly every time I made a copy (with WinMerge or Total Commander) ; when files ended up corrupted I used Robocopy instead (files which were corrupted with SynchronizeIt, when they were copied with Robocopy, then copied again with SynchronizeIt, were copied flawlessly, so there was something in the way they were recorded on the NTFS partition which confused that software, and which Robocopy somehow fixed). Then in 2015 I reported it again, after having identified more patterns regarding which files were corrupted : they had all been downloaded with particular download managers. That time the author did some digging, and found the explanation : it turned out that his tool had trouble copying files with the little known “sparse” attribute, and that some download managers set this attribute to save space when downloading files in multiple chunks. He provided me with an updated version which correctly copies sparse files, but hasn't released it on his website (the currently available version is 3.5 from 2009, the version I now use is a 3.6 beta from October 2015), so if you want to try that otherwise excellent software, be aware of that bug, and whenever you copy important files, thoroughly verify if each copied file is identical to the source (using a different tool), before deleting them from the source.

Related

Why slow simulations when results saved into one directory?

Would love some help figuring out why a script is running much slower than it used to.
The script starts sequential Matlab simulations and saves each simulation's output to a file in a directory on computer #1. The script is running on computers #2, 3, and 4 which have the C: drive of computer #1 mounted as drive K:, and the computers read and write K: drive files during the simulations. Prior to starting each simulation, the script saves a 'placeholder' version of the simulation's output file which later gets overwritten with that simulation's results once the simulation is complete. The output filename is unique to that simulation. The script checks for the output file before starting a simulation; if the file is found, it goes to the next simulation. The intent is to divide up many simulations among the different computers. The directory on computer #1 has many files in it (~4000, 6GB) and computer #1 is an old windows XP machine. Computers #2-4 are also windows machines and are 2+ years old.
This scheme used to work fine, saving ~3 files per minute. Now it is taking ~15 minutes per file. What might be the leading cause for the slowdown? Could it be the number of files in the directory or the number of computers accessing computer #1? If that is unlikely, I would like to know so I can redirect my troubleshooting.
The number of items in a single directory absolutely leads to decreased performance. I've read that it depends on OS, filesystem, phase of the moon, local/remote drives ... maybe phase of the moon.
My personal rule of thumb is that at about 5,000 items per directory performance starts to degrade, and at about 10,000 performance has degraded enough that whatever you are doing will not work correctly anymore.
It turns out the problem was an old network switch that the various computers were plugged into. When we tried a newer switch, the script ran like lightning.
However everyone's suggestions (subdirectories to reduce # of files; defragging computer #1 which turned out to be badly fragmented) were very helpful, and it was great to have some other eyes on the problem, so thanks.

Downloading simultaneously multiple files with big file lists on windows

I am looking for a program that could download simultaneously (like, about 100 files in parallel) multiple files. The only thing is, that this program should be able to handle very big lists of files (like 200MB of links), and should work on windows.
As for now, I have tested aria2, but when I load my file list I get out of memory exception (aria is trying to use over 4Gb of memory!). Also I tried using mulk, but this thing just is not working (because I don't believe that it is loading my files list for about two hours now, when generating this list and writing onto the disk took me about a half of a minute). I haven't tried using wget yet, but as far as I know it cannot download in parallel, am I right?
Is there any software that could handle my requirements?
With aria2, you can use --deferred-input option to reduce memory footprint for list input. Also making --max-download-result option low, such as 100, may reduce memory usage too.

Copyfile and clusters reservation for Windows

What is the OS (XP, Vista, Win7) behavior for copying files (with CopyFile) ?
When does it reserve clusters to copy to? which of the following ?
it reserves all destination clusters before starting to copy
it reserves some clusters, then copy a file portion to
these clusters, then, reserves additional clusters, then
copy a new file portion to these new reserved clusters,
etc.
The copy operation used by Explorer and cmd.exe reserves most of the disk space immediately, at least on my Windows 7 32-bit, as you can see by watching the free space on the volume. To the best of my recollection this behaviour has been the same in all versions of Windows since at least NT 4.
However, there are several caveats:
Explorer and cmd.exe don't (necessarily) use CopyFile.
This behaviour might be different in different versions of Windows, or depending on circumstances.
It might be only most of the destination clusters, for example it might sometimes needs to expand the MFT to complete the operation; I don't think this is likely, but I can't rule it out.
My recommendation:
If a slim possibility of the occasional failure is acceptable, test CopyFile and if it behaves as expected go ahead and use it.
If it isn't, consider doing the copy yourself. Unfortunately that last caveat might apply even then, but as I said I think it's probably not a significant risk.
You need to be prepared to cope with an unexpected failure either way since hardware faults, or perhaps even file system corruption, could cause the copy to fail part way through.

A Linux Kernel Module for Self-Optimizing Hard Drives: Advice?

I am a computer engineering student studying Linux kernel development. My 4-man team was tasked to propose a kernel development project (to be implemented in 6 weeks), and we came up with a tentative "Self-Optimizing Hard Disk Drive Linux Kernel Module". I'm not sure if that title makes sense to the pros.
We based the proposal on this project.
The goal of the project is to minimize hard disk access times. The plan is to create a special partition where the "most commonly used" files are to be placed. An LKM will profile, analyze, plan, and redirect I/O operations to the hard disk. This LKM should primarily be able to predict and redirect all file access (on files with sizes of < 10 MB) with minimal overhead, and lessen average read/write access times to the hard disk. I believe Apple's HFS has this feature.
Can anybody suggest a starting point? I recently found a way to redirect I/O operations by intercepting system calls (by hijacking all the read/write ones). However, I'm not convinced that this is the best way to go. Is there a way to write a driver that redirects these read/write operations? Can we perhaps tap into the read/write cache to achieve the same effect?
Any feedback at all is appreciated.
You may want to take a look at Unionfs. You don't even need a LKM - just a some user-space daemon which would subscribe to inotify events, keep statistics and migrate files between partitions. Unionfs will combine both partitions into a single logical filesystem.
There are many ways in which such optimizations might be useful:
accessing file A implies file B access is imminent. Example: opening an icon file for a media file by a media player
accessing any file in some group G of files means that other files in the group will be accessed shortly. Example: mysql receives a use somedb command which implies all the file tables, indexes, etc. will be accessed.
a program which stops reading a sequential file suggests the program has stalled or exited, so predictions of future accesses associated with that file should be abandoned.
having multiple (yet transparent) copies of some frequently referenced files strategically sprinkled about can use the copy nearest the disk heads. Example: uncached directories or small, frequently accessed settings files.
There are so many possibilities that I think at least 50% of an efficient solution would be a sensible, limited specification for what features you will attempt to implement and what you won't. It might be valuable to study how Microsoft's Vista's aggressive file caching mechanism disappointed.
Another problem you might encounter with a modern Linux distribution is how well the system already does much of what you plan to improve. In fact, measuring the improvement might be a big challenge. I suggest writing a benchmark program which opens and reads a series of files and precisely times the complete sequence. Run it several times with your improvements enabled and disabled. But you'll have to reboot in between for valid timing....

Performance issues using Copyfile() to copy files from different computers

Using VC++ VisualStudio 2003.
I'm trying to copy several image files (30kb or so per file) from another computer`s shared folder to a local file.
The problem is that there can be more than 2000 or so files in one transfer, and it seems
to take its toll, substantially taking more time to complete.
Is there any alternate method of copying files from another computer that could possibly
speed up the copy?
Thanks in advance.
EDIT*
Due to client request, it is not possible to change the code base dramaticaly,
hate to have to deviate from best practice because of non-techinical issues,
but is there a more subtle approuch? such as another function call?
I know I`m asking for some magical voodoo, asking just in case somebody knows of such.
A few things to try:
is copying files using the OS any faster?
if no, then there may be some inherent limitations to your network or the way it's setup (maybe authentication troubles, or the distant server has some hardware issues, or it's too busy, or the network card loses too many packets because of collisions, faulty switch, bad wiring...)
make some tests transferring files of various sizes.
Small files are always slower to transfer because there is a lot of overhead to fetch their details, then transfer the data, then create directory entries etc.
if large files are fast, then your network is OK and you're probably not be able to improve the system much (the bottleneck is elsewhere).
Eventually, from code, you could try to open, read the files into a large buffer in one go then save them on the local drive. This may be faster as you'll be bypassing a lot of checks that the OS does internally.
You could even do this over a few threads to open, load, write files concurrently to speed things up a bit.
A couple of references you can check for mutli-threaded file copy:
MTCopy: A Multi-threaded Single/Multi file copying tool on CodeProject
Good parallel/multi-thread file copy util? discussion thread on Channel 9.
McTool a command line tool for parallel file copy.
If implementing this yourself in code is too much trouble, you could always simply execute a utility like McTool in the background of your application and let it do the work for you.
Well, for a start, 2000 is not several. If it's taking most of the time because you're sending lots of small files, then you come up with a solution that packages them at the source into a single file and unpackages them at the destination. This will require some code running at the source - you'll have to design your solution to allow that since I assume at the moment you're just copying from a network share.
If it's the network speed (unlikely), you compress them as well.
My own beliefs are that it will be the number of files, basically all the repeated startup costs of a copy. That's because 2000 30K files is only 60MB, and on a 10Mb link, theoretical minimum time would be about a minute.
If your times are substantially above that, then I'd say I'm right.
A solution that uses 7zip or similar to compress them all to a single 7z file, transmit them, then unzip them at the other end sounds like what you're looking for.
But measure, don't guess! Test it out to see if it improves performance. Then make a decision.

Resources