I have an external hard-drive that I suspect is on its way out. At the minute, I can transfer files from it, but only for a while. Unfortunately, I have one single file that's >50GB in size. My solution to this is to use rsync to transfer this one particular file a bit at a time, leave the drive to rest (switch it off), and resume a little while later.
I'm using rsync --partial --progress --inplace --append -a /Volumes/Backup\ Drive/chris/Desktop/Recording\ Sessions/S1/Session\ 1/untitled ~/Desktop/temp to transfer it. (The file is in the untitled folder, which I'm moving into the temp folder) However, after having stopped it and resumed it, it seems to be over-writing the previous attempt at the file, meaning I don't really get any further.
Is there something I'm missing? :X
Thankyou ^_^
EDIT: Still don't know :\
Well, since this is a programming site, here's a program to do it. I tested it on OS X, but you should definitely test it on some small files first to make sure it does what you want:
#!/usr/bin/env python
import os
import sys
source = sys.argv[1]
target = sys.argv[2]
begin = int(sys.argv[3])
end = int(sys.argv[4])
mode = 'r+b' if os.path.exists(target) else 'w+b'
with open(source, 'rb') as source_file, open(target, mode) as target_file:
source_file.seek(begin)
target_file.seek(begin)
buffer = source_file.read(end - begin)
target_file.write(buffer)
You run this with four arguments: the source file, the destination, and two numbers. The first number is the byte count to start copying from (so on the first run you'd use 0). The second number is the byte count to copy until (not including). So on subsequent runs you'd always use the previous fourth argument as the new third argument (new begin equals old end). And just go on like that until it's done, using whatever sizes you like along the way.
I know this is related to macOS, but the best way to get all the files off a dying drive is with GNU ddrescue. I have no idea if this runs nicely on macOS, but you can always use a Linux live-usb to do this. You'll want to open a terminal and be either root (preferred) or use sudo.
Firstly, find the disk that you want to backup. This can be done by running the following. Make note of the partition name or disk name that you want to back up. Hard drives/flash drives will typically use the format sdX, where X is the drive letter. Partitions will be listed under sdX1, sdX2... etc. NVMe drives/partitions follow a similar naming convention.
lsblk -o name,size,label,fstype,model
Mount and change directory (cd) to a writable location that is bigger than the drive/partition you want to back up.
Now we are going to do a first pass over the drive/partition. This will do a first pass, without stopping on problematic sections. This will ensure that ddrescue does not cause any more damage by trying to access a bad section. Think of it like a hole in a sweater -- you wouldn't want to keep picking at the hole or it would get bigger. Run the following, with sdX replaced with the drive/partition name from earlier:
ddrescue -d /dev/sdX backup.img backup.logfile
the -d flag uses direct disk access and ignores the kernel cache, and the logfile is important in case the drive gets disconnected or the process stops somehow.
Run ddrescue again with the -r flag. This will retry bad sections 3 times. Feel free to run this a few times, but note that ddrescue cannot restore everything. From my experience it usually restores in the high 90%s, and many of the files are system files (aka not your personal files).
ddrescue -d -r3 /dev/sdX backup.img backup.logfile
Finally, you can use the image however you want. You can either mount it to copy the files off or use it in a virtual machine/burn it to a working drive with dd. Do note that the latter options will not always work if system critical files were damaged.
Good luck and remember to make backups!
Related
I accidentally ran this command while trying to remove an errant directory named \\ from my project directory. Quite a mistake I know. It pretty quickly began hitting permissioned files at which point I realized my mistake so I ctrl-c'ed out of there. I have all my important projects backed up but the command killed my development environment. Opening vim anywhere is crashing and throwing a segfault like so:
Vim: Caught deadly signal SEGV
Error detected while processing function <SNR>130_PollServerReady[7]..<SNR>130_Pyeval:Vim: Finished.
line 4:
Exception MemoryError: MemoryError() in <module 'threading' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.pyc'> ignored
[1] 6921 segmentation fault vim ~/dotfiles/.vimrc
My primary question for myself and anyone who commits a similar gaff is:
What, precisely does the double backslash // point to? What would be deleted first? Is there a logical first place to begin replacing util, configs, $PATH stuff etc?
Hopefully, this is clear and specific enough for SO.
cd // will take you to the root directory /.
rm performs a depth-first search, walking the results of the xfts_open call. find also traverses filesystems in this manner.
find / will list the files that exist. You can then use your knowledge of the expected structure to reverse the list that are missing.
Alternatively, you can use debugfs to help you get at the files.
This assumes that these commands will actually work. Realistically, your system is probably hosed. Deleting things in / will break your computer. Restoring from backup is probably the easiest way to return to a functional system. You can also try various utilities to recover recently erased files from your hard drive; if you plan on doing this, you should stop using your computer, as your hard drive currently treats many areas as free space (since you told it to) which recently held files in / and it could start writing to those areas.
In Linux, and I believe other *nix flavours, an extra slash in a path is simply ignored. Thus, a//b is the same as a/b and // is the same as /. I hope you didn't run this as a superuser...
I run the command to find the files named ".*large_files.*"
[root#iz2ze9wve43n2nyuvmsfx5z ~]# find / -iregex ".*large_files.*"
/root/search_large_files.py
It found the file but the cursor is shinning endless even if I leave it alone for over half an hour.
What's the bug in my codes to cause the problem?
Well, it may be that you just have massive file systems :-)
But, if you think it shouldn't be taking that long, you may well have mount points that are slower than normal, such as NFS-mounts where you have to go out over the network to get file information.
You could probably see a slow-down in that case if you just run find / on its own. If it goes out to an external location (like, I don't know, a ZX80 running in Antartica), the output rate may show that, and you'll be able to identify where in the hierarchical structure it happens.
Another possibility is to restrict it to the actual file system you're on to minimise the chance it will go external. That would be by using the xdev flag to prevent it crossing file systems. On my VM with one root file system but mounts for my C and D host drives, I cut the time down from two minutes to seventeen seconds.
Of course, that won't go to other local file systems but you could, if necessary write a script to find (with xdev) the file on all file systems marked ext4 (and whatever other ones you deem to be local).
I am trying to write a video ruby transformer script (using ffmpeg) that depends on mov files being ftped to a server.
The problem I've run into is that when a large file is uploaded by a user, the watch script (using rb-inotify) attempts to execute (and run the transcoder) before the mov is completely uploaded.
I'm a complete noob. But I'm trying to discover if there is a way for me to be able to ensure my watch script doesn't run until the file(s) is/are completely uploaded.
My watch script is here:
watch_me = INotify::Notifier.new
watch_me.watch("/directory_to_my/videos", :close_write) do |directories|
load '/directory_to_my/videos/.transcoder.rb'
end
watch_me.run
Thank you for any help you can provide.
Just relying on inotify(7) to tell you when a file has been updated isn't a great fit for telling when an upload is 'complete' -- an FTP session might time out and be re-started, for example, allowing a user to upload a file in chunks over several days as connectivity is cheap or reliable or available. inotify(7) only ever sees file open, close, rename, and access, but never the higher-level event "I'm done modifying this file", as the user would understand it.
There are two mechanisms I can think of: one is to have uploads go initially into one directory and ask the user to move the file into another directory when the upload is complete. The other creates some file meta-data on the client and uses that to "know" when the upload is complete.
Move completed files manually
If your users upload into the directory ftp/incoming/temporary/, they can upload the file in as many connections is required. Once the file is "complete", they can rename the file (rename ftp/incoming/temporary/hello.mov ftp/incoming/complete/hello.mov) and your rb-inotify interface looks for file renames in the ftp/incoming/complete/ directory, and starts the ffmpeg(1) command.
Generate metadata
For a transfer to be "complete", you're really looking for two things:
The file is the same size on both systems.
The file is identical on both systems.
Since "identical" is otherwise difficult to check, most people content themselves with checking if the contents of the file, when run through a cryptographic hash function such as MD5 or SHA-1 (or better, SHA-224, SHA-256, SHA-384, or SHA-512) functions. MD5 is quite fine if you're guarding against incomplete transmission but if you intend on using the output of the function for other means, using a stronger function would be wise.
MD5 is really tempting though, since tools to create and validate MD5 hashes are very widespread: md5sum(1) on most Linux systems, md5(1) on most BSD systems (including OS X).
$ md5sum /etc/passwd
c271aa0e11f560af419557ef49a27ac8 /etc/passwd
$ md5sum /etc/passwd > /tmp/sums
$ md5sum -c /tmp/sums
/etc/passwd: OK
The md5sum -c command asks the md5sum(1) program to check the file of hashes and filenames for correctness. It looks a little silly when used on just a single file, but when you've got dozens or hundreds of files, it's nice to let the software do the checking for you. For example: http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.0.19-real-real/MD5SUMS -- Mozilla has published such files with 860 entries -- checking them by hand would get tiring.
Because checking hashes can take a long time (five minutes on my system to check a high-definition hour-long video that wasn't recently used), it'd be a good idea to only check the hashes when the filesizes match. Modify your upload tool to send along some metadata about how long the file is and what its cryptographic hash is. When your rb-inotify script sees file close requests, check the file size, and if the sizes match, check the cryptographic hash. If the hashes match, then start your ffmpeg(1) command.
It seems easier to upload the file to a temporal directory on the server and move it to the location your script is watching once the transfer is completed.
Want to upgrade my file management productivity by replacing 2 panel file manager with command line (bash or cygwin). Can commandline give same speed? Please advise a guru way of how to do e.g. copy of some file in directory A to the directory B. Is it heavy use of pushd/popd? Or creation of links to most often used directories? What are the best practices and a day-to-day routine to manage files of a command line master?
Can commandline give same speed?
My experience is that commandline copying is significantly faster (especially in the Windows environment). Of course the basic laws of physics still apply, a file that is 1000 times bigger than a file that copies in 1 second will still take 1000 seconds to copy.
..(howto) copy of some file in directory A to the directory B.
Because I often have 5-10 projects that use similar directory structures, I set up variables for each subdir using a naming convention :
project=NewMatch
NM_scripts=${project}/scripts
NM_data=${project}/data
NM_logs=${project}/logs
NM_cfg=${project}/cfg
proj2=AlternateMatch
altM_scripts=${proj2}/scripts
altM_data=${proj2}/data
altM_logs=${proj2}/logs
altM_cfg=${proj2}/cfg
You can make this sort of thing as spartan or baroque as needed to match your theory of living/programming.
Then you can easily copy the cfg from 1 project to another
cp -p $NM_cfg/*.cfg ${altM_cfg}
Is it heavy use of pushd/popd?
Some people seem to really like that. You can try it and see what you thing.
Or creation of links to most often used directories?
Links to dirs are, in my experience used more for software development where a source code is expecting a certain set of dir names, and your installation has different names. Then making links to supply the dir paths expected is helpful. For production data, is just one more thing that can get messed up, or blow up. That's not always true, maybe you'll have a really good reason to have links, but I wouldn't start out that way, just because it is possible to do.
What are the best practices and a day-to-day routine to manage files of a command line master?
( Per above, use standardized directory structure for all projects.
Have scripts save any small files to a directory your dept keeps in the /tmp dir, .
i.e /tmp/MyDeptsTmpFile (named to fit your local conventions) )
It depends. If you're talking about data and logfiles, dated fileNames can save you a lot of time. I recommend dateFmts like YYYYMMDD(_HHMMSS) if you need the extra resolution.
Dated logfiles are very handy, when a current process seems like it is taking a long time, you can look at the log file from a week ago and quantify exactly how long this process took, a week, month, 6 months (up to how much space you can afford). LogFiles should also capture all STDERR messages, so you never have to re-run a bombed program just to see what the error message was.
This is Linux/Unix you're using, right? Read the man page for the cp cmd installed on your machine. I recommend using an alias like alias CP='/bin/cp -pi' so you always copy a file with the same permissions and with the original files' time stamp. Then it is easy to use /bin/ls -ltr to see a sorted list of files with the most recent files showing up at the bottom of the list. (No need to scroll back to the top, when you sort by time,reverse). Also the '-i' option will warn you that you are going to overwrite a file, and this has saved me more than a couple of times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
To restrict the scope, let assume we are in Windows world only.
Also assume we don't want to play with permission policy.
Is it possible for us to create a file that cannot be copied?
Thank you in advance.
"Trying to make digital files uncopyable is like trying to make water not wet." ~ Bruce Schneier
No. You can't create a file that a SYSADMIN can't copy. You could encrypt it, though.
Well, how about creating a file that uses up more than 50% of the total space on that machine and that is not compressible?
For instance, let us assume that you want to save a boolean (true or false) in such a fashion.
Depending on its value, you could then write a bit stream of ones or zeroes and encrypt said stream using some kind of encryption algorith, such as AES in CBC mode. This gives you the added advantage of error correction. Even in case of massive data corruption, you should be able to recover your boolean by checking whether ones or zeroes are prevalent in the decrypted stream.
In that case you cannot copy it around (completely) on the machine...
Of course, any type of external memory that can be added to the system would pose a problem in this scenario. But the file would be already encrypted, so don't worry about it too much...
Any file that can be read can have its contents written to another location (such as another file, i.e. copied).
The only thing you can do is limit who/what can read the file.
What is the motivation behind? If it is a read-only file, you can have it as embedded resources within your assembly.
Nice try, RIAA.
But seriously, no you can not. It is always possible to copy, you can just make it it more difficult for people to make sense of the file or try to hide it using like encryption. Spotify does it.
If you really try hard thou, you cold make a root-kit for windows and use it to prevent windows from even knowing about the file and also prevent copies. The file will still be there and copy-able by other tools, or Linux accessing the ntfs.
If in a running process you open a file and hold an exclusive lock, then other processes cannot read the file until you close the handle or your process terminates. However, as admin you could forcibly remove the lock handle.
Short answer: No.
You can, of course, use security settings to limit who can read the file. But if someone can read it, then they can copy it. Even if you found some operating system trick to disable "ordinary" copying, if someone can read the file, they can extract the contents, store it in memory, and then write it somewhere else.
You can encrypt the contents so it's only useful to your own program, that knows how to decrypt it.
That's about it.
When using Windows 7 to copy some files from a hard drive, certain files popped up a message saying they could not be copied in their entirety; certain data would be omitted from the copy. I suspect that had something to do with slack space at the end of the files, though I thought the message was curious. I would have expected the copy operation to just ignore the slack space.
If you are running old (OLD) versions of windows, there are certain characters you can put in the filename that make it invalid, not listed in folders, etc. They were used a lot in the old pub ftp days of filesharing ;)
In the old DOS days, you used to be able to flag disk sectors as bad and still read from them. This meant the OS ignored the sector in question but your application would know where to look and be able to get the data. Not sure this would work these days.
Another old MS-DOS trick was to put a space character in the middle of the filename (yes, spaces were valid characters for filenames). Since there was no method on the command line to escape a space, the file couldn't be copied using the DOS commands.
This answer is outside Windows so yeah
Dont know if its already been said but what about a file that is an inseperable part of the firmware so that it is always on AND running, perhaps it has firmware that generates a sequence that is required for the other . AN incedental effect of its running is to prevent any 80% or more of its code from being replicated. Lets say its on an entirely different board, protected by surge protectors, heavy em proof shielding and anything else required to make it completely unerasable.
If its possible to make a program that is ALWAYS on and running as long as the copying software is running then yes.
I have another way and this IS with windows. I will come to your house and give you a disk, i will then proceed to destroy every single computer you put the disk into. This doesnt work on XP
Well technically you could create and write to a write-only network share.