How do I split a large file in unix repeatedly?

How do I split a large file in unix repeatedly? - macos

I have a situation with a failing LaCie 500GB hard drive. It stays on for only about 10 minutes, then becomes unusable. For those 10 minutes or so I do have complete control.
I can't get my main mov file(160GB) transferred off that quickly, so I was thinking if I split it into small chunks, I could move them all off. I tried splitting the movie file using the SPLIT command, but it of course took longer than 10 minutes. I ended up with about 14GBs of files 2GB each before it failed.
Is there a way I can use a split command and skip any existing files chunks, so as I'm splitting this file it will see xaa, xab, xac and start after that point so it will continue to split the file starting with xad?
Or is there a better option that can split a file in multiple stages? I looked at csplit as well, but that didn't seem like an option either.
Thanks!
-------- UPDATE ------------
Now with the help of bcat and Mark I was able to do this using the following
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart1 bs=1g count=4
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart2 bs=1g count=4 skip=4
dd if=/Volumes/badharddrive/file.mov of=/Volumes/mainharddrive/movieparts/moviepart3 bs=1g count=4 skip=8
etc
cat /Volumes/mainharddrive/movieparts/moviepart[1-3] -> newmovie.mov

You can always use the dd command to copy chunks of the old file into a new location. This has the added benefit of not doing unnecessary writes to the failing drive. Using dd like this could get tedious with such a large mov file, but you should be able to write a simple shell script to automate part of the process.

Duh! bcat's answer is way better than mine, but since I wrote some code I figured I'd go ahead and post it.
input = ARGV[0]
length = ARGV[1].to_i
offset = ARGV[2].to_i
File.open "#{input}-#{offset}-#{length}", 'w' do |file|
file.write(File.read input, length, offset)
end
Use it like this:
$ ruby test.rb input_file length offset

Related

Bash script that creates files of a set size

I'm trying to set up a script that will create empty .txt files with the size of 24MB in the /tmp/ directory. The idea behind this script is that Zabbix, a monitoring service, will notice that the directory is full and wipe it completely with the usage of a recovery expression.
However, I'm new to Linux and seem to be stuck on the script that generates the files. This is what I've currently written out.
today="$( date +¨%Y%m%d" )"
number=0
while test -e ¨$today$suffix.txt¨; do
(( ++number ))
suffix=¨$( printf -- %02d ¨$number¨ )
done
fname=¨$today$suffix.txt¨
printf ´Will use ¨%s¨ as filename\n´ ¨$fname¨
printf -c 24m /tmp/testf > ¨$fname¨
I'm thinking what I'm doing wrong has to do with the printf command. But some input, advice and/or directions to a guide to scripting are very welcome.
Many thanks,
Melanchole

I guess that it doesn't matter what bytes are actually in that file, as long as it fills up the temp dir. For that reason, the right tool to create the file is dd, which is available in every Linux distribution, often installed by default.
Check the manpage for different options, but the most important ones are
if: the input file, /dev/zero probably which is just an endless stream of bytes with value zero
of: the output file, you can keep the code you have to generate it
count: number of blocks to copy, just use 24 here
bs: size of each block, use 1MB for that

clear piped output without interrupting program?

I am running a program (pianobar) piped to a text file, that outputs every second. The resulting file ("pianobarout.txt") needs to be cleared regularly, or it grows to massive proportions. However, I do not want to stop pianobar to clear the file.
I have tried running > pianobarout.txt as well as echo "" > pianobarout.txt, but both cause the system's resources to spike heavily for almost 30 seconds, causing the audio from pianobar to skip. I tried removing the file, but it appears that the file is not recreated after being deleted, and I just lose the pipe.
I'm working from python, so if any library there can help, those are available to me.
Any ideas?

If you are currently redirecting with truncation, like yourprogram > file.txt, try redirecting with appending: yourprogram >> file.txt.
There is a big difference between the two when the output file is truncated.
With appending redirection, data is written to the current end of the file. If you truncate it to 0 bytes, the next write will happen at position 0.
With truncating redirection, data is written wherever the last write left off in the file. If you truncate it to 0 bytes, writes will continue at byte 1073741824 where it last left off.
This results in a sparse file if the filesystem supports it (ext2-4 and most Unix fs do), or a long wait while the file is written out if it doesn't (like fat32). A long wait could also be caused by anything following the file, such as tail -f, which has to potentially catch up by reading a GB of zeroes.
Alternatives include yourprogram | split -b 1G - output-, which will write 1GB each to output-aa, output-ab, etc, letting you delete old files at your leasure.

Is this (simple) for loop doing what I want it to?

I have pretty much no experience with cygwin & UNIX but need to use it for extracting a large set of data from a even larger set of files...
I had some help yesterday to do this short script, but (after running for ~7-8 hours) the script simply wrote to the same output file 22 times. Atleast that's what I think happened.
I've now changed the code to this (see below) but it would be really awesome if someone who knows how this is done properly could tell me if it's likely to work before I waste another 8 hours...
for chr in {1..22}
do
zcat /cygdrive/g/data/really_long_filename$chr | sed '/^#/d' | cut -f1-3 >> db_to_rs_$chr
done
I want it to read file 1..22, remove rows starting with #, and send columns 1 to 3 to a file ending with the same number 1..22
yesterday the last part was just ...-f1-3 >> db_to_rs which I suspect just rewrote that file 22 times?
Help is much appreciated
~L

Yes, the code would work as expected.
When the command ended in ...-f1-3 >> db_to_rs, it essentially appended all the output to the file db_to_rs.
Saying ... >> db_to_rs_$chr would create filenames ending in {1 .. 22}.
However, note that saying >> would append the output to a file. So if db_to_rs1 already exists, the output would be appended. If you want to create a new file instead, say > instead of >>.

GUNZIP / Extract file "portion by portion"

I'm on a shared server with restricted disk space and i've got a gz file that super expands into a HUGE file, more than what i've got. How can I extract it "portion" by "portion (lets say 10 MB at a time), and process each portion, without extracting the whole thing even temporarily!
No, this is just ONE super huge compressed file, not a set of files please...
Hi David, your solution looks quite elegant, but if i'm readying it right, it seems like every time gunzip extracts from the beginning of the file (and the output of that is thrown away). I'm sure that'll be causing a huge strain on the shared server i'm on (i dont think its "reading ahead" at all) - do you have any insights on how i can make gunzip "skip" the necessary number of blocks?

If you're doing this with (Unix/Linux) shell tools, you can use gunzip -c to uncompress to stdout, then use dd with the skip and count options to copy only one chunk.
For example:
gunzip -c input.gz | dd bs=10485760 skip=0 count=1 >output
then skip=1, skip=2, etc.

Unfortunately I don't know of an existing Unix command that does exactly what you need. You could do it easily with a little program in any language, e.g. in Python, cutter.py (any language would do just as well, of course):
import sys
try:
size = int(sys.argv[1])
N = int(sys.argv[2])
except (IndexError, ValueError):
print>>sys.stderr, "Use: %s size N" % sys.argv[0]
sys.exit(2)
sys.stdin.seek((N-1) * size)
sys.stdout.write(sys.stdin.read(size))
Now gunzip <huge.gz | python cutter.py 1000000 5 > fifthone will put in file fifthone exactly a million bytes, skipping the first 4 million bytes in the uncompressed stream.

Replace chars in file by index

I am looking for a reliable method to replace a sequence of chars in a text file. I know that the file will always follow a specific format and that I need to replace a specific range of chars (ie start at char 20, replace the next 11 chars with '#')
I have found several examples using sed and awk which accomplish this on most files. However, the hangup in my case is that the range of chars in the file contain random gibberish chars include several NULL chars. This causes the file commands to stop processing.
I know that the simplest fix would be to go to the process that creates the file and not pad the file with NULL chars. However, the file is generated by a process buried within ancient COBOL running on a mainframe and any changes there require nearly an act of congress.
so, knowing that I am stuck with what I have, is there any way to manipulate the file, from the command line, that can successfully overwrite the NULL chars?
Thanks in advance.

GNU dd can do that
echo '###########'|dd of=FILENAME seek=20 bs=1 count=11 conv=notrunc
Make sure the echo command provides enough characters as input.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I split a large file in unix repeatedly? - macos

Related

Bash script that creates files of a set size

clear piped output without interrupting program?

Is this (simple) for loop doing what I want it to?

GUNZIP / Extract file "portion by portion"

Replace chars in file by index

Categories

Resources