Random generation of variable sized test files - bash

Here is script i am planning to use to generate 500 test files populated with random data.
for((counter=1;counter<=500;counter++));
do
echo Creating file$counter;
dd bs=1M count=10 if=/dev/urandom of=file$counter;
done
But what i need the script to do is make those 500 files to be of variable size as in let say between 1M and 10M; ie, file1=1M, file2=10M, file3=9M etc …
any help?

This will generate 500 files each containing between 1 and 10 megabytes of random bytes.
#!/bin/bash
max=10 # number of megabytes
for ((counter=1; counter<=500; counter++))
do
echo Creating file$counter
dd bs=1M count=$(($RANDOM%max + 1)) if=/dev/urandom of=file$counter
done
The second line could instead be:
for counter in {1..500}

set MAX=10
for((counter=1;counter<=500;counter++));
do
echo "Creating file$counter"
dd bs=$(( ($RANDOM%$MAX)+1 ))M count=10 if=/dev/urandom of=file$counter
done

Try $((1+$RANDOM%$MAX))

Related

dd: reading binary file as blocks of size N returned less data than N

i need to process large binary files in segments. in concept this would be similar to split, but instead of writing each segment to a file, i need to take that segment and send it as the input of another process. i thought i could use dd to read/write the file in chunks, but the results aren't at all what i expected. for example, if i try :
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
... the output sizes are actually 131,072 bytes and not 1,048,576.
could anyone tell me why i'm not seeing output blocked to 1M chunks and how i could better accomplish what i'm trying to do ?
thanks.
According to dd's manual:
bs=bytes
[...] if no data-transforming conv option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
So try with dd iflag=fullblock:
fullblock
Accumulate full blocks from input. The read system call may
return early if a full block is not available. When that
happens, continue calling read to fill the remainder of the
block. This flag can be used only with iflag. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a count=
argument is interpreted as a block count rather than a count
of read operations.
First of all, you don't need the first dd. A cat file | while or done < file would do the trick as well.
dd bs=1M count=1 might return less than 1M, see
When is dd suitable for copying data? (or, when are read() and write() partial)
Instead of dd count=… use head with the (non-posix) option -c ….
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
Or posix conform but very inefficient
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \\n | tr ' ' '\\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done

bash script: Appending bytes from an input file to a preexisting file at a given byte offset location

What I am trying to accomplish:
I have a file that I need to copy certain bytes from a certain location
and append them to a file at a given location of that file.
I am thinking something along this lines:
xxd -s $startOffset -l $numBytes inFile | dd of=fileToModify seek=$location conv=notrunc
I have this as well but it will only work for appending at the beginning of a file.
read -p "Enter target file :> " targetFile
read -p "Enter source file to append at the end of target file :> " inputFile
dd if=$inputFile of=$targetFile oflag=append conv=notrunc
Thank you in advance!
contents of first file
$ cat first
fskasfdklsgdfksdjhgf sadjfsdjfhf
dsfghkasdfg sadfhsdfh hskdjfksdfgkfg
jhfksjdafhksdjfh
ksdjhfsdjfh
contents of sceond file
$ cat second
jfhasjdhfjskdhf dshfjsdfh3821349832749832]
87348732642364
]yfisdfhshf936494
sdfisdfsdfsa;dlf
9346934623984
contents of shell script
$ cat cppaste.sh
dd if=$1 of=$2 bs=1 count=$3 status=noxfer
dd if=$4 of=$2 bs=1 seek=$3 status=noxfer
finsize=$(stat -c%s $2)
dd if=$1 of=$2 bs=1 skip=$3 seek=$finsize oflag=append status=noxfer
executing shell script with proper arguments
$ bash cppaste.sh first third 10 second
10+0 records in
10+0 records out
107+0 records in
107+0 records out
92+0 records in
92+0 records out
contents of the resultant file
$ cat third
fskasfdklsjfhasjdhfjskdhf dshfjsdfh3821349832749832]
87348732642364
]yfisdfhshf936494
sdfisdfsdfsa;dlf
9346934623984
gdfksdjhgf sadjfsdjfhf
dsfghkasdfg sadfhsdfh hskdjfksdfgkfg
jhfksjdafhksdjfh
ksdjhfsdjfh
Try this:
# copy certain bytes from a certain location
file=$1
certainlocation=$2
certainbytes=$3
# Append them to a file at a given location of that file
givenlocation=$4
dd if=$file of=$file iflag=skip_bytes oflag=seek_bytes,append conv=notrunc skip=$certainlocation seek=$givenlocation count=1 bs=$certainbytes
Usage:
> printf "1\n2\n3\n4\n" > /tmp/1; ./1.sh /tmp/1 4 2 2; cat /tmp/1;
1+0 records in
1+0 records out
2 bytes copied, 0.000378992 s, 5.3 kB/s
1
2
3
4
3
{
dd if=inFile iflag=count_bytes count="$targetByteLocation" status=none
cat -- "$fileToAppend"
dd if=inFile iflag=skip_bytes skip="$targetByteLocation" status=none
} >outFile

Create files of matching pattern from a single data file

I have a file of data type (file.dat) with ASCII data in it and consists two columns. Also this file is sorted according to first column. I want to write a script in either shell or awk in such way that new file should be created for similar record from that sorted file. Suppose I have file consists of (Four records) such as given below...
100.00 321342
100.00 434243
100.00 543231
100.50 743893
Hence according to my problem here two files should be created. One file consists of top three records and other file consists of last record according to data in first column.
File 1 contains
100.00 321342
100.00 434243
100.00 543231
File 2 contains
100.50 743893
your file
100.00 321342
100.00 434243
100.00 543231
100.50 743893
what you need
perl -a -nE 'qx( echo "$F[0] $F[1]" >> "Timestep_$F[0]" )' file
output is simply creates two file and name of one is Timestep_100.00 and name of other is Timestep_100.50 so it is separated by name of the first unique column. that's it.
$ cat Timestep_100.00
100.00 321342
100.00 434243
100.00 543231
and other file
$ cat Timestep_100.50
100.50 743893
This script should do the work:
#!/bin/sh
exec 0<file.txt
makeit=yes
while read stp num; do
if [ -f "Timestep_$stp" ]; then
echo "File Timestep_$stp exists, exiting."
makeit=no
break
fi
done
if [ $makeit = yes ]; then
exec 0<file.txt
while read stp num; do
echo "$stp $num" >> Timestep_$stp
done
echo "Processing done."
fi
The first loop checks that no file exists, otherwise the result would be wrong.

While loop computed hash compare in bash?

I am trying to write a script to count the number of zero fill sectors for a dd image file. This is what I have so far, but it is throwing an error saying it cannot open file #hashvalue#. Is there a better way to do this or what am I missing? Thanks in advance.
count=1
zfcount=0
while read Stuff; do
count+=1
if [ $Stuff == "bf619eac0cdf3f68d496ea9344137e8b" ]; then
zfcount+=1
fi
echo $Stuff
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
echo "Total Sector Count Is: $count"
echo "Zero Fill Sector Count is: $zfcount"
Doing this in bash is going to be extremely slow -- on the order of 20 minutes for a 1GB file.
Use another language, like Python, which can do this in a few seconds (if storage can keep up):
python -c '
import sys
total=0
zero=0
file = open(sys.argv[1], "r")
while True:
a=file.read(512)
if a:
total = total + 1
if all(x == "\x00" for x in a):
zero = zero + 1
else:
break
print "Total sectors: " + str(total)
print "Zeroed sectors: " + str(zero)
' yourfilehere
Your error message comes from this line:
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
What that does is reads your entire test.dd, calculates the md5sum of that data, and parses out just the hash value, then, by merit of being included inside $( ... ), it substitutes that hash value in place, so you end up with that line essentially acting like this:
done < e6e8c42ec6d41563fc28e50080b73025
(except, of course, you have a different hash). So, your shell attempts to read from a file named like the hash of your test.dd image, can't find the file, and complains.
Also, it appears that you are under the assumption that dd if=test.dd bs=512 ... will feed you 512-byte blocks one at a time to iterate over. This is not the case. dd will read the file in bs-sized blocks, and write it in the same sized blocks, but it does not insert a separator or synchronize in any way with whatever is on the other side of its pipe line.

Bash: substitute some bytes in a binary file

I have a binary file zero.bin which contains 10 bytes of 0x00, and a data file data.bin which contains 5 bytes of 0x01. I want to substitute the first 5 bytes of the zero.bin with data.bin. I have tried
dd if=data.bin of=zero.bin bs=1 count=5
but, the zero.bin is truncated, finally it becomes 5 bytes of 0x01. I want to keep the tailing 5 bytes of 0x00.
No problem, just add conv=notrunc:
dd if=data.bin of=zero.bin bs=1 count=5 conv=notrunc
You have half of the solution; do that into a temporary file tmp.bin instead of zero.bin, then
dd if=zero.bin bs=1 seek=5 skip=5 of=tmp.bin
mv zero.bin old.bin # paranoia
mv tmp.bin zero.bin
Don't get stuck on using dd(1). There are other tools, eg:
(cat data.bin && tail -c +5 zero.bin) > updated.bin

Resources