How to substitute/replace bytes in binary file with shell - shell

is it possible to substitute bytes from a binary file myfile from one specific position to another with dd in a loop or is it more comfortable using another command?
The idea is to replace a block B at position position2 with block A at position1 in a loop.
PSEUDO CODE
# l = 0
while (l <= bytelength of myfile)
copy myfile (from position1 to A+position1) myfile from (position2 to B+position2)
# position1 = position1+steplength
# position2 = position2+steplength
# l = l+steplength
end

The following lines of code in a text file will do approximately what (I think) you are asking for: copy a file from one location to another, but replace a block of bytes at one location with a block from another location; it uses dd as requested. However, it does create a separate output file - this is necessary to ensure there is no conflict regardless of whether the "input" block occurs before, or after, the "replacement" block. Note that it will do nothing if the distance between A and B is less than the size of the block being replaced - this would result in overlap, and it's not clear if you would want the bytes in the overlapping area to be "the end of A" or "the start of the copy of A".
Save this in a file called blockcpy.sh, and change permissions to include execute (e.g. chmod 755 blockcpy.sh). Run it with
./blockcpy.sh inputFile outputFile from to length
Note that the "from" and "to" offsets have base zero: so if you want to copy bytes starting at the start of the file, the from argument is 0
Here is the file content:
#!/bin/bash
# blockcpy file1 file2 from to length
# copy contents of file1 to file2
# replacing a block of bytes at "to" with block at "from"
# length of replaced block is "length"
blockdif=$(($3 - $4))
absdif=${blockdif#-}
#echo 'block dif: ' $blockdif '; abs dif: ' $absdif
if [ $absdif -ge $5 ]
then
# copy bytes up to "to":
dd if=$1 of=$2 bs=$4 count=1 status=noxfer 2>0
# copy "length" bytes from "from":
dd bs=1 if=$1 skip=$3 count=$5 status=noxfer 2>0 >> $2
# copy the rest of the file:
rest=$((`cat $1 | wc -c` - $4 - $5))
skip=$(($4 + $5))
dd bs=1 if=$1 skip=$skip count=$rest status=noxfer 2>0 >> $2
echo 'file "'$2'" created successfully!'
else
echo 'blocks A and B overlap!'
fi
The 2>0 " 'nix magic" suppresses output from stderr which otherwise shows up in the output (of the type: "16+0 records in").

Related

delete entries at certain indices in space delimited text file

I have a .txt file with numeric indices of certain 'outlier' data points, each on their own line, called by $outlier_file:
1
7
30
43
48
49
56
57
65
Using the following code, I can successfully remove certain files (volumes of neuroimaging data in this case) by using while + read.
while read outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm $DWI_path/$1/vol000*"$outlier".nii.gz;
done < $outlier_file
However, I also need to remove the numbers located at these 'outlier' indices from another text file stored in $bvec_file, which has 69 columns & 3 rows. Within each row, the numbers are space delimited. So e.g., for this example, I need to remove all 3 rows of column 1, 7, 30, etc. and then save this version with the outliers removed into a new *.txt file.
0 0.9988864166 -0.0415925034 -0.06652866169 -0.6187155495 0.2291534462 0.8892356214 0.7797364286 0.1957395685 0.9236669465 -0.5400265342 -0.3845263463 -0.4903989539 0.4863306385 -0.6496130843 0.5571164636 0.8110081715 0.9032142094 -0.3234596075 -0.1551409525 -0.806059879 0.4811597826 -0.7820757748 -0.9528881463 0.1916556621 -0.007136403284 -0.2459431735 -0.7915263574 -0.1938049261 -0.1578786349 0.8688043633 -0.5546072294 -0.4019951732 0.2806154851 0.3478762022 0.9548067252 -0.9696777541 -0.4816255837 -0.7962240023 0.6818610905 0.7097978218 0.6739686799 0.1317547111 -0.7648252249 -0.1456021218 -0.5948047487 0.0934205064 0.5268769564 -0.8618324858 -0.3721029232 -0.1827616535 0.691353613 0.4159071597 0.4605505287 0.1312199424 0.426674893 -0.4068291509 0.7167859082 0.2330824665 0.01909161256 -0.06375254731 -0.5981122948 -0.2672253674 0.6875472994 0.2302943724 0 0 0 0
0 0.04258194557 0.9988207007 0.6287131425 0.7469024143 0.5528476637 0.3024964957 0.1446931241 0.9305823612 0.1675139932 0.8208211337 0.8238722992 0.5983722761 0.4238174961 0.639429196 0.1072148887 0.5551578885 0.003337599176 0.511740508 0.9516619405 0.3851404227 0.8526321065 0.1390947346 0.2030449535 0.7759459569 0.165587903 0.9523372297 0.5801228933 0.3277276562 0.7413928896 0.442482978 0.2320585706 0.1079269171 0.1868672655 0.1606136006 0.2968573235 0.1682337977 0.8745679247 0.5989061899 0.4172933119 0.01746934331 0.5641480832 0.7455469091 0.3471016571 0.8035001467 0.5870623128 0.361107261 0.8192579877 0.4160218909 0.5651330299 0.4070513153 0.7221181184 0.714223583 0.6971767133 0.4937978446 0.4232911691 0.8011701162 0.2870385494 0.9016941521 0.09688949547 0.9086826131 0.2631932421 0.152678096 0.6295753848 0.9712458578 0 0 0 0
0 -0.02031513434 -0.02504539005 -0.7747862425 0.2435730944 0.8011542666 0.343155766 -0.6091592581 -0.3093581909 -0.3446424728 -0.1860752773 -0.4163819443 -0.6336083058 0.7641081337 -0.4112580017 -0.8234841915 0.1845683194 0.4291770641 -0.7959243273 -0.2650864686 0.449371034 -0.203724703 0.6074620459 0.2253373638 -0.6009791836 -0.9861692137 0.1804598471 0.1922068008 -0.9246806119 0.6522353256 -0.2222336438 0.7990992685 -0.9092588527 -0.9414539684 0.9236803664 0.0148272357 -0.1772637652 0.05628269894 -0.08566629406 -0.6007759525 0.7041888058 0.4769729119 0.6532997034 -0.5427364139 -0.5772239915 0.5491494803 0.9278330427 0.2263117816 -0.290121617 0.7363179158 0.8949343019 -0.02399176716 0.5629439653 -0.5493977074 -0.8596191107 -0.7992328333 0.4388809483 0.6354737076 0.3641705918 0.9951120218 0.412591228 -0.75696169 0.9514620339 -0.3618197699 0.06038199928 0 0 0 0
As far as I've gotten in one approach is using awk to index the right columns.. (just printing them right now) but I can only get this to work if I call $1 (i.e., the numeric index of the first outlier column)...
awk -F ' ' '{print $1}' $bvec_file
If I try to refer to the value in $outlier, it doesn't work. Instead, this prints the entire contents of $bvec_file
while read outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm $DWI_path/$1/vol000*"$outlier".nii.gz;
# Remove outlier #'s from bvec file
awk -F ' ' '{print $1}' $bvec_file
done < $outlier_file
I am completely stuck on how to get this done. Any advice would be greatly appreciated.
To delete the outliers from bvec_file after the loop and only delete the ones where the associated file was successfully removed:
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
while IFS= read -r outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm "$DWI_path/$1"/vol000*"$outlier".nii.gz &&
echo "$outlier"
done < "$outlier_file" |
awk '
NR==FNR { os[$0]; next }
{
for (o in os) {
$o=""
}
$0=$0; $1=$1
}
1' - "$bvec_file" > "$tmp" &&
mv "$tmp" "$bvec_file"
Or to delete the outliers one at a time as the files are removed:
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
while IFS= read -r outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm "$DWI_path/$1"/vol000*"$outlier".nii.gz &&
# Remove outlier #'s from bvec file
awk -v o="$outlier" '{$o=""; $0=$0; $1=$1} 1' "$bvec_file" > "$tmp" &&
mv "$tmp" "$bvec_file"
done < <(sort -rnu "$outlier_file")
Always quote your shell variables, see https://mywiki.wooledge.org/Quotes, and the && at the end of each line is to ensure the next command only runs if the previous commands succeeded.
The magical incantation in the awk script does the following - lets say your input is a b c and the outlier field is field number 2, b:
$ echo 'a b c'
a b c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; print NF ":", $0}'
3: a c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; $0=$0; print NF ":", $0}'
2: a c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; $0=$0; $1=$1; print NF ":", $0}'
2: a c
The o="" sets the field value to null, the $0=$0 forces awk to resplit $0 into fields so it effectively deletes field 2 (as opposed to the previous step which set it to null but it still existed as such), and the $1=$1 recombines $0 from it's fields replacing every FS (any contiguous chain of white space chars including the 2 blanks now between a and c) with OFS (a single blank char).

Split large csv file into multiple files and keep header in each part

How to split a large csv file (1GB) into multiple files (say one part with 1000 rows, 2nd part 10000 rows, 3rd part 100000, etc) and preserve the header in each part ?
How can I achieve this
h1 h2
a aa
b bb
c cc
.
.
12483720 rows
into
h1 h2
a aa
b bb
.
.
.
1000 rows
And
h1 h2
x xx
y yy
.
.
.
10000 rows
Another awk. First some test records:
$ seq 1 1234567 > file
Then the awk:
$ awk 'NR==1{n=1000;h=$0}{print > n}NR==n+c{n*=10;c=NR-1;print h>n}' file
Explained:
$ awk '
NR==1 { # first record:
n=1000 # set first output file size and
h=$0 # store the header
}
{
print > n # output to file
}
NR==n+c { # once target NR has been reached. close(n) goes here if needed
n*=10 # grow target magnitude
c=NR-1 # set the correction factor.
print h > n # first the head
}' file
Count the records:
$ wc -l 1000*
1000 1000
10000 10000
100000 100000
1000000 1000000
123571 10000000
1234571 total
Here is a small adaptation of the solution from: Split CSV files into smaller files but keeping the headers?
awk -v l=1000 '(NR==1){header=$0;next}
(n==l) {
c=sprintf("%0.5d",c+1);
close(file); file=FILENAME; sub(/csv$/,c".csv",file)
print header > file
n=0;l*=10
}
{print $0 > file; n++}' file.csv
This works in the following way:
(NR==1){header=$0;next}: If the record/line is the first line, save that line as the header.
(n==l){...}: Every time we wrote the requested amount of records/lines, we need to start writing to a new file. This happens every time n==l and we perform the following actions:
c=sprintf("%0.5d",c+1): increase the counter with one, and print it as 000xx
close(file): close the file you just wrote too.
file=FILENAME; sub(/csv$/,c".csv",file): define the new filename
print header > file: open the file and write the header to that file.
n=0: reset the current record count
l*=10: increase the maximum record count for the next file
{print $0 > file; n++}: write the entries to the file and increment the record count
Hacky, but utlizes the split utility, which does most of the heavy lifting for splitting the files. Then, with the split files with a well-defined naming convention, I loop over files without the header, and spit out a file with the header concatenated with the file body to tmp.txt, and then move that file back to the original filename.
# Use `split` utility to split the file csv, with 5000 lines per files,
# adding numerical suffixs, and adding additional suffix '.split' to help id
# files.
split -l 5000 -d --additional-suffix=.split repro-driver-table.csv
# This identifies all files that should NOT have headers
# ls -1 *.split | egrep -v -e 'x0+\.split'
# This identifies files that do have headers
# ls -1 *.split | egrep -e 'x0+\.split'
# Walk the files that do not have headers. For each one, cat the header from
# file with header, with rest of body, output to tmp.txt, then mv tmp.txt to
# original filename.
for f in $(ls -1 *.split | egrep -v -e 'x0+\.split'); do
cat <(head -1 $(ls -1 *.split | egrep -e 'x0+\.split')) $f > tmp.txt
mv tmp.txt $f
done
Here's a first approach:
#!/bin/bash
head -1 $1 >header
split $1 y
for f in y*; do
cp header h$f
cat $f >>h$f
done
rm -f header
rm -f y*
The following bash solution should work nicely :
IFS='' read -r header
for ((curr_file_max_rows=1000; 1; curr_file_max_rows*=10)) {
curr_file_name="file_with_${curr_file_max_rows}_rows"
echo "$header" > "$curr_file_name"
for ((curr_file_row_count=0; curr_file_row_count < curr_file_max_rows; curr_file_row_count++)) {
IFS='' read -r row || break 2
echo "$row" >> "$curr_file_name"
}
}
We have a first iteration level which produces the number of rows we're going to write for each successive file. It generates the file names and write the header to them. It is an infinite loop because we don't check how many lines the input has and therefore don't know beforehand how many files we're going to write to, so we'll have to break out of this loop to end it.
Inside this loop we iterate a second time, this time over the number of lines we're going to write to the current file. In this loop we try to read a line from the input file. If it works we write it to the current output file, if it doesn't (we've reached the end of the input) we break out of two levels of loop.
You can try it here.

bash script: Appending bytes from an input file to a preexisting file at a given byte offset location

What I am trying to accomplish:
I have a file that I need to copy certain bytes from a certain location
and append them to a file at a given location of that file.
I am thinking something along this lines:
xxd -s $startOffset -l $numBytes inFile | dd of=fileToModify seek=$location conv=notrunc
I have this as well but it will only work for appending at the beginning of a file.
read -p "Enter target file :> " targetFile
read -p "Enter source file to append at the end of target file :> " inputFile
dd if=$inputFile of=$targetFile oflag=append conv=notrunc
Thank you in advance!
contents of first file
$ cat first
fskasfdklsgdfksdjhgf sadjfsdjfhf
dsfghkasdfg sadfhsdfh hskdjfksdfgkfg
jhfksjdafhksdjfh
ksdjhfsdjfh
contents of sceond file
$ cat second
jfhasjdhfjskdhf dshfjsdfh3821349832749832]
87348732642364
]yfisdfhshf936494
sdfisdfsdfsa;dlf
9346934623984
contents of shell script
$ cat cppaste.sh
dd if=$1 of=$2 bs=1 count=$3 status=noxfer
dd if=$4 of=$2 bs=1 seek=$3 status=noxfer
finsize=$(stat -c%s $2)
dd if=$1 of=$2 bs=1 skip=$3 seek=$finsize oflag=append status=noxfer
executing shell script with proper arguments
$ bash cppaste.sh first third 10 second
10+0 records in
10+0 records out
107+0 records in
107+0 records out
92+0 records in
92+0 records out
contents of the resultant file
$ cat third
fskasfdklsjfhasjdhfjskdhf dshfjsdfh3821349832749832]
87348732642364
]yfisdfhshf936494
sdfisdfsdfsa;dlf
9346934623984
gdfksdjhgf sadjfsdjfhf
dsfghkasdfg sadfhsdfh hskdjfksdfgkfg
jhfksjdafhksdjfh
ksdjhfsdjfh
Try this:
# copy certain bytes from a certain location
file=$1
certainlocation=$2
certainbytes=$3
# Append them to a file at a given location of that file
givenlocation=$4
dd if=$file of=$file iflag=skip_bytes oflag=seek_bytes,append conv=notrunc skip=$certainlocation seek=$givenlocation count=1 bs=$certainbytes
Usage:
> printf "1\n2\n3\n4\n" > /tmp/1; ./1.sh /tmp/1 4 2 2; cat /tmp/1;
1+0 records in
1+0 records out
2 bytes copied, 0.000378992 s, 5.3 kB/s
1
2
3
4
3
{
dd if=inFile iflag=count_bytes count="$targetByteLocation" status=none
cat -- "$fileToAppend"
dd if=inFile iflag=skip_bytes skip="$targetByteLocation" status=none
} >outFile

How to write from a particular column in shell?

I need to write values to a particular column in a text file, say 10th column.
touch test.txt
echo "232" >> test.txt # I want 232 to start from 10th column of text file
How to go about ?
printf is another alternative to provide an offset. It has the advantage of being able to take the amount of offset (field width) as an argument to the format specifier. The following will take the column offset as the first argument and the data to write at that offset as the second argument (your defaults of 10/232 are used) for example:
#!/bin/sh
col="${1:-10}" # column offset (default: 10)
stuff="${2:-232}" # variable to write at offset
printf "%*s%s\n" "$col" "" "$stuff" # write $stuff at $col offset
exit 0
To create the offset, the printf command format specifier just says use a minimum field width of $col to write the empty-string ("") and thereafter write your data (in $stuff) followed by a newline. With the script saved as prncol.sh:
output:
$ bash prncol.sh
232
$ bash prncol.sh 5 501
501
$ bash prncol.sh 15 anything
anything
Of course to write the output to test.txt, just redirect/append the output of printf to test.txt
for x in $(seq 10)
do
echo -n ' '
done
echo "232" >> test.txt

While loop computed hash compare in bash?

I am trying to write a script to count the number of zero fill sectors for a dd image file. This is what I have so far, but it is throwing an error saying it cannot open file #hashvalue#. Is there a better way to do this or what am I missing? Thanks in advance.
count=1
zfcount=0
while read Stuff; do
count+=1
if [ $Stuff == "bf619eac0cdf3f68d496ea9344137e8b" ]; then
zfcount+=1
fi
echo $Stuff
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
echo "Total Sector Count Is: $count"
echo "Zero Fill Sector Count is: $zfcount"
Doing this in bash is going to be extremely slow -- on the order of 20 minutes for a 1GB file.
Use another language, like Python, which can do this in a few seconds (if storage can keep up):
python -c '
import sys
total=0
zero=0
file = open(sys.argv[1], "r")
while True:
a=file.read(512)
if a:
total = total + 1
if all(x == "\x00" for x in a):
zero = zero + 1
else:
break
print "Total sectors: " + str(total)
print "Zeroed sectors: " + str(zero)
' yourfilehere
Your error message comes from this line:
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
What that does is reads your entire test.dd, calculates the md5sum of that data, and parses out just the hash value, then, by merit of being included inside $( ... ), it substitutes that hash value in place, so you end up with that line essentially acting like this:
done < e6e8c42ec6d41563fc28e50080b73025
(except, of course, you have a different hash). So, your shell attempts to read from a file named like the hash of your test.dd image, can't find the file, and complains.
Also, it appears that you are under the assumption that dd if=test.dd bs=512 ... will feed you 512-byte blocks one at a time to iterate over. This is not the case. dd will read the file in bs-sized blocks, and write it in the same sized blocks, but it does not insert a separator or synchronize in any way with whatever is on the other side of its pipe line.

Resources