Check if a file's hex dump has modulo 4 length - bash

I am trying to write a script that, taking a file as argument, checks if the hex dump of the file has a mod 4 length. If the length is not mod 4 it must add 00 to the end of the dump to make it mod 4
I tried with hexdump
D=$(hexdump filename)
if [ $((${#D}%4)) != 0 ]
then
D+=00
fi
and with od
D=$(od -t x filename)
if [ $((${#D}%4)) != 0 ]
then
D+=00
fi
but both methods not working
I think the problems are as follows:
I include the offset columns in variable D when I only have to consider the hexdump.
For example:
cat file.txt
// Hello World!
//This is a file .txt
od -t x file.txt
//0000000 6c6c6548 6f77206f 21646c72 6968540a
//0000020 73692073 66206120 20656c69 7478742e
//0000040 0000000a
//0000041
The columns 0000000 0000020 0000040 0000041 must not be inside D.
In practice, inside D there must be 6c6c65486f77206f21646c726968540a736920736620612020656c697478742e0000000a
Also, I have to modify that dump so that it has a mod 4 lenght and I don't know if it is enough to simply add 00 at the end.
Any idea how this can be done?

D=$(xxd -p $1|tr -d '\n')
while [ $(bc<<<$(echo -n $D|wc -c)%4) != 0 ];do D+=00;done
does not include columns

Why do you need to examine the hex dump? Just pad the raw file with zeros if necessary.
padump () {
local size=$(stat -f '%z' "$1")
local i
( cat "$1"
for ((i=size; i%2>0; i++)); do
printf '\0'
done ) |
hexdump
}
Usage: padump filename
The argument to stat is unfortunately not portable; on Linux, try stat -c '%s' "$1" (the code above was written on macOS).
The modulo of the above is 2 because four hex digits corresponds to two bytes of file.
Anyway, I'll note that in your attempt, od already adds more padding than you bargained for, and outputs the bytes in the wrong order. The final value at offset 0x0041 is the single byte 0x0a and the zeros before it actually belong after it, and are padding. The other hex values are similarly swapped; 0x6c is the character l, but occurs first in the dump - read as a regular hex dump, 6c6c6548 spells lleH. You can fix this with options;
#!/bin/sh
od -t 2x --endian=big "$1" |
sed 's/^[^ ]* //;s/ //g;$d'
Because od already adds the padding, this is probably the simplest solution. The above also discards the offsets (the first column; sed 's/^[^ ]* //') and removes any remaining spaces between the hex digits (s/ //g), and the final line with just an offset ($d). However, the --endian=big option is Linux only. Alternatively, use -t 1x and add padding as above.

Related

read file line by line and sum each line individually

Im trying to make a script that creates a file say file01.txt that writes a number on each line.
001
002
...
998
999
then I want to read the file line by line and sum each line and say whether the number is even or odd.
sum each line like 0+0+1 = 1 which is odd
9+9+8 = 26 so even
001 odd
002 even
..
998 even
999 odd
I tried
while IFS=read -r line; do sum+=line >> file02.txt; done <file01.txt
but that sums the whole file not each line.
You can do this fairly easily in bash itself making use of built-in parameter expansions to trim leading zeros from the beginning of each line in order to sum the digits for odd / even.
When reading from a file (either a named file or stdin by default), you can use the initialization with default to use the first argument (positional parameter) as the filename (if given) and if not, just read from stdin, e.g.
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
Which you will use infile with your while loop, e.g.
while read -r line; do ## loop reading each line
...
done < "$infile"
To trim the leading zeros, first obtain the substring of leading zeros trimming all digits from the right until only zeros remain, e.g.
leading="${line%%[1-9]*}" ## get leading 0's
Now using the same type parameter expansion with # instead of %% trim the leading zeros substring from the front of line saving the resulting number in value, e.g.
value="${line#$leading}" ## trim from front
Now zero your sum and loop over the digits in value to obtain the sum of digits:
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
All that remains is your even / odd test. Putting it altogether in a short example script that intentionally outputs the sum of digits in addition to your wanted "odd" / "even" output, you could do:
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
while read -r line; do ## read each line
[ "$line" -eq "$line" 2>/dev/null ] || continue ## validate integer
leading="${line%%[1-9]*}" ## get leading 0's
value="${line#$leading}" ## trim from front
sum=0 ## zero sum
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
printf "%s (sum=%d) - " "$line" "$sum" ## output line w/sum
## (temporary output)
if ((sum % 2 == 0)); then ## check odd / even
echo "even"
else
echo "odd"
fi
done < "$infile"
(note: you can actually loop over the digits in line and skip removing the leading zeros substring. The removal ensure that if the whole value is used it isn't interpreted as an octal value -- up to you)
Example Use/Output
Using a quick process substitution to provide input of 001 - 020 on stdin you could do:
$ ./sumdigitsoddeven.sh < <(printf "%03d\n" {1..20})
001 (sum=1) - odd
002 (sum=2) - even
003 (sum=3) - odd
004 (sum=4) - even
005 (sum=5) - odd
006 (sum=6) - even
007 (sum=7) - odd
008 (sum=8) - even
009 (sum=9) - odd
010 (sum=1) - odd
011 (sum=2) - even
012 (sum=3) - odd
013 (sum=4) - even
014 (sum=5) - odd
015 (sum=6) - even
016 (sum=7) - odd
017 (sum=8) - even
018 (sum=9) - odd
019 (sum=10) - even
020 (sum=2) - even
You can simply remove the output of "(sum=X)" when you have confirmed it operates as you expect and redirect the output to your new file. Let me know if I understood your question properly and if you have further questions.
Would you please try the bash version:
parity=("even" "odd")
while IFS= read -r line; do
mapfile -t ary < <(fold -w1 <<< "$line")
sum=0
for i in "${ary[#]}"; do
(( sum += i ))
done
echo "$line" "${parity[sum % 2]}"
done < file01.txt > file92.txt
fold -w1 <<< "$line" breaks the string $line into lines of character
(one digit per line).
mapfile assigns array to the elements fed by the fold command.
Please note the bash script is not efficient in time and not suitable
for the large inputs.
With GNU awk:
awk -vFS='' '{sum=0; for(i=1;i<=NF;i++) sum+=$i;
print $0, sum%2 ? "odd" : "even"}' file01.txt
The FS awk variable defines the field separator. If it is set to the empty string (this is what the -vFS='' option does) then each character is a separate field.
The rest is trivial: the block between curly braces is executed for each line of the input. It compute the sum of the fields with a for loop (NF is another awk variable, its value is the number of fields of the current record). And it then prints the original line ($0) followed by the string even if the sum is even, else odd.
pure awk:
BEGIN {
for (i=1; i<=999; i++) {
printf ("%03d\n", i) > ARGV[1]
}
close(ARGV[1])
ARGC = 2
FS = ""
result[0] = "even"
result[1] = "odd"
}
{
printf("%s: %s\n", $0, result[($1+$2+$3) % 2])
}
Processing a file line by line, and doing math, is a perfect task for awk.
pure bash:
set -e
printf '%03d\n' {1..999} > "${1:?no path provided}"
result=(even odd)
mapfile -t num_list < "$1"
for i in "${num_list[#]}"; do
echo $i: ${result[(${i:0:1} + ${i:1:1} + ${i:2:1}) % 2]}
done
A similar method can be applied in bash, but it's slower.
comparison:
bash is about 10x slower.
$ cd ./tmp.Kb5ug7tQTi
$ bash -c 'time awk -f ../solution.awk numlist-awk > result-awk'
real 0m0.108s
user 0m0.102s
sys 0m0.000s
$ bash -c 'time bash ../solution.bash numlist-bash > result-bash'
real 0m0.931s
user 0m0.929s
sys 0m0.000s
$ diff --report-identical result*
Files result-awk and result-bash are identical
$ diff --report-identical numlist*
Files numlist-awk and numlist-bash are identical
$ head -n 5 *
==> numlist-awk <==
001
002
003
004
005
==> numlist-bash <==
001
002
003
004
005
==> result-awk <==
001: odd
002: even
003: odd
004: even
005: odd
==> result-bash <==
001: odd
002: even
003: odd
004: even
005: odd
read is a bottleneck in a while IFS= read -r line loop. More info in this answer.
mapfile (combined with for loop) can be slightly faster, but still slow (it also copies all the data to an array first).
Both solutions create a number list in a new file (which was in the question), and print the odd/even results to stdout. The path for the file is given as a single argument.
In awk, you can set the field separator to empty (FS="") to process individual characters.
In bash it can be done with substring expansion (${var:index:length}).
Modulo 2 (number % 2) to get odd or even.

dd: reading binary file as blocks of size N returned less data than N

i need to process large binary files in segments. in concept this would be similar to split, but instead of writing each segment to a file, i need to take that segment and send it as the input of another process. i thought i could use dd to read/write the file in chunks, but the results aren't at all what i expected. for example, if i try :
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
... the output sizes are actually 131,072 bytes and not 1,048,576.
could anyone tell me why i'm not seeing output blocked to 1M chunks and how i could better accomplish what i'm trying to do ?
thanks.
According to dd's manual:
bs=bytes
[...] if no data-transforming conv option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
So try with dd iflag=fullblock:
fullblock
Accumulate full blocks from input. The read system call may
return early if a full block is not available. When that
happens, continue calling read to fill the remainder of the
block. This flag can be used only with iflag. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a count=
argument is interpreted as a block count rather than a count
of read operations.
First of all, you don't need the first dd. A cat file | while or done < file would do the trick as well.
dd bs=1M count=1 might return less than 1M, see
When is dd suitable for copying data? (or, when are read() and write() partial)
Instead of dd count=… use head with the (non-posix) option -c ….
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
Or posix conform but very inefficient
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \\n | tr ' ' '\\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done

How to calculate crc32 checksum from a string on linux bash

I used crc32 to calculate checksums from strings a long time ago, but I cannot remember how I did it.
echo -n "LongString" | crc32 # no output
I found a solution [1] to calculate them with Python, but is there not a direct way to calculate that from a string?
# signed
python -c 'import binascii; print binascii.crc32("LongString")'
python -c 'import zlib; print zlib.crc32("LongString")'
# unsigned
python -c 'import binascii; print binascii.crc32("LongString") % (1<<32)'
python -c 'import zlib; print zlib.crc32("LongString") % (1<<32)'
[1] How to calculate CRC32 with Python to match online results?
I came up against this problem myself and I didn't want to go to the "hassle" of installing crc32. I came up with this, and although it's a little nasty it should work on most platforms, or most modern linux anyway ...
echo -n "LongString" | gzip -1 -c | tail -c8 | hexdump -n4 -e '"%u"'
Just to provide some technical details, gzip uses crc32 in the last 8 bytes and the -c option causes it to output to standard output and tail strips out the last 8 bytes. (-1 as suggested by #MarkAdler so we don't waste time actually doing the compression).
hexdump was a little trickier and I had to futz about with it for a while before I came up with something satisfactory, but the format here seems to correctly parse the gzip crc32 as a single 32-bit number:
-n4 takes only the relevant first 4 bytes of the gzip footer.
'"%u"' is your standard fprintf format string that formats the bytes as a single unsigned 32-bit integer. Notice that there are double quotes nested within single quotes here.
If you want a hexadecimal checksum you can change the format string to '"%08x"' (or '"%08X"' for upper case hex) which will format the checksum as 8 character (0 padded) hexadecimal.
Like I say, not the most elegant solution, and perhaps not an approach you'd want to use in a performance-sensitive scenario but an approach that might appeal given the near universality of the commands used.
The weak point here for cross-platform usability is probably the hexdump configuration, since I have seen variations on it from platform to platform and it's a bit fiddly. I'd suggest if you're using this you should try some test values and compare with the results of an online tool.
EDIT As suggested by #PedroGimeno in the comments, you can pipe the output into od instead of hexdump for identical results without the fiddly options. ... | od -t x4 -N 4 -A n for hex ... | od -t d4 -N 4 -A n for decimal.
Or just use the process substitution:
crc32 <(echo "LongString")
Your question already has most of the answer.
echo -n 123456789 | python -c 'import sys;import zlib;print(zlib.crc32(sys.stdin.read())%(1<<32))'
correctly gives 3421780262
I prefer hex:
echo -n 123456789 | python -c 'import sys;import zlib;print("%08x"%(zlib.crc32(sys.stdin.read())%(1<<32)))'
cbf43926
Be aware that there are several CRC-32 algorithms:
http://reveng.sourceforge.net/crc-catalogue/all.htm#crc.cat-bits.32
On Ubuntu, at least, /usr/bin/crc32 is a short Perl script, and you can see quite clearly from its source that all it can do is open files. It has no facility to read from stdin -- it doesn't have special handling for - as a filename, or a -c parameter or anything like that.
So your easiest approach is to live with it, and make a temporary file.
tmpfile=$(mktemp)
echo -n "LongString" > "$tmpfile"
crc32 "$tmpfile"
rm -f "$tmpfile"
If you really don't want to write a file (e.g. it's more data than your filesystem can take -- unlikely if it's really a "long string", but for the sake for argument...) you could use a named pipe. To a simple non-random-access reader this is indistinguishable from a file:
fifo=$(mktemp -u)
mkfifo "$fifo"
echo -n "LongString" > "$fifo" &
crc32 "$fifo"
rm -f "$fifo"
Note the & to background the process which writes to fifo, because it will block until the next command reads it.
To be more fastidious about temporary file creation, see: https://unix.stackexchange.com/questions/181937/how-create-a-temporary-file-in-shell-script
Alternatively, use what's in the script as an example from which to write your own Perl one-liner (the presence of crc32 on your system indicates that Perl and the necessary module are installed), or use the Python one-liner you've already found.
Here is a pure Bash implementation:
#!/usr/bin/env bash
declare -i -a CRC32_LOOKUP_TABLE
__generate_crc_lookup_table() {
local -i -r LSB_CRC32_POLY=0xEDB88320 # The CRC32 polynomal LSB order
local -i index byte lsb
for index in {0..255}; do
((byte = 255 - index))
for _ in {0..7}; do # 8-bit lsb shift
((lsb = byte & 0x01, byte = ((byte >> 1) & 0x7FFFFFFF) ^ (lsb == 0 ? LSB_CRC32_POLY : 0)))
done
((CRC32_LOOKUP_TABLE[index] = byte))
done
}
__generate_crc_lookup_table
typeset -r CRC32_LOOKUP_TABLE
crc32_string() {
[[ ${#} -eq 1 ]] || return
local -i i byte crc=0xFFFFFFFF index
for ((i = 0; i < ${#1}; i++)); do
byte=$(printf '%d' "'${1:i:1}") # Get byte value of character at i
((index = (crc ^ byte) & 0xFF, crc = (CRC32_LOOKUP_TABLE[index] ^ (crc >> 8)) & 0xFFFFFFFF))
done
echo $((crc ^ 0xFFFFFFFF))
}
printf 'The CRC32 of: %s\nis: %08x\n' "${1}" "$(crc32_string "${1}")"
# crc32_string "The quick brown fox jumps over the lazy dog"
# yields 414fa339
Testing:
bash ./crc32.sh "The quick brown fox jumps over the lazy dog"
The CRC32 of: The quick brown fox jumps over the lazy dog
is: 414fa339
I use cksum and convert to hex using the shell builtin printf:
$ echo -n "LongString" | cksum | cut -d\ -f1 | xargs echo printf '%0X\\n' | sh
5751BDB2
The cksum command first appeared on 4.4BSD UNIX and should be present in all modern systems.
You can try to use rhash.
http://rhash.sourceforge.net/
https://github.com/rhash/RHash
http://manpages.ubuntu.com/manpages/bionic/man1/rhash.1.html
Testing:
## install 'rhash'...
$ sudo apt-get install rhash
## test CRC32...
$ echo -n 123456789 | rhash --simple -
cbf43926 (stdin)

Length of string in bash

How do you get the length of a string stored in a variable and assign that to another variable?
myvar="some string"
echo ${#myvar}
# 11
How do you set another variable to the output 11?
To get the length of a string stored in a variable, say:
myvar="some string"
size=${#myvar}
To confirm it was properly saved, echo it:
$ echo "$size"
11
Edit 2023-02-13: Use of printf %n instead of locales...
UTF-8 string length
In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
will render:
Généralités is 11 char len, but 14 bytes len.
you could even have a look at stored chars:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"
will answer:
Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').
Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.
Same, but without having to play with locales
I recently learn %n format of printf command (builtin):
myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C.)
Length of an argument, working sample
Argument work same as regular variables
showStrLen() {
local -i chrlen=${#1} bytlen
printf -v _ %s%n "$1" bytlen
LANG=$oLang LC_ALL=$oLcAll
printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}
will work as
showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'
Useful printf correction tool:
If you:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
printf " - %-14s is %2d char length\n" "'$string'" ${#string}
done
- 'Généralités' is 11 char length
- 'Language' is 8 char length
- 'Théorème' is 8 char length
- 'Février' is 7 char length
- 'Left: ←' is 7 char length
- 'Yin Yang ☯' is 10 char length
Not really pretty output!
For this, here is a little function:
strU8DiffLen() {
local -i bytlen
printf -v _ %s%n "$1" bytlen
return $(( bytlen - ${#1} ))
}
or written in one line:
strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}
Then now:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strU8DiffLen "$string"
printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
"'$string'" ${#string} $((${#string}+$?))
done
- 'Généralités' is 11 chars length, but uses 14 bytes
- 'Language' is 8 chars length, but uses 8 bytes
- 'Théorème' is 8 chars length, but uses 10 bytes
- 'Février' is 7 chars length, but uses 8 bytes
- 'Left: ←' is 7 chars length, but uses 9 bytes
- 'Yin Yang ☯' is 10 chars length, but uses 12 bytes
Unfortunely, this is not perfect!
But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...
Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.
I wanted the simplest case, finally this is a result:
echo -n 'Tell me the length of this sentence.' | wc -m;
36
You can use:
MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)
wc -c or wc --bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.
wc -m or wc --chars for character counts = Unicode characters are counted single until they use more bytes.
In response to the post starting:
If you want to use this with command line or function arguments...
with the code:
size=${#1}
There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:
if [ -z "$1" ]; then
#zero length argument
else
#non-zero length
fi
See GNU and wooledge for a more complete list of Bash conditional expressions.
If you want to use this with command line or function arguments, make sure you use size=${#1} instead of size=${#$1}. The second one may be more instinctual but is incorrect syntax.
Using your example provided
#KISS (Keep it simple stupid)
size=${#myvar}
echo $size
Here is couple of ways to calculate length of variable :
echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'
and to set the result in another variable just assign above command with back quote into another variable as following:
otherVar=`echo -n $VAR | wc -m`
echo $otherVar
http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html
I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var} combination, but it fails with unicode: most text I process with the bash is in Cyrillic...
Based on #atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:
$ cat draw_border.sh
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 # 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
And what this sample produces:
$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
First example (in French?) was taken from someone's example above.
Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.
I used echo $BPAR|wc -m instead of printf ... in order to not rely on if the printf is buillt-in or not.
Above I saw talks about trailing newline and -n parameter for echo. I did not used it, thus I add only one to the $BPLEN. Should I use -n, I must add 2.
To explain the difference between wc -m and wc -c, see the same script with only one minor change: -m was replaced with -c
$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message.
Hope, it will save some one some time :-)
p.s. Russian text says "here is one more"
p.p.s. Working "two-liner"
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 # 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command
Maybe just use wc -c to count the number of characters:
myvar="Hello, I am a string."
echo -n $myvar | wc -c
Result:
21
Length of string in bash
str="Welcome to Stackoveflow"
length=`expr length "$str"`
echo "Length of '$str' is $length"
OUTPUT
Length of 'Welcome to Stackoveflow' is 23

How to use "cmp" to compare two binaries and find all the byte offsets where they differ?

I would love some help with a Bash script loop that will show all the differences between two binary files, using just
cmp file1 file2
It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks
I think cmp -l file1 file2 might do what you want. From the manpage:
-l --verbose
Output byte numbers and values of all differing bytes.
The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:
4531 66 63
4532 63 65
4533 64 67
4580 72 40
4581 40 55
[...]
So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.
Method that works for single byte addition/deletion
diff <(od -An -tx1 -w1 -v file1) \
<(od -An -tx1 -w1 -v file2)
Generate a test case with a single removal of byte 64:
for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2
Output:
64d63
< 40
If you also want to see the ASCII version of the character:
bdiff() (
f() (
od -An -tx1c -w1 -v "$1" | paste -d '' - -
)
diff <(f "$1") <(f "$2")
)
bdiff file1 file2
Output:
64d63
< 40 #
Tested on Ubuntu 16.04.
I prefer od over xxd because:
it is POSIX, xxd is not (comes with Vim)
has the -An to remove the address column without awk.
Command explanation:
-An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
-w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
-tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
-v prevents asterisk repetition abbreviation * which might interfere with the diff
paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?
See also:
https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
https://unix.stackexchange.com/questions/59849/diff-binary-files-of-different-sizes
The more efficient workaround I've found is to translate binary files to some form of text using od.
Then any flavour of diff works fine.

Resources