Replace range of numbers with certain number - bash

I need to replace a range of numbers with a certain number. I really tried it hard to code it myself with sed (like sed "s/[33-64]/64/") or awk, but always get wrong results. It tends to replace single digits instead of numbers... What I need would be: Replacing 0-32 -> 32, 33-64 -> 64, 65-128 -> 128, 129-255 -> 255. In between these numbers are IPs, which should stay untouched. I think this command is selecting all, but IPs:
sed '/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/! ... '
So I have a file like this:
65.74.16.161
232
10.128.8.72
63
10.128.14.13
100
10.128.8.58
32
10.128.4.129
60
10.128.240.18
59
and it should look like this:
65.74.16.161
255
10.128.8.72
64
10.128.14.13
128
10.128.8.58
32
10.128.4.129
64
10.128.240.18
64

The [33-64] defines a character class and is a funny way of writing [3-6] and does indeed only match a single character — any single digit from 3, 4, 5 or 6. If you really want to do it with sed, and you're concerned with values from 33 to 64, then you have to write it out differently — and much more verbosely.
In part it depends on which version of sed you have. A solution that will work with classic sed is:
sed -e 's/^[0-9]$/32/' \
-e 's/^[12][0-9]$/32/' \
-e 's/^3[012]$/32/' \
-e 's/^3[3-9]$/64/' \
-e 's/^[45][0-9]$/64/' \
-e 's/^6[0-4]$/64/' \
-e 's/^6[5-9]$/128/' \
-e 's/^[7-9][0-9]$/128/' \
-e 's/^1[01][0-9]$/128/' \
-e 's/^12[0-8]$/128/' \
-e 's/^129$/255/' \
-e 's/^1[3-9][0-9]$/255/' \
-e 's/^2[0-4][0-9]$/255/' \
-e 's/^25[0-5]$/255/'
But, as you can see, it is quite painful. If you have GNU sed, you can use the -r option to enable extended regular expressions; if you have Mac OS X or BSD sed, you can use the -E option to enable extended regular expressions. Then you can reduce the code above to:
sed -E \
-e 's/^([0-9]|[12][0-9]|3[012])$/32/' \
-e 's/^(3[3-9]|[45][0-9]|6[0-4])$/64/' \
-e 's/^(6[5-9]|[7-9][0-9]|1[01][0-9]|12[0-8])$/128/' \
-e 's/^(129|1[3-9][0-9]|2[0-4][0-9]|25[0-5])$/255/'
However, you might do better using awk:
awk '/^[0-9][0-9]*$/ { if ($1 <= 32) print 32
else if ($1 <= 64) print 64
else if ($1 <= 128) print 128
else if ($1 <= 255) print 255
else print $1
next
}
{ print }'
The final else clause accurately prints any unexpected values, such as 256 or 999 or, indeed, 123456789. There are those who would write 1 in place of { print } — the part of the awk script that matches and prints the IP addresses.

You can use this awk with some arithmetic:
awk '$1 == ($1+0) && $1<=255{$1 = ($1>128)?255:($1>64?128:32 * int(($1+31)/32))} 1' file
65.74.16.161
255
10.128.8.72
64
10.128.14.13
128
10.128.8.58
32
10.128.4.129
64
10.128.240.18
64
$1 == ($1+0) is a check to determine $1 is an integer.

Using awk:
awk -vFS=. ' NF == 1 { v=2^int((log($1)/log(2))+0.5); $1 = v>255?255:v; }1' input
Gives:
65.74.16.161
255
10.128.8.72
64
10.128.14.13
128
10.128.8.58
32
10.128.4.129
64
10.128.240.18
64

Related

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?
Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>
For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).
Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed
don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>
To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

How to generate a NUL-delimited stream of timestamped filenames with BSD `stat` command

Let's suppose that you need to generate a NUL-delimited stream of timestamped filenames.
On Linux & Solaris I can do it with:
stat --printf '%.9Y %n\0' -- *
On BSD, I can get the same info, but delimited by newlines, with:
stat -f '%.9Fm %N' -- *
The man talks about a few escape sequences but the NUL byte doesn't seem supported:
If the % is immediately followed by one of n, t, %, or #, then a newline character, a tab character, a percent character, or the current file number is printed.
Is there a way to work around that? edit: (accurately and efficiently?)
Update:
Sorry, the glob * is misleading. The arguments can contain any path.
I have a working solution that forks a stat call for each path. I want to improve it because of the massive number of files to process.
You may try this work-around solution if running stat command for files:
stat -nf "%.9Fm %N/" * | tr / '\0'
Here:
-n: To suppress newlines in stat output
Added / as terminator for each entry from stat output
tr / '\0': To convert / into NUL byte
Another work-around is to use a control character in stat and use tr to replace it with \0 like this:
stat -nf "%.9Fm %N"$'\1' * | tr '\1' '\0'
This will work with directories also.
Unfortunately, stat out of the box does not offer this option, and so what you ask is not directly achievable.
However, you can easily implement the required functionality in a scripting language like Perl or Python.
#!/usr/bin/env python3
from pathlib import Path
from sys import argv
for arg in argv[1:]:
print(
Path(arg).stat().st_mtime,
arg, end="\0")
Demo: https://ideone.com/vXiSPY
The demo exhibits a small discrepancy in the mtime which does not seem to be a rounding error, but the result could be different on MacOS (the demo platform is Debian Linux, apparently). If you want to force the result to a particular number of decimal places, Python has formatting facilities similar to those of stat and printf.
With any command that can't produce NUL-terminated (or any other character/string terminated) output, you can just wrap it in a function to call the command and then printf it's output with a terminating NUL instead of newline, for example:
nulstat() {
local fmt=$1 file
shift
for file in "$#"; do
printf '%s\0' "$(stat -f "$fmt" "$file")"
done
}
nulstat '%.9Fm %N' *
For example:
$ > foo
$ > $'foo\nbar'
$ nulstat '%.9Fm %N' foo* | od -c
0000000 1 6 6 3 1 6 2 5 3 6 . 4 7 7 9 8
0000020 0 1 4 0 f o o \0 1 6 6 3 1 6 2
0000040 5 3 9 . 3 8 8 0 6 9 9 3 0 f o
0000060 o \n b a r \0
0000066
1. What you can do (accurate but slow):
Fork a stat command for each input path:
for p in "$#"
do
stat -nf '%.9Fm' -- "$p" &&
printf '\t%s\0' "$p"
done
2. What you can do (accurate but twisted):
In the input paths, replace each occurrence of (possibly overlapping) /././ with a single /./, make stat output /././\n at the end of each record, and use awk to substitute each /././\n by a NUL byte:
#!/bin/bash
shopt -s extglob
stat -nf '%.9Fm%t%N/././%n' -- "${#//\/.\/+(.\/)//./}" |
awk -F '/\\./\\./' '{
if ( NF == 2 ) {
printf "%s%c", record $1, 0
record = ""
} else
record = record $1 "\n"
}'
N.B. If you wonder why I chose /././\n as record separator then take a look at Is it "safe" to replace each occurrence of (possibly overlapped) /./ with / in a path?
3. What you should do (accurate & fast):
You can use the following perl one‑liner on almost every UNIX/Linux:
LANG=C perl -MTime::HiRes=stat -e '
foreach (#ARGV) {
my #st = stat($_);
if ( #st > 0 ) {
printf "%.9f\t%s\0", $st[9], $_;
} else {
printf STDERR "stat: %s: %s\n", $_, $!;
}
}
' -- "$#"
note: for perl < 5.8.9, remove the -MTime::HiRes=stat from the command line.
ASIDE: There's a bug in BSD's stat:
When %N is at the end of the format string and the filename ends with a newline character, then its trailing newline might get stripped:
For example:
stat -f '%N' -- $'file1\n' file2
file1
file2
For getting the output that one would expect from stat -f '%N' you can use the -n switch and add an explicit %n at the end of the format string:
stat -nf '%N%n' -- $'file1\n' file2
file1
file2
Is there a way to work around that?
If all you need is to just replace all newlines with NULLs, then following tr should suffice
stat -f '%.9Fm %N' * | tr '\n' '\000'
Explanation: 000 is NULL expressed as octal value.

How can a "grep | sed | awk" script merging line pairs be more cleanly implemented?

I have a little script to extract specific data and cleanup the output a little. It seems overly messy and i'm wondering if the script can be trimmed down a bit.
The input file contains of pairs of lines -- names, followed by numbers.
Line pairs where the numeric value is not between 80 and 199 should be discarded.
Pairs may sometimes, but will not always, be preceded or followed by blank lines, which should be ignored.
Example input file:
al12t5682-heapmemusage-latest.log
38
al12t5683-heapmemusage-latest.log
88
al12t5684-heapmemusage-latest.log
100
al12t5685-heapmemusage-latest.log
0
al12t5686-heapmemusage-latest.log
91
Example/wanted output:
al12t5683 88
al12t5684 100
al12t5686 91
Current script:
grep --no-group-separator -PxB1 '([8,9][0-9]|[1][0-9][0-9])' inputfile.txt \
| sed 's/-heapmemusage-latest.log//' \
| awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Extra input example
al14672-heapmemusage-latest.log
38
al14671-heapmemusage-latest.log
5
g4t5534-heapmemusage-latest.log
100
al1t0000-heapmemusage-latest.log
0
al1t5535-heapmemusage-latest.log
al1t4676-heapmemusage-latest.log
127
al1t4674-heapmemusage-latest.log
53
A1t5540-heapmemusage-latest.log
54
G4t9981-heapmemusage-latest.log
45
al1c4678-heapmemusage-latest.log
81
B4t8830-heapmemusage-latest.log
76
a1t0091-heapmemusage-latest.log
88
al1t4684-heapmemusage-latest.log
91
Extra Example expected output:
g4t5534 100
al1t4676 127
al1c4678 81
a1t0091 88
al1t4684 91
another awk
$ awk -F- 'NR%2{p=$1; next} 80<=$1 && $1<=199 {print p,$1}' file
al12t5683 88
al12t5684 100
al12t5686 91
UPDATE
for the empty line record delimiter
$ awk -v RS= '80<=$2 && $2<=199{sub(/-.*/,"",$1); print}' file
al12t5683 88
al12t5684 100
al12t5686 91
Consider implementing this in native bash, as in the following (which can be seen running with your sample input -- including sporadically-present blank lines -- at http://ideone.com/Qtfmrr):
#!/bin/bash
name=; number=
while IFS= read -r line; do
[[ $line ]] || continue # skip blank lines
[[ -z $name ]] && { name=$line; continue; } # first non-blank line becomes name
number=$line # second one becomes number
if (( number >= 80 && number < 200 )); then
name=${name%%-*} # prune everything after first "-"
printf '%s %s\n' "$name" "$number" # emit our output
fi
name=; number= # clear the variables
done <inputfile.txt
The above uses no external commands whatsoever -- so whereas it might be slower to run over large input than a well-implemented awk or perl script, it also has far shorter startup time since no interpreter other than the already-running shell is required.
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?, describing the while read idiom.
BashFAQ #100 - How do I do string manipulations in bash?; or The Bash-Hackers' Wiki on parameter expansion, describing how name=${name%%-*} works.
The Bash-Hackers' Wiki on arithmetic expressions, describing the (( ... )) syntax used for numeric comparisons.
perl -nle's/-.*//; $n=<>; print "$_ $n" if 80<=$n && $n<=199' inputfile.txt
With gnu sed
sed -E '
N
/\n[8-9][0-9]$/bA
/\n1[0-9]{2}$/!d
:A
s/([^-]*).*\n([0-9]+$)/\1 \2/
' infile

Prevent bc from auto truncating leading zeros when converting from hex to binary

I'm trying to convert a hex string to binary. I'm using:
echo "ibase=16; obase=2; $line" | BC_LINE_LENGTH=9999 bc
It is truncating the leading zeroes. That is, if the hex string is 4F, it is converted to 1001111 and if it is 0F, it is converted to 1111. I need it to be 01001111 and 00001111
What can I do?
The output from bc is correct; it simply isn't what you had in mind (but it is what the designers of bc had in mind). If you converted hex 4F to decimal, you would not expect to get 079 out of it, would you? Why should you get leading zeroes if the output base is binary? Short answer: you shouldn't, so bc doesn't emit them.
If you must make the binary output a multiple of 8 bits, you can add an appropriate number of leading zeroes using some other tool, such as awk:
awk '{ len = (8 - length % 8) % 8; printf "%.*s%s\n", len, "00000000", $0}'
Pure Bash solution (beside bc):
paddy()
{
how_many_bits=$1
read number
zeros=$(( $how_many_bits - ${#number} ))
for ((i=0;i<$zeros;i++)); do
echo -en 0
done && echo $number
}
Usage:
>bc <<< "obase=2;ibase=16; 20" | paddy 8
00100000
You can pipe to awk like this:
echo "ibase=16; obase=2; $line" | BC_LINE_LENGTH=9999 bc | awk '{ printf "%08d\n", $0 }'
You can do it in python:
line=4F
python -c "print ''.join([bin(int(i, 16))[2:].zfill(4) for i in '$line'])"
result:
'01001111'
What's frustrating is the bc expects the input to be zero padded but doesn't provide a similar output option. Here's another alternative using sed:
sed 's_0_0000_g; s_1_0001_g; s_2_0010_g; s_3_0011_g;
s_4_0100_g; s_5_0101_g; s_6_0110_g; s_7_0111_g;
s_8_1000_g; s_9_1001_g; s_[aA]_1010_g; s_[bB]_1011_g;
s_[cC]_1100_g; s_[dD]_1101_g; s_[eE]_1110_g; s_[fF]_1111_g;'
You can use seq and sed to help you pād:
function paddington(){
PADDLE=8; while read IN; do
seq -f '0' -s '' 1 $PADDLE | \
sed "s/0\{${#IN}\}\$/$IN/"
done
}
bc <<< "ibase=16; obase=2; 4F; 1E; 0F" | paddington
The output:
01001111
00011110
00001111
You can use printf to left-pad the result with zeros if the result's length is not a multiple of four:
hex_nr=2C8B; hex_len=${#hex_nr}; binary_nr=$(bc <<< "obase=2;ibase=16;$hex_nr"); \
bin_length=$(( hex_len * 4 )); printf "%0${bin_length}d\n" $binary_nr
will result in
0010110010001011
instead of bc's output of
10110010001011

How to use "cmp" to compare two binaries and find all the byte offsets where they differ?

I would love some help with a Bash script loop that will show all the differences between two binary files, using just
cmp file1 file2
It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks
I think cmp -l file1 file2 might do what you want. From the manpage:
-l --verbose
Output byte numbers and values of all differing bytes.
The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:
4531 66 63
4532 63 65
4533 64 67
4580 72 40
4581 40 55
[...]
So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.
Method that works for single byte addition/deletion
diff <(od -An -tx1 -w1 -v file1) \
<(od -An -tx1 -w1 -v file2)
Generate a test case with a single removal of byte 64:
for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2
Output:
64d63
< 40
If you also want to see the ASCII version of the character:
bdiff() (
f() (
od -An -tx1c -w1 -v "$1" | paste -d '' - -
)
diff <(f "$1") <(f "$2")
)
bdiff file1 file2
Output:
64d63
< 40 #
Tested on Ubuntu 16.04.
I prefer od over xxd because:
it is POSIX, xxd is not (comes with Vim)
has the -An to remove the address column without awk.
Command explanation:
-An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
-w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
-tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
-v prevents asterisk repetition abbreviation * which might interfere with the diff
paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?
See also:
https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
https://unix.stackexchange.com/questions/59849/diff-binary-files-of-different-sizes
The more efficient workaround I've found is to translate binary files to some form of text using od.
Then any flavour of diff works fine.

Resources