How to increment the last number in a string; bash - bash

I have a string that looks something like
/foo/bar/baz59_ 5stuff.thing
I would like to increase the last number (5 in the example) by one, if it's greater than another variable. A 9 would be increased to 10. Note that the last number could be multiple digits also; also that "stuff.thing" could be anything; other than a number; so it can't be hard coded.
The above example would result in /foo/bar/baz59_ 6stuff.thing
I've found multiple questions (and answers) that would extract the last number from the string, and obviously that could then be used in a comparison.
The issue I'm having is how to ensure that when I do the replace, I only replace the last number (since obviously I can't just replace "5" for "6"). Can anyone make any suggestions?
awk/sed/bash/grep are all viable.

Updated Answer
Thanks to #EdMorton for pointing out the further requirement of the number exceeding a threshold. That can be done like this:
perl -spe 's/(\d+)(?!.*\d+)/$1>$thresh? $1+1 : $1/e' <<< "abc123_456.txt" -- -thresh=500
Original Answer
You can evaluate/calculate a replacement with /e in Perl regexes. Here I just add 1 to the captured string of digits but you can do more complicated stuff:
perl -pe 's/(\d+)(?!.*\d+)/$1+1/e' <<< "abc123_456.txt"
abc123_457.txt
The (?!.*\d+) is (hopefully) a negative look-ahead for any more digits.
The $1 represents any sequence of digits captured in the capture group (\d+).
Note that this would need modification to handle decimal numbers, negative numbers and scientific notation - but that is possible.

Using bash regular expression matching:
$ f="/foo/bar/baz59_ 99stuff.thing"
$ [[ $f =~ ([0-9]+)([^0-9]+)$ ]]
OK, what do we have now?
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="99stuff.thing" [1]="99" [2]="stuff.thing")
So we can construct the new filename
if [[ $f =~ ([0-9]+)([^0-9]+)$ ]]; then
prefix=${f%${BASH_REMATCH[0]}} # remove "99stuff.thing" from $f
number=$(( 10#${BASH_REMATCH[1]} + 1 )) # use "10#" to force base10
new=${prefix}${number}${BASH_REMATCH[2]}
echo $new
fi
# => /foo/bar/baz59_ 100stuff.thing

With GNU awk for the 3rd arg to match():
$ awk -v t=3 'match($0,/(.*)([0-9]+)([^0-9]*)$/,a) && a[2]>t{a[2]++; $0=a[1] a[2] a[3]} 1' file
/foo/bar/baz59_ 6stuff.thing
Just set t to whatever your threshold value is for incrementing, e.g.:
$ awk -v t=7 'match($0,/(.*)([0-9]+)([^0-9]*)$/,a) && a[2]>t{a[2]++; $0=a[1] a[2] a[3]} 1' file
/foo/bar/baz59_ 5stuff.thing

if it's greater than a script argument.
If I get it correctly(I am assuming you are passing an argument through a script and if its value is greater than string's 2nd field digit then increase 1 into that 2nd field's digit), could you please try following once.
cat script.ksh
value=$1
echo "/foo/bar/baz59_ 5stuff.thing" |
awk -v arg="$value" '
match($2,/[0-9]+/){
val=substr($2,RSTART,RLENGTH)
val=val<arg?val+1:val
$2=val substr($2,RSTART+RLENGTH)
}
1'
Here is an example when I run script.ksh it gives following output.
/script.ksh 7
/foo/bar/baz59_ 6stuff.thing

Here is a shorter gnu awk approach:
cat incr.awk
{
n = split($0, a, /[0-9]+/, b)
for(i=1; i<n; i++)
s = s a[i] b[i] + (b[i] < max && i == n-1 ? 1 : 0)
print s a[i]
}
Then use it as:
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 5stuff.thing'
/foo/bar/baz59_ 6stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 79stuff.thing'
/foo/bar/baz59_ 80stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 90stuff.thing'
/foo/bar/baz59_ 90stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 80stuff.thing'
/foo/bar/baz59_ 80stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/stuff.thing'
/foo/bar/stuff.thing

An awk:
$ echo /foo/bar/baz59_ 99stuff.thing |
awk '
/[0-9]/ {
rstart=1 # keep track of the start
while(match(substr($0,rstart),/[0-9]+/)) { # while numbers last
rstart+=RSTART+RLENGTH-1 # increase rstart
rlength=RLENGTH # remember length too
}
v=substr($0,rstart-rlength,rlength)+1 # increase last number
print substr($0,1,rstart-rlength-1) v substr($0,rstart) # print in parts
next
}1' # in case there was no number
/foo/bar/baz59_ 100stuff.thing
Edit:
Whoops, I missed the argument requirement (increase the last number - - by a one, if it's greater than a script argument):
$ echo /foo/bar/baz59_ 99stuff.thing |
awk -v arg=100 '
/[0-9]/ {
rstart=1
while(match(substr($0,rstart),/[0-9]+/)) {
rstart+=RSTART+RLENGTH-1
rlength=RLENGTH
}
v=substr($0,rstart-rlength,rlength)
if(0+v>arg) { # test if v greater that argument
print substr($0,1,rstart-rlength-1) v+1 substr($0,rstart)
next
}
}1'
Output now:
/foo/bar/baz59_ 99stuff.thing

if the 'testing' number is in 'bound' variable:
perl -pe 'BEGIN{$bound=6} s{(\d+)_(\d+)(?!.*\d+)}{ $i=$2+1;($i>$bound? $1+1:$1)."_".$i}e' <<<"/foo/bar/baz59_5stuff.thing"
/foo/bar/baz59_6stuff.thing

Related

Computing the size of array in text file in bash

I have a text file that sometimes-not always- will have an array with a unique name like this
unique_array=(1,2,3,4,5,6)
I would like to find the size of the array-6 in the above example- when it exists and skip it or return -1 if it doesnt exist.
grepping the file will tell me if the array exists but not how to find its size.
The array can fill multiple lines like
unique_array=(1,2,3,
4,5,6,
7,8,9,10)
Some of the elements in the array can be negative as in
unique_array=(1,2,-3,
4,5,6,
7,8,-9,10)
awk -v RS=\) -F, '/unique_array=\(/ {print /[0-9]/?NF:0}' file.txt
-v RS=\) - delimit records by ) instead of newlines
-F, - delimit fields by , instead of whitespace
/unique_array=(/ - look for a record containing the unique identifier
/[0-9]?NF:0 - if record contains digit, number of fields (ie. commas+1), otherwise 0
There is a bad bug in the code above: commas preceding the array may be erroneously counted. A fix is to truncate the prefix:
awk -v RS=\) -F, 'sub(/.*unique_array=\(/,"") {print /[0-9]/?NF:0}' file.txt
Your specifications are woefully incomplete, but guessing a bit as to what you are actually looking for, try this at least as a starting point.
awk '/^unique_array=\(/ { in_array = 1; n = split(",", arr, $0); next }
in_array && /\)/ { sub(/\)./, ""); quit = 1 }
in_array { n += split(",", arr, $0);
if (quit) { print n; in_array = quit = n = 0 } }' file
We keep a state variable in_array which tells us whether we are currently in a region which contains the array. This gets set to 1 when we see the beginning of the array, and back to 0 when we see the closing parenthesis. At this point, we remove the closing parenthesis and everything after it, and set a second variable quit to trigger the finishing logic in the next condition. The last condition performs two tasks; it adds the items from this line to the count in n, and then checks if quit is true; if it is, we are at the end of the array, and print the number of elements.
This will simply print nothing if the array was not found. You could embellish the script to set a different exit code or print -1 if you like, but these details seem like unnecessary complications for a simple script.
I think what you probably want is this, using GNU awk for multi-char RS and RT and word boundaries:
$ awk -v RS='\\<unique_array=[(][^)]*)' 'RT{exit} END{print (RT ? gsub(/,/,"",RT)+1 : -1)}' file
With your shown samples please try following awk.
awk -v RS= '
{
while(match($0,/\<unique_array=[(][^)]*\)/)){
line=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]*\n[[:space:]]*|(^|\n)unique_array=\(|(\)$|\)\n)/,"",line)
print gsub(/,/,"&",line)+1
$0=substr($0,RSTART+RLENGTH)
}
}
' Input_file
Using sed and declare -a. The test file is like this:
$ cat f
saa
dfsaf
sdgdsag unique_array=(1,2,3,
4,5,6,
7,8,9,10) sdfgadfg
sdgs
sdgs
sfsaf(sdg)
Testing:
$ declare -a "$(sed -n '/unique_array=(/,/)/s/,/ /gp' f | \
sed 's/.*\(unique_array\)/\1/;s/).*/)/;
s/`.*`//g')"
$ echo ${unique_array[#]}
1 2 3 4 5 6 7 8 9 10
And then you can do whatever you want with ${unique_array[#]}
With GNU grep or similar that support -z and -o options:
grep -zo 'unique_array=([^)]*)' file.txt | tr -dc =, | wc -c
-z - (effectively) treat file as a single line
-o - only output the match
tr -dc =, - strip everything except = and ,
wc -c - count the result
Note: both one- and zero-element arrays will be treated as being size 1. Will return 0 rather than -1 if not found.
here's an awk solution that works with gawk, mawk 1/2, and nawk :
TEST INPUT
saa
dfsaf
sdgdsag unique_array=(1,2,3,
4,5,6,
7,8,9,10) sdfgadfg
sdgs
sdgs
sfsaf(sdg)
CODE
{m,n,g}awk '
BEGIN { __ = "-1:_ERR_NOT_FOUND_"
RS = "^$" (_ = OFS = "")
FS = "(^|[ \t-\r]?)unique[_]array[=][(]"
___ = "[)].*$|[^0-9,.+-]"
} $!NF = NR < NF ? $(gsub(___,_)*_) : __'
OUTPUT
1,2,3,4,5,6,7,8,9,10

Replace hex char by a different random one (awk possible?)

have a mac address and I need to replace only one hex char (one at a very specific position) by a different random one (it must be different than the original). I have it done in this way using xxd and it works:
#!/bin/bash
mac="00:00:00:00:00:00" #This is a PoC mac address obviously :)
different_mac_digit=$(xxd -p -u -l 100 < /dev/urandom | sed "s/${mac:10:1}//g" | head -c 1)
changed_mac=${mac::10}${different_mac_digit}${mac:11:6}
echo "${changed_mac}" #This echo stuff like 00:00:00:0F:00:00
The problem for my script is that using xxd means another dependency... I want to avoid it (not all Linux have it included by default). I have another workaround for this using hexdump command but using it I'm at the same stage... But my script already has a mandatory awk dependency, so... Can this be done using awk? I need an awk master here :) Thanks.
Something like this may work with seed value from $RANDOM:
mac="00:00:00:00:00:00"
awk -v seed=$RANDOM 'BEGIN{ FS=OFS=":"; srand(seed) } {
s="0"
while ((s = sprintf("%x", rand() * 16)) == substr($4, 2, 1))
$4 = substr($4, 1, 1) s
} 1' <<< "$mac"
00:00:00:03:00:00
Inside while loop we continue until hex digit is not equal to substr($4, 2, 1) which 2nd char of 4th column.
You don't need xxd or hexdump. urandom will also generate nubmers that match the encodings of the digits and letters used to represent hexadecimal numbers, therefore you can just use
old="${mac:10:1}"
different_mac_digit=$(tr -dc 0-9A-F < /dev/urandom | tr -d "$old" | head -c1)
Of course, you can replace your whole script with an awk script too. The following GNU awk script will replace the 11th symbol of each line with a random hexadecimal symbol different from the old one. With <<< macaddress we can feed macaddress to its stdin without having to use echo or something like that.
awk 'BEGIN { srand(); pos=11 } {
old=strtonum("0x" substr($0,pos,1))
new=(old + 1 + int(rand()*15)) % 16
print substr($0,1,pos-1) sprintf("%X",new) substr($0,pos+1)
}' <<< 00:00:00:00:00:00
The trick here is to add a random number between 1 and 15 (both inclusive) to the digit to be modified. If we end up with a number greater than 15 we wrap around using the modulo operator % (16 becomes 0, 17 becomes 1, and so on). That way the resulting digit is guaranteed to be different from the old one.
However, the same approach would be shorter if written completely in bash.
mac="00:00:00:00:00:00"
old="${mac:10:1}"
(( new=(16#"$old" + 1 + RANDOM % 15) % 16 ))
printf %s%X%s\\n "${mac::10}" "$new" "${mac:11}"
"One-liner" version:
mac=00:00:00:00:00:00
printf %s%X%s\\n "${mac::10}" "$(((16#${mac:10:1}+1+RANDOM%15)%16))" "${mac:11}"
bash has printf builtin and a random function (if you trust it):
different_mac_digit() {
new=$1
while [[ $new = $1 ]]; do
new=$( printf "%X" $(( RANDOM%16 )) )
done
echo $new
}
Invoke with the character to be replaced as argument.
Another awk:
$ awk -v n=11 -v s=$RANDOM ' # set n to char # you want to replace
BEGIN { FS=OFS="" }{ # each char is a field
srand(s)
while((r=sprintf("%x",rand()*16))==$n);
$n=r
}1' <<< $mac
Output:
00:00:00:07:00:00
or oneliner:
$ awk -v n=11 -v s=$RANDOM 'BEGIN{FS=OFS=""}{srand(s);while((r=sprintf("%x",rand()*16))==$n);$n=r}1' <<< $mac
$ mac="00:00:00:00:00:00"
$ awk -v m="$mac" -v p=11 'BEGIN{srand(); printf "%s%X%s\n", substr(m,1,p-1), int(rand()*15-1), substr(m,p+1)}'
00:00:00:01:00:00
$ awk -v m="$mac" -v p=11 'BEGIN{srand(); printf "%s%X%s\n", substr(m,1,p-1), int(rand()*15-1), substr(m,p+1)}'
00:00:00:0D:00:00
And to ensure you get a different digit than you started with:
$ awk -v mac="$mac" -v pos=11 'BEGIN {
srand()
new = old = toupper(substr(mac,pos,1))
while (new==old) {
new = sprintf("%X", int(rand()*15-1))
}
print substr(mac,1,pos-1) new substr(mac,pos+1)
}'
00:00:00:0D:00:00

Counting lines in a file matching specific string

Suppose I have more than 3000 files file.gz with many lines like below. The fields are separated by commas. I want to count only the line in which the 21st field has today's date (ex:20171101).
I tried this:
awk -F',' '{if { $21 ~ "TZ=GMT+30 date '+%d-%m-%y'" } { ++count; } END { print count; }}' file.txt
but it's not working.
Using awk, something like below
awk -F"," -v toSearch="$(date '+%Y%m%d')" '$21 ~ toSearch{count++}END{print count}' file
The date '+%Y%m%d' produces the date in the format as you requested, e.g. 20170111. Then matching that pattern on the 21st field and counting the occurrence and printing it in the END clause.
Am not sure the Solaris version of grep supports the -c flag for counting the number of pattern matches, if so you can do it as
grep -c "$(date '+%Y%m%d')" file
Another solution using gnu-grep
grep -Ec "([^,]*,){20}$(date '+%Y%m%d')" file
explanation: ([^,]*,){20} means 20 fields before the date to be checked
Using awk and process substitution to uncompress a bunch of gzs and feed them to awk for analyzing and counting:
$ awk -F\, 'substr($21,1,8)==strftime("%Y%m%d"){i++}; END{print i}' * <(zcat *gz)
Explained:
substr($21,1,8) == strftime("%Y%m%d") { # if the 8 first bytes of $21 match date
i++ # increment counter
}
END { # in the end
print i # output counter
}' * <(zcat *gz) # zcat all gzs to awk
If Perl is an option, this solution works on all 3000 gzipped files:
zcat *.gz | perl -F, -lane 'BEGIN{chomp($date=`date "+%Y%m%d"`); $count=0}; $count++ if $F[20] =~ /^$date/; END{print $count}'
These command-line options are used:
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-n loop around each line of the input file
-e execute the perl code
-F autosplit modifier, in this case splits on ,
BEGIN{} executes before the main loop.
The $date and $count variables are initialized.
The $date variable is set to the result of the shell command date "+%Y%m%d"
$F[20] is the 21st element in #F
If the 21st element starts with $date, increment $count
END{} executes after the main loop
Using grep and cut instead of awk and avoiding regular expressions:
cut -f21 -d, file | grep -Fc "$(date '+%Y%m%d')"

Subtracting numbers in a column

Inside a file I have a column with numbers with 10 elements. I want to subtract the 1st from the 3rd number, the 2nd from the 4th, the 3rd from the 5th, the 4th from the 6th, and so on till the 8th from the 10th.
For example:
10.3456
6.3452
11.2456
5.6666
10.5678
6.4568
14.7777
7.5434
16.5467
8.9999
and get a file with the subtraction
3rd-1st
4th-2nd
5th-3rd
6th-4th
7th-5th
8th-6th
9th-7th
10th-8th
quick and dirty:
$ awk '{a[NR]=0+$0}END{for(i=3;i<=NR;i++)print a[i]-a[i-2]}' file
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
Update: came up with another funny way:
$ awk 'NF>1{print $1-$2}' <(paste <(sed -n '3,$p' file) file)
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
update2, make the result CSV:
kent$ awk '{a[NR]=0+$0}END{for(i=3;i<=NR;i++)
printf "%s%s", a[i]-a[i-2],NR==i?RS:","}' file
0.9,-0.6786,-0.6778,0.7902,4.2099,1.0866,1.769,1.4565
#!/bin/bash
#Create an array
mapfile -t lines < inputFile
output=()
for index in "${!lines[#]}"; do
# Check if the index + 2 exist
if [[ ${lines[$(expr $index + 2)]} ]]; then
#It does exist, do the math
output+=("$(expr ${lines[$index]} + ${lines[$(expr $index + 2)]})")
fi
done
printf "%s\n" "${output[#]}" > output
perly dog
perl -ne '$a{$.}=$_;print $_-$a{$.-2}."\n" if $a{$.-2}' file
Makes an array
If a key of two lines before exists then print that line minus the value from array.
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
For in a row like asked for in Kents answer
perl -ne '$a{$.}=$_;print $_-$a{$.-2}.(eof()?"\n":",") if $a{$.-2}' file
0.9,-0.6786,-0.6778,0.7902,4.2099,1.0866,1.769,1.4565
With awk, I'd write
awk -v ORS="" '
{a=b; b=c; c=$0} # remember the last 3 lines
NR >= 3 {print sep c-a; sep=","} # print the difference
END {print "\n"} # optional, add a trailing newline.
' file
Or let paste do the gruntwork
awk '{a=b;b=c;c=$0} NR >= 3 {print c-a}' file | paste -sd,

How to efficiently sum two columns in a file with 270,000+ rows in bash

I have two columns in a file, and I want to automate summing both values per row
for example
read write
5 6
read write
10 2
read write
23 44
I want to then sum the "read" and "write" of each row. Eventually after summing, I'm finding the max sum and putting that max value in a file. I feel like I have to use grep -v to rid of the column headers per row, which like stated in the answers, makes the code inefficient since I'm grepping the entire file just to read a line.
I currently have this in a bash script (within a for loop where $x is the file name) to sum the columns line by line
lines=`grep -v READ $x|wc -l | awk '{print $1}'`
line_num=1
arr_num=0
while [ $line_num -le $lines ]
do
arr[$arr_num]=`grep -v READ $x | sed $line_num'q;d' | awk '{print $2 + $3}'`
echo $line_num
line_num=$[$line_num+1]
arr_num=$[$arr_num+1]
done
However, the file to be summed has 270,000+ rows. The script has been running for a few hours now, and it is nowhere near finished. Is there a more efficient way to write this so that it does not take so long?
Use awk instead and take advantage of modulus function:
awk '!(NR%2){print $1+$2}' infile
awk is probably faster, but the idiomatic bash way to do this is something like:
while read -a line; do # read each line one-by-one, into an array
# use arithmetic expansion to add col 1 and 2
echo "$(( ${line[0]} + ${line[1]} ))"
done < <(grep -v READ input.txt)
Note the file input file is only read once (by grep) and the number of externally forked programs is kept to a minimum (just grep, called only once for the whole input file). The rest of the commands are bash builtins.
Using the <( ) process substition, in case variables set in the while loop are required out of scope of the while loop. Otherwise a | pipe could be used.
Your question is pretty verbose, yet your goal is not clear. The way I read it, your numbers are on every second line, and you want only to find the maximum sum. Given that:
awk '
NR%2 == 1 {next}
NR == 2 {max = $1+$2; next}
$1+$2 > max {max = $1+$2}
END {print max}
' filename
You could also use a pipeline with tools that implicitly loop over the input like so:
grep -v read INFILE | tr -s ' ' + | bc | sort -rn | head -1 > OUTFILE
This assumes there are spaces between your read and write data values.
Why not run:
awk 'NR==1 { print "sum"; next } { print $1 + $2 }'
You can afford to run it on the file while the other script it still running. It'll be complete in a few seconds at most (prediction). When you're confident it's right, you can kill the other process.
You can use Perl or Python instead of awk if you prefer.
Your code is running grep, sed and awk on each line of the input file; that's damnably expensive. And it isn't even writing the data to a file; it is creating an array in Bash's memory that'll need to be printed to the output file later.
Assuming that it's always one 'header' row followed by one 'data' row:
awk '
BEGIN{ max = 0 }
{
if( NR%2 == 0 ){
sum = $1 + $2;
if( sum > max ) { max = sum }
}
}
END{ print max }' input.txt
Or simply trim out all lines that do not conform to what you want:
grep '^[0-9]\+\s\+[0-9]\+$' input.txt | awk '
BEGIN{ max = 0 }
{
sum = $1 + $2;
if( sum > max ) { max = sum }
}
END{ print max }' input.txt

Resources