How to replace all matches with an incrementing number in BASH? - bash

I have a text file like this:
AAAAAA this is some content.
This is AAAAAA some more content AAAAAA. AAAAAA
This is yet AAAAAA some more [AAAAAA] content.
I need to replace all occurrence of AAAAAA with an incremented number, e.g., the output would look like this:
1 this is some content.
This is 2 some more content 3. 4
This is yet 5 some more [6] content.
How can I replace all of the matches with an incrementing number?

Here is one way of doing it:
$ awk '{for(x=1;x<=NF;x++)if($x~/AAAAAA/){sub(/AAAAAA/,++i)}}1' file
1 this is some content.
This is 2 some more content 3. 4
This is yet 5 some more [6] content.

A perl solution:
perl -pe 'BEGIN{$A=1;} s/AAAAAA/$A++/ge' test.dat

This might work for you (GNU sed):
sed -r ':a;/AAAAAA/{x;:b;s/9(_*)$/_\1/;tb;s/^(_*)$/0\1/;s/$/:0123456789/;s/([^_])(_*):.*\1(.).*/\3\2/;s/_/0/g;x;G;s/AAAAAA(.*)\n(.*)/\2\1/;ta}' file
This is a toy example, perl or awk would be a better fit for a solution.
The solution only acts on lines which contain the required string (AAAAAA).
The hold buffer is used as a place to keep the incremented integer.
In overview: when a required string is encountered, the integer in the hold space is incremented, appended to the current line, swapped for the required string and the process is then repeated until all occurences of the string are accounted for.
Incrementing an integer simply swaps the last digit (other than trailing 9's) for the next integer in sequence i.e. 0 to 1, 1 to 2 ... 8 to 9. Where trailing 9's occur, each trailing 9 is replaced by a non-integer character e.g '_'. If the number being incremented consists entirely of trailing 9's a 0 is added to the front of the number so that it can be incremented to 1. Following the increment operation, the trailing 9's (now _'s) are replaced by '0's.
As an example say the integer 9 is to be incremented:
9 is replaced by _, a 0 is prepended (0_), the 0 is swapped for 1 (1_), the _ is replaced by 0. resulting in the number 10.
See comments directed at #jaypal for further notes.

Maybe something like this
#!/bin/bash
NR=1
cat filename while read line
do
line=$(echo $line | sed 's/AAAAA/$NR/')
echo ${line}
NR=$((NR + 1 ))
done

Perl did the job for me
perl -pi -e 's/\b'DROP'\b/$&.'_'.++$A /ge' /folder/subfolder/subsubfolder/*
Input:
DROP
drop
$drop
$DROP
$DROP="DROP"
$DROP='DROP'
$DROP=$DROP
$DROP="DROP";
$DROP='DROP';
$DROP=$DROP;
$var="DROP_ACTION"
drops
DROPS
CODROP
'DROP'
"DROP"
/DROP/
Output:
DROP_1
drop
$drop
$DROP_2
$DROP_3="DROP_4"
$DROP_5='DROP_6'
$DROP_7=$DROP_8
$DROP_9="DROP_10";
$DROP_11='DROP_12';
$DROP_13=$DROP_14;
$var="DROP_ACTION"
drops
DROPS
CODROP
'DROP_15'
"DROP_16"
/DROP_17/

Related

Parse output of command and store in variables

I need to parse the output of the mmls command and store multiple values in variables using a BASH script.
Specifically, I need to store: sector size (512 in the example below), and start values (0,0,63,224910,240975 in the example below). Since the second set of values represent partitions, the number of values captured could vary.
mmls /mnt/E01Mnt/RAW/ewf1
DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors
Slot Start End Length Description
000: Meta 0000000000 0000000000 0000000001 Primary Table (#0)
001: ------- 0000000000 0000000062 0000000063 Unallocated
002: 000:000 0000000063 0000224909 0000224847 NTFS / exFAT (0x07)
003: 000:001 0000224910 0000240974 0000016065 DOS FAT12 (0x01)
004: ------- 0000240975 0000250878 0000009904 Unallocated
Here's a start:
$ awk '/^Units/{print $4+0} /^[0-9]/{print $3+0}' file
512
0
0
63
224910
240975
Try to solve the rest yourself and then let us know if you have questions.
Explanation: file is a file containing your sample input. You can replace awk '{script}' file with command | awk '{script}' if you're input is coming from the output of some command rather then being stored in a file.
^ is the universal regexp metacharacter for start of string while /.../ in awk means "find this regexp". So the above is looking for lines that start with the text shown (i.e. Units or digits) and then printing the 4th or 3rd space-separated field after adding zero to it to remove any trailing non-digits or leading zeros. man awk.
You need a bit of awk to start with.
values=( $(mmls /mnt/E01Mnt/RAW/ewf1 | awk '
/^Units are in/{match($4,/^[[:digit:]]+/,ss); print ss[0]}
NR>6{print $4}'
) )
Now you have a values array which contains both the sector size(first element) and the start values(subsequent elements) . We could do some array manipulation to separate individual elements.
secsize=${values[0]} # size of sector
declare -a sv # sv for start values
for((i=1;i<${#values[#]};i++))
do
sv+=( ${values[i]} )
done
echo "${sv[#]}" # print start values
unset values # You don't need values anymore.
Note: Requires GNU awk.

how to add one to all fields in a file

suppose I have file containing numbers like:
1 4 7
2 5 8
and I want to add 1 to all these numbers, making the output like:
2 5 8
3 6 9
is there a simple one-line command (e.g. awk) to realize this?
try following once.
awk '{for(i=1;i<=NF;i++){$i=$i+1}} 1' Input_file
EDIT: As per OP's request without loop, here is a solution(written as per shown sample only).
With hardcoding of number of fields.
awk -v RS='[ \n]' '{ORS=NR%3==0?"\n":" ";print $0+1}' Input_file
OR
Without hardcoding number of fields.
awk -v RS='[ \n]' -v col=$(awk 'FNR==1{print NF}' Input_file) '{ORS=NR%col==0?"\n":" ";print $0+1}' Input_file
Explanation: So in EDIT section 1st solution I have hardcoded the number of fields by mentioning 3 there, in OR solution of EDIT, I am creating a variable named col which will read the very first line of Input_file to get the number of fields. Then it will not read all the Input_file, Now coming onto the code I have set Record separator as space or new line to it will add them without using a loop and it will add space each time after incrementing 1 in their values. It will print new line only when number of lines are completely divided by value of col(which is why we have taken number of fields in -v col section).
In native bash (no awk or other external tool needed):
#!/usr/bin/env bash
while read -r -a nums; do # read a line into an array, splitting on spaces
out=( ) # initialize an empty output array for that line
for num in "${nums[#]}"; do # iterate over the input array...
out+=( "$(( num + 1 ))" ) # ...and add n+1 to the output array.
done
printf '%s\n' "${out[*]}" # then print that output array with a newline following
done <in.txt >out.txt # with input from in.txt and output to out.txt
You can do this using gnu awk:
awk -v RS="[[:space:]]+" '{$0++; ORS=RT} 1' file
2 5 8
3 6 9
If you don't mind Perl:
perl -pe 's/(\d+)/$1+1/eg' file
Substitute any number composed of multiple digits (\d+) with that number ($1) plus 1. /e means to execute the replacement calculation, and /g means globally throughout the file.
As mentioned in the comments, the above only works for positive integers - per the OP's original sample file. If you wanted it to work with negative numbers, decimals and still retain text and spacing, you could go for something like this:
perl -pe 's/([-]?[.0-9]+)/$1+1/eg' file
Input file
Some column headers # words
1 4 7 # a comment
2 5 cat dog # spacing and stray words
+5 0 # plus sign
-7 4 # minus sign
+1000.6 # positive decimal
-21.789 # negative decimal
Output
Some column headers # words
2 5 8 # a comment
3 6 cat dog # spacing and stray words
+6 1 # plus sign
-6 5 # minus sign
+1001.6 # positive decimal
-20.789 # negative decimal

Process a line based on lines before and after in bash

I am trying to figure out how to write a bash script which uses the lines immediately before and after a line as a condition. I will give an example in a python-like pseudocode which makes sense to me.
Basically:
for line in FILE:
if line_minus_1 == line_plus_one:
line = line_minus_1
What would be the best way to do this?
So if I have an input file that reads:
3
1
1
1
2
2
1
2
1
1
1
2
2
1
2
my output would be:
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
Notice that it starts from the first line until the last line and respects changes made in earlier lines so if I have:
2
1
2
1
2
2
I would get:
2
2
2
2
2
2
and not:
2
1
1
1
2
2
$ awk 'minus2==$0{minus1=$0} NR>1{print minus1} {minus2=minus1; minus1=$0} END{print minus1}' file
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
How it works
minus2==$0{minus1=$0}
If the line from 2 lines ago is the same as the current line, then set the line from 1 line ago equal to the current line.
NR>1{print minus1}
If we are past the first line, then print the line from 1 line ago.
minus2=minus1; minus1=$0
Update the variables.
END{print minus1}
After we have finished reading the file, print the last line.
Multiple line version
For those who like their code spread over multiple lines:
awk '
minus2==$0{
minus1=$0
}
NR>1{
print minus1
}
{
minus2=minus1
minus1=$0
}
END{
print minus1
}
' file
Here is a (GNU) sed solution:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
This works with a moving three line window. A bit more readable:
sed -r ' # -r for extended regular expressions: () instead of \(\)
1N # On first line, append second line to pattern space
N # On all lines, append third line to pattern space
/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/ # See below
P # Print first line of pattern space
D # Delete first line of pattern space
' infile
N;P;D is the idiomatic way to get a moving two line window: append a line, print first line, delete first line of pattern space. To get a moving three line window, we read an additional line, but only once, namely when processing the first line (1N).
The complicated bit is checking if the first and third line of the pattern space are identical, and if they are, replacing the second line with the first line. To check if we have to make the substitution, we use the address
/^(.*)\n.*\n\1$/
The anchors ^ and $ are not really required as we'll always have exactly to newlines in the pattern space, but it makes it more clear that we want to match the complete pattern space. We put the first line into a capture group and see if it is repeated on the third line by using a backreference.
Then, if this is the case, we perform the substitution
s/^(.*\n).*\n/\1\1/
This captures the first line including the newline, matches the second line including the newline, and substitutes with twice the first line. P and D then print and remove the first line.
When reaching the end, the whole pattern space is printed so we're not swallowing any lines.
This also works with the second input example:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile2
2
2
2
2
2
2
To use with BSD sed (as found in OS X), you'd either have to use the -E instead of the -r option, or use no option, i.e., basic regular expressions and escape all parentheses (\(\)) in the capture groups. The newline matching should work, but I didn't test it. If in doubt, check this great answer lining out all the differences.

How can I find the missing integers in a unique and sequential list (one per line) in a unix terminal?

Suppose I have a file as follows (a sorted, unique list of integers, one per line):
1
3
4
5
8
9
10
I would like the following output (i.e. the missing integers in the list):
2
6
7
How can I accomplish this within a bash terminal (using awk or a similar solution, preferably a one-liner)?
Using awk you can do this:
awk '{for(i=p+1; i<$1; i++) print i} {p=$1}' file
2
6
7
Explanation:
{p = $1}: Variable p contains value from previous record
{for ...}: We loop from p+1 to the current row's value (excluding current value) and print each value which is basically the missing values
Using seq and grep:
seq $(head -n1 file) $(tail -n1 file) | grep -vwFf file -
seq creates the full sequence, grep removes the lines that exists in the file from it.
perl -nE 'say for $a+1 .. $_-1; $a=$_'
Calling no external program (if filein contains the list of numbers):
#!/bin/bash
i=0
while read num; do
while (( ++i<num )); do
echo $i
done
done <filein
To adapt choroba's clever answer for my own use case, I needed my sequence to deal with zero-padded numbers.
The -w switch to seq is the magic here - it automatically pads the first number with the necessary number of zeroes to keep it aligned with the second number:
-w, --equal-width equalize width by padding with leading zeroes
My integers go from 0 to 9999, so I used the following:
seq -w 0 9999 | grep -vwFf "file.txt"
...which finds the missing integers in a sequence from 0000 to 9999. Or to put it back into the more universal solution in choroba's answer:
seq -w $(head -n1 "file.txt") $(tail -n1 "file.txt") | grep -vwFf "file.txt"
I didn't personally find the - in his answer was necessary, but there may be usecases which make it so.
Using Raku (formerly known as Perl_6)
raku -e 'my #a = lines.map: *.Int; say #a.Set (^) #a.minmax.Set;'
Sample Input:
1
3
4
5
8
9
10
Sample Output:
Set(2 6 7)
I'm sure there's a Raku solution similar to #JJoao's clever Perl5 answer, but in thinking about this problem my mind naturally turned to Set operations.
The code above reads lines into the #a array, mapping each line so that elements in the #a array are Ints, not strings. In the second statement, #a.Set converts the array to a Set on the left-hand side of the (^) operator. Also in the second statement, #a.minmax.Set converts the array to a second Set, on the right-hand side of the (^) operator, but this time because the minmax operator is used, all Int elements from the min to max are included. Finally, the (^) symbol is the symmetric set-difference (infix) operator, which finds the difference.
To get an unordered whitespace-separated list of missing integers, replace the above say with put. To get a sequentially-ordered list of missing integers, add the explicit sort below:
~$ raku -e 'my #a = lines.map: *.Int; .put for (#a.Set (^) #a.minmax.Set).sort.map: *.key;' file
2
6
7
The advantage of all Raku code above is that finding "missing integers" doesn't require a "sequential list" as input, nor is the input required to be unique. So hopefully this code will be useful for a wide variety of problems in addition to the explicit problem stated in the Question.
OTOH, Raku is a Perl-family language, so TMTOWTDI. Below, a #a.minmax array is created, and grepped so that none of the elements of #a are returned (none junction):
~$ raku -e 'my #a = lines.map: *.Int; .put for #a.minmax.grep: none #a;' file
2
6
7
https://docs.raku.org/language/setbagmix
https://docs.raku.org/type/Junction
https://raku.org

Grep specific lines in a file

my file looks like this
Tree:0,pos:0,len:2.29276,TMRCA:0.795328,ARG:,len:2.29276,TMRCA:0.795328
NEWICK_TREE: [169]((2:0.147398,(6:0.136844,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.0105546):0.647929,(0:0.199142,(1:0.0103058,3:0.0103058):0.188836):0.596186);
SITE: 0 0.0123617064 0.648849164 0010111111
iHistoryMax: 0
Tree:1,pos:0.0169589,len:2.28476,TMRCA:0.795328,ARG:,len:2.28476,TMRCA:0.795328
NEWICK_TREE: [303]((2:0.147398,((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.0105546):0.647929,(0:0.199142,3:0.199142):0.596186);
iHistoryMax: 1
Tree:2,pos:0.0472255,len:2.77342,TMRCA:0.795328,ARG:,len:2.77342,TMRCA:0.795328
NEWICK_TREE: [67](((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.658484,((0:0.199142,3:0.199142):0.436921,2:0.636062):0.159266);
iHistoryMax: 2
Tree:3,pos:0.0539094,len:2.96385,TMRCA:0.795328,ARG:,len:2.96385,TMRCA:0.795328
NEWICK_TREE: [40](((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.658484,((0:0.389568,3:0.389568):0.246494,2:0.636062):0.159266);
iHistoryMax: 3
However, what I only need is the pos of each Tree (in the line Tree:1,pos) and the output should be only the number followed by pos in 1 column with 3 rows (or more). The position of the Tree line is not always in each 3 line as the part in between can change in length. This can be done in bash?
Use awk with a delimiter of : and , and then print the fields you want. For example, this will print the the Tree and pos numbers:
awk -F[:,] '/^Tree:/{print $2,$4}' file
using grep with -P
grep -Po "(?<=Tree.*pos:)[0-9.]+" file
0
0.0169589
0.0472255
0.0539094

Resources