use bash or awk to replace part of a string - bash

I have the following example lines in a file:
sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0
I want to process the file and have the following output:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.
I have not succeeded with the follwing:
sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt
Any bash or awk advice will be welcome.Thanks

If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z]+ and use \1 in the replacement.
sed -E 's/([a-zA-Z])_[a-zA-Z]+/\1/g' file.txt >out.txt
If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.
If you only want to do this for the start of the string you can prepend ^ to the pattern and omit the /g at the end of the sed command.
sed -E 's/([a-zA-Z]+)(_\1)+/\1/g' file.txt >out.txt
The pattern matches:
([a-zA-Z]+) Capture group 1, match 1 or more occurrences of a char a-zA-Z
(_\1)+ Capture group 2, repeat matching _ and the same text captured by group 1
The file out.txt will contain:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

With your shown samples, please try following awk code.
awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file
Explanation: Simple explanation would be, using awk's split function that splits 1st field into an array named arr with delimiter _ AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1) and by mentioning 1 printing edited/non-edited lines.

You can do it more simply, like this:
sed -E 's/_[a-zA-Z]+//' file.txt >out.txt
This just replaces an underscore followed by any number of alphabetical characters with nothing.

$ awk 'NR~/^[23]$/{sub(/_[^ ]+/,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

I would do:
awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file
Prints:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Related

Increment last number of first line in file

I want to write a shell script which can increment the last value of the first line of a certain file structure:
File-structure:
p cnf integer integer
integer integer ... 0
For Example:
p cnf 11 9
1 -2 0
3 -1 5 0
To:
p cnf 11 10
1 -2 0
3 -1 5 0
The dots should stay the same.
If you could use perl:
perl -pe 's/(-*\d+)$/$1+1/e' if $. == 1' inputfile
Here (-*\d+)$ is capturing integer value(optionally negative) at the end of the line and e flag allows the execution of code before replacement, so the value increments.
With GNU awk:
awk 'NR==1{$NF++} {print}' file
or
awk 'NR==1{$NF++}1' file
Output:
p cnf 11 10
1 -2 0
3 -1 5 0
$NF contains last column.

Sed: delete lines after a pattern for all occurences

I needed some help with sed. I am trying to delete 3 lines after a pattern for all occurrences in a file. I do
sed '/pattern/,+3d' file.
This only deletes 3 lines and the pattern for the first occurrence but just deletes the pattern for the second occurrence but not the lines after which is really confusing. Can anyone please help with what am I doing wrong?
I think awk is better for the task. For example,
$ cat file
1
2
4
a0
1
a1
1
2
3
4
5
Run
awk '
flag { i ++ }
i == 3 { flag = 0 }
!flag
/a/ { flag = 1; i = 0 }
' file
Output
1
2
4
a0
3
4
5
This might work for you (GNU sed):
sed -n '/regexp/{p;:a;n;//ba;n;//ba;n;//ba;d};p' file
If the regexp is encountered, print the current line and then delete the following 3 lines. At any time whilst reading these 3 lines, the regexp occurs, reset the count.
If the regexp is also to be deleted, use:
sed -n '/regexp/{:a;n;//ba;n;//ba;n;//ba;d};p' file

Replace the nth field of every mth line using awk or bash

For a file that contains entries similar to as follows:
foo 1 6 0
fam 5 11 3
wam 7 23 8
woo 2 8 4
kaz 6 4 9
faz 5 8 8
How would you replace the nth field of every mth line with the same element using bash or awk?
For example, if n = 1 and m = 3 and the element = wot, the output would be:
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
I understand you can call / print every mth line using e.g.
awk 'NR%7==0' file
So far I have tried to keep this in memory but to no avail... I need to keep the rest of the file as well.
I would prefer answers using bash or awk, but sed solutions would also be helpful. I'm a beginner in all three. Please explain your solution.
awk -v m=3 -v n=1 -v el='wot' 'NR % m == 0 { $n = el } 1' file
Note, however, that the inter-field whitespace is not guaranteed to be preserved as-is, because awk splits a line into fields by any run of whitespace; as written, the output fields of modified lines will be separated by a single space.
If your input fields are consistently separated by 2 spaces, however, you can effectively preserve the input whitespace by adding -F' ' -v OFS=' ' to the awk invocation.
-v m=3 -v n=1 -v el='wot' defines Awk variables m, n, and el
NR % m == 0 is a pattern (condition) that evaluates to true for every m-th line.
{ $n = el } is the associated action that replaces the nth field of the input line with variable el, causing the line to be rebuilt, implicitly using OFS, the output-field separator, which defaults to a space.
1 is a common Awk shorthand for printing the (possibly modified) input line at hand.
Great little exercise. While I would probably lean toward an awk solution, in bash you can also rely on parameter expansion with substring replacement to replace the nth field of every mth line. Essentially, you can read every line, preserving whitespace, then check your line count, e.g. if c is your line counter and m your variable for mth line, you could use:
if (( $((c % m )) == 0)) ## test for mth line
If the line is a replacement line, you can read each word into an array after restoring default word-splitting and then use your array element index n-1 to provide the replacement (e.g. ${line/find/replace} with ${line/"${array[$((n-1))]}"/replace}).
If it isn't a replacement line, simply output the line unchanged. A short example could be similar to the following (to which you can add additional validations as required)
#!/bin/bash
[ -n "$1" -a -r "$1" ] || { ## filename given an readable
printf "error: insufficient or unreadable input.\n"
exit 1
}
n=${2:-1} ## variables with default n=1, m=3, e=wot
m=${3:-3}
e=${4:-wot}
c=1 ## line count
while IFS= read -r line; do
if (( $((c % m )) == 0)) ## test for mth line
then
IFS=$' \t\n'
a=( $line ) ## split into array
IFS=
echo "${line/"${a[$((n-1))]}"/$e}" ## nth replaced with e
else
echo "$line" ## otherwise just output line
fi
((c++)) ## advance counter
done <"$1"
Example Use/Output
n=1, m=3, e=wot
$ bash replmn.sh dat/repl.txt
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
n=1, m=2, e=baz
$ bash replmn.sh dat/repl.txt 1 2 baz
foo 1 6 0
baz 5 11 3
wam 7 23 8
baz 2 8 4
kaz 6 4 9
baz 5 8 8
n=3, m=2, e=99
$ bash replmn.sh dat/repl.txt 3 2 99
foo 1 6 0
fam 5 99 3
wam 7 23 8
woo 2 99 4
kaz 6 4 9
faz 5 99 8
An awk solution is shorter (and avoids problems with duplicate occurrences of the replacement string in $line), but both would need similar validation of field existence, etc.. Learn from both and let me know if you have any questions.

Process a line based on lines before and after in bash

I am trying to figure out how to write a bash script which uses the lines immediately before and after a line as a condition. I will give an example in a python-like pseudocode which makes sense to me.
Basically:
for line in FILE:
if line_minus_1 == line_plus_one:
line = line_minus_1
What would be the best way to do this?
So if I have an input file that reads:
3
1
1
1
2
2
1
2
1
1
1
2
2
1
2
my output would be:
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
Notice that it starts from the first line until the last line and respects changes made in earlier lines so if I have:
2
1
2
1
2
2
I would get:
2
2
2
2
2
2
and not:
2
1
1
1
2
2
$ awk 'minus2==$0{minus1=$0} NR>1{print minus1} {minus2=minus1; minus1=$0} END{print minus1}' file
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
How it works
minus2==$0{minus1=$0}
If the line from 2 lines ago is the same as the current line, then set the line from 1 line ago equal to the current line.
NR>1{print minus1}
If we are past the first line, then print the line from 1 line ago.
minus2=minus1; minus1=$0
Update the variables.
END{print minus1}
After we have finished reading the file, print the last line.
Multiple line version
For those who like their code spread over multiple lines:
awk '
minus2==$0{
minus1=$0
}
NR>1{
print minus1
}
{
minus2=minus1
minus1=$0
}
END{
print minus1
}
' file
Here is a (GNU) sed solution:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
This works with a moving three line window. A bit more readable:
sed -r ' # -r for extended regular expressions: () instead of \(\)
1N # On first line, append second line to pattern space
N # On all lines, append third line to pattern space
/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/ # See below
P # Print first line of pattern space
D # Delete first line of pattern space
' infile
N;P;D is the idiomatic way to get a moving two line window: append a line, print first line, delete first line of pattern space. To get a moving three line window, we read an additional line, but only once, namely when processing the first line (1N).
The complicated bit is checking if the first and third line of the pattern space are identical, and if they are, replacing the second line with the first line. To check if we have to make the substitution, we use the address
/^(.*)\n.*\n\1$/
The anchors ^ and $ are not really required as we'll always have exactly to newlines in the pattern space, but it makes it more clear that we want to match the complete pattern space. We put the first line into a capture group and see if it is repeated on the third line by using a backreference.
Then, if this is the case, we perform the substitution
s/^(.*\n).*\n/\1\1/
This captures the first line including the newline, matches the second line including the newline, and substitutes with twice the first line. P and D then print and remove the first line.
When reaching the end, the whole pattern space is printed so we're not swallowing any lines.
This also works with the second input example:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile2
2
2
2
2
2
2
To use with BSD sed (as found in OS X), you'd either have to use the -E instead of the -r option, or use no option, i.e., basic regular expressions and escape all parentheses (\(\)) in the capture groups. The newline matching should work, but I didn't test it. If in doubt, check this great answer lining out all the differences.

replace exact number in shell

I have following matrix:
0.380451 0.381955 0 0.237594
0.317293 0.362406 0 0.320301
0.261654 0.38797 0 0.350376
0 0 0 1
0 1 0 0
0 0 0 1
0 0.001504 0 0.998496
0.270677 0.35188 0.018045 0.359398
0.36391 0.305263 0 0.330827
0.359398 0.291729 0.037594 0.311278
0.359398 0.276692 0.061654 0.302256
And I want to replace only the zeros not the zeros followed by points to 0.001, how can I do that with sed or gsub?
This is not elegant, and not super portable, but it works on your specific example:
sed -e 's=^0 =X =g
s= 0$= X=g
s= 0 = X =g' data.txt
First of all, it assumes that the fields in the input file are separated by one or more white spaces. The first part looks for "0" at the beginning of the line, the second at the end of the line, and the third finds "0" with spaces on both sides.
Any particular reason to use only sed for this? I am sure that a simple awk script could do a better job, and also be more robust.
Match whitespace in your replacement.
echo 0 0.001504 0 0.998496 | sed 's/0[\t ]/Z /g'

Resources