Replace a word of a line if matched - bash

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx

Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).

sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....

In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)

Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

Related

Interchanging of values with conditions?

I have a following file and i would like to swap the values which are in format of digits(up to 3 digits)#digits(up to 4 digits) followed by # or space/end of line.
If # is followed by non digits then it shouldn't interchange them.
Sample Input
cat file1
xyz xyz xyz 123#456#1#34#123#2
xyz xyz xyz xyz xyz
test test
123#456#1#34#123#212#3#456#1#34#123#2#123#xyzxyz xyz
xyz xyz xyz
Sample output:
xyz xyz xyz 456#123#34#1#2#123
xyz xyz xyz xyz xyz
test test
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzxyz xyz
xyz xyz xyz
Have tried the following logic, seems like split is required in order interchange the values, but not able to check the condition along with how to save this in same field
awk '{for(a=1;a<=NF;a++){if($a~/#/){split($a,b,"[##]");val1=b[1];val2=b[2];print val1,val2}}}' file1
123 456
123 456
This simple gnu sed should be able to do the job:
sed -E 's/\<([0-9]{1,3})#([0-9]{1,4})(#|$)/\2#\1\3/g' file
xyz xyz xyz 456#123#34#1#2#123
xyz xyz xyz xyz xyz
test test
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzxyz xyz
xyz xyz xyz
Here, \< is used for word boundary.
Note that on BSD sed you have to use [[:<:]] for word boundary:
sed -E 's/[[:<:]]([0-9]{1,3})#([0-9]{1,4})(#|$)/\2#\1\3/g' file
Explanation:
\<: Word boundary
([0-9]{1,3}): Match 1 to 3 digits
#: Match a #
([0-9]{1,4}): Match 1 to 4 digits
(#|$): Match a # or end of line
With your shown samples, could you please try following. Written and tested in GNU awk.
awk -v RS='([0-9]{1,3}#[0-9]{1,4}#)+[0-9]{1,3}#[0-9]{1,4}' '
{
val=""
delete arr
delete arr2
num=split(RT,arr,"#")
for(i=1;i<=num;i++){
valTemp=""
split(arr[i],arr2,"#")
valTemp=arr2[2]"#"arr2[1]
val=(val?val "#":"")valTemp
}
ORS=val
}
1
' Input_file
Using GNU sed for the word \boundary:
sed -E 's/\b([[:digit:]]{1,3})#([[:digit:]]{1,4})(#|[[:blank:]]*|[[:blank:]]*$)/\2#\1\3/g' infile
Input:
xyz123#456#1#34#1234#2
0123#456# 123#456#
123#456#1#34#123#212#3#456#1#34#123#2#123#xyzx
5678#124 111#110# 002#001 01#010 1111#000
1111#000
Output:
xyz123#456#34#1#1234#2
0123#456# 456#123#
456#123#34#1#212#123#456#3#34#1#2#123#123#xyzx
5678#124 110#111# 001#002 010#01 1111#000
1111#000

BASH: grep characters and replace by the same plus tab

Basically, the only thing I need is to replace two spaces by a tab; this is the query:
abc def ghi K00001 jkl
all the columns are separated by a tab; the K00001 jkl is separated by two spaces. But I want these two spaces to be replaced by a tab.
I cannot just grep all two spaces since other contents have to spaces and they should stay.
My approach would be to grep:
grep '[0-9][0-9][0-9][0-9][0-9] ' file
but I want to replace it to have the same K00001<TAB>jkl
How do I replace by the same string? Can I use variables to store the grep result and then print the modified (tab not spaces) by the same string?
sed -r "s/([A-Z][0-9]{5}) /&\t/" File
or
sed -r "s/([A-Z][0-9]{5})\s{2}/&\t/" File
Example :
AMD$ echo "abc def ghi K00001 jkl" | sed -r "s/([A-Z][0-9]{5}) /&\t/"
abc def ghi K00001 jkl
You can use this sed:
sed -E $'s/([^[:blank:]]) {2}([^[:blank:]])/\\1\t\\2/g' file
Regex ([^[:blank:]]) {2}([^[:blank:]]) makes sure to match 2 spaces surrounded by 2 non-space characters. In replacement we put back surrounding characters using back-references \1 and \2
I would use awk , since with awk no matter if fields are separated by one - two or more spaces i can force output to be with tabs:
$ echo "abc def ghi K00001 jkl" |awk -v OFS="\t" '{$1=$1}1'
abc def ghi K00001 jkl

Using awk, eliminate any empty fields in a file and print in proper format

how to use awk on the following file named "awk.txt" and print all fields in proper length of space or tab length between.
# cat /root/awk.txt
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
I wanted to use awk on this and print in the following proper format.
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
Please help!!
Use the column command from coreutils:
column -t file
In this special case, where all entries have the same length, the following awk command would do the trick as well, however column can do the job even if the entries have different length:
awk '{$1=$1}1' OFS=' ' file
This line of awk will format the output using printf (documentation)
awk '{printf "%3s\t%3s\t%3s\n",$1,$2,$3}' awk.txt
If you want to strip the first line starting with #
awk '!/^#/{printf "%3s\t%3s\t%3s\n",$1,$2,$3}'

finding contents of one file in another in Unix

I am trying to search for the contents of one file(f1) in another file(f2) and print successful matches.
I have tried various posted answers as shown below but none of them help.
1.
awk 'FNR==NR{a[$0]++}FNR!=NR && !a[$0]{print}' f1 f2
2.
while read name
do
awk '$1 ~ '$name'' f2| awk '{print $NF, $4}' >> f3
done < f1
3.
grep -F -f f1 f2 > f3
All the above solutions print non matching entries also from f2. Is there any other way of doing it?
I am looking forward to an exact match in my scenario.
Say for example
$cat f1
abc
def
ghi
$cat f2
this line has abc
bc
abc
de
this line has ghi
i
ghi
Expected output :
abc
ghi
Thank you for your help.
Try below command (-i) flag is to search case insensitive
grep -i -Fx -f search_this.txt search_in.txt
Demo session is below
$ cat search_this.txt
xxxx yyyy
kkkkkk
zzzzzzzz
$ cat search_in.txt
line doesnot contain any name
This person is xxxx yyyy good
xxxx yyyy
Another line which doesnot contain any name
Is kkkkkk a good name ?
kkkkkk
This name itself is sleeping ...zzzzzzzz
I can't find any other name
Lets try the command now
$ grep -i -Fx -f search_this.txt search_in.txt
xxxx yyyy
kkkkkk
For me, that works, but I'm unsure if this is safe from a variable expansion point of view
PATTERN=`cat f1`; pcregrep -M "$PATTERN" f2
For using f2 as a number of patterns each which should be found, a solution seems to be here: finding contents of one file into another file in unix shell script

Cut first appearing pattern from line

I have a file say abc containing records like:
$cat xyz
ABC
ABCABC
ABCABCABC
I want to cut first pattern so result should be like:
AC
ACABC
ACABCABC
I am trying to cut pattern using awk like:
$ cat xyz|awk -F 'B' '{print $1,$2}'
A CA
A CA
A CA
Of course, B is deliminator so i am getting above result. How could i do that?
Thanks
I understand you want to delete first B in each line. If so, this will work:
sed 's/B//' xyx
Output:
AC
ACABC
ACABCABC
If you want the file to be replaced, add -i
sed -i 's/B//' xyx
I see you tried to edit my answer to add a new question - note that you have to do it updating your answer or writing in the comments.
Thanks and if i have one more case that i want to delete first pattern
only if i have more than one repeated pattern like:
$cat xyz
ABC
ABCABC
ABCABCABC
Output should be:
ABC
ACABC
ACABCABC
$cat xy
This can be a way to do it:
while read line
do
if [ `echo $line | grep -o "B" | wc -l` -ge 2 ]
then
echo $line | sed 's/B//'
else
echo $line
fi
done < xyz
Output:
ABC
ACABC
ACABCABC

Resources