replace last word in specific line sed - bash

I have file like
abc dog 1.0
abc cat 2.4
abc elephant 1.2
and I want to replace last word from a line which contains 'elephant' with string which I know.
The result should be
abc dog 1.0
abc cat 2.4
abc elephant mystring
I have sed '/.*elephant.*/s/%/%/' $file but what should be instead of '%'?
EDIT:
odd example
abc dogdogdogdog 1.0
abc cat 2.4
abc elephant 1.2
and now try to change last line.

EDIT: To preserve spaces could you please try following.
awk '
match($0,/elephant[^0-9]*/){
val=substr($0,RSTART,RLENGTH-1)
sub("elephant","",val)
$NF=val "my_string"
val=""
}
1
' Input_file
Could you please try following(if you are ok with awk).
awk '/elephant/{$NF="my_string"} 1' Input_file
In case you want to save output into Input_file itself try following.
awk '/elephant/{$NF="my_string"} 1' Input_file > temp_file && mv temp_file Input_file

basic
sed '/elephant/ s/[^[:blank:]]\{1,\}$/mstring/' $file
if some space could be at the end
sed '/elephant/ s/[^[:blank:]]\{1,\}[[:blank:]*$/mystring/' $file

an alternative to do the substitution and preserve the space:
awk '/elephant/{sub(".{"length($NF)"}$","new")}7' file
with your example:
kent$ cat f
abc dog 1.0
abc cat 2.4
abc elephant 1.2
kent$ awk '/elephant/{sub(".{"length($NF)"}$","new")}7' f
abc dog 1.0
abc cat 2.4
abc elephant new

Robustly in any awk:
$ awk '$2=="elephant"{sub(/[^[:space:]]+$/,""); $0=$0 "mystring"} 1' file
abc dog 1.0
abc cat 2.4
abc elephant mystring
Note that unlike the other answers you have so far it will not fail when the target string (elephant) is part of some other string or appears in some other location than the 2nd field or contains any regexp metachars, or when the replacement string contains &, etc.

Related

Partial string search between two files using AWK

I have been trying to re-write an egrep command using awk to improve performance but haven't been successful. The egrep command performs a simple case insensitive search of the records in file1 against (partial matches in) file2. Below is the command and sample output.
file1 contains:
Abc
xyz
123
blah
hh
a,b
file2 contains:
abc de
xyz
123
456
blah
test1
abdc
abc,def,123
kite
a,b,c
Original command :
egrep -i -f file1 file2
Original (egrep) command output :
$ egrep -i -f file1 file2
abc de
xyz
123
blah
abc,def,123
a,b,c
I would like to use AWK to rewrite the command to do the same operation. I have tried the below but it is performing a full record match and not partial like grep does.
Modified command in awk :
awk 'NR==FNR{a[tolower($0)];next} tolower($0) in a' file1 file2
Modified command (awk) output:
$ awk 'NR==FNR{a[tolower($0)];next} tolower($0) in a' file1 file2
xyz
123
blah
This excludes the records which had partial matches for the string "abc". Any help to fix the awk command please? Thanks in advance.
Use index like this for a partial literal match:
awk '
NR == FNR {
needles[tolower($0)]
next
}
{
haystack = tolower($0)
for (needle in needles) {
if (index(haystack, needle)) {
print
break
}
}
}' file1 file2
I would be a bit surprised that it's significantly faster than egrep but you can try this:
$ awk 'NR==FNR {r=r ((r=="")?"":"|") tolower($0);next} tolower($0)~r' file1 file2
abc de
xyz
123
blah
abc,def,123
Explanation: first build the r1|r2|...|rn regular expression from the content of file1 and store it in awk variable r. Then print all lines of file2 that match it, thanks to the ~ match operator.
If you have GNU awk you can use its IGNORECASE variable instead of tolower:
$ awk -v IGNORECASE=1 'NR==FNR{r=r ((r=="")?"":"|") $0;next} $0~r' file1 file2
abc de
xyz
123
blah
abc,def,123
And with GNU awk it could be that forcing the type of r to regexp instead of string leads to better performance. The manual says:
Given that you can use both regexp and string constants to describe
regular expressions, which should you use? The answer is "regexp
constants," for several reasons:
...
It is more efficient to use regexp constants. 'awk' can note that
you have supplied a regexp and store it internally in a form that
makes pattern matching more efficient. When using a string
constant, 'awk' must first convert the string into this internal
form and then perform the pattern matching.
In order to do this you can try:
$ awk -v IGNORECASE=1 'NR==FNR {s=s ((s=="")?"":"|") $0;next}
FNR==1 && NR!=FNR {r=#//;sub(//,s,r);print typeof(r),r} $0~r' file1 file2
regexp Abc|xyz|123|blah|hh
abc de
xyz
123
blah
abc,def,123
(r=#// forces variable r to be of type regexp and sub(//,s,r) does not change this)
Note: just like with your egrep attempts, the lines of file1 are considered as regular expressions, not simple text strings to search for. So, if one line in file1 is .*, all lines in file2 will match, not just the lines containing substring .*.

Sed to remove substring | Can I make a flexible pattern to remove numbers before tab?

I wanted to ask some advice on an issue that I'm having in removing a substring from a string. I have a file with many lines like the following:
DOG; CSQ| 0.1234 | abcd | \t CAT
where \t represents a literal tab.
My aim is to remove a substring by using sed 's/CSQ.*|//g' so that I can get the following output:
DOG; CAT
However I face a problem where all the rows aren't formatted the same. For example, I also get lines such as:
DOG; CSQ| 0.1234 | abcd | 0 \t CAT
DOG; CSQ| 0.1234 | abcd | 0.9187 \t CAT
My code fails at this point because instead of getting DOG; CAT for all lines, I get:
DOG; CAT
DOG; 0 CAT
DOG; 0.9187 CAT
I've searched for possible solutions but I'm having difficulty (I'm also quite new to bash). I imagine there's something that I can do with sed that will handles all cases but I'm not sure.
You can find and replace all text from CSQ till the last | and all chars after that till the tab including it using
sed 's/CSQ.*|.*\t//' file > newfile
See the online demo.
The CSQ.*|.*\t is a POSIX BRE pattern that matches
CSQ - a CSQ string
.* - any text
| - a pipe char
.* - any text
\t - TAB char.
If the \t are two-char combinations double the backslash before t:
sed 's/CSQ.*|.*\\t//' file > newfile
See this online demo.
So optionally match it.
sed 's/CSQ.*|\( [0-9.]*\)\?//g'
You can learn regex online with fun with regex crosswords.
awk makes this pretty easy.
$: awk '/CSQ.*\t/{print $1" "$NF}' file
DOG; CAT
DOG; CAT
DOG; CAT
Note that the file has to have actual tabs, not \t sequences. awk will read the \t correctly.
If there are no other formatted lines in the file that you want, then maybe just
$: awk '{print $1" "$NF}' file
DOG; CAT
DOG; CAT
DOG; CAT

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

BASH: grep characters and replace by the same plus tab

Basically, the only thing I need is to replace two spaces by a tab; this is the query:
abc def ghi K00001 jkl
all the columns are separated by a tab; the K00001 jkl is separated by two spaces. But I want these two spaces to be replaced by a tab.
I cannot just grep all two spaces since other contents have to spaces and they should stay.
My approach would be to grep:
grep '[0-9][0-9][0-9][0-9][0-9] ' file
but I want to replace it to have the same K00001<TAB>jkl
How do I replace by the same string? Can I use variables to store the grep result and then print the modified (tab not spaces) by the same string?
sed -r "s/([A-Z][0-9]{5}) /&\t/" File
or
sed -r "s/([A-Z][0-9]{5})\s{2}/&\t/" File
Example :
AMD$ echo "abc def ghi K00001 jkl" | sed -r "s/([A-Z][0-9]{5}) /&\t/"
abc def ghi K00001 jkl
You can use this sed:
sed -E $'s/([^[:blank:]]) {2}([^[:blank:]])/\\1\t\\2/g' file
Regex ([^[:blank:]]) {2}([^[:blank:]]) makes sure to match 2 spaces surrounded by 2 non-space characters. In replacement we put back surrounding characters using back-references \1 and \2
I would use awk , since with awk no matter if fields are separated by one - two or more spaces i can force output to be with tabs:
$ echo "abc def ghi K00001 jkl" |awk -v OFS="\t" '{$1=$1}1'
abc def ghi K00001 jkl

how to get the lines from one pattern to other pattern when those pattern are repeated by sed command

if i have file like this
test.txt
abc naveen
abc cde
naveen cde
kumar
naveen
abc
cde
abc
naveen
cde
Question 1: In this we have repeated patterns like abc, navee, cdf etc
Now we have to get the lines from first occurrence of one pattern to any second occurrence of another pattern
For example, I want to get the lines from the 2nd occurrence of abc to the 3rd occurrence of naveen i.e we get output as
abc cde
naveen cde
kumar
naveen
Question 2 (this question is continue to above question):
I want to get only the lines between them (exclude those abc and naveen )
So, I want output as
cde
naveen cde
kumar
this can be done by using sed command ....
so any one please give me the answer for this
try this
a=2
b=3
abcocc=`awk '$0~/abc/{print NR}' txt | awk -v occ=$a 'NR==occ{print $0}' `
naveenocc=`awk '$0~/naveen/{print NR}' txt | awk -v occ=$b 'NR==occ{print $0}'`
1) awk -v abc=$abcocc -v naveen=$naveenocc 'NR>=abc&&NR<=naveen{print $0}' txt
2) awk -v abc=$abcocc -v naveen=$naveenocc 'NR>abc&&NR<naveen{print $0}' txt
a is occurrence of abc and b is occurrence of Naveen and txt is input file. try and let me know if modification is needed.

Resources