searching multi-word patterns from one file in another using awk - bash

patterns file:
wicked liquid
movie
guitar
balance transfer offer
drive car
bigfile file:
wickedliquidbrains
drivelicense
balanceofferings
using awk on command line:
awk '/balance/ && /offer/' bigfile
i get the result i want which is
balanceofferings
awk '/wicked/ && /liquid/' bigfile
gives me
wickedliquidbrains, which is also good..
awk '/drive/ && /car/' bigfile
does not give me drivelicense which is also good, as i am having &&
now when trying to pass shell variable, containg those '/regex1/ && /regex2/.. etc' to awk..
awk -v search="$out" '$0 ~ search' "$bigfile"
awk does not run.. what may be the problem??

Try this:
awk "$out" "$bigfile"
When you do $0 ~ search, the value of search has to be a regular expression. But you were setting it to a string containing a bunch of regexps with && between them -- that's not a valid regexp.
To perform an action on the lines that match, do:
awk "$out"' { /* do stuff */ }' "$bigfile"
I switched from double quotes to single quotes for the action in case the action uses awk variables with $.

UPDATED
An alternative to Barmars's solution with arguments passed with -v:
awk -v search="$out" 'match($0,search)' "$bigfile"
Test:
$ echo -e "one\ntwo"|awk -v luk=one 'match($0,luk)'
one
Passing two (real) regexs (EREs) to awk:
echo -e "one\ntwo\nnone"|awk -v re1=^o -v re2=e$ 'match($0,re1) && match($0,re2)'
Output:
one
If You want to read the pattern_file and do match against all the rows, You could try something like this:
awk 'NR==FNR{N=NR;re[N,0]=split($0,a);for(i in a)re[N,i]=a[i];next}
{
for(i=1;i<=N;++i) {
#for(j=1;j<=re[i,0]&&match($0,re[i,j]);++j);
for(j=1;j<=re[i,0]&&$0~re[i,j];++j);
if(j>re[i,0]){print;break}
}
}' patterns_file bigfile
Output:
wickedliquidbrains
At the 1st line it reads and stores the pattern_file in a 2D array re. Each row contains the split input string. The 0th element of each row is the length of that row.
Then it reads bigfile. Each lines of bigfile are tested for match of re array. If all items in a row are matching then that row is printed.

Related

How to find content in a file and replace the adjecent value

Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.

AWK Finding a way to print lines containing a word from a comma separated string

I want to write a bash script that only prints lines that, on their second column, contain a word from a comma separated string. Example:
words="abc;def;ghi;jkl"
>cat log1.txt
hello;abc;1234
house;ab;987
mouse;abcdef;654
What I want is to print only lines that contain a whole word from the "words" variable. That means that "ab" won't match, neither will "abcdef". It seems so simple yet after trying for manymany hours, I was unable to find a solution.
For example, I tried this as my awk command, but it matched any substring.
-F \; -v b="TSLA;NVDA" 'b ~ $2 { print $0 }'
I will appreciate any help. Thank you.
EDIT:
A sample input would look like this
1;UNH;buy;344.74
2;PG;sell;138.60
3;MSFT;sell;237.64
4;TSLA;sell;707.03
A variable like this would be set
filter="PG;TSLA"
And according to this filter, I want to echo these lines
2;PG;sell;138.60
4;TSLA;sell;707.03
Grep is a good choice here:
grep -Fw -f <(tr ';' '\n' <<<"$words") log1.txt
With awk I'd do
awk -F ';' -v w="$words" '
BEGIN {
n = split(w, a, /;/)
# next line moves the words into the _index_ of an array,
# to make the file processing much easier and more efficient
for (i=1; i<=n; i++) words[a[i]]=1
}
$2 in words
' log1.txt
You may use this awk:
words="abc;def;ghi;jkl"
awk -F';' -v s=";$words;" 'index(s, FS $2 FS)' log1.txt
hello;abc;1234

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

Updating a specific field with sed

I'm trying to update a specific field on a specific line with the sed command in Bourne Shell.
Lets say I have a file TopScorer.txt
Player:Games:Goals:Assists
Salah:9:9:3
Kane:10:8:4
And I need to update the 3rd Column (Goals) of a player, I tried this command and it works unless Games and Goals have the same value then it updates the first one
player="Salah"
NewGoals="10"
OldGoals=$(awk -F':' '$1=="'$player'"' TopScorer.txt | cut -d':' -f3)
sed -i '/^'$player'/ s/'$OldGoals'/'$NewGoals'/' TopScorer.txt
Output> Salah:10:9:3 instead of Salah:9:10:3
Is there any solution? Should I use delimiters and $3==... to specify that field?
I also tried the option /2 for second occurrence but it's not very convenient in my case.
You can just do this with awk alone and not with sed. Also note that awk has an internal syntax to import variables from the shell. So your code just becomes
awk -F: -v pl="$player" -v goals="$NewGoals"
'BEGIN { OFS = FS } $1 == pl { $3= goals }1' TopScorer.txt
The -F: sets the input de-limiter as : and the part involving -v imports your shell variables to the context of awk. The BEGIN { OFS = FS } sets the output field separator to the same as input. Then we do the match using the imported variables and update $3 to the required value.
To make the modifications in-place, use a temporary file
awk -F: -v pl="$player" -v goals="$NewGoals"
'BEGIN { OFS = FS } $1 == pl { $3= goals }1' TopScorer.txt > tmpfile && mv tmpfile TopScorer.txt
This might work for you (GNU sed):
(player=Salah;newGoals=10;sed -i "/^$name/s/[^:]*/$newGoals/3" /tmp/file)
Use a sub shell so as not to pollute the current shell (...). Use sed and pattern matching to match the first field of each record to the variable player and replace the third field of the matching record with the contents of newGoals.
P.S. If the variables are needed in further processes the sub shell is not necessary i.e. remove the ( and )
You can try it with Perl
$ player="Salah"
$ NewGoals="10"
$ perl -F: -lane "\$F[2]=$NewGoals if ( \$F[0] eq $player ) ; print join(':',#F) " TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
or export them and call Perl one-liner within single quotes
$ export NewGoals="10"
$ export player="Salah"
$ perl -F: -lane '$F[2]=$ENV{NewGoals} if $F[0] eq $ENV{player} ; print join(":",#F) ' TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
Note that Perl has -i switch and you can do the replacement in-place, so
$ perl -i.bak -F: -lane '$F[2]=$ENV{NewGoals} if $F[0] eq $ENV{player} ; print join(":",#F) ' TopScorer.txt
$ cat TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
This will work .
With the first part of sed , i try to match a full line that math the player, and i keep all fields i want to keep by using \( .
The second part , i rebuild the line with some constants and the value of \1 and the value of \2
player="Salah"
NewGoals="10"
sed "s/^$player:\([^:]*\):[^:]*:\([^:]*\)\$/$player:\1:$NewGoals:\2/"
Could you please try following once. Advantage of this approach is that I am not hard coding field for Goals. This program will look for header's field wherever Goal is present(eg--> 4th or 5th any field), it will change for that specific column only.
1st Solution: When you need to make changes to all occurrences of player name then use following.
NewGoals=10
awk -v newgoals="$NewGoals" 'BEGIN{FS=OFS=":"} FNR==1{for(i=1;i<=NF;i++){if($i=="Goals"){field=i}}} FNR>1{if($1=="Salah"){$field=newgoals}} 1' Input_file
2nd Solution: In case you want to change a specific player's goals value to specific row only then try following.
NewGoals=10
awk -v newgoals="$NewGoals" 'BEGIN{FS=OFS=":"} FNR==1{for(i=1;i<=NF;i++){if($i=="Goals"){field=i}}} FNR>1{if($1=="Salah" && FNR==2){$field=newgoals}} 1' Input_file
Above will make changes only for row 2, you coud change it by changing FNR==2 in 2nd condition where FNR refers row number inawk. In case you want to save output into Input_file itself then you could append > temp_file && mv temp_file Input_file to above codes.

bash grep for string and ignore above one line

One of my script will return output as below,
NameComponent=Apache
Fixed=False
NameComponent=MySQL
Fixed=True
So in the above output, I am trying to ignore the below output using grep grep -vB1 'False' which seems not working,
NameComponent=Apache
Fixed=False
Is it possible to perform this using grep or is any better way with awk..
<some-command> |tac |sed -e '/False/ { N; d}' |tac
NameComponent=MySQL
Fixed=True
For every line that matches "False", the code in the {} gets executed. N takes the next line into the pattern space as well, and then d deletes the whole thing before moving on to the next line. Note: using multiple pipes is not considered as good practice.
#Karthi1234: If your Input_file is same as provided samples then try:
awk -F' |=' '($2 != "Apache" && $2 != "False")' Input_file
First making field separator as a space or = then checking here if field 2nd's value is not equal to sting Apache and False and mentioned no action to be performed so default print action will be done by awk.
EDIT: as per OP's request following is the code changed one, try:
awk '!/Apache/ && !/False/' Input_file
You could change strings too in case if these are not the ones which you want, logic should be same.
EDIT2: eg--> You could change values of string1 and string2 and increase the conditions if needed as per your requirement.
awk '!/string1/ && !/string2/' Input_file
If I understand the question correctly you will always have a line before "Fixed=..." and you want to print both lines if and only if "Fixed=True"
The following awk should do the trick:
< command > | awk 'BEGIN {prev='NA'} {if ($0=="Fixed=True") {print prev; print $0;} prev=$0;}'
Note that if the first line is "Fixed=True" it will print the string "NA" as the first line.

Resources