awk print "matched" after successful match - bash

Trying to use awk to search a file for specific lines and writing "matched" after printing each line if it matches.
For example, if I have a file that contains a list of names and emails but I only want to match emails ending in "#yahoo.com" I want it to print out that line with "matched" at the end and if the line does NOT contain #yahoo.com, I just want it to print out that line and continue.
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
This returns:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com`
So its only matching the first #yahoo.com, then printing the other lines off regardless of their email address. What am I missing?

Could you please try following.
awk '/#yahoo\.com/{print $0,"matched"}' Input_file
Explanation: Your shown sample Input_file was not having comma so removed that field separator part from it.
Also a person's name can have more than 2 fields so I am not matching 3rd field in condition, rather I am checking condition on whole line here.

This is probably what you want:
awk -F, '{print $0 ($NF ~ /#yahoo\.com$/ ? " matched" : "")}' emails.txt
but without seeing your input file it's just an untested guess.

I tested the example with the following input: file delimiter given is , in your example and after splitting it u can access $3 record(email) and match your condition.
Input:
Joe,Smith,joe.smith#yahoo.com
John,Doe,john.doe#gmail.com
Sally,Sue,sally.sue#yahoo.com```
Script
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
Output:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched

If the line ends with #yahoo.com , reassign the entire line with itself followed by string matched.
awk '/#yahoo.com$/{$0=$0 " matched"}1' input
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched

Related

Find a line with a single word and merge it with the next line

I have an issue with grep that i can't sort out.
What I have.
A listing of firstnames and lastnames, like:
John Doe
Alice Smith
Bob Smith
My problem.
Sometimes, firstname and lastname are disjointed, like:
Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line
What I'd like to achieve.
Concatenate the "orphan" name with the next line.
Alice Smith
Bod Doolittle
Mark Von Doe
What I already tried
grep -ozP "^\w+\n\w.+" file | tr '\n' ' '
So, here I ask grep to find a line with just one word and concatenate it with the following line, even is this next line has more than one word.
It works correctly but only if the isolated word is at the very beginning of the file. If it appears below the first line, grep do not spot it. So a quick and dirty solution where I would loop through the file and remove a line after each pass doesn't work for me.
If awk is acceptable:
awk '
NF==1 {printf "%s ",$1; getline; print; next}
1' names.dat
Where:
NF==1 - if only one name/field in the current record ...
printf / getline / print / next - print field #1, read next line and print it, then skip to next line
1 - print all other lines as is
As a one-liner:
awk 'NF==1{printf "%s ",$1;getline;print;next}1' names.dat
This generates:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
You can use GNU sed like this:
sed -E -i '/^[^[:space:]]+$/{N;s/\n/ /}' file
See the sed demo:
s='Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line'
sed -E '/^[^[:space:]]+$/{N;s/\n/ /}' <<< "$s"
Output:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
Details:
/^[^[:space:]]+$/ finds a line with no whitespace
{N;s/\n/ /} - reads in the next line, and appends a newline char with this new line to the current pattern space, and then s/\n/ / replaces this newline char with a space.
Use this Perl one-liner:
perl -lane 'BEGIN { $is_first_name = 1; } if ( #F == 1 && $is_first_name ) { #prev = #F; $is_first_name = 0; } else { print join " ", #prev, #F; $is_first_name = 1; #prev = (); }' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
Using awk:
awk '
{f=$2 ? 1 : 0}
v==1{v=0; print; next}
f==0{v=1; printf "%s ", $1; next}
1
' file
Output
Alice Smith
Bob Doolittle
Mark Von Doe
This might work for you (GNU sed):
sed -E 'N;s/^(\S+)\n/\1 /;P;D' file
Append the next line.
If the first line in the pattern space contains one word only, replace the following newline with a space.
Print/delete the first line and repeat.

Print the characters from string

I have a file which contains words-
abfiuf.com abdbhj.co.in abcahjkl.org.in.2 abciuf zasdg cbhjk asjk
including other contents. My Requirement is -The word which starts with abfiuf,
abdbhj, abcahjkl, abciuf ,.... cut the two character from middle like below.
abfiuf - fi
abdbhj - db
abjcahjkl - ca
abciuf - ci
I have tried below-
First to get the matching word-
cat /etc/xyz.txt|grep -Eo \<(abfiuf|abdbhj|abjcahjkl|abciuf)\S*'|cut -f1 -d"."
But unable to cut before and after matching "fi" , "db" , "ca" , "ci" from words .
Tried with sed command - sed 's/^.*fi/fi/' -> working only for one word removing before. But How to cut multiple char before & after from words ?
EDIT2: Since OP told he/she only want to print matching strings value only if that is the case then one should try following.
awk 'match($0,/fi|db|ca|ci/){print substr($0,RSTART,RLENGTH)}' Input_file
OR in case you want to print a message with line number that DO NOT have any match found then try following.
awk 'match($0,/fi|db|ca|ci/){print substr($0,RSTART,RLENGTH);next} {print "Line number " FNR " is NOT having any matching value in it."}' Input_file
Assuming that you need to print only 3rd and 4th character if that is the case then try following.
awk '{print substr($0,3,2)}' Input_file
EDIT: Now I am assuming that you DO NOT want to hard code the position to print from lines if that is the case then try following, which will first calculate the length of line and will print 2 characters starting from its middle letter.
awk '{len=length($0)/2;print substr($0,len,2)}' Input_file

Formatting based on condition in bash

Here is my code:
grep -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b|first_name.{0,40}|[(]?[2-9]{1}[0-9]{2}[)-. ]?[2-9]{1}[0-9]{2}[-. ]?[0-9]{4}" file.txt | awk -v ORS= '
NR>1 && !/,/ {print "\n"}
{print}
END {if (NR) print "\n"}' | sed -e :a -e '$!N;s/\n[0-9]{3}/,/;ta' -e 'P;D' | sed '$!N;s/\n\s*[0-9]//;P;D'
I'm pretty close. The above code works, but is removing the first digit from phone number.
I'm looking for a bash solution to do the following:
Combine two lines if the lines do not start with a number.
If the line starts with a number, combine the previous two lines + the line with the number for 3 fields in one line.
Here's an example?
jim.bob3#email.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
....
Results:
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Thanks!
If your Input_file is same as shown sample then following awk solution may help you in same.
awk '{printf("%s",$0~/^name/&&FNR>1?RS $0:FNR==1?$0:FS $0)} END{print ""}' Input_file
Output will be as follows.
name1#email.com Jim Bob
name2#email.com Jane Bob
name3#email.com Joe Bob 555 555 5555
name4#email.com Jeff Bob
Explanation: Following code is only for understanding purposes NOT for running you could use above code for running.
awk '{printf(\ ##Using printf keyword from awk here to print the values etc.
"%s",\ ##Mentioning %s means it tells printf that we are going to print a string here.
$0~/^name/&&FNR>1\ ##Checking here condition if a line starts from string name and line number is greater than 1 then:
?\ ##? means following statement will be printed as condition is TRUE.
RS $0\ ##printing RS(record separator) and current line here.
:\ ##: means in case mentioned above condition was NOT TRUE then perform following steps:
FNR==1\ ##Checking again condition here if a line number is 1 then do following:
?\ ##? means execute statements in case above condition is TRUE following ?
$0\ ##printing simply current line here.
:\ ##: means in case above mentioned conditions NOT TRUE then perform actions following :
FS $0)} ##Printing FS(field separator) and current line here.
END{print ""}' file24 ##Printing a NULL value here to print a new line and mentioning the Input_file name here too.
Using awk
awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
Input:
$ cat infile
jim.bob3#emaawk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infileil.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
Output:
$ awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Explanation:
awk '/#/{ # ir row/line/record contains #
if(s)print s; # if variable s was set before print it.
s="" # nullify or reset variable s
}
{
s=(s?s OFS:"")$0 # concatenate variable s with its previous content if it was set before, with
# OFS o/p field separator and
# current row/line/record ($0),
# otherwise s will be just current record
}
END{ # end block
if(s)print s # if s was set before print it
}
' infile

Using grep inside of awk

I have a quite untidy CSV-file with ; as field separator. In field 1 I have a name, and in field 3 OR 4 there are address details, separated by comma, with an unspecified number of entries, mostly including an e-mail-address. So it looks like this:
Doe, Jon; Some information ; some more information; di: address details, p: (01234) 56789, F: 252470, info#my-domain.com
Miller, Mariella; Some information ; di: other address, p: (09876) 54321, mailme#the-millers.com
Brown, Sam; Other information ; di: other address with no e-mail, p: (09876) 54321
I want to extract the e-mail-addresses from the file together with the names. I can get the names with
BEGIN {FS = ";"}
/#/ {print $1}
I can find the e-mail-addresses with this nice grep:
grep -i -o "[A-Z0-9._%+-]\+#[A-Z0-9.-]\+\.[A-Z]\{2,4\}" mylist.csv
I would like to have the grep called when there is an # in the line, resulting in an output like this:
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com
But I have no clue how I can call the grep from the awk.
You can use gawk:
$ gawk -F\; 'match($0, /(\w+#[^#]+.)/, a){print $1", "a[1]}' file
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com
From the documentation:
If regexp contains parentheses, the integer-indexed elements of array
are set to contain the portion of string matching the corresponding
parenthesized subexpression.
Explanation
match($0, /(\w+#[^#]+.)/, a) will serve us in two ways, match function will be true only if the regex captures a mail address, then we enter the print part to show the final result.
Using awk you can do this:
awk -F ';' '$NF ~ /#/{sub(/ *$/, "", $NF); sub(/.* /, "", $NF); print $1 ",", $NF}' file
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com

awk pattern match for a line with two specific words

Using AWK, want to print last line containing two specific words.
Suppose I have log.txt which contains below logs
log1|Amy|Call to Bob for Food
log2|Jaz|Call to Mary for Toy and Cookies
log3|Ron|Call to Jerry then Bob for Book
log4|Amy|Message to John for Cycle
Now, Need to extract last line with "Call" and "Bob".
I tried with-
#!/bin/bash
log="log.txt"
var="Bob"
check=$(awk -F'|' '$3 ~ "/Call.*$var/" {print NR}' $log | tail -1)
echo "Value:$check"
so Value:3 (3rd record) should be printed.
But it's not printed.Please suggest. I have to use awk.
With GNU awk for word delimiters to avoid "Bob" incorrectly matching "Bobbing":
$ awk -v var="Bob" -F'|' '$3 ~ "Call.*\\<" var "\\>"{nr=NR; rec=$0} END{if (nr) print nr, rec}' file
3 log3|Ron|Call to Jerry then Bob for Book
See http://cfajohnson.com/shell/cus-faq-2.html#Q24. If Bob is already saved in a shell variable named var then use awk -v var="$var" ....

Resources