Using grep inside of awk - bash

I have a quite untidy CSV-file with ; as field separator. In field 1 I have a name, and in field 3 OR 4 there are address details, separated by comma, with an unspecified number of entries, mostly including an e-mail-address. So it looks like this:
Doe, Jon; Some information ; some more information; di: address details, p: (01234) 56789, F: 252470, info#my-domain.com
Miller, Mariella; Some information ; di: other address, p: (09876) 54321, mailme#the-millers.com
Brown, Sam; Other information ; di: other address with no e-mail, p: (09876) 54321
I want to extract the e-mail-addresses from the file together with the names. I can get the names with
BEGIN {FS = ";"}
/#/ {print $1}
I can find the e-mail-addresses with this nice grep:
grep -i -o "[A-Z0-9._%+-]\+#[A-Z0-9.-]\+\.[A-Z]\{2,4\}" mylist.csv
I would like to have the grep called when there is an # in the line, resulting in an output like this:
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com
But I have no clue how I can call the grep from the awk.

You can use gawk:
$ gawk -F\; 'match($0, /(\w+#[^#]+.)/, a){print $1", "a[1]}' file
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com
From the documentation:
If regexp contains parentheses, the integer-indexed elements of array
are set to contain the portion of string matching the corresponding
parenthesized subexpression.
Explanation
match($0, /(\w+#[^#]+.)/, a) will serve us in two ways, match function will be true only if the regex captures a mail address, then we enter the print part to show the final result.

Using awk you can do this:
awk -F ';' '$NF ~ /#/{sub(/ *$/, "", $NF); sub(/.* /, "", $NF); print $1 ",", $NF}' file
Doe, Jon, info#my-domain.com
Miller, Mariella, mailme#the-millers.com

Related

awk print "matched" after successful match

Trying to use awk to search a file for specific lines and writing "matched" after printing each line if it matches.
For example, if I have a file that contains a list of names and emails but I only want to match emails ending in "#yahoo.com" I want it to print out that line with "matched" at the end and if the line does NOT contain #yahoo.com, I just want it to print out that line and continue.
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
This returns:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com`
So its only matching the first #yahoo.com, then printing the other lines off regardless of their email address. What am I missing?
Could you please try following.
awk '/#yahoo\.com/{print $0,"matched"}' Input_file
Explanation: Your shown sample Input_file was not having comma so removed that field separator part from it.
Also a person's name can have more than 2 fields so I am not matching 3rd field in condition, rather I am checking condition on whole line here.
This is probably what you want:
awk -F, '{print $0 ($NF ~ /#yahoo\.com$/ ? " matched" : "")}' emails.txt
but without seeing your input file it's just an untested guess.
I tested the example with the following input: file delimiter given is , in your example and after splitting it u can access $3 record(email) and match your condition.
Input:
Joe,Smith,joe.smith#yahoo.com
John,Doe,john.doe#gmail.com
Sally,Sue,sally.sue#yahoo.com```
Script
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
Output:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched
If the line ends with #yahoo.com , reassign the entire line with itself followed by string matched.
awk '/#yahoo.com$/{$0=$0 " matched"}1' input
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched

Match a particular letter and print word after that using SED

I have a file "Log.txt" which look like this:
bla bla.. line1
bla bla.. line2
bla bla.. lineN
:000000 ... 239e670... A bla1.txt
:000000 ... 76fd777... M bla2.txt
:000000 ... e69de29... A bla3.txt
Let's say that I am looking for the letter 'A' and 'M'.
How would I look for it ONLY in the 4th field or line that contains this particular letter only. I need to Match the words "A" and "M" only and print the file name after that. i.e I need to get final output as below:
A bla1.txt
M bla2.txt
A bla3.txt
I used awk to match 4th column with A and M and print the next word. but not getting the expected output. I'm getting extra Bla Bla lines also.
Anyone has idea how to achieve this using sed?
awk for this:
awk '$4 ~ /^[AM]$/ { print $4," ",$5 }' Log.txt
sed for it:
sed -En '/^([^ ]+ ){3}[AM]/ { s/^([^ ]+ ){3}([AM] .*)/\2/; p; }' Log.txt
Both of these confirm that the A or M is in the 4th field.
Awk actually can do your job, just need to add a condition:
awk "/ (A|M) /{print $4,$5}" Log.txt
As for sed, you can do this:
sed -nr "/ (A|M) /{s/.*((A|M)\s+.*)$/\1/;p}" Log.txt
Not sure how are your real data looks like, but I guess you will get it and adjust the command to suit them.
As per your input file and your expected output, Please try below using awk:
awk '{if ($4 == "A" || $4 == "M") {print $4,$5}}' log.txt
Output:
A bla1.txt
M bla2.txt
A bla3.txt
This might work for you (GNU sed):
sed 's/^\(\S*\s\)\{3\}\([AM]\s\)/\2/p;d' file
Match the fourth field to be A or M and if so, remove the first three fields and print the remainder.

How to reorder the elements of each line in a file using sed and/or awk following a dynamic format

I currently have a file with each line containing ordered data. For example:
Peter:Connor:14:40kg
George:Head:56:60kg
I have a listing function that takes as an argument a "format" string.
That string contains abbreviations representing each possible element of the list. In this example, the abbreviations would be:
%N, %S, %A, %W
Those abbreviations can be preceded or followed by any amount of characters.
I want to print the data so that it fits the received string format, replacing each abbreviation with their corresponding element in the list. For example, I might receive:
{%A} [%W] %S %N
or
%S|%N|%A[[%W]]
And I would need to reorder the data so that it fits the demanded format. Since it's an argument in the function, I have no way to know what I will receive beforehand.
{14} [40kg] Connor Peter
and for the 2nd example
Connor|Peter|14[[40kg]]
How can I use awk to do this?
Assuming that I might receive... leads to you being able to pass a string with that value to awk:
$ cat tst.awk
BEGIN {
FS = ":"
tmp = fmt
sub(/^[^[:alpha:]]+/,"",tmp)
split(tmp,flds,/[^[:alpha:]]+/)
gsub(/[[:alpha:]]+/,"%s",fmt)
fmt = fmt ORS
}
NR==1 {
for (i=1; i<=NF; i++) {
f[$i] = i
}
next
}
{ printf fmt, $(f[flds[1]]), $(f[flds[2]]), $(f[flds[3]]), $(f[flds[4]]) }
$ awk -v fmt='{age} [kilo] surname name' -f tst.awk file
{14} [40kg] Connor Peter
{56} [60kg] Head George
$ awk -v fmt='surname|name|age[[kilo]]' -f tst.awk file
Connor|Peter|14[[40kg]]
Head|George|56[[60kg]]
For the above to work there has to be something that names the columns You could hard-code that in the script if you like but I added it as a header line to your CSV instead:
$ cat file
name:surname:age:kilo
Peter:Connor:14:40kg
George:Head:56:60kg
awk 'BEGIN{FS=":"; OFS=" "}{print "{"$3"}","["$4"]",$2,$1}' inputFile
gives:
{14} [40kg] Connor Peter
{56} [60kg] Head George
and
awk 'BEGIN{FS=":"; OFS="|"}{print $2,$1,$3"[["$2"]]"}' inputFile
yields
Connor|Peter|14[[Connor]]
Head|George|56[[Head]]

print 1st string of a line if last 5 strings match input

I have a requirement to print the first string of a line if last 5 strings match specific input.
Example: Specified input is 2
India;1;2;3;4;5;6
Japan;1;2;2;2;2;2
China;2;2;2;2
England;2;2;2;2;2
Expected Output:
Japan
England
As you can see, China is excluded as it doesn't meet the requirement (last 5 digits have to be matched with the input).
grep ';2;2;2;2;2$' file | cut -d';' -f1
$ in a regex stands for "end of line", so grep will print all the lines that end in the given string
-d';' tells cut to delimit columns by semicolons
-f1 outputs the first column
You could use awk:
awk -F';' -v v="2" -v count=5 '
{
c=0;
for(i=2;i<=NF;i++){
if($i == v) c++
if(c>=count){print $1;next}
}
}' file
where
v is the value to match
count is the maximum number of value to print the wanted string
the for loop is parsing all fields delimited with a ; in order to find a match
This script doesn't need the 5 values 2 to be consecutive.
With sed:
sed -n 's/^\([^;]*\).*;2;2;2;2;2$/\1/p' file
It captures and output non ; first characters in lines ending with ;2;2;2;2;2
It can be shortened with GNU sed to:
sed -nE 's/^([^;]*).*(;2){5}$/\1/p' file
awk -F\; '/;2;2;2;2;2$/{print $1}' file
Japan
England

awk pattern match for a line with two specific words

Using AWK, want to print last line containing two specific words.
Suppose I have log.txt which contains below logs
log1|Amy|Call to Bob for Food
log2|Jaz|Call to Mary for Toy and Cookies
log3|Ron|Call to Jerry then Bob for Book
log4|Amy|Message to John for Cycle
Now, Need to extract last line with "Call" and "Bob".
I tried with-
#!/bin/bash
log="log.txt"
var="Bob"
check=$(awk -F'|' '$3 ~ "/Call.*$var/" {print NR}' $log | tail -1)
echo "Value:$check"
so Value:3 (3rd record) should be printed.
But it's not printed.Please suggest. I have to use awk.
With GNU awk for word delimiters to avoid "Bob" incorrectly matching "Bobbing":
$ awk -v var="Bob" -F'|' '$3 ~ "Call.*\\<" var "\\>"{nr=NR; rec=$0} END{if (nr) print nr, rec}' file
3 log3|Ron|Call to Jerry then Bob for Book
See http://cfajohnson.com/shell/cus-faq-2.html#Q24. If Bob is already saved in a shell variable named var then use awk -v var="$var" ....

Resources