awk pattern match for a line with two specific words - bash

Using AWK, want to print last line containing two specific words.
Suppose I have log.txt which contains below logs
log1|Amy|Call to Bob for Food
log2|Jaz|Call to Mary for Toy and Cookies
log3|Ron|Call to Jerry then Bob for Book
log4|Amy|Message to John for Cycle
Now, Need to extract last line with "Call" and "Bob".
I tried with-
#!/bin/bash
log="log.txt"
var="Bob"
check=$(awk -F'|' '$3 ~ "/Call.*$var/" {print NR}' $log | tail -1)
echo "Value:$check"
so Value:3 (3rd record) should be printed.
But it's not printed.Please suggest. I have to use awk.

With GNU awk for word delimiters to avoid "Bob" incorrectly matching "Bobbing":
$ awk -v var="Bob" -F'|' '$3 ~ "Call.*\\<" var "\\>"{nr=NR; rec=$0} END{if (nr) print nr, rec}' file
3 log3|Ron|Call to Jerry then Bob for Book
See http://cfajohnson.com/shell/cus-faq-2.html#Q24. If Bob is already saved in a shell variable named var then use awk -v var="$var" ....

Related

How to split a name from middle name and last name in a list with Bash Shell Script

First of all, thank you so much for taking some time out to help me out!
I've been trying to figure out a way to split a full name into First, Middle, and Last names using awk and sed in bash shell without any success, this is an example of the names.
Kent D. Hones
Akelli Same Lizarraga
Sherein Rahmi
Theresa Q. Callins
Vanessa M. Dewson
Behzad Gazloo
Jim M. Skolen
Sherry Marie Wanaa
These are the bash commands that I've been trying to use.
awk -F"." '{print $1}' listnames.csv > output.csv
sed -e 's/*.//g' < listnames.csv > output.csv
The output of the used commands are:
for awk -F"." '{print $1}' listnames.csv > output.csv returns an empty output.csv
for sed -e 's/*.//g' < listnames.csv > output.csv returns the exact same list:
Kelly D. Hynes
Aketzalli Gamez Lizarraga
Shervin Rahimi
Theresa M. Collins
Vanessa L. Dawson
Behzad Garagozloo
James M. Skaalen
Shannon Marie Wenaa
The desired output is to have at least two list of
First name
Aketzalli
Shervin
Theresa
Vanessa
Behzad
James
Shannon
Last name
Hynes
Lizarraga
Rahimi
Collins
Dawson
Garagozloo
Skaalen
Wenaa
I was thinking that maybe I could use the "." in the Middle name to differentiate them but that would not work for distinguishing between last names and middle names.
Any help, insights, or feedback would be much appreciated.
Thanks! 🙏🏼
$ awk -v OFS=',' '{print $1, (NF>2 ? $2 : ""), $NF}' file
Kent,D.,Hones
Akelli,Same,Lizarraga
Sherein,,Rahmi
Theresa,Q.,Callins
Vanessa,M.,Dewson
Behzad,,Gazloo
Jim,M.,Skolen
Sherry,Marie,Wanaa
To print the first and last names to 2 separate files as you now show your desired output to be would just be:
awk '{print $1 > "first"; print $NF > "last"}' file
You could use awk to separate the names into comma separated values and then use sed to trim any whitespace characters.
awk '{if (NF == 3) {print $1",",$2",",$3} else {print $1",,"$2}}' listnames.csv |sed 's/[ .]//g'
if (NF == 3) ... This tests if the name contains a middle name or not in order to separate the values properly.
sed 's/[ .]//g' We use sed to remove whitespace and periods.

How to awk to find and replace string in 2,3,4 and 5th column if matching pattern exist

My text file consist of 5 columns. Based on the user input want to search only in 2nd, 3rd, 4th and 5th column and replace it.
for example
oldstring="Old_word"
newstring="New_word"
So want to find all the exact match of oldstring and replace the same with newstring.
Column one should be untouched even if there is a match.
Browsed and found that awk will do but I am able to change in one particular column.
bash script
$ cat testfile
a,b,c Old_word,d,e
f,gOld_wordOld_wordOld_words,h,i,j
$ awk -F, -v OFS=, -v oldstring="Old_word" -v newstring="New_Word" '{
for (i=2; i<=5; i++) { gsub(oldstring,newstring,$i) }
print
}' testfile
a,b,c New_Word,d,e
f,gNew_WordNew_WordNew_Words,h,i,j
For more information about awk read the awk info page
Another way, similar to Glenn Jackman's answer is :
$ awk -F, -v OFS=, -v old="Old_word" -v new="New_word" '{
s=$1; $1=""; gsub(old,new,$0); $1=s
print }' <file>

awk or shell command to count occurence of value in 1st column based on values in 4th column

I have a large file with records like below :
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
I want to find the no of person (names in col 1) have apple and oranges both. And the command should take as less memory as possible and should be fast. Any help appreciated!
Output :
awk/sed file => 2 (jon and tom)
Using awk is pretty easy:
awk -F, \
'$4 == "apple" { apple[$1]++ }
$4 == "oranges" { orange[$1]++ }
END { for (name in apple) if (orange[name]) print name }' data
It produces the required output on the sample data file:
jon
tom
Yes, you could squish all the code onto a single line, and shorten the names, and otherwise obfuscate the code.
Another way to do this avoids the END block:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && orange[$1]) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && apple[$1]) print $1 }' data
When it encounters an apple entry for the first time for a given name, it checks to see if the name also (already) has an entry for oranges and prints it if it has; likewise and symmetrically, if it encounters an orange entry for the first time for a given name, it checks to see if the name also has an entry for apple and prints it if it has.
As noted by Sundeep in a comment, it could use in:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && $1 in orange) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && $1 in apple) print $1 }' data
The first answer could also use in in the END loop.
Note that all these solutions could be embedded in a script that would accept data from standard input (a pipe or a redirected file) — they have no need to read the input file twice. You'd replace data with "$#" to process file names if they're given, or standard input if no file names are specified. This flexibility is worth preserving when possible.
With awk
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){print $1}' ip.txt ip.txt
jon
tom
This processes the input twice
In first pass, add key to an array if last field is apple (-F, would set , as input field separator)
In second pass, check if last field is oranges and if first field is a key of array a
To print only number of matches:
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){c++} END{print c}' ip.txt ip.txt
2
Further reading: idiomatic awk for details on two file processing and awk idioms
I did a work around and used only grep and comm commands.
grep "apple" file | cut -d"," -f1 | sort > file1
grep "orange" file | cut -d"," -f1 | sort > file2
comm -12 file1 file2 > names.having.both.apple&orange
comm -12 shows only the common names between the 2 files.
Solution from Jonathan also worked.
For the input:
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
the command:
sed -n "/apple\|oranges/p" inputfile | cut -d"," -f1 | uniq -d
will output a list of people with both apples and oranges:
jon
tom
Edit after comment: For an for input file where lines are not ordered by 1st column and where each person can have two or more repeated fruits, like:
jon,1,2,apple
fred,1,2,apple
fred,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
jon,1,2,oranges
tom,1,2,apple
mary,1,2,apple
tom,1,2,oranges
This command will work:
sed -n "/\(apple\|oranges\)$/ s/,.*,/,/p" inputfile | sort -u | cut -d, -f1 | uniq -d

awk delete all lines not containing substring using if condition

I want to delete lines where the first column does not contain the substring 'cat'.
So if string in col 1 is 'caterpillar', i want to keep it.
awk -F"," '{if($1 != cat) ... }' file.csv
How can i go about doing it?
I want to delete lines where the first column does not contain the substring 'cat'
That can be taken care by this awk:
awk -F, '!index($1, "cat")' file.csv
If that doesn't work then I would suggest you to provide your sample input and expected output in question.
This awk does the job too
awk -F, '$1 ~ /cat/{print}' file.csv
Explanation
-F : "Delimiter"
$1 ~ /cat/ : match pattern cat in field 1
{print} : print
A shorter command is:
awk -F, '$1 ~ "cat"' file.csv
-F is the field delimiter: (,)
$1 ~ "cat" is a (not anchored) regular expression match, match at any position.
As no action has been given, the default: {print} is assumed by awk.

Searching within column using awk

I am making a contact system through a bashscript. The text file I am inputting contacts in looks like this:
Sally May,may#yahoo.com,344-555-4930,Friend
Bill,Bill#yahoo.com,344-555-6543,Co-Worker
In a search option provided I ask (after they pick the column):
echo -e"What would you like to search for:\c";;
read search
From here I would like to use the variable $search to go through the FIRST column and give me those lines in a formatted fashion. For example:
If they type in (Bill), then it should return
Name Email Phone Category
Bill Bill#yahoo.com 344-555-6543 Co-Worker
If they type in (ll), then it should return
Name Email Phone Category
Bill Bill#yahoo.com 344-555-6543 Co-Worker
Sally May may#yahoo.com 344-555-4930 Friend
The line of code I have been working on so far is this:
awk -F, '{ if ($1 ~/$search/) print $0 }' contacts.txt | awk -F, 'BEGIN{printf "%-25s %-25s %-25s %-25s\n","Name","Email","Phone","Category"} {printf "%-25s %-25s %-25s %-25\n",$1,$2,$3,$4}' ;;
It is giving me an error when I run it. Could someone help me fix this! I appreciate it
You need to pass variable to awk using -v option and need to simplify your formatting:
s='ll'
awk -F, -v s="$s" '$0 ~ s{$1=$1; print}' file | column -t
Sally May may#yahoo.com 344-555-4930 Friend
Bill Bill#yahoo.com 344-555-6543 Co-Worker
s='Bill'
awk -F, -v s="$s" '$0 ~ s{$1=$1; print}' file | column -t
Bill Bill#yahoo.com 344-555-6543 Co-Worker
read -p "What would you like to search for? :" patt
awk -F, 'BEGIN{printf "%-25s%-25s%-25s%-25s\n","Name","Email","Phone","Category"} $1~/'$patt'/{printf("%-25s%-25s%-25s%-25s\n",$1,$2,$3,$4)}' file

Resources