Search for a part of a string in a file in shell - shell

I have a file with a list of names in it. I am trying to search for names with a certain group of strings and print them to a new file in shell. Can you please help?
Here's what I am looking at.
Text file(names.txt) contains names like
New York_USA
Delhi
Moscow
Tokyo
Austin_USA
Beijing
Chicago_USA
I am trying to get the names with _USA in a seperate file.
Here's what I have tried
#/bin/ksh
for city in "cat names.txt"
do
if ["$city" =~ "*_USA"]
then
echo "$city in USA" > USAnames.txt
fi
done

You just need to execute the cat command instead of quoting it, fix your test operator ([ --> [[), and append >> instead of overwrite >, and write a regex instead of a glob pattern:
#/bin/ksh
for city in `cat names.txt` # notice the back quotes
do
if [[ "$city" =~ .*_USA ]]
then
echo "$city in USA" >> USAnames.txt
fi
done
A more idiomatic way would perhaps be:
#/bin/ksh
while read city
do
if [[ "$city" =~ .*_USA ]]
then
echo "$city in USA" >> USAnames.txt
fi
done < names.txt

Simply use grep :
grep '_USA$' your_file> new_file

Use awk or sed will be easier
if you want the output to be like this
New York_USA in USA
Austin_USA in USA
Chicago_USA in USA
use
awk '/_USA$/ {print $0, "in USA"}' names.txt > USAnames.txt
or sed
sed -n 's/\(.*\)_USA$/\0 in USA/p' names.txt > USAnames.txt
or if you want output to be
New York in USA
Austin in USA
Chicago in USA
use
awk -F_ '/_USA$/ {print $1,"in USA"}' names.txt > USAnames.txt
or sed
sed -n 's/\(.*\)_USA$/\1 in USA/p' names.txt > USAnames.txt
Edit
you can also use perl to print out the USA cities
#!/usr/bin/env perl
open(NAMES, "names.txt");
while (<NAMES>) {
chomp;
print "$_ in USA\n" if /_USA$/; # Chicago_USA in USA
# print "$1 in USA\n" if /(.*)_USA$/; # Chicago in USA
}
close(NAMES);

Related

Key Matching using shell

I wanted to see different type of answers I receive from you guys for the below problem. I am curious to see below problem being solved completely through array or any other matching (if there is any).
Below is the problem. Keeping Name as the key we need to print their various phone numbers in a line.
$cat input.txt
Name1, Phone1
Name2, Phone2
Name3, Phone1
Name4, Phone5
Name1, Phone2
Name2, Phone1
Name4, Phone1
O/P:
$cat output.txt
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1
I solved the above problem but I wanted to see a solving technique perhaps one that is more effective than me. I am not an expert in shell still at a beginner level. My code below:
$cat keyMatchingfunction.sh
while read LINE; do
var1=(echo "$LINE"|awk -F\, '{ print $1 }')
matching_line=(grep "$var1" output.txt|wc -l)
if [[ $matching_line -eq 0 ]]; then
echo "$LINE" >> output.txt
else
echo $LINE is already present in output.txt
grep -q -n "$var1" output.txt
line_no=(grep -n "$var1" output.txt|cut -d: -f1)
keymatching=(echo "$LINE"|awk -F\, '{ print $2 }')
sed -i "$line_no s/$/,$keymatching/" output.txt
fi
done
Try this:
awk -F', ' '{a[$1]=a[$1]","$2}END{for(i in a) print i a[i]}' input.txt
Output:
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1
With bash and sort:
#!/bin/bash
declare -A array # define associative array
# read file input.txt to array
while IFS=", " read -r line number; do
array["$line"]+=",$number"
done < input.txt
# print array
for i in "${!array[#]}"; do
echo "$i${array[$i]}"
done | sort
Output:
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1

How to add multiple line of output one by one to a variable in Bash?

This might be a very basic question but I was not able to find solution. I have a script:
If I run w | awk '{print $1}' in command line in my server I get:
f931
smk591
sc271
bx972
gaw844
mbihk988
laid640
smk59
ycc951
Now I need to use this list in my bash script one by one and manipulate some operation on them. I need to check their group and print those are in specific group. The command to check their group is id username. How can I save them or iterate through them one by one in a loop.
what I have so far is
tmp=$(w | awk '{print $1})
But it only return first record! Appreciate any help.
Populate an array with the output of the command:
$ tmp=( $(printf "a\nb\nc\n") )
$ echo "${tmp[0]}"
a
$ echo "${tmp[1]}"
b
$ echo "${tmp[2]}"
c
Replace the printf with your command (i.e. tmp=( $(w | awk '{print $1}') )) and man bash for how to work with bash arrays.
For a lengthier, more robust and complete example:
$ cat ./tstarrays.sh
# saving multi-line awk output in a bash array, one element per line
# See http://www.thegeekstuff.com/2010/06/bash-array-tutorial/ for
# more operations you can perform on an array and its elements.
oSET="$-"; set -f # save original set flags and turn off globbing
oIFS="$IFS"; IFS=$'\n' # save original IFS and make IFS a newline
array=( $(
awk 'BEGIN{
print "the quick brown"
print " fox jumped\tover\tthe"
print "lazy dogs back "
}'
) )
IFS="$oIFS" # restore original IFS value
set +f -$oSET # restore original set flags
for (( i=0; i < ${#array[#]}; i++ ));
do
printf "array[%d] of length=%d: \"%s\"\n" "$i" "${#array[$i]}" "${array[$i]}"
done
printf -- "----------\n"
printf -- "array[#]=\n\"%s\"\n" "${array[#]}"
printf -- "----------\n"
printf -- "array[*]=\n\"%s\"\n" "${array[*]}"
.
$ ./tstarrays.sh
array[0] of length=22: "the quick brown"
array[1] of length=23: " fox jumped over the"
array[2] of length=21: "lazy dogs back "
----------
array[#]=
"the quick brown"
array[#]=
" fox jumped over the"
array[#]=
"lazy dogs back "
----------
array[*]=
"the quick brown fox jumped over the lazy dogs back "
A couple of non-obvious key points to make sure your array gets populated with exactly what your command outputs:
If your command output can contain globbing characters than you should disable globbing before the command (oSET="$-"; set -f) and re-enable it afterwards (set +f -$oSET).
If your command output can contain spaces then set IFS to a newline before the command (oIFS="$IFS"; IFS=$'\n') and set it back to it's old value after the command (IFS="$oIFS").
tmp=$(w | awk '{print $1}')
while read i
do
echo "$i"
done <<< "$tmp"
You can use a for loop, i.e.
for user in $(w | awk '{print $1}'); do echo $user; done
which in a script would look nicer as:
for user in $(w | awk '{print $1}')
do
echo $user
done
You can use the xargs command to do this:
w | awk '{print $1}' | xargs -I '{}' id '{}'
With the -I switch, xargs will take each line of its standard input separately, then construct and execute a command line by replacing the specified string '{}' in the command line template with the input line
I guess you should use who instead of w. Try this out,
who | awk '{print $1}' | xargs -n 1 id

Unix bash cutting and grep

I have a text file called db.txt.
Some sample lines from the file goes as such:
Harry Potter and the Sorcerer's Stone:J.K. Rowling:21.95:100:200
Harry Potter and the Chamber of Secrets:J.K. Rowling:21.95:150:300
Lord of the Rings, The Fellowship of the Ring:J.R.R. Tolkien:32.00:500:500
A Game of Thrones:George R.R. Martin:44.50:300:250
Then in my script, I have the following lines:
echo "Enter title:"
read TITLE
cut -d ":" -f 1 db.txt | grep -iw "$TITLE" | while read LINE
do
STRING="`echo $LINE | cut -d ":" -f 1`,"
STRING="$STRING `echo $LINE | cut -d ":" -f 2`, "
STRING=" \$$STRING`echo $LINE | cut -d ":" -f 3`,"
STRING=" $STRING`echo $LINE | cut -d ":" -f 4`,"
STRING=" $STRING`echo $LINE | cut -d ":" -f 5`"
done
Is there a way to grep a specific field from cut and then pass in the full line into the while loop?
For example, if I entered "Harry Potter",
it should display:
Harry Potter and the Sorcerer's Stone, J.K. Rowling, $21.95, 100, 200
Harry Potter and the Chamber of Secrets, J.K. Rowling, $21.95, 150, 300
You can do this without cut, and without grep if you're ok with bash's regular expression matching (or can use shell pattern matching instead).
The idea would be to read the file line by line, then split the line into an array.
Once you've got that, do the comparisons and output you want.
Here's a demo of the technique:
#! /bin/bash
echo "Title:"
read title
# shopt -s nocasematch # if you want case-insensitive matching
while read line ; do # this read takes data from input.txt, see
# end of loop
IFS=: read -a parts <<< "$line" # this splits the line on ":" into
# an array called parts
if [[ ${parts[0]} =~ $title ]] ; then # regex matching
printf "%s -- %s\n" "${parts[1]}" "${parts[2]}"
fi
done < input.txt
The next step up from grep and cut is awk. Unless you must do this using bash (is this homework?), then awk would make things considerably easier:
awk -F: '/harry potter/ { sub(/^/,"$",$(NF-2)); print }' IGNORECASE=1 OFS=", " db.txt
Test input:
Harry Potter and the Sorcerer's Stone:J.K. Rowling:21.95:100:200
Harry Potter and the Chamber of Secrets:J.K. Rowling:21.95:150:300
Lord of the Rings, The Fellowship of the Ring:J.R.R. Tolkien:32.00:500:500
A Game of Thrones:George R.R. Martin:44.50:300:250
Test output:
Harry Potter and the Sorcerer's Stone, J.K. Rowling, $21.95, 100, 200
Harry Potter and the Chamber of Secrets, J.K. Rowling, $21.95, 150, 300
read -p "Enter title: " TITLE
while IFS=: read title author price x y; do
if [[ ${title,,} == *${TITLE,,}* ]]; then
printf "%s, %s, $%s, %s, %s\n" "$title" "$author" "$price" "$x" "$y"
fi
done < db.txt
The test in the if command does a simple glob-match but case insensitively, so it will match if the user enters "potter".
Or, use sed to change the separators:
read -p "Enter title: " TITLE
sed '/'"$TITLE"'/I!d; s/:/, /g' db.txt
which means delete all lines that do not match the TITLE, then transform the separator.
The easiest method of doing this is to look over the grep results
#!/bin/bash
read -p "Enter title: " TITLE
FILENAME="db.txt"
IFS=$'\n'
for LINE in `grep -iw "Harry Potter" "$FILENAME"`; do
echo $LINE | awk 'BEGIN { FS = ":" } ; { print $1, $2, $3, $4, $5 }'
done
The IFS change changes the delimiter to a new line rather than a space and the FS in the awk command changes the delimiter to the : to allow access to the fields
I know you didn't specify it, but awk is probably the best tool to use for this task. It combines cut, sed, and grep into one convenient and easy to use tool. Well, convenient tool...
To understand awk, you have to understand a few things:
Awk is a programming language. It has built in logic and variables.
Awk assumes a read loop reading each and every line.
Awk programs must be surrounded by curly braces.
Not only curly braces, but Awk parsing variables start with dollar signs. Therefore, you need to put your Awk programs surrounded by single quotes to keep the shell out of it.
Awk automatically parses each line based upon the field separator. The default field separator is a while space, but you can change that via the -f parameter.
Each field gets a special variable. THe first field is $1, the next field is $2, etc. The entire line is $0.
Here's your Awk statement:
awk -F: '{
title = $1
author = $2
price = $3
pages_read_until_i_got_bored=$4
pages = $5
print "I read " pages_read_until_i_gob_bored "pages out of " $pages " pages of " $title " by " $author "."
}' $file
Of course, the whole thing could be a single line too:
awk -F: '{ print "I read " $4 " pages " out of " $5 " of " $1 " by " $2 "." }' $file
Just wanted to emphasize the programability of Awk and how it can be used to do this type of parsing.
If your question is how to enter this information and put it into environment variables, Glenn Jackman's answer is the best.
If you can use sed this would be a solution
read -p "Enter title: " TITLE
sed -n -e 's/^\([^:]\+:\)\{2\}/\0$/' -e 's/:/, /g' -e "/^$TITLE/Ip" db.txt
Short explanation what it does
-n tells sed not to print any lines
-e 's/^\([^:]\+:\)\{2\}/\0$/' matches for the 2nd : and adds a $ after it
-e 's/:/, /g' replaces all : with , and a following whitespace
-e "/^$TITLE/Ip" tells sed to print all lines which start with $TITLE (that's the p) and I tells sed to match case-insensitive

Get any string between 2 string and assign a variable in bash

I cannot get this to work. I only want to get the string between 2 others in bash. Like this:
FOUND=$(echo "If show <start>THIS WORK<end> then it work" | **the magic**)
echo $FOUND
It seems so simple...
sed -n 's/.*<start>\(.*\)<end>.*/\1/p'
This can be done in bash without any external commands such as awk and sed. When doing a regex match in bash, the results of the match are put into a special array called BASH_REMATCH. The second element of this array contains the match from the first capture group.
data="If show <start>THIS WORK<end> then it work"
regex="<start>(.*)<end>"
[[ $data =~ $regex ]] && found="${BASH_REMATCH[1]}"
echo $found
This can also be done using perl regex in grep (GNU specific):
found=$(grep -Po '(?<=<start>).*(?=<end>)' <<< "If show <start>THIS WORK<end> then it work")
echo "$found"
If you have < start > and < end > in your string then this will work. Set the FS to < and >.
[jaypal:~/Temp] FOUND=$(echo "If show <start>THIS WORK<end> then it work" |
awk -v FS="[<>]" '{print $3}')
[jaypal:~/Temp] echo $FOUND
THIS WORK

Linux bash - break a files into 2-word terms

I have put together this one-liner that prints all the words in a file on different lines:
sed -e 's/[^a-zA-Z]/\n/g' test_input | grep -v "^$"
If test_input contains "My bike is fast and clean", the one-liner's output will be:
My
bike
is
fast
and
clean
What I would need now is a different version that prints all the 2-word terms in the text, like this (still with the Bash):
My bike
bike is
is fast
fast and
and clean
Would you know how to do it?
Pipe your word file to this script's standard input.
#! bash
last_word=""
while read word
do
if [ $last_word != "" ] ; then
echo $last_word $word
fi
last_word=$word
done
This also works:
paste <(head -n -1 test.dat) <(tail +2 test.dat)
use awk for this, no need anything else
$ echo "My bike is fast and clean" | awk '{for(i=1;i<NF;i++){printf "%s %s\n",$i,$(i+1) } }'
My bike
bike is
is fast
fast and
and clean
This probably requires GNU sed and there's probably a simpler way:
sed 's/[[:blank:]]*\<\(\w\+\)\>/\1 \1\n/g; s/[^ ]* \([^\n]*\)\n\([^ ]*\)/\1 \2\n/g; s/ \n//; s/\n[^ ]\+$//' inputfile
to your command add:
| awk '(PREV!="") {printf "%s %s\n", PREV, $1} {PREV=$1}'

Resources