Key Matching using shell - bash

I wanted to see different type of answers I receive from you guys for the below problem. I am curious to see below problem being solved completely through array or any other matching (if there is any).
Below is the problem. Keeping Name as the key we need to print their various phone numbers in a line.
$cat input.txt
Name1, Phone1
Name2, Phone2
Name3, Phone1
Name4, Phone5
Name1, Phone2
Name2, Phone1
Name4, Phone1
O/P:
$cat output.txt
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1
I solved the above problem but I wanted to see a solving technique perhaps one that is more effective than me. I am not an expert in shell still at a beginner level. My code below:
$cat keyMatchingfunction.sh
while read LINE; do
var1=(echo "$LINE"|awk -F\, '{ print $1 }')
matching_line=(grep "$var1" output.txt|wc -l)
if [[ $matching_line -eq 0 ]]; then
echo "$LINE" >> output.txt
else
echo $LINE is already present in output.txt
grep -q -n "$var1" output.txt
line_no=(grep -n "$var1" output.txt|cut -d: -f1)
keymatching=(echo "$LINE"|awk -F\, '{ print $2 }')
sed -i "$line_no s/$/,$keymatching/" output.txt
fi
done

Try this:
awk -F', ' '{a[$1]=a[$1]","$2}END{for(i in a) print i a[i]}' input.txt
Output:
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1

With bash and sort:
#!/bin/bash
declare -A array # define associative array
# read file input.txt to array
while IFS=", " read -r line number; do
array["$line"]+=",$number"
done < input.txt
# print array
for i in "${!array[#]}"; do
echo "$i${array[$i]}"
done | sort
Output:
Name1,Phone1,Phone2
Name2,Phone2,Phone1
Name3,Phone1
Name4,Phone5,Phone1

Related

Adding data to line in CSV if value exists in external file

Here is my sample data:
1,32425,New Zealand,number,21004
1,32425,New Zealand,number,20522
1,32434,Australia,number,1542
1,32434,Australia,number,986
1,32434,Fiji,number,1
Here is my expected output:
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
Basically I am trying to append the Yes/No based on if field 3 is contained in an external file. Here is what I have currently but as I understand it grep is eating all the stdin in the while loop. So I am only getting No added to the end of each line as the first value is not contained in the external file.
while IFS=, read -r type id country number volume
do
if grep $country externalfile.csv
then
echo "${country}"
sed 's/$/,Yes/' >> file2.csv
else
echo "${country}"
sed 's/$/,No/' >> file2.csv
fi
done < file1.csv
I added the echo "${country}" as I was trying to troubleshoot and that's how I discovered it was only parsing the first line.
Assuming there are no headers -
awk -F, 'NR==FNR{lookup[$1]=$1; next;}
{ if ( lookup[$3] == $3 ) { print $0 ",Yes" } else { print $0 ",No" } }
' externalfile.csv file2.csv
This will parse both files in one pass.
If you just prefer to do it in pure bash,
declare -A lookup
while read c; do lookup["$c"]="$c"; done < externalfile.csv
declare -p lookup # this is just to show you what my example loaded
declare -A lookup='([USA]="USA" [Fiji]="Fiji" )'
while IFS=, read a b c d; do
[[ -n "${lookup[$c]}" ]] && echo "$a,$b,$c,$d,Yes" || echo "$a,$b,$c,$d,No"
done < file2.csv
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
No grep needed.
awk -F, -v OFS=, 'NR == FNR { ++a[$1]; next } { $(++NF) = $3 in a ? "Yes" : "No" } 1' externalfile.csv file2.csv
Try this:
while read -r line
do
country=`echo $line | cut -d',' -f3`
if grep "$country" externalfile.csv
then
echo "$line,Yes" >> file2.csv
else
echo "$line,No" >> file2.csv
fi
done < test.txt
You need to put $country inside the ", because some country could contains more than 1 word. For example New Zealand. You can also set country variable easier using cut command.

Comparing Columns in Different CSV then Print Non-Matches

I'm a scripting newbie and am looking for help in building a BASH script to compare different columns in different CSV documents and then print the non-matches. I've included an example below.
File 1
Employee ID Number,Last Name,First Name,Preferred Name,Email Address
File 2
Employee Name,Email Address
I am wanting to compare the email address column in both files. If File 1 does not contain an email address found in File 2, I want to output to a new file.
Thanks in advance!
This is how I would do it:
#!/bin/bash
#
>output.txt
# Read file2.txt, line per line...
cat file2.txt | while read line2
do
# Extract the email from the line
email2=$(echo $line2 | cut -d',' -f2)
# Verify if the email is in file1
if [ $(grep -c $email2 file1.txt) -eq 0 ]
then
# It is not, so output the line from file2 to the output file
echo $line2 >>output.txt
fi
done
Email addresses are case insensitive which should be a consideration when comparing them. This version uses a little awk to handle the case bit and with selecting the last field in each line ($NF):
#!/bin/bash
our_addresses=( $(awk -F, '{print tolower($NF)}' file1) )
while read -r line; do
this_address=$(awk -F, '{print tolower($NF)}' <<< "$line")
if [[ ! " ${our_addresses[#]} " =~ " $this_address " ]]; then
echo "$line"
fi
done < file2

Merge two csv files if Id columns match

I have the following:
file1.csv
"Id","clientName1","clientName2"
file2.csv
"Id","Name1","Name2"
I want to read file1 sequentially. For each record, I want to check if there is a matching Id in file2. There may be more than one match. For each match, I want to append Name1, Name2 to the end of the record of file1.csv
So, possible result, if a record has more than one match in file2:
"Id","clientName1","clientName2","Name1","Name2","Name1","Name2"
A regex solution by using join and GNU sed
join -t , -a 1 file[12].csv | sed -r '$!N;/^(.*,)(.*)\n\1/!P;s//\n\1\2,/;D'
assume that both file1.csv and file2.csv are sorted by id and without header
file1.csv
1,c11,c12
2,c21,c22
3,c31,c32
file2.csv
1,n11,n12
1,n21,n22
1,n31,n32
2,n41,n42
gives a result of
1,c11,c12,n11,n12,n21,n22,n31,n32
2,c21,c22,n41,n42
3,c31,c32
UPDATE
In case where file1.csv might contain duplicate ids and various field lengths, I would suggest to perform a pre-process to make sure file1.csv is clean before joining with file2.csv
awk -F, '{for(i=2;i<=NF;i++) print $1 FS $i}' file1.csv |\
sort -u |\
sed -r '$!N;/^(.*,)(.*)\n\1/!P;s//\n\1\2,/;D'
the first awk process splits all data into (id, name) pairs
sort -u sorts and uniques each pairs
the last sed process merge all pairs with same ids into a single row
input
1,c11,c12
1,c12,c14,c13
1,c15,c12
2,c21,c22
output
1,c11,c12,c13,c14,c15
2,c21,c22
I'm afraid bash may not be the efficient solution but the following bash script would work:
#!/bin/bash
declare -A id_hash
while read line; do
id=$(echo $line | cut -d ',' -f 1)
name=$(echo $line | cut -d ',' -f 2-)
if [ -z "${id_hash[$id]}" ]; then
id_hash[$id]=$name
else
id_hash[$id]=${id_hash[$id]},$name
fi
done < file1.csv
while read line; do
id=$(echo $line | cut -d ',' -f 1)
name=$(echo $line | cut -d ',' -f 2-)
if [ -z "${id_hash[$id]}" ]; then
id_hash[$id]=$name
else
id_hash[$id]=${id_hash[$id]},$name
fi
done < file2.csv
for id in ${!id_hash[#]}; do
echo $id,${id_hash[$id]}
done
Thanks to all but this has been completed. The code I wrote is below:
#!/bin/bash
echo
echo 'Merging files into one'
IFS=","
while read id lname fname dnaid status type program startdt enddt ref email dob age add1 add2 city postal phone1 phone2
do
var="$dnaid,$lname,$fname,$status,$type,$program,$startdt,$enddt,$ref,$email,$dob,$age,$add1,$add2,$city,$postal,$phone1,$phone2"
while read id2 cwlname cwfname
do
if [ $id == $id2 ]
then
var="$var,$cwlname,$cwfname"
fi
done < file2.csv
echo "$var" >> /root/scijoinedfile.csv
done < file1.csv
echo
echo "Merging completed"
In response to the OP's clarification in his/her comment, here is the revised version of the single awk command which does merge in case there was duplicated IDs either in file1 or file2 or in both and if with different number of fields. old version which it works for OP's current stated question
awk -F',' '{one=$1;$1="";a[one]=a[one]$0} END{for (i in a) print i""a[i]}' OFS=, file[12]
For the inputs:
file1
"Id1","clientN1","clientN2"
"Id2","Name3","Name4"
"Id3","client00","client01","client02"
"Id1","client1","client2","client3"
file2
"Id1","Name1","Name2"
"Id1","Name3","Name4"
"Id2","Name0","Name1"
"Id2","Name00","Name11","Name22"
The output is merged file1 and file2 on same IDs:
"Id1","clientN1","clientN2","client1","client2","client3","Name1","Name2","Name3","Name4"
"Id2","Name3","Name4","Name0","Name1","Name00","Name11","Name22"
"Id3","client00","client01","client02"

How to pass filename through variable to be read it by awk

Good day,
I was wondering how to pass the filename to awk as variable, in order to awk read it.
So far I have done:
echo file1 > Aenumerar
echo file2 >> Aenumerar
echo file3 >> Aenumerar
AE=`grep -c '' Aenumerar`
r=1
while [ $r -le $AE ]; do
lista=`awk "NR==$r {print $0}" Aenumerar`
AEList=`grep -c '' $lista`
s=1
while [ $s -le $AEList ]; do
word=`awk -v var=$s 'NR==var {print $1}' $lista`
echo $word
let "s = s + 1"
done
let "r = r + 1"
done
Thanks so much in advance for any clue or other simple way to do it with bash command line
Instead of:
awk "NR==$r {print $0}" Aenumerar
You need to use:
awk -v r="$r" 'NR==r' Aenumerar
Judging by what you've posted, you don't actually need all the NR stuff; you can replace your whole script with this:
while IFS= read -r lista ; do
awk '{print $1}' "$lista"
done < Aenumerar
(This will print the first field of each line in each of file1, file2, file3. I think that's what you're trying to do?)

Setting variables in shell script by running commands

>cat /tmp/list1
john
jack
>cat /tmp/list2
smith
taylor
It is guaranteed that list1 and list2 will have equal number of lines.
f(){
i=1
while read line
do
var1 = `sed -n '$ip' /tmp/list1`
var2 = `sed -n '$ip' /tmp/list2`
echo $i,$var1,$var2
i=`expr $i+1`
echo $i,$var1,$var2
done < $INFILE
}
So output of f() should be:
1,john,smith
2,jack,taylor
But getting
1,p,p
1+1,p,p
If i replace following:
var1 = `sed -n '$ip' /tmp/list1`
var2 = `sed -n '$ip' /tmp/list2`
with this:
var1=`head -$i /tmp/vip_list|tail -1`
var2=`head -$i /tmp/lb_list|tail -1`
Then output:
1,john,smith
1,john,smith
If you can use paste and awk command, you can achieve the same with a one-liner:
paste -d, /tmp/list1 /tmp/list2 | awk '{print NR "," $0}'
Replace the while script with this line :)
the $ip is the problem there making ip the name of the variable, you should use ${i}p instead letting the shell know that the variable is i not ip, your code should look like
var1=`sed -n "${i}p" /tmp/list1`
var2=`sed -n "${i}p" /tmp/list2`

Resources