I have the following:
file1.csv
"Id","clientName1","clientName2"
file2.csv
"Id","Name1","Name2"
I want to read file1 sequentially. For each record, I want to check if there is a matching Id in file2. There may be more than one match. For each match, I want to append Name1, Name2 to the end of the record of file1.csv
So, possible result, if a record has more than one match in file2:
"Id","clientName1","clientName2","Name1","Name2","Name1","Name2"
A regex solution by using join and GNU sed
join -t , -a 1 file[12].csv | sed -r '$!N;/^(.*,)(.*)\n\1/!P;s//\n\1\2,/;D'
assume that both file1.csv and file2.csv are sorted by id and without header
file1.csv
1,c11,c12
2,c21,c22
3,c31,c32
file2.csv
1,n11,n12
1,n21,n22
1,n31,n32
2,n41,n42
gives a result of
1,c11,c12,n11,n12,n21,n22,n31,n32
2,c21,c22,n41,n42
3,c31,c32
UPDATE
In case where file1.csv might contain duplicate ids and various field lengths, I would suggest to perform a pre-process to make sure file1.csv is clean before joining with file2.csv
awk -F, '{for(i=2;i<=NF;i++) print $1 FS $i}' file1.csv |\
sort -u |\
sed -r '$!N;/^(.*,)(.*)\n\1/!P;s//\n\1\2,/;D'
the first awk process splits all data into (id, name) pairs
sort -u sorts and uniques each pairs
the last sed process merge all pairs with same ids into a single row
input
1,c11,c12
1,c12,c14,c13
1,c15,c12
2,c21,c22
output
1,c11,c12,c13,c14,c15
2,c21,c22
I'm afraid bash may not be the efficient solution but the following bash script would work:
#!/bin/bash
declare -A id_hash
while read line; do
id=$(echo $line | cut -d ',' -f 1)
name=$(echo $line | cut -d ',' -f 2-)
if [ -z "${id_hash[$id]}" ]; then
id_hash[$id]=$name
else
id_hash[$id]=${id_hash[$id]},$name
fi
done < file1.csv
while read line; do
id=$(echo $line | cut -d ',' -f 1)
name=$(echo $line | cut -d ',' -f 2-)
if [ -z "${id_hash[$id]}" ]; then
id_hash[$id]=$name
else
id_hash[$id]=${id_hash[$id]},$name
fi
done < file2.csv
for id in ${!id_hash[#]}; do
echo $id,${id_hash[$id]}
done
Thanks to all but this has been completed. The code I wrote is below:
#!/bin/bash
echo
echo 'Merging files into one'
IFS=","
while read id lname fname dnaid status type program startdt enddt ref email dob age add1 add2 city postal phone1 phone2
do
var="$dnaid,$lname,$fname,$status,$type,$program,$startdt,$enddt,$ref,$email,$dob,$age,$add1,$add2,$city,$postal,$phone1,$phone2"
while read id2 cwlname cwfname
do
if [ $id == $id2 ]
then
var="$var,$cwlname,$cwfname"
fi
done < file2.csv
echo "$var" >> /root/scijoinedfile.csv
done < file1.csv
echo
echo "Merging completed"
In response to the OP's clarification in his/her comment, here is the revised version of the single awk command which does merge in case there was duplicated IDs either in file1 or file2 or in both and if with different number of fields. old version which it works for OP's current stated question
awk -F',' '{one=$1;$1="";a[one]=a[one]$0} END{for (i in a) print i""a[i]}' OFS=, file[12]
For the inputs:
file1
"Id1","clientN1","clientN2"
"Id2","Name3","Name4"
"Id3","client00","client01","client02"
"Id1","client1","client2","client3"
file2
"Id1","Name1","Name2"
"Id1","Name3","Name4"
"Id2","Name0","Name1"
"Id2","Name00","Name11","Name22"
The output is merged file1 and file2 on same IDs:
"Id1","clientN1","clientN2","client1","client2","client3","Name1","Name2","Name3","Name4"
"Id2","Name3","Name4","Name0","Name1","Name00","Name11","Name22"
"Id3","client00","client01","client02"
I need to read a file that has lines like
user=username1
pass=password1
How can I read multiple lines like this into separate variables like username and password?
Would I use awk or grep? I have found ways to read lines into variables with grep but would I need to read the file for each individual item?
The end result is to use these variables to access a database via the command line. So I need to be able to read, store and use these values in other commands.
if the process which generates the file is safe and has shell syntax just source the file.
. ./file
Otherwise the file can be processes before to add quotes
perl -ne 'if (/^([A-Za-z_]\w*)=(.*)/) {$k=$1;$v=$2;$v=~s/\x27/\x27\\\x27\x27/g;print "$k=\x27$v\x27\n";}' <file >file2
. ./file2
If you want to use awk then
Input
$ cat file
user=username1
pass=password1
Reading
$ user=$(awk -F= '$1=="user"{print $2;exit}' file)
$ pass=$(awk -F= '$1=="pass"{print $2;exit}' file)
Output
$ echo $user
username1
$ echo $pass
password1
You could use a loop for your file perhaps, but this is probably the functionality you're looking for.
$ echo 'user=username1' | awk -F= '{print $2}'
username1
Using the -F flag sets the delimiter to = and we select the 2nd item from the row.
file.txt:
user=username1
pass=password1
user=username2
pass=password2
user=username3
pass=password3
Do to avoid browsing several times the file file.txt:
#!/usr/bin/env bash
func () {
echo "user:$1 pass:$2"
}
i=0
while IFS='' read -r line; do
if [ $i -eq 0 ]; then
i=1
user=$(echo ${line} | cut -f2 -d'=')
else
i=0
pass=$(echo ${line} | cut -f2 -d'=')
func "$user" "$pass"
fi
done < file.txt
Output:
user:username1 pass:password1
user:username2 pass:password2
user:username3 pass:password3
I have a problem creating a script that reads specific value from all the files of an entire folder
I have a number of email files in a directory and I need to extract from each file, 2 specific values.
After that I have to put them into a new file that looks like that:
--------------
To: value1
value2
--------------
This is what I want to do, but I don't know how to create the script:
# I am putting the name of the files into a temp file
`ls -l | awk '{print $9 }' >tmpfile`
# use for the name of a file
`date=`date +"%T"
# The first specific value from file (phone number)
var1=`cat tmpfile | grep "To: 0" | awk '{print $2 }' | cut -b -10 `
# The second specific value from file(subject)
var2=cat file | grep Subject | awk '{print $2$3$4$5$6$7$8$9$10 }'
# Put the first value in a new file on the first row
echo "To: 4"$var1"" > sms-$date
# Put the second value in the same file on the second row
echo ""$var2"" >>sms-$date
.......
and do the same for every file in the directory
I tried using while and for functions but I couldn't finalize the script
Thank You
I've made a few changes to your script, hopefully they will be useful to you:
#!/bin/bash
for file in *; do
var1=$(awk '/To: 0/ {print substr($2,0,10)}' "$file")
var2=$(awk '/Subject/ {for (i=2; i<=10; ++i) s=s$i; print s}' "$file")
outfile="sms-"$(date +"%T")
i=0
while [ -f "$outfile" ]; do outfile="sms-$date-"$((i++)); done
echo "To: 4$var1" > "$outfile"
echo "$var2" >> "$outfile"
done
The for loop just goes through every file in the folder that you run the script from.
I have added added an additional suffix $i to the end of the file name. If no file with the same date already exists, then the file will be created without the suffix. Otherwise the value of $i will keep increasing until there is no file with the same name.
I'm using $( ) rather than backticks, this is just a personal preference but it can be clearer in my opinion, especially when there are other quotes about.
There's not usually any need to pipe the output of grep to awk. You can do the search in awk using the / / syntax.
I have removed the cut -b -10 and replaced it with substr($2, 0, 10), which prints the first 10 characters from column 2.
It's not much shorter but I used a loop rather than the $2$3..., I think it looks a bit neater.
There's no need for all the extra " in the two output lines.
I sugest to try the following:
#!/bin/sh
RESULT_FILE=sms-`date +"%T"`
DIR=.
fgrep -l 'To: 0' "$DIR" | while read FILE; do
var1=`fgrep 'To: 0' "$FILE" | awk '{print $2 }' | cut -b -10`
var2=`fgrep 'Subject' "$FILE" | awk '{print $2$3$4$5$6$7$8$9$10 }'`
echo "To: 4$var1" >>"$RESULT_FIL"
echo "$var2" >>"$RESULT_FIL"
done
Here I made a small script that take input from user searching some pattern from a file and displays required no of lines from that file where the pattern is found. Although this code is searching the pattern line wise due to standard grep practice. I mean if the pattern occurs twice on the same line, i want the output to print twice. Hope I make some sense.
#!/bin/sh
cat /dev/null>copy.txt
echo "Please enter the sentence you want to search:"
read "inputVar"
echo "Please enter the name of the file in which you want to search:"
read "inputFileName"
echo "Please enter the number of lines you want to copy:"
read "inputLineNumber"
[[-z "$inputLineNumber"]] || inputLineNumber=20
cat /dev/null > copy.txt
for N in `grep -n $inputVar $inputFileName | cut -d ":" -f1`
do
LIMIT=`expr $N + $inputLineNumber`
sed -n $N,${LIMIT}p $inputFileName >> copy.txt
echo "-----------------------" >> copy.txt
done
cat copy.txt
As I understood, the task is to count number of pattern occurrences in line. It can be done like so:
count=$((`echo "$line" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
Suppose you have one file to read. Then, code will be following:
#!/bin/bash
file=$1
pattern="an."
#reading file line by line
cat -n $file | while read input
do
#storing line to $tmp
tmp=`echo $input | grep "$pattern"`
#counting occurrences count
count=$((`echo "$tmp" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
#printing $tmp line $count times
for i in `seq 1 $count`
do
echo $tmp
done
done
I checked this for pattern "an." and input:
I pass here an example of many 'an' letters
an
ananas
an-an-as
Output is:
$ ./test.sh input
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
3 ananas
4 an-an-as
4 an-an-as
Adapt this to your needs.
How about using awk?
Assume the pattern you are searching for is in variable $pattern and the file you are checking is $file
The
count=`awk 'BEGIN{n=0}{n+=split($0,a,"'$pattern'")-1}END {print n}' $file`
or for a line
count=`echo $line | awk '{n=split($0,a,"'$pattern'")-1;print n}`