Linux: Appending values into files, to the end of particular lines, and at the bottom of the file if there is on "key" - bash

I have one file, file1, that has values like so:
key1|value1|
key2|value2|
key3|value3|
I have another file, file2, that has key based values I would like to add to add to file1:
key2 value4
key3 value5
key4 value6
I would like to add values to file1 to lines where the "key" matches, and if there is no "key" in file1, simply adding the new key & value to the bottom:
key1|value1|
key2|value2|value4|
key3|value3|value5|
key4|value6|
It seems like this is something that could be done with 2 calls to awk, but I am not familiar enough with it. I'm also open to using bash or shell commands.
UPDATE
I found this to work
awk 'NR==FNR {a[$1]=$2; next} {print $1,$2,a[$1];delete a[$1]}END{for(k in a) print k,a[k]}' file2 file1

The only deviation from the desired output is that keys from file1 that are not in file2 are not known AOT, so they are printed at the end to keep things semi-online:
awk -v first=data1.txt -f script.awk data2.txt
BEGIN {
OLD=FS
FS="|"
while (getline < first)
table[$1] = $0
OFS=FS
FS=OLD
}
!($1 in table) {
queue[$1] = $0
}
$1 in table {
id=$1
gsub(FS, OFS)
sub(/[^|]*\|/, "")
print table[id] $0 OFS
delete table[id]
}
END {
for (id in table)
print table[id]
for (id in queue) {
gsub(FS, OFS, queue[id])
print queue[id] OFS
}
}
key2|value2|value4|
key3|value3|value5|
key1|value1|
key4|value6|

this is the LOL answer ... ha ha . I basically loop over both keeping track of them and sort ... silly'ish , probably not even something you would want to use bash for perhaps ..
declare -a checked
checked=()
file="/tmp/file.txt"
> "${file}"
while IFS= read -r line1 ;do
key1=$(echo $line1 | cut -d'|' -f1)
if ! grep -qi ${key1} "/tmp/file2.txt" ; then
echo "$line1" >> "${file}"
continue
fi
while IFS= read -r line2 ;do
key2=$(echo $line2 | cut -d' ' -f1)
if ! grep -qi ${key2} "/tmp/file1.txt" ; then
if ! [[ "${checked[#]}" =~ $key2 ]] ;then
echo "$(echo $line2| awk '{print $1"|"$2}')|" >> "${file}"
checked+=(${key2})
continue
fi
fi
if [[ "$key2" == "$key1" ]] ;then
echo "${line1}$(echo $line2 | cut -d' ' -f2-)|" >> "${file}"
continue
fi
done < "/tmp/file2.txt"
done < "/tmp/file1.txt"
sort -k2 -n ${file}
[[ -f "${file}" ]] && rm -f "${file}"
Output:
key1|value1|
key2|value2|value4|
key3|value3|value5|
key4|value6|

Related

Adding data to line in CSV if value exists in external file

Here is my sample data:
1,32425,New Zealand,number,21004
1,32425,New Zealand,number,20522
1,32434,Australia,number,1542
1,32434,Australia,number,986
1,32434,Fiji,number,1
Here is my expected output:
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
Basically I am trying to append the Yes/No based on if field 3 is contained in an external file. Here is what I have currently but as I understand it grep is eating all the stdin in the while loop. So I am only getting No added to the end of each line as the first value is not contained in the external file.
while IFS=, read -r type id country number volume
do
if grep $country externalfile.csv
then
echo "${country}"
sed 's/$/,Yes/' >> file2.csv
else
echo "${country}"
sed 's/$/,No/' >> file2.csv
fi
done < file1.csv
I added the echo "${country}" as I was trying to troubleshoot and that's how I discovered it was only parsing the first line.
Assuming there are no headers -
awk -F, 'NR==FNR{lookup[$1]=$1; next;}
{ if ( lookup[$3] == $3 ) { print $0 ",Yes" } else { print $0 ",No" } }
' externalfile.csv file2.csv
This will parse both files in one pass.
If you just prefer to do it in pure bash,
declare -A lookup
while read c; do lookup["$c"]="$c"; done < externalfile.csv
declare -p lookup # this is just to show you what my example loaded
declare -A lookup='([USA]="USA" [Fiji]="Fiji" )'
while IFS=, read a b c d; do
[[ -n "${lookup[$c]}" ]] && echo "$a,$b,$c,$d,Yes" || echo "$a,$b,$c,$d,No"
done < file2.csv
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes
No grep needed.
awk -F, -v OFS=, 'NR == FNR { ++a[$1]; next } { $(++NF) = $3 in a ? "Yes" : "No" } 1' externalfile.csv file2.csv
Try this:
while read -r line
do
country=`echo $line | cut -d',' -f3`
if grep "$country" externalfile.csv
then
echo "$line,Yes" >> file2.csv
else
echo "$line,No" >> file2.csv
fi
done < test.txt
You need to put $country inside the ", because some country could contains more than 1 word. For example New Zealand. You can also set country variable easier using cut command.

Shell awk - Print a position from variable

Here is my string that needs to be parsed.
line='aaa vvv ccc'
I need to print the values one by one.
no_of_users=$(echo $line| wc -w)
If the no_of_users is greater than 1 then I need to print the values one by one.
aaa
vvv
ccc
I used this script.
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
-- here is my issue ##echo 'user:'$n $line|awk -F ' ' -vno="${n}" 'BEGIN { print no }'
done
fi
In the { print no } I have to print the value in that position.
You may use this awk:
awk 'NF>1 {OFS="\n"; $1=$1} 1' <<< "$line"
aaa
vvv
ccc
What it does:
NF>1: If number of fields are greater than 1
OFS="\n": Set output field separator to \n
$1=$1: Force restructure of a record
1: Print a record
1st solution: Within single awk could you please try following. Where var is an awk variable which has shell variable line value in it.
awk -v var="$line" '
BEGIN{
num=split(var,arr," ")
if(num>1){
for(i=1;i<=num;i++){ print arr[i] }
}
}'
Explanation: Adding detailed explanation for above.
awk -v var="$line" ' ##Starting awk program and creating var variable which has line shell variable value in it.
BEGIN{ ##Starting BEGIN section of program from here.
num=split(var,arr," ") ##Splitting var into array arr here. Saving its total length into variable num to check it later.
if(num>1){ ##Checking condition if num is greater than 1 then do following.
for(i=1;i<=num;i++){ print arr[i] } ##Running for loop from i=1 to till value of num here and printing arr value with index i here.
}
}'
2nd solution: Adding one more solution tested and written in GNU awk.
echo "$line" | awk -v RS= -v OFS="\n" 'NF>1{$1=$1;print}'
Another option:
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
echo 'user:'$n $(echo $line|awk -F ' ' -v x=$n '{printf $x }')
done
fi
You can use grep
echo $line | grep -o '[a-z][a-z]*'
Also with awk:
awk '{print $1, $2, $3}' OFS='\n' <<< "$line"
aaa
vvv
ccc
the key is setting OFS='\n'
Or a really toughie:
printf "%s\n" $line
(note: $line is unquoted)
printf will consume all words in line with word-splitting applied so each word is taken as a single input.
Example Use/Output
$ line='aaa vvv ccc'; printf "%s\n" $line
aaa
vvv
ccc
Using bash:
$ line='aaa vvv'ccc'
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
aaa
vvv
ccc
$ line=aaa
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
$
If you are on another shell:
$ line="foo bar baz" bash -c '[[ $line =~ \ ]] && echo -e ${line// /\\n}'
grep -Eq '[[:space:]]' <<< "$line" && xargs printf "%s\n" <<< $line
Do a silent grep for a space in the variable, if true, print with names on separate lines.
awk -v OFS='\n' 'NF>1{$1=$1; print}'
e.g.
$ line='aaa vvv ccc'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
aaa
vvv
ccc
$ line='aaa'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
$
another golfed awk variation
$ awk 'gsub(FS,RS)'
only print if there is a substitution.

To extract records from file based on the latest date and store in a new file

I have a file with different balances for a particular account. And every day these balance changes. What I want to do is extract balance record of a particular account based on the latest date.
I am following an approach where I am adding the date into the first column of records manually using an awk script but taking date from the file name since I do not get date in my records. Next I want to sort the records based on the Account number and want to extract the record with latest date in another file.
Can anybody help me with this?
Till now I have written this code and I am unable to sort and extract the data into the other file
#!/usr/bin/ksh
f=mainfile_20151201.dat
s=`echo $f | cut -c 16-23`
echo "$f -> $s"
awk -F "~" 'BEGIN { OFS = "~"; ORS = "\n" ; date='$s' ; IFS = "~"} { $1=date"~"$1 ; print }' mainfile_20151201.dat > tempdate
awk -F "~" 'BEGIN { OFS = "~"; ORS = "\n" ; IFS = "~"} { $1 ; print }' tempdate > newfile
Sample data:
AccountNumber~~0~149038.40000000~149038.4~0.00000000~0.00000000~0.00000000
Please note that the data in the 4th field changes everyday
If your main file is always going to be in the format "something_date" this should do the job ok.
#!/bin/bash
f=mainfile_20151201.dat
s=`echo $f | cut -d"_" -f2`
echo "$f -> $s"
awk -F "~" 'BEGIN { OFS = "~"; ORS = "\n" ; date='$s' ; IFS = "~"} { $1=date"~"$1 ; print }' mainfile_20151201.dat > tempdate
awk -F "~" 'BEGIN { OFS = "~"; ORS = "\n" ; IFS = "~"} { $1 ; print }' tempdate > newfile
rm tempdate
sort -u -t~ -k 2 <newfile >newfile.s
#sort by unique arange by field 2 then field 1 (default action)
d=$(cat newfile.s | head -1 | cut -d"~" -f1) #get first date
a=$(cat newfile.s | head -1 | cut -d"~" -f2) #get first account number
while read line; do
d2=`echo "$line" | cut -d"~" -f1` #get date from line
a2=`echo "$line" | cut -d"~" -f2` #get account from line
if [[ $a2 == $a ]] && [[ $d2 > $d ]] || [[ $d2 == $d ]];then #if acount are same but date is 'bigger' OR the same
sed -i '$ d' output.txt #remove last line of file
echo "$line" >> output.txt #append to file
a="$a2" #set new account for later
d="$d2" #set new date for later
else
a="$a2"
d="$d2"
echo "$line" >>output.txt
fi
done <newfile.s #while input declaration
Note: This worked with the sample from your comment but will definitely need to be tweeked for your needs. At any rate, it should be enough to get you going. Hope this helps!

Best way to merge two lines with same pattern

I have a text file like below
Input:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
I am wondering the best way to merge two lines into:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0
With this as the input file:
$ cat file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
We can get the output you want with:
$ awk -F, -v OFS=, 'NR==1{first=$0;next;} {print first,$6,$7;}' file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0
This is a more general solution that reads both files, item by item, where items are separated by comma. After the first mismatch, remaining items from the first line are appended to the output, followed by remaining items from the second line.
The most complicated tool this uses is sed. Looking at it again, even sed can be replaced.
#!/bin/bash
inFile="$1"
tmp=$(mktemp -d)
sed -n '1p' <"$inFile" | tr "," "\n" > "$tmp/in1"
sed -n '2p' <"$inFile" | tr "," "\n" > "$tmp/in2"
{ while true; do
read -r f1 <&3; r1=$?
read -r f2 <&4; r2=$?
[ $r1 -ne 0 ] || [ $r2 -ne 0 ] && break
[ $r1 -ne 0 ] && echo "$f2"
[ $r2 -ne 0 ] && echo "$f1"
if [ "$f1" == "$f2" ]; then
echo "$f1"
else
while echo "$f1"; do
read -r f1 <&3 || break
done
while echo "$f2"; do
read -r f2 <&4 || break
done
fi
done; } 3<"$tmp/in1" 4<"$tmp/in2" | tr '\n' ',' | sed 's/.$/\n/'
rm -rf "$tmp"
Assuming your input file looks like this:
$ cat in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
You can then run the script as:
$ ./merge.sh in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0

how to map one csv file content to second csv file and write it another csv using unix

After writing some unix scripts I am able to manage to get data from different xml files to csv format and now I got stuck with the following problem
file1.csv : contains
1,5,6,7,8
2,3,4,5,9
1,6,10,11,12
1,5,11,12
file2.csv : contains
1,Mango,Tuna,Webby,Through,Franky,Sam,Sumo
2,Franky
3,Sam
4,Sumo
5,Mango,Tuna,Webby
6,Tuna,Webby,Through
7,Through,Sam,Sumo
8,Nothing
9,Sam,Sumo
10,Sumo,Mango,Tuna
11,Mango,Tuna,Webby,Through
12,Mango,Tuna,Webby,Through,Franky
output I want is
1,5,6,7,8
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Tuna,Webby,Through
Through,Sam,Sumo
Nothing
Common word:None
2,3,4,5,9
Franky
Sam
Sumo
Mango,Tuna,Webby
Sam, Sumo
Common Word:None
1,6,10,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Tuna,Webby,Through
Sumo,Mango,Tuna
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Tuna
1,5,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Mango,Tuna,Webby
I apprecaite any help.
Thanks
I got some solution but not complete
##!/bin/bash
count=1
count_2=1
for i in `cat file1.csv`
do
echo $i > $count.txt
cat $count.txt | tr "," "\n" > $count_2.txt
count=`expr $count + 1`
count_2=`expr $count_2 + 1`
done;
#this code will create separte files for each line in file1.csv,
bash file3_search.sh
##########################
file3_search.sh
================
##!/bin/bash
cat file2.csv | sed '/^$/d' | sed 's/[ ]*$//' > trim.txt
dos2unix -q 1.txt 1.txt
dos2unix 2.txt 2.txt
dos2unix 3.txt 3.txt
echo "1st Combination results"
for i in `cat 1.txt`
do
cat trim.txt | egrep -w $i
done > Combination1.txt;
echo "2nd Combination results"
for i in `cat 2.txt`
do
cat trim.txt | egrep -w $i
done > Combination2.txt;
echo "3rd Combination results"
for i in `cat 3.txt`
do
cat trim.txt | egrep -w $i
done > Combination3.txt;
Guys I am not good at programming (I am software tester) please someone can re-factor my code and also please tell me how to get the common word in those Combination.txt file
IMHO it works:
for line in $(cat 1.csv) ; do
echo $line ;
grepline=`echo $line | sed 's/ \+//g;s/,/,|/g;s/^\(.*\)$/^(\1,)/'`;
egrep $grepline 2.csv
egrep $grepline 2.csv | \
awk -F "," '
{ for (i=2;i<=NF;i++)
{s[$i]+=1}
}
END { for (key in s)
{if (s[key]==NR) { tp+=key "," }
}
if (tp!="") {print "Common word(s): " gensub(/,$/,"","g",tp)}
else {print "Common word: None"}}'
echo
done
HTH
Here's an answer for you. It depends on associative array capabilities of bash version 4:
IFS=,
declare -a words
# read and store the words in file2
while read line; do
set -- $line
n=$1
shift
words[$n]="$*"
done < file2.csv
# read file1 and process
while read line; do
echo "$line"
set -- $line
indexes=( "$#" )
NF=${#indexes[#]}
declare -A common
for (( i=0; i<$NF; i++)); do
echo "${words[${indexes[$i]}]}"
set -- ${words[${indexes[$i]}]}
for word; do
common[$word]=$(( ${common[$word]} + 1))
done
done
printf "Common words: "
n=0
for word in "${!common[#]}"; do
if [[ ${common[$word]} -eq $NF ]]; then
printf "%s " $word
(( n++ ))
fi
done
[[ $n -eq 0 ]] && printf "None"
unset common
printf "\n\n"
done < file1.csv

Resources