Replace number in one file with the number in other file - bash

I have a problem. I have two files (file1 and file2). Both files contain number (with different values) which characterize same variable from different estimations. In file1 this number1 is for example in row beginning with name var1 in field $3, in file2 this number2 is in row beginning with name var2 and is in field $2. I want take number1 from file1 and replace number2 in file2 with it. I tried following script, but it is not working, in output nothing is changed compared to original file2:
#! /bin/bash
Var1=$(cat file1 | grep 'var1' | awk '{printf "%s", $3}' )
Var2=$(cat file2 | grep 'var2' | awk '{printf "%s", $2}' )
cat file2 | awk '{gsub(/'$Var2'/,'$Var1'); print}'
Thanks in advance!
Addition: For example, in file1 I have:
Tomato 2.154 3.789
Apple 1.458 3.578
Orange 2.487 4.045
In file2:
Banana 2.892
Apple 1.687
Mango 2.083
I want to change file2 so, that it would be:
Banana 2.892
Apple 3.578
Mango 2.083

Using this as file1:
var1 junk 101
var2 junk 102
var3 junk 103
And this as file2:
var1 201
var2 202
var3 203
This will extract field 3 from file1 where field 1 is var1:
awk '$1=="var1"{print $3}' file1
101
This will replace field 2 in file2 with x (101) where the first field is var2:
awk -v x=101 '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
And combining them, you get:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2
var1 201
var2 101
var3 203
Assuming you want to overwrite the first file, you can do a conditional mv that runs only when things worked:
awk -v x=$(awk '$1=="var1"{print $3}' file1) '$1=="var2"{$2=x}1' file2 > /tmp/a && mv /tmp/a file2

Related

Comparing the data from one file to another and print the output

I have three files of name - File1, File2 and File3. The data of the three files is shown below:
File1:
// Class of "A2" of type "ONE".
// Class of "A3" of type "ONE".
// Class of "D1" of type "TWO".
// Class of "D2" of type "TWO".
// Class of "D3" of type "FOUR".
// Class of "D6" of type "FIVE."
File2:
#CLASS_NAMES = ("one",
"two",
"three",);
#CLASS_LIST_NAMES = ("ONE.A1",
"ONE.A2",
"ONE.A3",
"TWO.D1",
"TWO.D2");
File3:
D3
D4
D5
I need to check in File1 Class "D3" is present in the File2 of #CLASS_LIST_NAMES or not.
If it is not present in File2 of #CLASS_LIST_NAMES then I need to check in File3 if D3 is present there or not.
If D3 is present in File3 then the output should be as PASS and if it not present in both File2 and File3 the output should be FAIL.
Similarly I need to check for all the Class list-(A2, A3, D1, D2....) from File1 if they are present in the File2 of #CLASS_LIST_NAMES or not and if they are not present in File2, I need to verify with File3 and print the output as PASS or FAIL.
I tried the below code:
#!/bin/bash
sed -n '/#CLASS_LIST_NAMES =/,/)/p' File2
I'm stuck at here, can anyone tell me what need to be done next.
Deisred_Output: As from File1 - D6 is not found in both File2 and File3 it should print as FAIL. The output should be like below:
Fail: D6 is not found
You can achieve this with grep and awk
Use GNU grep which supports -P option
awk 'NR==FNR{a[$0]; next} !($0 in a){print "Fail: "$0 " is not found"}' <(cat file3 <(grep -Po '(?<=\.)[^"]+' file2)) <(grep -Po '(?<=of ")\w+' file1)
If you want to extract the classnames present only in the #CLASS_LIST_NAMES statement use below one.
awk 'NR==FNR{a[$0]; next} !($0 in a){print "Fail: "$0 " is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) <(grep -Po '(?<=of ")\w+' file1)
If the no of spaces in the file1 are not consistent, you can process using awk
# expects the 4th column is the variable, input format shouldn't change
awk 'NR==FNR{a[$0]; next} {gsub("\"","",$4)} !($4 in a){print "Fail: "$4" is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) file1
# alternate way using FPAT if the position of actual field can change, but it occurs first between double quotes
awk 'NR==FNR{a[$0]; next} {gsub("\"","",$1)} !($1 in a){print "Fail: "$1" is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) FPAT="\"[^ \"]+" file1

print the full line of the file if a string matched from another file in unix shell scripting

File1 id.txt
101
102
103
File2 emp_details.txt
101 john USA
103 Alex USA
104 Nike UK
105 phil UK
if the id of a.txt match with the first column of emp_details.txt then out put with full line to a new file matched.txt.If not matched then out put with only id to a new file notmatched.txt
example:
matched.txt
101 john USA
103 Alex USA
unmatched.txt (assumed by the editor)
102
grep -f f1 f2 > matched
grep -vf <(awk '{print $1}' matched) f1 > not_matched
Explanation:
use file1 as pattern to search in file2 and store matched results in matched file
use matched file's column1 as pattern to search in file1 and store non-matches in not_matched file
-v means "invert the match" in grep
Output :
$ cat matched
101 john USA
103 Alex USA
$ cat not_matched
102
Using awk:
One-liner:
awk 'FNR==NR{ arr[$1]; next }($1 in arr){ print >"matched.txt"; delete arr[$1] }END{for(i in arr)print i >"unmatched.txt"}' file1 file2
Better Readable:
awk '
FNR==NR{
arr[$1];
next
}
($1 in arr){
print >"matched.txt";
delete arr[$1]
}
END{
for(i in arr)
print i >"unmatched.txt"
}
' file1 file2
Test Results:
$ cat file1
101
102
103
$ cat file2
101 john USA
103 Alex USA
104 Nike UK
105 phil UK
$ awk 'FNR==NR{arr[$1];next }($1 in arr){print >"matched.txt";delete arr[$1]}END{for(i in arr)print i >"unmatched.txt"}' file1 file2
$ cat matched.txt
101 john USA
103 Alex USA
$ cat unmatched.txt
102
Usually we expect you to explain what you have tried and where you are stuck. We usually don't provide complete answers on this site. As it's just a few lines lines, I hacked up a not very efficient version. Simply loop over the id file and use egrep to find the matched and unmatched lines.
#!/bin/bash
while read p; do
egrep "^$p" emp_details.txt >> matched.txt
done <id.txt
while read p; do
if ! egrep -q "^$p" emp_details.txt; then
echo $p >> unmatched.txt;
fi
done <id.txt
It's an another thought compared to #Akshay Hegde's answer. Set the map for $1 and $0 in emp_details.txt into array a.
awk 'NR==FNR{a[$1]=$0;next} {if($1 in a){print a[$1]>>"matched.txt"}else{print $1 >> "unmatched.txt"}}' emp_details.txt id.txt

How to print columns one after the other in bash?

Is there any better methods to print two or more columns into one column, for example
input.file
AAA 111
BBB 222
CCC 333
output:
AAA
BBB
CCC
111
222
333
I can only think of:
cut -f1 input.file >output.file;cut -f2 input.file >>output.file
But it's not good if there are many columns, or when I want to pipe the output to other commands like sort.
Any other suggestions? Thank you very much!
With awk
awk '{if(maxc<NF)maxc=NF;
for(i=1;i<=NF;i++){(a[i]!=""?a[i]=a[i]RS$i:a[i]=$i)}
}
END{
for(i=1;i<=maxc;i++)print a[i]
}' input.file
You can use a GNU awk array of arrays to store all the data and print it later on.
If the number of columns is constant, this works for any amount of columns:
gawk '{for (i=1; i<=NF; i++) # loop over columns
data[i][NR]=$i # store in data[column][line]
}
END {for (i=1;i<=NR;i++) # loop over lines
for (j=1;j<=NF;j++) # loop over columns
print data[i][j] # print the given field
}' file
Note NR stands for number of records (that is, number of lines here) and NF stands for number of fields (that is, the number of fields in a given line).
If the number of columns changes over rows, then we should use yet another array, in this case to store the number of columns for each row. But in the question I don't see a request for this, so I am leaving it for now.
See a sample with three columns:
$ cat a
AAA 111 123
BBB 222 234
CCC 333 345
$ gawk '{for (i=1; i<=NF; i++) data[i][NR]=$i} END {for (i=1;i<=NR;i++) for (j=1;j<=NF;j++) print data[i][j]}' a
AAA
BBB
CCC
111
222
333
123
234
345
If the number of columns is not constant, using an array to store the number of columns for each row helps to keep track of it:
$ cat sc.wk
{for (i=1; i<=NF; i++)
data[i][NR]=$i
columns[NR]=NF
}
END {for (i=1;i<=NR;i++)
for (j=1;j<=NF;j++)
print (i<=columns[j] ? data[i][j] : "-")
}
$ cat a
AAA 111 123
BBB 222
CCC 333 345
$ awk -f sc.wk a
AAA
BBB
CCC
111
222
333
123
-
345
awk '{print $1;list[i++]=$2}END{for(j=0;j<i;j++){print list[j];}}' input.file
Output
AAA
BBB
CCC
111
222
333
More simple solution would be
awk -v RS="[[:blank:]\t\n]+" '1' input.file
Expects tab as delimiter:
$ cat <(cut -f 1 asd) <(cut -f 2 asd)
AAA
BBB
CCC
111
222
333
Since the order is of no importance:
$ awk 'BEGIN {RS="[ \t\n]+"} 1' file
AAA
111
BBB
222
CCC
333
Ugly, but it works-
for i in {1..2} ; do awk -v p="$i" '{print $p}' input.file ; done
Change the {1..2} to {1..n} where 'n' is the number of columns in the input file
Explanation-
We're defining a variable p which itself is the variable i. i varies from 1 to n and at each step we print the 'i'th column of the file.
This will work for an arbitrary number fo space separated colums
awk '{for (A=1;A<=NF;A++) printf("%s\n",$A);}' input.file | sort -u > output.file
If space is not the separateor ... let's suppose ":" is the separator
awk -F: '{for (A=1;A<=NF;A++) printf("%s\n",$A);}' input.file | sort -u > output.file

comparing 2 files and extracting elements from file

I have two files. one has list of names (only one column) and the second file is with three columns with names, phone number, country.
What I want is to extract the data of the people whose names are not present in file 1, but only present in file2.
#!/bin/bash
for i in `cat file1 `
do
cat file2 | awk '{ if ($1 != "'$i'") {print $1 "\t" $2 "\t" $3 }}'>>NonResp
done
What I get is a weird result with more data than expected.
Kindly help.
You can do this with grep:
grep -v -F -f file1 file2
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
The files must already be sorted for this to work properly.
Explanation:
=> awk '{print $1}' file2 |
print only the first fileld of file2 and feed it to the next command (|)
=> comm -1 -3 file1 - |
compare file1 and the output of the last command (-) and suppress lines only in file1 (-1) as well as lines in both files (-3); that leaves lines in file2 only and feed this to the next command (|)
=> join file2 -
join the original file2 and the output from the last command (-) and write out the fields fo the matching lines (whitespace between fields is truncated, however)
Testcase:
cat <<EOF >file1
alan
bert
cindy
dave
fred
sunny
ted
EOF
cat <<EOF >file2
bert 01 AU
cindy 03 CZ
ginny 05 CN
ted 07 CH
zorro 09 AG
EOF
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
assuming the field delimiter as "," in file2
awk -F, 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
if "," is not the delimiter ,then simply
awk 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
would be sufficient.

bash process data from two files

file1:
456
445
2323
file2:
433
456
323
I want get the deficit of the data in the two files and output to output.txt, that is:
23
-11
2000
How do I realize this? thank you.
$ paste file1 file2 | awk '{ print $1 - $2 }'
23
-11
2000
Use paste to create the formulae, and use bc to perform the calculations:
paste -d - file1 file2 | bc
In pure bash, with no external tools:
while read -u 4 line1 && read -u 5 line2; do
printf '%s\n' "$(( line1 - line2 ))"
done 4<file1 5<file2
This works by opening both files (attaching them to file descriptors 4 and 5); going into a loop in which we read one line from each descriptor per iteration (exiting the loop if either has no value), and calculate and print the result.
You could use paste and awk to operate between columns:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}'
Or even pipe to a file:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}' > output.txt
Hope it helps!

Resources