How to grep a Line and update a particular column value - bash

I have to update a particular column value in a file for particular Unique IDs.
My file-name and sample contents are given below:
Names.txt
J017 0001 Amit 10th
J011 2341 Kuldeep 11th
J004 1254 Ramand 12th
I have to update the 4th column value to something . I tried the below logic but did not work
stu="";
for i in `echo "J017, J058 and J107. " |egrep -o '[jJ][0-9]{3}' `
do
stu="$stu|$i ";
awk -v I=$i '/$I/{$4="LEFT";print $0}' Names.txt >tmp
done
egrep -v `echo "$stu" | sed "s/^|//g" ` Names.txt >>tmp
mv tmp Names.txt
The above awk command did not give the result. Please help me to fix the error.

To answer your specific question about why this:
awk -v I=$i '/$I/{$4="LEFT";print $0}'
doesn't work, you don;t acces awk variables by prefixing them with a "$", just like you don't do that for C or most other languages (shell being an exception). This is how you would write the above to execute the way you are trying to get it to execute:
awk -v I=$i '$0 ~ I{$4="LEFT";print $0}'
Having said that, your shell script is completely the wrong way to do what you want. Try this instead (uses GNU awk for patsplit() but match()/substr() in other awks would work just as well):
$ cat tst.sh
awk -v ids="J017, J058 and J107. " '
BEGIN{
patsplit(ids,idsA,/[jJ][0-9]{3}/)
for (i=1;i in idsA;i++)
stu = stu (i==1?"^":"|") idsA[i]
stu = stu "$"
}
$1 ~ stu { $4 = "LEFT" }
{ print }
' "$#"
$ ./tst.sh file
J017 0001 Jagdeep LEFT
J011 2341 Kuldeep 11th
J004 1254 Ramand 12th

#!/bin/bash
FILE='Names.txt'
COLUMNS=(J017 J011 J004)
REPLACE='LEFT'
OUT=$(
IFS="|"
awk -v R="$REPLACE" -v E="${COLUMNS[*]}" '$1 ~ E{$4 = R;print $0}' "$FILE"
)
echo "$OUT" > "$FILE"
Run with:
bash script.sh
Input:
J017 0001 Jagdeep 10th
J011 2341 Kuldeep 11th
J004 1254 Ramand 12th
Result:
J017 0001 Jagdeep LEFT
J011 2341 Kuldeep LEFT
J004 1254 Ramand LEFT

Names.txt
J017 0001 Jagdeep 10th
J011 2341 Kuldeep 11th
J004 1254 Ramand 12th
awk
awk '/[jJ][0-9][0-9][0-9]/ {$4="LEFT"}1' Names.txt
J017 0001 Jagdeep LEFT
J011 2341 Kuldeep LEFT
J004 1254 Ramand LEFT

This might work for you:
echo "J017, J111 and J004. " |
grep -o "J[0-9]\{3\}" |
awk 'FNR==NR{key[$1];next};$1 in key{$4="LEFT"}1' - Names.txt

Related

Using bash to query a large tab delimited file

I have a list of names and IDs (50 entries)
cat input.txt
name ID
Mike 2000
Mike 20003
Mike 20002
And there is a huge zipped file (13GB)
zcat clients.gz
name ID comment
Mike 2000 foo
Mike 20002 bar
Josh 2000 cake
Josh 20002 _
My expected output is
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar
each $1"\t"$2 of clients.gz is a unique identifier. There might be some entries from input.txt that might be missing from clients.gz. Thus, I would like to add the NR column to my output to find out which are missing. I would like to use zgrep. awk takes a very long time (since I had to zcat for uncompress the zipped file I assume?)
I know that zgrep 'Mike\t2000' does not work. The NR issue I can fix with awk FNR I imagine.
So far I have:
awk -v q="'"
'
NR > 1 {
print "zcat clients.gz | zgrep -w $" q$0q
}' input.txt |
bash > subset.txt
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ key = $1 FS $2 }
NR == FNR { map[key] = (NR>1 ? NR-1 : "NR"); next }
key in map { print map[key], $0 }
$ zcat clients.gz | awk -f tst.awk input.txt -
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar
With GNU awk and bash:
awk 'BEGIN{FS=OFS="\t"}
# process input.txt
NR==FNR{
a[$1,$2]=$1 FS $2
line[$1,$2]=NR-1
next
}
# process <(zcat clients.gz)
{
$4=a[$1,$2]
if(FNR==1)
line[$1,$2]="NR"
if($4!="")
print line[$1,$2],$1,$2,$3
}' input.txt <(zcat clients.gz)
Output:
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar
As one line:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1,$2]=$1 FS $2; line[$1,$2]=NR-1; next} {$4=a[$1,$2]; if(FNR==1) line[$1,$2]="NR"; if($4!="")print line[$1,$2],$1,$2,$3}' input.txt <(zcat clients.gz)
See: Joining two files based on two key columns awk and 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
[EDIT]
I've misunderstood where the prepended line numbers come from. Corrected.
Would you try the following:
declare -A num # asscoiates each pattern to the line number
mapfile -t ary < <(tail -n +2 input.txt)
pat=$(IFS='|'; echo "${ary[*]}")
for ((i=0; i<${#ary[#]}; i++)); do num[${ary[i]}]=$((i+1)); done
printf "%s\t%s\t%s\t%s\n" "NR" "name" "ID" "comment"
zgrep -E -w "$pat" clients.gz | while IFS= read -r line; do
printf "%d\t%s\n" "${num[$(cut -f 1-2 <<<"$line")]}" "$line"
done
Output:
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar
The second line and third generate a search pattern as Mike 2000|Mike 20003|Mike 20002 from input.txt.
The line for ((i=0; i<${#ary[#]}; i++)); do .. creates a map from
the pattern to the number.
The expression "${num[$(cut -f 1-2 <<<"$line")]}" retrieves the line
number from the 1st and 2nd fields of the output.
If the performance is not still satisfactory, please consider ripgrep which is much faster than grep or zgrep.

Sort and join files in AIX

I have two sample files test1.txt and text2.txt. How to join them appropriately ?
$> cat test1.txt
1 USA
2 CANADA
3 MEXICO
4 BAHAMAS
5 CUBA
$> cat test2.txt
MEXICO Mexico-city
USA Washington
CANADA Ottawa
CUBA Havanna
BAHAMAS Nassau
$> join -j 2 -o '1.1,1.2,2.2' < (sort -k2 test1.txt) < (sort -k1 test2.txt)
ksh: 0403-057 Syntax error: `(' is not expected.
Expected output:
1 USA Washington
2 CANADA Ottawa
3 MEXICO Mexico-city
4 BAHAMAS Nassau
5 CUBA Havanna
The first solution is bad! When the test1.txt file is large, you will have too much overhead in calling grep and cut for each line.
while read -r line; do
key=${line#* }
printf "%s %s\n" "${line}" $(grep "${key}" test2.txt | cut -d" " -f2-)
done < test1.txt
You should get your join working or use awk:
awk 'FNR==NR { towns[$1]=$2; next;} $2 in towns {print $0 " " towns[$2];}' test2.txt test1.txt

Bind two files by column in bash

when i have two files such as file A
012
658
458
895
235
and file B
1
2
3
4
5
how could they be joined in bash? The output shoudl just be
1012
2658
3458
4895
5235
really I just want to bind by column such as in R (cbind).
Assuming columns are in equal length in both files, you can use paste command:
paste --delimiters='' fileB fileA
The default delimiter for paste command is TAB. So '' make sure no delimiter is in place.
Like this maybe:
paste -d'\0' B A
Or, if you like awk:
awk 'FNR==NR{A[FNR]=$0;next} {print $0,A[FNR]}' OFS='' A B
Using pure Bash and no external commands:
while read -u 3 A && read -u 4 B; do
echo "${B}${A}"
done 3< File_A.txt 4< File_B.txt
grep "run complete" *.err | awk -F: '{print $1}'|sort > a
ls ../bam/*bam | grep -v temp | awk -F[/_] '{print $3".err"}' | sort > b
diff <(grep "run complete" *.err | awk -F: '{print $1}'|sort) <(ls ../bam/*bam | grep -v temp | awk -F[/_] '{print $3".err"}' )
paste a b

Find difference in second field, report using first field (awk)

I have 2 (dummy) files
file1.txt
Tom 25
John 27
Bob 22
Justin 37
Nick 19
Max 42
file2.txt
Tom 25
John 40
Bob 22
Justin 37
Nick 19
Max 24
I want to compare the Second field of these files (the numbers). Then If they are different, report using the First field (Names). So the expected output would be the following.
John's age in file1.txt is different from file2.txt
Max's age in file1.txt is different from file2.txt
I don't know if my approach is good but I first parse the ages into another file and compare them. If they are different, I will look in which line number is the difference. Then I will go back to the original file and parse the Name of the person from THAT line.
I run the following code in shell.
$ awk '{print $2}' file1.txt > tmp1.txt
$ awk '{print $2}' file2.txt > tmp2.txt
$
$ different=$(diff tmp1.txt tmp2.txt | awk '{$1=""; print $0')
$
$ if ["${different}"]; then
$ #This is to get the line number where the ages are different
$ #so that I can go to THAT line in file1.txt and get the first field.
$ awk 'NR==FNR{a[$0];next}!($0 in a){print FNR}' tmp1.txt tmp2.txt > lineNumber.txt
$ fi
However, I am blocked here. I don't know if my approach is right or there's an easier approach.
Thanks a lot
awk 'NR==FNR{a[$1]=$2;next} $2!=a[$1]{print "Age of "$1" is different"}' file1 file2
awk '
NR==FNR{a[$1]=$2;next}
a[$1] != $2 {print $1"\047s age in "ARGV[1]" is different from "ARGV[2]}
' file1.txt file2.txt
If both files list the same names, something like this works:
join file{1,2}.txt | awk '$2 != $3 { print "Age of " $1 " is different" }'

bash process data from two files

file1:
456
445
2323
file2:
433
456
323
I want get the deficit of the data in the two files and output to output.txt, that is:
23
-11
2000
How do I realize this? thank you.
$ paste file1 file2 | awk '{ print $1 - $2 }'
23
-11
2000
Use paste to create the formulae, and use bc to perform the calculations:
paste -d - file1 file2 | bc
In pure bash, with no external tools:
while read -u 4 line1 && read -u 5 line2; do
printf '%s\n' "$(( line1 - line2 ))"
done 4<file1 5<file2
This works by opening both files (attaching them to file descriptors 4 and 5); going into a loop in which we read one line from each descriptor per iteration (exiting the loop if either has no value), and calculate and print the result.
You could use paste and awk to operate between columns:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}'
Or even pipe to a file:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}' > output.txt
Hope it helps!

Resources