Linux - loop through each element on each line - for-loop

I have a text file with the following information:
cat test.txt
a,e,c,d,e,f,g,h
d,A,e,f,g,h
I wish to iterate through each line and then for each line print the index of all the characters different from e. So the ideal output would be either with a tab seperator or comma seperator
1 3 4 6 7 8
1 2 4 5 6
or
1,3,4,6,7,8
1,2,4,5,6
I have managed to iterate through each line and print the index, but the results are printed to the same line and not seperated.
while read line;do echo "$line" | awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}' ;done<test.txt
With the result being
1 3 4 6 7 8 1 2 4 5 6
If I do it only using awk
awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}'
I get the same output.
Could anyone help me with this specific issue with seperating the lines?

If you don't mind some trailing whitespace, you can just do:
while read line;do echo "$line" | awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' ;done<test.txt
but it would be more typical to omit the while loop and do:
awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' <test.txt
You can avoid the trailing whitespace with the slightly cryptic:
awk -F, '{m=0; for(i=1;i<=NF;i++) if($i!="e") {printf "%c%d", m++ ? " " : "", i }; print ""}' <test.txt

Related

Finding difference of values between corresponding fields in two CSV files

I have been trying to find difference of values between corresponding fields in two CSV files
$ cat f1.csv
A,B,25,35,50
C,D,30,40,36
$
$ cat f2.csv
E,F,20,40,50
G,H,22,40,40
$
Desired output:
5 -5 0
8 0 -4
I could able to achieve it like this:
$ paste -d "," f1.csv f2.csv
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
$
$ paste -d "," f1.csv f2.csv | awk -F, '{print $3-$8 " " $4-$9 " " $5-$10 }'
5 -5 0
8 0 -4
$
Is there any better way to achieve it with awk alone without paste command?
As first step replace only paste with awk:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {print file1[FNR] FS $0}' f1.csv f2.csv
Output:
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
Then split file1[FNR] FS $0 to an array with , as field separator:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {split(file1[FNR] FS $0, arr, FS); print arr[3]-arr[8], arr[4]-arr[9], arr[5]-arr[10]}' f1.csv f2.csv
Output:
5 -5 0
8 0 -4
From man awk:
FNR: The input record number in the current input file.
NR: The total number of input records seen so far.
Another way using nl and awk
$ (nl f1.csv;nl f2.csv) | sort | awk -F, ' {a1=$3;a2=$4;a3=$5; getline; print a1-$3,a2-$4,a3-$5 } '
5 -5 0
8 0 -4
$

Compare two files(file1 & file2) and add one column from from file2 to file1 if first column of two files matches

I have two files (file1 and file2)
file1
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/abc/patch142007
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/def/patch143005
DEF=14.3.0.5.SAMPLE2=git.calypso/plugins/gitiles/+/refs/heads/clientpatch/def/patch14300-calib
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/hij/patch120000sp3
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/mno/patch161028
.......(150 lines)
file2
IJK = open
ABC = closed
PQR = closed
DEF = open
HIJ = open
LMN = closed
MNO = closed
PQR = open
......(> 150 lines)
output file
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/abc/patch142007=closed
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/def/patch143005=open
DEF=14.3.0.5.SAMPLE2=git.xyz/plugins/gitiles/+/refs/heads/client/def/patch14300-calib=open
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/client/hij/patch120000sp3=open
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/mno/patch161028=closed
I have tried the following script. But it is not giving me any output. Not even printing anything. No errors
while IFS= read -r line
do
key1=`echo $line | awk -F "=" '{print $1}'` < file1
key2=`echo $line | awk -F "=" '{print $2}'` < file1
key3=`echo $line | awk -F "=" '{print $3}'` < file1
key4=`echo $line | awk -F "=" '{print $1}'` < file2
value3=`echo $line | awk -F "=" '{print $2}'` < file2
if [ "$key1" == "$key4" ]; then
echo "$key1=$key2=$key3=$value3"
fi
done
Giving a brief description for how the code should work.
The code should compare first columns of two files(file1 and file2). If each name matches it should give me output file as listed above. Else go to the next line. I should get output if my two files are either in sorted or unsorted format.
Helps will be appreciated. Thank you
Or another approach with awk that stores the file2 values in an array and then appends the correct state to the appropriate line in file1:
awk -F' = ' 'NR==FNR {a[$1]=$2; next} {print $0"="a[$1]}' file2 FS="=" file1
Example Use/Output
$ awk -F' = ' 'NR==FNR {a[$1]=$2; next} {print $0"="a[$1]}' file2 FS="=" file1
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/abc/patch142007=closed
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/def/patch143005=open
DEF=14.3.0.5.SAMPLE2=git.calypso/plugins/gitiles/+/refs/heads/clientpatch/def/patch14300-calib=open
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/hij/patch120000sp3=open
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/mno/patch161028=closed
Could you please try following.
awk '
BEGIN{
OFS="="
}
FNR==NR{
a[$1]=$NF
next
}
($1 in a){
print $0,a[$1]
}
' Input_file2 FS="=" Input_file1
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
OFS="=" ##Setting OFS as = here for all lines.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file2 is being read.
a[$1]=$NF ##Creating an array a with index $1 and value is last field.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if $1 of current line is present in array a then do following.
print $0,a[$1] ##Printing current line and value of array a with index $1.
}
' file2 FS="=" file1 ##Mentioning Input_file file2 and file1 and setting FS="=" for file1 here.

using awk to print header name and a substring

i try using this code for printing a header of a gene name and then pulling a substring based on its location but it doesn't work
>output_file
cat input_file | while read row; do
echo $row > temp
geneName=`awk '{print $1}' tmp`
startPos=`awk '{print $2}' tmp`
endPOs=`awk '{print $3}' tmp`
for i in temp; do
echo ">${geneName}" >> genes_fasta ;
echo "awk '{val=substr($0,${startPos},${endPOs});print val}' fasta" >> genes_fasta
done
done
input_file
nad5_exon1 250405 250551
nad5_exon2 251490 251884
nad5_exon3 195620 195641
nad5_exon4 154254 155469
nad5_exon5 156319 156548
fasta
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc............
and this is my wrong output file
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
output should look like that:
>name1
atgcatgcatgcatgcatgcat
>name2
tgcatgcatgcatgcat
>name3
gcatgcatgcatgcatgcat
>namen....
You can do this with a single call to awk which will be orders of magnitude more efficient than looping in a shell script and calling awk 4-times per-iteration. Since you have bash, you can simply use command substitution and redirect the contents of fasta to an awk variable and then simply output the heading and the substring containing the beginning through ending characters from your fasta file.
For example:
awk -v fasta=$(<fasta) '{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
or using getline within the BEGIN rule:
awk 'BEGIN{getline fasta<"fasta"}
{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
Example Input Files
Note: the beginning and ending values have been reduced to fit within the 129 characters of your example:
$ cat input
rad5_exon1 1 17
rad5_exon2 23 51
rad5_exon3 110 127
rad5_exon4 38 62
rad5_exon5 59 79
and the first 129-characters of your example fasta
$ cat fasta
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
Example Use/Output
$ awk -v fasta=$(<fasta) '{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
>rad5_exon1
atgcatgcatgcatgca
>rad5_exon2
gcatgcatgcatgcatgcatgcatgcatg
>rad5_exon3
tgcatgcatgcatgcatg
>rad5_exon4
tgcatgcatgcatgcatgcatgcat
>rad5_exon5
gcatgcatgcatgcatgcatg
Look thing over and let me know if I understood your question requirements. Also let me know if you have further questions on the solution.
If I'm understanding correctly, how about:
awk 'NR==FNR {fasta = fasta $0; next}
{
printf(">%s %s\n", $1, substr(fasta, $2, $3 - $2 + 1))
}' fasta input_file > genes_fasta
It first reads fasta file and stores the sequence in a variable fasta.
Then it reads input_file line by line, extracts the substring of fasta starting at $2 and of length $3 - $2 + 1. (Note that the 3rd argument to substr function is length, not endpos.)
Hope this helps.
made it work!
this is the script for pulling substrings from a fasta file
cat genes_and_bounderies1 | while read row; do
echo $row > temp
geneName=`awk '{print $1}' temp`
startPos=`awk '{print $2}' temp`
endPos=`awk '{print $3}' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' unwraped_${fasta} >> genes_fasta
done
done

How to color the output of awk depending on a condition

I have an input file test.txt containing the fallowing:
a 1 34
f 2 1
t 3 16
g 4 11
j 5 16
I use awk to print only string 2 and 3:
awk '{print $2 " " $3}' test.txt
Is there a way to color only the second string of my output depending on a condition, if the value is higher than 15 then print in orange, if the value is higher than 20, print in red. It will give the same but colored:
1 34(red)
2 1
3 16(orange)
4 11
5 16(orange)
The input could contain many more lines in a different order.
This awk command should do what you want:
awk -v red="$(tput setaf 1)" -v yellow="$(tput setaf 3)" -v reset="$(tput sgr0)" '{printf "%s"OFS"%s%s%s\n", $1, ($3>20)?red:($3>15?yellow:""), $3, reset}'
The key bits here are
the use of tput to get the correct representation of setting the color for the current terminal (as opposed to hard-coding a specific escape sequence)
the use of -v to set the values of the variables the awk command uses to construct its output
The above script is tersely written but could be written less tersely like this:
{
printf "%s"OFS, $1
if ($3 > 20) {
printf "%s", red
} else if ($3 > 15) {
printf "%s", yellow
}
printf "%s%s\n", $3, reset
}
Edit: Ed Morton correctly points out that the awk programs above could be simplified by using a color variable and separating the color choice from the printing. Like this:
awk -v red="$(tput setaf 1)" -v yellow="$(tput setaf 3)" -v reset="$(tput sgr0)" \
'{
if ($3>20) color=red; else if ($3>15) color=yellow; else color=""
printf "%s %s%s%s\n", $1, color, $3, reset
}'
You can do the following:
awk '{printf "%s ", $2}
$3 > 15 {printf "\033[33m"}
$3 > 20 {printf "\033[31m"}
{printf "%s\033[0m\n", $3}' test.txt
Unfortunately, I don't know the orange ansi escape code...
Another approach:
awk -v color1="\033[33m" -v color2="\033[31m" -v reset="\033[0m" '
$3 > 15 && $3 <= 20 {$3=color1 $3 reset}
$3 > 20 {$3=color2 $3 reset}
{print $2, $3}' test.txt
awk '{if($2>15 && $2<=20){$2="\033[1;33m" $2 "\033[0m"};if($2>20){$2="\033[1;31m" $2 "\033[0m"};print}' file
breakdown
($2>15 && $2<=20){$2="\033[1;33m" $2 "\033[0m"} # if field 2>15 and =<20, then add colour code to field 2.
if($2>20){$2="\033[1;31m" $2 "\033[0m"} ## if field 2>20, then add colour code to field 2.
print #print line afterwards

Bash - only printing certain parts of a matrix using awk

I want to read a matrix of numbers
1 3 4 5
2 4 9 0
And only want my awk statement to print out the first and last, so 1 and 0. I have this so far, but nothing will print. What is wrong with my logic?
awk 'BEGIN {for(i=1;i<NF;i++)
if(i==1)printf("%d ", $i);
else if(i==NF && i==NR)printf("%d ", $i);}'
$ awk '{ if (NR==1) { print $1}} END{print $NF}' matrix
1
0
The above awk program has two parts. The first is:
{ if (NR==1) { print $1}}
This prints the first field (column) of the first record (line) of the file.
The second part is:
END{print $NF}
This parts runs only at the end after the last record (line) has been read. It prints the last field (column) of that line.
Borrowing from unix.com, you can use the following:
awk 'NR == 1 {print $1} END { print $NF }'
This will print the first column of the first line (NR == 1) and end input has finished (END), print the final column of the last line.
If I understand the output format you're looking for, this code should capture those values and print them:
awk 'NR == 1 {F = $1} END { L = $NF ; printf("%d %d", F, L) }'
awk is line based, NR is the current record (line) number.
and awk is essentially match => action,
echo "1 3 4 5
2 4 9 0" |
awk 'NR == 1 {print $1;}
END {print $NF;}'
for the first record print the first field;
for the last record print the last field.
Since so many solutions with awk, here is another way with sed.
sed -r ':a;$!{N;ba};s/\s+.*\s+/ /' file
Yet another sed variant:
$ echo $'1 3 4 5\n2 4 9 0' | sed -n '1s/ .*//p;$s/.* //p'
awk 'NR==1{print $1;} END{print $NF;}'

Resources