Divide the first entry and last entry of each row in a file using awk - bash

I have a file with varying row lengths:
120 2 3 4 5 9 0.003
220 2 3 4 0.004
320 2 3 5 6 7 8 8 0.009
I want the output to consist of a single column with entries like:
120/0.003
220/0.004
320/0.009
That is i want to divide the first column and last column of each row.
How can I achieve this using awk?

To show the output of the division operation:
$ awk '{ printf $1 "/" $NF "=" ; print ($1/$NF)}' infile
120/0.003=40000
220/0.004=55000
320/0.009=35555.6

awk will split its input based on the value of FS, which by default is any sequence of whitespace. That means you can get at the first and last column by referring to $1 and $NF. NF is the number of fields in the current line or record.
So to tell awk to print the first and last column do something like this:
awk '{ print $1 "/" $NF }' infile
Output:
120/0.003
220/0.004
320/0.009

Related

Modify values of one column based on values of another column on a line-by-line basis

I'm looking to use bash/awk/sed in order to modify a document.
The document contains multiple columns. Column 5 currently has the value "A" at every row. Column six is composed of increasing numbers. I'm attempting a script that goes through the document line by line, checks the value of Column 6, if the value is greater than a certain integer (specifically 275) the value of Column 5 in that same line is changed to "B".
while IFS="" read -r line ; do
awk 'BEGIN {FS = " "}'
Num=$(awk '{print $6}' original.txt)
if [ $Num > 275 ] ; then
awk '{ gsub("A","B",$5) }'
fi
done < original.txt >> edited.txt
For the above, I've tried setting the residueNum variable both inside and outside of the while loop.
I've also tried using a for loop and cat:
awk 'BEGIN {FS = " "}' original.txt
Num=$(awk '{print $6}' heterodimer_P49913/unrelaxed_model_1.pdb)
integer=275
for data in $Num ; do
if [ $data > $integer ] ; then
##Change value in other column to "B" for all lines containing column 6 values greater than "integer"
fi
done
Thanks in advance.
GNU AWK does not need external while loop (there is implicit loop), if you need further explanation read awk info page. Let file.txt content be
1 2 3 4 A 100
1 2 3 4 A 275
1 2 3 4 A 300
and task to be
checks the value of Column 6, if the value is greater than a certain
integer (specifically 275) the value of Column 5 in that same line is
changed to "B".
then it might be done using GNU AWK following way
awk '$6>275{$5="B"}{print}' file.txt
which gives output
1 2 3 4 A 100
1 2 3 4 A 275
1 2 3 4 B 300
Explanation: action set value of 5th field ($5) to B is applied conditionally to rows where value of 6th field is greater than 275. Action to print is applied unconditionally to all lines. Observe that change if applied is done before printing.
(tested in GNU Awk 5.0.1)

How to count number of values in a row and store total count to array

I have a scenario where i want to get count of all values row by row and store it to dynamic array
Data in file :
"A","B","C","B"
"P","W","R","S"
"E","U","C","S"
"Y","F","C"
first row as : 4 -> values
second row as : 4 -> values
third row as : 4 -> values
fourth row as : 3 -> values
Expected Output :
store to array : array_list=(4,4,4,3)
written a script but not working
array_list=()
while read -r line
do
var_comma_count=`echo "$line" | tr -cd , | wc -c`
array_list=+($( var_comma_count))
done < demo.txt
when i print array it should give me all values : echo "{array_list[#]}"
Note :
The file might contain empty lines at last which should not be read
when i count file it gave me count : 5 , it should have ignored last line which is empty
where as when i use awk it give me proper count : awk '{print NF}' demo.txt -> 4
I know processing file using while loop is not a best practise , but any better solution will be appreciated
Perhaps this might be easier using awk, set the FS to a comma and check if the number of fields is larger than 0:
#!/bin/bash
array_list=($(awk -v FS=, 'NF>0 {print NF}' demo.txt))
echo "${array_list[#]}"
Output
4 4 4 3
The awk command explained:
awk -v FS=, ' # Start awk, set the Field Separator (FS) to a comma
NF>0 {print NF} # If the Number of Fields (NF) is greater than 0, print the NF
' demo.txt # Close awk and set demo.txt as the input file
Another option could be first matching the format of the whole line. If it matches, there is at least a single occurrence.
Then split the line on a comma.
array_list=($(awk '/^"[A-Z]"(,"[A-Z]")*$/{print(split($0,a,","));}' demo.txt))
echo "${array_list[#]}"
Output
4 4 4 3
The awk command explained:
awk '/^"[A-Z]"(,"[A-Z]")*$/{ # Regex pattern for the whole line, match a single char A-Z between " and optionally repeat preceded by a comma
print(split($0,a,",")); # Split the whole line `$0` on a comma and print the number of parts
}
' demo.txt

How to print and store specific named columns from csv file with new row numbers

start by saying, I'm very new to using bash and any sort of script writing in general.
I have a csv file that has basic column headers and values underneath which looks something like this as an example:
a b c d
3 3 34 4
2 5 4 94
4 5 8 3
9 8 5 7
Is there a way to extract only the numerical values from a specific column and add a number for each row. For example first numbered row of the first column (starting from 1 after the column header) is 1, then 2, then 3, etc, for example for column b the output would be:
1 3
2 5
3 5
4 8
I would like to be able to do this for various different named column headers.
Any help would be appreciated,
Chris
Like this? Using awk:
$ awk 'NR>1{print NR-1, $2}' file
1 3
2 5
3 5
4 8
Explained:
$ awk ' # using awk for the job
NR>1 { # for the records or rows after the first
print NR-1, $2 # output record number minus one and the second field or column
}' file # state the file
I would like to be able to do this for various different named column headers. With awk you don't specify the column header name but the column number, like you don't state b but $2.
awk 'NR>1 {print i=1+i, $2}' file
NR>1 skips the first line, in your case the header.
print print following
i=1+i prints i, i is first 0 and add 1, so i is 1, next time 2 and so on.
$2 prints the second column.
file is the path to your file.
If you have a simple multi-space delimited file (as in your example) awk is the best tool for the job. To select the column by name in awk you can do something like:
$ awk -v col="b" 'FNR==1 { for (i=1;i<=NF;i++) if ($i==col) x=i; next }
{print FNR-1 OFS $x}' file
1 3
2 5
3 5
4 8

How to use awk counting the number of specificed digit in certain column

How can I count the matched value in the certain column?
I have a file:(wm.csv)
I executed the command to get the targeted value in certain column: tail -n +497 wm.csv | awk -F"," '$2=="2" {print $3" "$4}'
then I get the following output data I want:
hit 2
hit 2
hit 2
hit 2
miss
hit 2
hit 2
hit 2
hit 2
hit 2
hit 2
hit 2
incorrect 1
hit 2
hit 2
hit 2
I want to count the number of "2" in second column in order to do simple math like: total digits in column divided by total number of row. Specifically, in this case, it would looks like: 14 (fourteen "2" in second column) / 16(total number of row)
Following is the command I tried but this does not work :
tail -n +497 wm.csv | awk -F"," '$2=="2" {count=0;} { if ($4 == "2") count+=1 } {print $3,$4,$count }'
thanks
taking the posted data as input file
$ awk '$2==2{c++} END{print NR,c,c/NR}' file
16 14 0.875
awk '($0 ~ "hit 2"){count += 1} END{print count, FNR, count/FNR}' sample.csv
14 16 0.875
I use ~ to compare the whole line($0) matches "hit 2", if it is increase counter by 1. FNR is the file number of records which is the total line number.

Exclude a define pattern using awk

I have a file with two columns and want to print the first column only if a determined pattern is not found in the second column, the file can be for example:
3 0.
5 0.
4 1.
3 1.
10 0.
and I want to print the values in the first column only if there isn't the number 1. in the second file, i.e.
3
5
10
I know that to print the first column I can use
awk '{print $1}' fileInput >> fileOutput
Is it possible to have an if block somewhere?
In general, you just need to indicate what pattern you don't want to match:
awk '! /pattern/' file
In this specific case, where you want to print the 1st column of lines where 2st column is not "1.", you can say:
$ awk '$2 != "1." {print $1}' file
3
5
10
When the condition is accomplished, {print $1} will be performed, so that you will have the first column of the file.
In this special case, because the 1 evaluates to true and the 0 to false, you can do:
awk '!$2 { print $1 }' file
3
5
10
The part before the { } is the condition under which the commands are executed. In this case, !$2 means that not column 2 is true (i.e. column 2 is false).
edit: this remains to be the case, even with the trailing dot. In fact, all three of these solutions work:
bash-4.2$ cat file
3 0.
5 0.
4 1.
3 1.
10 0.
bash-4.2$ awk '!$2 { print $1 }' file # treat column 2 as a boolean
3
5
10
bash-4.2$ awk '$2 != "1." {print $1}' file # treat column 2 as a string
3
5
10
bash-4.2$ awk '$2 != 1 {print $1}' file # treat column 2 as a number
3
5
10

Resources