awk space delimiter with empty content - bash

I have a text file which is delimited by space
1 dsfsdf 2
2 3
4 sdfsdf 4
5 sdfsdf 5
When I run
awk -F' ' '{s+=$3} END {print s}' test
It returns 11. It should return 14. I believe awk gets confused about the second line, between two spaces nothing there. How should I modify my command?
Thanks

try
awk -F' {1}' '{s+=$3} END {print s}' test
you get
14
Note
if test file contains
1 dsfsdf 2 1
2 3 1
4 sdfsdf 4 1
5 sdfsdf 5 1
also it works, i use gnu-awk
edit
how, #Ed_Morton and #"(9 )*" says is better to use literal space [ ]
awk -F'[ ]' '{s+=$3} END {print s}' test

this should work too if only the second column has missing values.
awk '{s+=$(NF-1)} END{print s}'

Related

How to add number to beginning of each line?

This is what I normally use to add numbers to the beginning of each line:
awk '{ print FNR " " $0 }' file
However, what I need to do is start the number at 1000001. Is there a way to start with a specific number like this instead of having to use line numbers?
there is a special command for this nl
nl -v1000001 file
You can just add 1000001 to FNR (or NR):
awk '{ print (1000001 + FNR), $0 }' file
$ seq 5 | awk -v n=1000000 '{print ++n, $0}'
1000001 1
1000002 2
1000003 3
1000004 4
1000005 5
$ seq 5 | awk -v n=30 '{print ++n, $0}'
31 1
32 2
33 3
34 4
35 5

Remove comma using awk command with multiple record

Let says i've record like this.
Input
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
I want to be like this
Output
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
How to remove the comma only on column number 4th ?, i don't want to remove comma on last column
You can use:
awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}'
The int() is what you are looking for I guess.
Example:
$ cat test
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
$ awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}' test
1 1 1 1 1 1 1.23
2 2 2 2 2 2 2.33
3 3 3 3 3 3 3.44
Edit (Good suggestion from ccf):
You could use this instead of the long version of awk command above.
$ awk -F',' '{$4=int($4); print}'
1,1,1,1.213,1,1,1.23
1 1 1 1 1 1 1.23
If temp.txt has the input, then
$ cat temp.txt | sed 's/\.[0-9]\+//1'
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
1 at the end means, only replace the first match.

awk issue with RS in mac command line

I have the following text file records.text
IronMan
1
2
3
Batman
1
2
3
I have the following awk command
awk 'BEGIN{ FS="\n"; RS="\n\n"} {print NR, ":", $1, $2}' records.text
I get the following output
1: Ironman
2: 1
3: 2
4: 3
5:
6: Batman
7: 1
8: 2
9: 4
Expected output:
1: Ironman 1
2: Batman 1
Which is wrong. This means RS variable is not picked up and still using default "\n" as the record separator? Anyone else with the same issue? Any solutions?
Unlike gnu awk, OSX's BSD awk does not handle multiple-character record separators. You'll have to try it a different way, handling one line at a time.
From your expression, I do get (after adding missing }
awk 'BEGIN{ FS="\n"; RS="\n\n"} {print NR, ";", $1, $2}' file
1 ; IronMan 1
2 ; Batman
Missing a 1 here, compare to what you like.
PS this also need a gnu awk do to the multiple characters in RS
When you working with record separated by empty lines you should set record selector to nothing.
awk -v RS="" '{print NR, ";", $1, $2}' file
1 ; IronMan 1
2 ; Batman 1

AWK doesn't recognise more than one field when changing value of an element

I have a csv file, a simplified version of which is:
#data
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
When I do:
awk -F',' '{print NF}' myfile.csv
I get:
1
5
5
5
5
5
5
5
5
5
5
I am trying to change the 5th element of the 10th line in this dataset but I noticed a strange behavior upon doing so. More specificaly, when I give:
awk -F',' 'NR==10{$5="abnormal"}1' myfile.csv | awk -F',' '{print NF}'
I get:
1
5
5
5
5
5
5
5
5
1
5
Does anyone has an explanation or any thought on this?
Thanks to #EdMorton for the valuable comment that assigning a value to any field causes the record to be recompiled using the OFS value which by default is a space.
Updating as per updated question: #drDoom, you are missing `OFS=','.
See the difference in the two outputs below with your sample data:
/home/amit/$ awk -F',' 'NR==10{$5="abnormal"}1' OFS=',' myfile.csv | awk -F',' '{print NF}'
1
5
5
5
5
5
5
5
5
5
5
/home/amit/$ awk -F',' 'NR==10{$5="abnormal"}1' myfile.csv | awk -F',' '{print NF}'
1
5
5
5
5
5
5
5
5
1
5
For changing the 150th field on a 100th line, you can do as below
awk -F',' 'NR==100{ $150 = "NewValue"}1' OFS=',' myfile.csv
Any or all of these are the issue:
a) Your csv file was created on Windows and so has extraneous control-M characters in it.
b) Your separator is not a comma [on every line].
c) You are miscounting which line is the 100th one.
Do this and update your question with the output:
dos2unix file
awk -F',' -v OFS=':' 'NR>98 && NR<102{print NR, NF, $1, $0}' file
Note that I said update your question with the output - do NOT post the output as a comment, as we will not be able to see the format.

Computing differences between columns of tab delimited file

I have a tab delimited file of 4 columns and n number of rows.
I want to find the difference in values present in column 3 and 2 and want to store them in another file.
This is what I am doing
cat filename | awk '{print $3 - $2}'>difference
and it is not working. How can I improve the code?
Solution:
I was missing the closing single quotation, and my eyes were so tuned to the screen that I couldn't figure it out in 35 lines code what was going wrong...and out of frustration I wrote the question on forum ... and [to complete] the comedy of errors, the syntax I wrote here [in the] question is correct (as it contains both single quotes).
Thank you all for your help.
Set the field separator if you have other whitespace in the lines.
BEGIN {
FS="\t"
}
Try using -F to force the delimiter as tab and enclose your
cat filename | awk -F"\t" '{print $3 - $2}' > difference
Does anyone test before they give their answers/ awk breaks on white space and not just spaces.
I just did this:
awk '{print $3 - $2}' temp.txt
And it works perfectly.
Here's my file:
1 2 7 4
11 12 13 14
1 12 3 4
1 2 3 4
1 2 3 4
And here's my results:
$ awk '{print $3 - $2}' temp.txt
5
1
-9
1
1
$
In fact, I used your command, and got the same results?
Can you explain what's not working for you? What data are you using, and what results are you getting?
Try this:
cat filename | awk -F '^T' '{print $3 - $4}' > difference
where ^T is tab delimiter (get it by pressing Ctrl+V+T)

Resources