awk issue with RS in mac command line - macos

I have the following text file records.text
IronMan
1
2
3
Batman
1
2
3
I have the following awk command
awk 'BEGIN{ FS="\n"; RS="\n\n"} {print NR, ":", $1, $2}' records.text
I get the following output
1: Ironman
2: 1
3: 2
4: 3
5:
6: Batman
7: 1
8: 2
9: 4
Expected output:
1: Ironman 1
2: Batman 1
Which is wrong. This means RS variable is not picked up and still using default "\n" as the record separator? Anyone else with the same issue? Any solutions?

Unlike gnu awk, OSX's BSD awk does not handle multiple-character record separators. You'll have to try it a different way, handling one line at a time.

From your expression, I do get (after adding missing }
awk 'BEGIN{ FS="\n"; RS="\n\n"} {print NR, ";", $1, $2}' file
1 ; IronMan 1
2 ; Batman
Missing a 1 here, compare to what you like.
PS this also need a gnu awk do to the multiple characters in RS
When you working with record separated by empty lines you should set record selector to nothing.
awk -v RS="" '{print NR, ";", $1, $2}' file
1 ; IronMan 1
2 ; Batman 1

Related

Linux - loop through each element on each line

I have a text file with the following information:
cat test.txt
a,e,c,d,e,f,g,h
d,A,e,f,g,h
I wish to iterate through each line and then for each line print the index of all the characters different from e. So the ideal output would be either with a tab seperator or comma seperator
1 3 4 6 7 8
1 2 4 5 6
or
1,3,4,6,7,8
1,2,4,5,6
I have managed to iterate through each line and print the index, but the results are printed to the same line and not seperated.
while read line;do echo "$line" | awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}' ;done<test.txt
With the result being
1 3 4 6 7 8 1 2 4 5 6
If I do it only using awk
awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}'
I get the same output.
Could anyone help me with this specific issue with seperating the lines?
If you don't mind some trailing whitespace, you can just do:
while read line;do echo "$line" | awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' ;done<test.txt
but it would be more typical to omit the while loop and do:
awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' <test.txt
You can avoid the trailing whitespace with the slightly cryptic:
awk -F, '{m=0; for(i=1;i<=NF;i++) if($i!="e") {printf "%c%d", m++ ? " " : "", i }; print ""}' <test.txt

awk print the last row of file failed

$cat file
1
2
3
4
5
6
7
8
9
0
I want to print the value of last row.
$awk '{print $NR}' file
1
Why the output is not 0?
Unlike sed, awk does not have a way to specify the last line. A work-around is:
$ awk '{line=$0} END{print line}' file
0
Discussion
Let's look at your command at see what it actually does. Consider this test file:
$ cat testfile
a b c
A B C
i ii iii
Now, let's run your command:
$ awk '{print $NR}' testfile
a
B
iii
As you can see, print $NR prints the diagonal. In other words, on line number NR, it prints field number NR. So, on the first line, NR=1, the command print $NR prints the first field. On the second line, NR=2, the command print $NR prints the second field. And so on.
Use following code, which will print the last line of any Input_file. Here END section is the out of the box awk keyword which is used to execute the commands/statements after main section. So I am simply printing the line in END section which will print the last line.
awk 'END{print $0}' Input_file
OR
awk 'END{print}' Input_file

Add new field at the end of each line based on value of existing field (sed or awk)

I have a set of CSV files which I wish to add a field at the end of each line.
The first field is an ID, some ten-digit number:
id,2nd_field,...,last_field
1234567890,Smith,...,Arkansas
1234567891,Jones,...,California
1234567892,White,...,
I want to add another field at the end where the value is based on modulo 3 (id % 3) of the ID:
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
Please take into account the fact that the last_field could be null or blank.
How to do this using sed or awk? I'm a newbie on using these tools, kindly provide as well some explanation to your script. Thanks.
Using awk:
awk 'BEGIN{FS=OFS=","} NR==1{print $0, "added_field"; next}
($1%3)==0{p="x"} ($1%3)==1{p="y"} ($1%3)==2{p="z"} {print $0, p}' file
Output:
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
$ cat tst.awk
BEGIN { FS=OFS=","; split("y,z,x",map) }
{ print $0, (NR>1 ? map[($1-1)%3+1] : "added_field") }
$ awk -f tst.awk file
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
The above just uses split() to create a mapping of:
map[1] = y
map[2] = z
map[3] = x
and then accesses it when needed via the common (VALUE-1)%N+1 syntax that maps mod N results for values 1,2,..,N-1,N to 1,2,..,N-1,N instead of 1,2,..,N-1,0:
map[($1-1)%3+1]
e.g.:
$ awk 'BEGIN{ for (i=1;i<=6;i++) print i, i%3, (i-1)%3+1 }'
1 1 1
2 2 2
3 0 3
4 1 1
5 2 2
6 0 3

awk space delimiter with empty content

I have a text file which is delimited by space
1 dsfsdf 2
2 3
4 sdfsdf 4
5 sdfsdf 5
When I run
awk -F' ' '{s+=$3} END {print s}' test
It returns 11. It should return 14. I believe awk gets confused about the second line, between two spaces nothing there. How should I modify my command?
Thanks
try
awk -F' {1}' '{s+=$3} END {print s}' test
you get
14
Note
if test file contains
1 dsfsdf 2 1
2 3 1
4 sdfsdf 4 1
5 sdfsdf 5 1
also it works, i use gnu-awk
edit
how, #Ed_Morton and #"(9 )*" says is better to use literal space [ ]
awk -F'[ ]' '{s+=$3} END {print s}' test
this should work too if only the second column has missing values.
awk '{s+=$(NF-1)} END{print s}'

Bash - only printing certain parts of a matrix using awk

I want to read a matrix of numbers
1 3 4 5
2 4 9 0
And only want my awk statement to print out the first and last, so 1 and 0. I have this so far, but nothing will print. What is wrong with my logic?
awk 'BEGIN {for(i=1;i<NF;i++)
if(i==1)printf("%d ", $i);
else if(i==NF && i==NR)printf("%d ", $i);}'
$ awk '{ if (NR==1) { print $1}} END{print $NF}' matrix
1
0
The above awk program has two parts. The first is:
{ if (NR==1) { print $1}}
This prints the first field (column) of the first record (line) of the file.
The second part is:
END{print $NF}
This parts runs only at the end after the last record (line) has been read. It prints the last field (column) of that line.
Borrowing from unix.com, you can use the following:
awk 'NR == 1 {print $1} END { print $NF }'
This will print the first column of the first line (NR == 1) and end input has finished (END), print the final column of the last line.
If I understand the output format you're looking for, this code should capture those values and print them:
awk 'NR == 1 {F = $1} END { L = $NF ; printf("%d %d", F, L) }'
awk is line based, NR is the current record (line) number.
and awk is essentially match => action,
echo "1 3 4 5
2 4 9 0" |
awk 'NR == 1 {print $1;}
END {print $NF;}'
for the first record print the first field;
for the last record print the last field.
Since so many solutions with awk, here is another way with sed.
sed -r ':a;$!{N;ba};s/\s+.*\s+/ /' file
Yet another sed variant:
$ echo $'1 3 4 5\n2 4 9 0' | sed -n '1s/ .*//p;$s/.* //p'
awk 'NR==1{print $1;} END{print $NF;}'

Resources