awk print the last row of file failed - shell

$cat file
1
2
3
4
5
6
7
8
9
0
I want to print the value of last row.
$awk '{print $NR}' file
1
Why the output is not 0?

Unlike sed, awk does not have a way to specify the last line. A work-around is:
$ awk '{line=$0} END{print line}' file
0
Discussion
Let's look at your command at see what it actually does. Consider this test file:
$ cat testfile
a b c
A B C
i ii iii
Now, let's run your command:
$ awk '{print $NR}' testfile
a
B
iii
As you can see, print $NR prints the diagonal. In other words, on line number NR, it prints field number NR. So, on the first line, NR=1, the command print $NR prints the first field. On the second line, NR=2, the command print $NR prints the second field. And so on.

Use following code, which will print the last line of any Input_file. Here END section is the out of the box awk keyword which is used to execute the commands/statements after main section. So I am simply printing the line in END section which will print the last line.
awk 'END{print $0}' Input_file
OR
awk 'END{print}' Input_file

Related

apply dictionary mapping to the column of a file with awk

I have a text file file.txt with several columns (tab separated), and the first column can contain indexes such as 1, 2, and 3. I want to update the first column so that 1 becomes "one", 2 becomes "two", and 3 becomes "three". I created a bash file a.sh containing:
declare -A DICO=( [1]="one" [2]="two" [3]="three" )
awk '{ $1 = ${DICO[$1]}; print }'
But now when I run cat file.txt | ./a.sh I get:
awk: cmd. line:1: { $1 = ${DICO[$1]}; print }
awk: cmd. line:1: ^ syntax error
I'm not able to fix the syntax. Any ideas? Also there is maybe a better way to do this with bash, but I could not think of another simple approach.
For instance, if the input is a file containing:
2 xxx
2 yyy
1 zzz
3 000
4 bla
The expected output would be:
two xxx
two yyy
one zzz
three 000
UNKNOWN bla
EDIT: Since OP had now added samples so changed solution as per that now.
awk 'BEGIN{split("one,two,three",array,",")} {$1=$1 in array?array[$1]:"UNKONW"} 1' OFS="\t" Input_file
Explanation: Adding explanation for above code too now.
awk '
BEGIN{ ##Starting BEGIN block of awk code here.
split("one,two,three",array,",") ##Creating an array named array whose values are string one two three with delimiter as comma.
}
{
$1=$1 in array?array[$1]:"UNKOWN" ##Re-creating first column which will be if $1 comes in array then its value will be aray[$1] else it will be UNKOWN string.
}
1 ##Mentioning 1 here. awk works on method of condition then action, so making condition is TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file ##mentioning Input_file name here.
Since you haven't shown samples so couldn't tested completely, could you please try following and let me know if this helps.
awk 'function check(value){gsub(value,array[value],$1)} BEGIN{split("one,two,three",array,",")} check(1) check(2) check(3); 1' Input_file
Adding a non-one liner form of solution too here.
awk '
function check(value){
gsub(value,array[value],$1)
}
BEGIN{
split("one,two,three",array,",")
}
check(1)
check(2)
check(3);
1' OFS="\t" Input_file
Tested code as follows too:
Let's say we have following Input_file:
cat Input_file
1213121312111122243434onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvr13232424
Then after running the code following will be the output:
onetwoonethreeonetwoonethreeonetwooneoneoneonetwotwotwo4three4three4onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvronethreetwothreetwo4two4
Given a dico file containing this:
$ cat dico
1 one
2 two
3 three
You could use this awk script:
awk 'NR==FNR{a[$1]=$2;next}($1 in a){$1=a[$1]}1' dico file.txt
This fills the array a with the content of the dico file and replaces the first element of the file.txt file if this one is part of the array.

UNIX: How to print out specific lines in a file using sed/awk/grep?

I have a data in Unix which were fetched by a command and prints:
line01
line02
line03
line04
line05
line06
line07
line08
line09
line10
line11
line12
and I wanted to sort it out such that the lines 10 to 12 are above lines 1 to 9. like this:
line10
line11
line12
line01
line02
line03
line04
line05
line06
line07
line08
line09
i tried using
<command that fetches the data> | awk 'NR>=10 || NR<=9'
and
<command that fetches the data> | sed -n -e '4,5p' -e '1,3p'
but it still display in a sorted order. i'm new to unix so i don't know how to properly use awk/sed.
PS. These data are stored in a variable which will then be processed by another command. so i needed it to be sorted that way so that line 10-12 will be processed first. :)
Use head and tail:
$ tail -n 2 file && head -n 3 file
name4
name5
name1
name2
name3
Your awk and sed approach do not work because you are just saying: print lines number X, Y and Z, and they will do so as soon as they find any of them. If you wanted to use these tools, you would need to read the file first, storing its content, and then print it.
$ awk -v OFS="\n" '{a[NR]=$0} END {print a[4], a[5], a[1], a[2], a[3]}' file
name4
name5
name1
name2
name3
Or even give the order as a variable:
awk -v order="4 5 1 2 3"
'BEGIN {split(order,lines)}
{a[NR]=$0}
END {for (i=1;i<=length(lines);i++) print a[lines[i]]}' file
If you want to give the order of the lines as an argument, you can use process substitution saying awk '...' <(command) file and working with FNR/NR to distinguish between the input and the file
Or you can use - to read from stdin as first file:
echo "4 5 1 2 3" | awk 'FNR==NR {n=split($0,lines); next}
{a[FNR]=$0}
END {for (i=1;i<=n;i++) print a[lines[i]]}' - file
As one-liner:
$ echo "4 5 1 2 3" | awk 'FNR==NR {n=split($0,lines); next} {a[FNR]=$0} END {for (i=1;i<=n;i++) print a[lines[i]]}' - a
This might work for you (GNU sed):
sed '1h;2,9H;1,9d;12G' file
Replace the hold space with line 1, then append lines 2 to 9 to the hold space and delete lines 1 thru 9. Print all other lines normally but on line 12 append the lines stored in the hold space to the pattern space.
Using sort actually:
$ sort -r -s -k 1.5,1.5 /tmp/lines
line10
line11
line12
line01
line02
line03
line04
line05
line06
line07
line08
line09
The -k 1.5,1.5 means I'm using only 5th character of first word for sorting. -r means reverse order and -s means stable - leaving lines that have same 5th character in the same order.

Getting repeated lines with awk in Bash

I'm trying to know which are the lines that are repeated X times in a text file, and I'm using awk but I see that awk in my command, not work with lines that begin with the same characters or words. That is, does not recognize the full line individually.
Using this command I try to get the lines that are repeated 3 times:
awk '++A[$1]==3' ./textfile > ./log
This is what you need hopefully:
awk '{a[$0]++}END{for(i in a){if(a[i]==3)print i}}' File
Increment array a with the line($0) as index for each line. In the end, for each index ($0), check if the count(a[i] which is the original a[$0]) equals 3. If so, print the line (i which is the original $0 / line). Hope it's clear.
This returns lines repeated 3 times but adds a space at the beginning of each 3x-repeated line:
sort ./textfile | uniq -c | awk '$1 == 3 {$1 = ""; print}' > ./log

Print a comma except on the last line in Awk

I have the following script
awk '{printf "%s", $1"-"$2", "}' $a >> positions;
where $a stores the name of the file. I am actually writing multiple column values into one row. However, I would like to print a comma only if I am not on the last line.
Single pass approach:
cat "$a" | # look, I can use this in a pipeline!
awk 'NR > 1 { printf(", ") } { printf("%s-%s", $1, $2) }'
Note that I've also simplified the string formatting.
Enjoy this one:
awk '{printf t $1"-"$2} {t=", "}' $a >> positions
Yeh, looks a bit tricky at first sight. So I'll explain, first of all let's change printf onto print for clarity:
awk '{print t $1"-"$2} {t=", "}' file
and have a look what it does, for example, for file with this simple content:
1 A
2 B
3 C
4 D
so it will produce the following:
1-A
, 2-B
, 3-C
, 4-D
The trick is the preceding t variable which is empty at the beginning. The variable will be set {t=...} only on the next step of processing after it was shown {print t ...}. So if we (awk) continue iterating we will got the desired sequence.
I would do it by finding the number of lines before running the script, e.g. with coreutils and bash:
awk -v nlines=$(wc -l < $a) '{printf "%s", $1"-"$2} NR != nlines { printf ", " }' $a >>positions
If your file only has 2 columns, the following coreutils alternative also works. Example data:
paste <(seq 5) <(seq 5 -1 1) | tee testfile
Output:
1 5
2 4
3 3
4 2
5 1
Now replacing tabs with newlines, paste easily assembles the date into the desired format:
<testfile tr '\t' '\n' | paste -sd-,
Output:
1-5,2-4,3-3,4-2,5-1
You might think that awk's ORS and OFS would be a reasonable way to handle this:
$ awk '{print $1,$2}' OFS="-" ORS=", " input.txt
But this results in a final ORS because the input contains a newline on the last line. The newline is a record separator, so from awk's perspective there is an empty last record in the input. You can work around this with a bit of hackery, but the resultant complexity eliminates the elegance of the one-liner.
So here's my take on this. Since you say you're "writing multiple column values", it's possible that mucking with ORS and OFS would cause problems. So we can achieve the desired output entirely with formatting.
$ cat input.txt
3 2
5 4
1 8
$ awk '{printf "%s%d-%d",t,$1,$2; t=", "} END{print ""}' input.txt
3-2, 5-4, 1-8
This is similar to Michael's and rook's single-pass approaches, but it uses a single printf and correctly uses the format string for formatting.
This will likely perform negligibly better than Michael's solution because an assignment should take less CPU than a test, and noticeably better than any of the multi-pass solutions because the file only needs to be read once.
Here's a better way, without resorting to coreutils:
awk 'FNR==NR { c++; next } { ORS = (FNR==c ? "\n" : ", "); print $1, $2 }' OFS="-" file file
awk '{a[NR]=$1"-"$2;next}END{for(i=1;i<NR;i++){print a[i]", " }}' $a > positions

Deleting the first two lines of a file using BASH or awk or sed or whatever

I'm trying to delete the first two lines of a file by just not printing it to another file. I'm not looking for something fancy. Here's my (failed) attempt at awk:
awk '{ (NR > 2) {print} }' myfile
That throws out the following error:
awk: { NR > 2 {print} }
awk: ^ syntax error
Example:
contents of 'myfile':
blah
blahsdfsj
1
2
3
4
What I want the result to be:
1
2
3
4
Use tail:
tail -n+3 file
from the man page:
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K
to output lines starting with the Kth
How about:
tail +3 file
OR
awk 'NR>2' file
OR
sed '1,2d' file
You're nearly there. Try this instead:
awk 'NR > 2 { print }' myfile
awk is rule based, and the rule appears bare (i.e., without braces) before the block it woud execute if it passes.
Also as Jaypal has pointed out, in awk if all you want to do is print the line that matches the rules you can even omit the action, thus simplifying the command to:
awk 'NR > 2' myfile
awk is based on pattern{action} statements. In your case, the pattern is NR>2 and the action you want to perform is print. This action is also the default action of awk.
So even though
awk 'NR>2{print}' filename
would work fine, you can shorten it to
awk 'NR>2' filename.

Resources