unix command: how to get top n records - shell

I want to get top n records using unix command:
e.g.
input:
1 a
2 b
3 c
4 d
5 e
output(get top 3):
5 e
4 d
3 c
Current I am doing:
cat myfile.txt | sort -k1nr | head -3 > my_output.txt
It works fine but when the file gets large, it becomes very slow.
It is slow because it sorts the file completely, while what I need is just the top 3 records.
Is there any command I can use to get the top 3 records?

perl -ane '
BEGIN {#top = ([-1]) x 3}
if ($F[0] > $top[0][0]) {
#top = sort {$a->[0] <=> $b->[0]} #top[1,2], [$F[0], $_];
}
END {print for reverse map {$_->[1]} #top}
' << END_DATA
1 a
2 b
3 c
4 d
5 e
END_DATA
5 e
4 d
3 c

Have you tried changing the order of your command?
Like this.
sort -k1nr myfile.txt | head -3 > my_output.txt

Related

How to sort the file based on last column in unix using sort command?

a 1
b 2 4
c 3
d 4 5 7
e 4 6
f 5
how can we print the output like below using sort in which the last column is sorted -
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
We can achieve the result using awk -
$awk '{print $NF,$0}' file.txt | sort -n | cut -f2- -d' '
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
Could you please try following and let me know if this helps you.
rev Input_file | sort -nk1.1 | rev

join command leaving out a row of numbers

I have two files, I want to take out the rows which have common data in the third column. But it is leaving out a row which should be matched.
File1
b b b
4 5 3
c c c
File2
1 2 3 4
a b c d
e f g h
i j k l
l m n o
The output is:
c c c a b d
The command used is:
join -1 3 -2 3 --nocheck-order File1.txt File2.txt
It is missing out the row with 3 as the common field, even after placing the --nocheck-order
Edit:
Expected output:
c c c a b d
3 4 5 1 2 4
As an alternative to 2 sort commands (can be very expensive for big files) and then a join, you can use this single awk command to get your output:
awk 'FNR == NR{a[$3]=$0; next} $3 in a{print $3, a[$3], $1, $2, $4}' file1 file2
3 4 5 3 1 2 4
c c c c a b d
Explanation:
NR == FNR { # While processing the first file
a[$3] = $0 # store the whole line in array a using $3 as key
next
}
$3 in a { # while processing the 2nd file, when $3 is found in array
print $3,a[$3],$1,$2,$4 # print relevant fields from file2 and the remembered
# value from the first file.
}
You need to sort your inputs (e.g. using process substitution):
$ join -1 3 -2 3 <(sort -k3 1.txt) <(sort -k3 2.txt)
3 4 5 1 2 4
c c c a b d
This is equivalent to:
$ sort -k3 1.txt > 1-sorted.txt
$ sort -k3 2.txt > 2-sorted.txt
$ join -1 3 -2 3 1-sorted.txt 2-sorted.txt
3 4 5 1 2 4
c c c a b d

bash print complete lines where just the first n characters match

I have created a sorted list of hashes for certain files
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/camera/London 170713/P9110042.JPG
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/london/P9110042.JPG
where there are duplicate hashes (just the hashes), I want to print the whole line of all matches
so say there where hashes A B C
A 1
B 2
B 3
C 4
C 5
C 6
in this example all the lines except the first one should be printed
B 2
B 3
C 4
C 5
C 6
Before you continue, look up fdupes.
If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq:
$ cat file
A 1
B 2
B 3
C 4
C 5
C 6
$ sort file | uniq -w 1 -D
B 2
B 3
C 4
C 5
C 6
Using awk you can do (will work with unsorted file also):
awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file
B 2
B 3
C 4
C 5
C 6

Unix Command (Mac OS): cut and move rows

Could you please give me a hint which unix command I can use to do the following:
I want to convert these lines...
1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi
...into those:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
rsand perl -pne just transpose them but I need a completely new arrangement as you see. Perl-code would be favored, but I am thankful for any help.
cheers
marsch
Using a perl one-liner
perl -lne 'push #{$l[($.-1) % 3]}, $_; }{ print "#$_" for #l' data.txt | column -t
Explanation:
Switches:
-l: Enable line ending processing, specifies line terminator
-n: Creates a while(<>){..} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
Code:
push #{$l[($.-1) % 3]}, $_;: Push each line into an array modulo the line number
}{ print "#$_" for #l: Print the 3 element array at end of processing
| column -t: Even out the columns
I would go with split and paste from coreutils. Try the following commands:
split -l3 infile
paste -d' ' xaa xab xac | column -t
Output:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
Here is a oneliner:
perl -ne 'chomp; push #a,$_ if $_; unless($. % 3) {push #f,[#a]; #a = undef; shift #a} END {for my $i (#f) { for (#$i) {print "$_ "} print "\n"}}' filename.txt
output
1 a i 2 b ii 3 c iii
4 d iv 5 e v 6 f vi
7 g vii 8 h viii 9 i xi
I use ruby
string = "1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi "
ary = string.split("\n")
length = ary.size / 3
new_ary = Array.new(3, "")
ary.each_with_index do |e, i|
position = i % 3
new_ary[position] += e
end
puts new_ary.join("\n")
Hope to help:)

Get n last records and change particular columns on them

I have file like this
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
I want delete a, b from two last Records in END{} section
Result:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
How can I get n last lines and change fields on them with awk?
Here's one way using any awk:
awk -v count=$(wc -l <file.txt) 'NR > count - 2 { $2 = "" }1' file.txt
Results:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
Or to do awk operations for all records except 2 last lines of input file as a shell script, try ./script.sh file.txt. Contents of script.sh:
command=$(awk -v count=$(wc -l <"$1") 'NR <= count - 2 { $2 = "" }1' "$1"
echo -e "$command"
Results:
1 "45554323" p b
2 "34534567" f a
3 "76546787" u b
2 "56765435" f a
* a
0 b
If you know the value of n - the line number after which you want to delete the last item on the line/colum (here 4) this will work:
awk '{if (NR>4) NF=NF-1}1' data.txt
will give:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
NF = NF -1 makes awk think there is one less field on the line than there is, which is how it doesn't display the last column/item on the line once that condition is met. NR refers to the current line number in the file being read.
awk can't know the number of lines in a file unless it goes through it once, or is given that information (e.g., wc -l). An alternative approach would be to save the last n lines in a buffer (sort of a sliding window/tape-delay type analogy, you are always printing n lines behind) and then process the final n lines in the END block.
This doesn't exactly answer your question but it produces the output you require:
$ gawk '{if (NF < 3) print $1; else print}' input.txt
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
$ cat file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
$ awk 'BEGIN{ARGV[ARGC++]=ARGV[ARGC-1]} NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
or if you don't mind manually specifying the file name twice:
awk 'NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file file

Resources