bash print complete lines where just the first n characters match

bash print complete lines where just the first n characters match - bash

I have created a sorted list of hashes for certain files
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/camera/London 170713/P9110042.JPG
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/london/P9110042.JPG
where there are duplicate hashes (just the hashes), I want to print the whole line of all matches
so say there where hashes A B C
A 1
B 2
B 3
C 4
C 5
C 6
in this example all the lines except the first one should be printed
B 2
B 3
C 4
C 5
C 6

Before you continue, look up fdupes.
If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq:
$ cat file
A 1
B 2
B 3
C 4
C 5
C 6
$ sort file | uniq -w 1 -D
B 2
B 3
C 4
C 5
C 6

Using awk you can do (will work with unsorted file also):
awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file
B 2
B 3
C 4
C 5
C 6

Related

Bash_shell Use shell to convert three format in one script to another script at one time

cat file1.txt
set A B 1
set C D E 2
set E F 3 3 3 3 3 3
cat file2.txt
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;
please help convert the format in file1.txt to file2.txt, the file2.txt is the output. I just input 3 lines in file1.txt for taking example, but in fact ,there are many command lines same with these 3 format.So the shell command should be adapt to any situation where the content contains these 3 format in file1.txt.

echo "set A B 1
set C D E 2
set E F 3 3 3 3 3 3 " | sed -r 's/set (.) /\1;/;s/([A-Z])*( ([A-Z]))/\1.\3/g;s/([A-Z]) ([0-9])/\1;\2/;s/ ?$/;/'
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;

Find and replace entries in one csv file using another with bash

Main file:
A B
C D
D A
G H
Ref file:
1 A
2 B
3 C
4 D
5 G
6 H
New file:
1 2
3 4
4 1
5 6
I wanna do the above replacement, how can I do that using awk or some simple command line?

awk solution:
awk 'NR==FNR{ a[$2]=$1; next }{ $1=a[$1]; $2=a[$2] }1' reffile mainfile
The output:
1 2
3 4
4 1
5 6
a[$2]=$1 - capturing numbers from reffile into array indexed by letters (e.g. a["A"]=1)
$1=a[$1]; $2=a[$2] - replacing letters in mainfile with respective numbers

join command leaving out a row of numbers

I have two files, I want to take out the rows which have common data in the third column. But it is leaving out a row which should be matched.
File1
b b b
4 5 3
c c c
File2
1 2 3 4
a b c d
e f g h
i j k l
l m n o
The output is:
c c c a b d
The command used is:
join -1 3 -2 3 --nocheck-order File1.txt File2.txt
It is missing out the row with 3 as the common field, even after placing the --nocheck-order
Edit:
Expected output:
c c c a b d
3 4 5 1 2 4

As an alternative to 2 sort commands (can be very expensive for big files) and then a join, you can use this single awk command to get your output:
awk 'FNR == NR{a[$3]=$0; next} $3 in a{print $3, a[$3], $1, $2, $4}' file1 file2
3 4 5 3 1 2 4
c c c c a b d
Explanation:
NR == FNR { # While processing the first file
a[$3] = $0 # store the whole line in array a using $3 as key
next
}
$3 in a { # while processing the 2nd file, when $3 is found in array
print $3,a[$3],$1,$2,$4 # print relevant fields from file2 and the remembered
# value from the first file.
}

You need to sort your inputs (e.g. using process substitution):
$ join -1 3 -2 3 <(sort -k3 1.txt) <(sort -k3 2.txt)
3 4 5 1 2 4
c c c a b d
This is equivalent to:
$ sort -k3 1.txt > 1-sorted.txt
$ sort -k3 2.txt > 2-sorted.txt
$ join -1 3 -2 3 1-sorted.txt 2-sorted.txt
3 4 5 1 2 4
c c c a b d

unix command: how to get top n records

I want to get top n records using unix command:
e.g.
input:
1 a
2 b
3 c
4 d
5 e
output(get top 3):
5 e
4 d
3 c
Current I am doing:
cat myfile.txt | sort -k1nr | head -3 > my_output.txt
It works fine but when the file gets large, it becomes very slow.
It is slow because it sorts the file completely, while what I need is just the top 3 records.
Is there any command I can use to get the top 3 records?

perl -ane '
BEGIN {#top = ([-1]) x 3}
if ($F[0] > $top[0][0]) {
#top = sort {$a->[0] <=> $b->[0]} #top[1,2], [$F[0], $_];
}
END {print for reverse map {$_->[1]} #top}
' << END_DATA
1 a
2 b
3 c
4 d
5 e
END_DATA
5 e
4 d
3 c

Have you tried changing the order of your command?
Like this.
sort -k1nr myfile.txt | head -3 > my_output.txt

Add to the end of a predetermined line using sed in bash

I have a file in the format:
C 1 1 2
H 2 2 1
C 3 1 2
C 3 3 2
H 2 3 1
I need to add " f" to the end of specific lines, for example the third line, so the output would be:
C 1 1 2
H 2 2 1
C 3 1 2 f
C 3 3 2
H 2 3 1
From Googling, it seems that I need to use sed, but I couldn't find any examples on how to do specifically what I want.
Thanks in advance.

You are looking for this article on sed. Specifically, the section on restricting to a line number. An example:
sed '3 s/$/f/' < yourFile

awk 'NR==3{$0=$0" f"}1' your_file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash print complete lines where just the first n characters match - bash

Before you continue, look up fdupes. If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq: $ cat file A 1 B 2 B 3 C 4 C 5 C 6 $ sort file | uniq -w 1 -D B 2 B 3 C 4 C 5 C 6

Using awk you can do (will work with unsorted file also): awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file B 2 B 3 C 4 C 5 C 6

Related

Bash_shell Use shell to convert three format in one script to another script at one time

Find and replace entries in one csv file using another with bash

join command leaving out a row of numbers

unix command: how to get top n records

Add to the end of a predetermined line using sed in bash

Categories

Resources