tail file till last duplicate line in a file using bash - bash

Hey everyone! How to in simple way find line number of last duplicate in file
I need take tale till last duplicate Example
hhhh
str1
str2
hhhh
str1
hhh
**str1
str2
str3**
I need only bold till hhh(str1,str2,str3).Thanks in advance!

Give this a try:
awk '{if (a[$0]) accum = nl = ""; else {a[$0]=1;accum = accum nl $0; nl = "\n"}} END { print accum}' inputfile
Given this input:
aaa
b
c
aaa
d
e
f
aaa
b
aaa
g
h
i
This is the output:
g
h
i

taking the sample from Dennis,
$ gawk -vRS="aaa" 'END{print}' file
g
h
i
here's another way if you don't know before hand, although not as elegant as one awk script.
var=$(sort file| uniq -c|sort -n | tail -1| awk '{print $2}')
gawk -vRS="$var" 'END{print}' file
still, this will only get the duplicate that occurs the most frequency. it does not get the "last duplicate" , whatever that means.

Related

If there's match append text to the beginning of the next line

I have a file like this
from a
b
to c
d
from e
f
from g
h
to i
j
If there's match for from, add to to the beginning of the next line. If there's a match for to, add from to the beginning of the next line. The output should be like this
from a
to b
to c
from d
from e
to f
from g
to h
to i
from j
Can this be done using any unix commands?
I have tried paste command to merge every 2 lines and then using sed. Something like this. But it's definitely wrong. Also, I don't know how to split it back again.
paste -d - - <file> | sed "s/\(^from.*\)/\1 to/" | sed "s/\(^to.*\)/\1 from/"
I think there should be an easier solution to this compared to what I'm doing.
Using sed :
sed '/^from/{n;s/^/to /;b};/^to/{n;s/^/from /}'
You can try it here.
$ awk '{if ($1 ~ /^(from|to)$/) dir=$1; else $0=(dir=="from" ? "to" : "from") OFS $0} 1' file
from a
to b
to c
from d
from e
to f
from g
to h
to i
from j
Could you please try following.
awk '
{
val=prev=="from" && $0 !~ /to/?"to "$0:prev=="to" && $0 !~/from/?"from "$0:$0
prev=$1
$0=val
}
1
' Input_file
Something like this should work :
awk '
#Before reading the file I build a dictionary that links "from" keywoard to "to" value and inversally
BEGIN{kw["from"]="to"; kw["to"]="from"}
#If the first word of the line is a key of my dictionary (to or from), I save the first word in k variable and print the line
$1 in kw{k=$1;print;next}
#Else I add the "opposite" of k at the beginning of the line
{print kw[k], $0}
' <input>

How to get lines from the last match to the end of file?

Need to print lines after the last match to the end of file. The number of matches could be anything and not definite. I have some text as shown below.
MARKER
aaa
bbb
ccc
MARKER
ddd
eee
fff
MARKER
ggg
hhh
iii
MARKER
jjj
kkk
lll
Output desired is
jjj
kkk
lll
Do I use awk with RS and FS to get the desired output?
You can actually do it with awk (gawk) without using any pipe.
$ awk -v RS='(^|\n)MARKER\n' 'END{printf "%s", $0}' file
jjj
kkk
lll
Explanations:
You define your record separator as (^|\n)MARKER\n via RS='(^|\n)MARKER\n', by default it is the EOL char
'END{printf "%s", $0}' => at the end of the file, you print the whole line, as RS is set at (^|\n)MARKER\n, $0 will include all the lines until EOF.
Another option is to use grep (GNU):
$ grep -zoP '(?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z' file
jjj
kkk
lll
Explanations:
-z to use the ASCII NUL character as delimiter
-o to print only the matching
-P to activate the perl mode
PCRE regex: (?<=MARKER\n)(?:(?!MARKER)[^\0])+\Z explained here https://regex101.com/r/RpQBUV/2/
Last but not least, the following sed approach can also been used:
sed -n '/^MARKER$/{n;h;b};H;${x;p}' file
jjj
kkk
lll
Explanations:
n jump to next line
h replace the hold space with the current line
H do the same but instead of replacing, append
${x;p} at the end of the file exchange (x) hold space and pattern space and print (p)
that can be turned into:
tac file | sed -n '/^MARKER$/q;p' | tac
if we use tac.
Could you please try following.
tac file | awk '/MARKER/{print val;exit} {val=(val?val ORS:"")$0}' | tac
Benefit of this approach will be awk will just read last block of the Input_file(which will be actually first block for awk after tac prints it reverse)and exit after that.
Explanation:
tac file | ##Printing Input_file in reverse order.
awk '
/MARKER/{ ##Searching for a string MARKER in a line of Input_file.
print val ##Printing variable val here. Because we need last occurrence of string MARKER,which has become first instance after reversing the Input_file.
exit ##Using exit to exit from awk program itself.
}
{
val=(val?val ORS:"")$0 ##Creating variable named val whose value will be keep appending to its own value with a new line to get values before string MARKER as per OP question.
}
' | ##Sending output of awk command to tac again to make it in its actual form, since tac prints it in reverse order.
tac ##Using tac to make it in correct order(lines were reversed because of previous tac).
You can try Perl as well
$ perl -0777 -ne ' /.*MARKER(.*)/s and print $1 ' input.txt
jjj
kkk
lll
$
This might work for you (GNU sed):
sed -nz 's/.*MARKER.//p' file
This uses greed to delete all lines upto and including the last occurrence of MARKER.
Simplest to remember:
tac fun.log | sed "/MARKER/Q" | tac
This awk solution would work with any version of awk on any OS:
awk '/^MARKER$/ {s=""; next} {s = s $0 RS} END {printf "%s", s}' file
jjj
kkk
lll

How to print lines with the specified word in the path?

Let's say I have file abc.txt which contains the following lines:
a b c /some/path/123/path/120
a c b /some/path/312/path/098
a p t /some/path/123/path/321
a b c /some/path/098/path/123
and numbers.txt:
123
321
123
098
I want to print the whole line which contain "123" only in the third place under "/some/path/123/path",
I don't want to print line "a c b/some/path/312/path" or
"a b c /some/path/098/path/123/". I want to save all files with the "123" in the third place in the new file.
I tried several methods and the best way seems to be use awk. Here is my example code which is not working correctly:
for i in `cat numbers.txt | xargs`
do
cat abc.txt | awk -v i=$i '$4 ~ /i/ {print $0}' > ${i}_number.txt;
done
because it's catching also for example "a b c /some/path/098/path/123/".
Example:
For number "123" I want to save only one line from abc.txt in 123_number.txt:
a b c /some/path/123/path/120
For number "312" I want to save only one line from abc.txt in 312_number.txt:
a c b /some/path/312/path/098
this can be accomplished in a single awk call:
$ awk -F'/' 'NR==FNR{a[$0];next} ($4 in a){f=$4"_number.txt";print >>f;close(f)}' numbers.txt abc.txt
$ cat 098_number.txt
a b c /some/path/098/path/123
$ cat 123_number.txt
a b c /some/path/123/path/120
a p t /some/path/123/path/321
keep numbers in an array and use it for matching lines, append matching lines to corresponding files.
if your files are huge you may speed up the process using sort:
sort -t'/' -k4 abc.txt | awk -F'/' 'NR==FNR{a[$0];next} ($4 in a){if($4!=p){close(f);f=(p=$4)"_number.txt"};print >>f}' numbers.txt -

Bash: Separating a file by blank lines and assigning to a list

So i have a file for example
a
b
c
d
I'd like to make the list of the lines with data out of this. The empty line would be the seperator. So above file's list would be
First element = a
Second element = b
c
Third element = d
Replace blank lines with ,, then remove newline characters:
cat <file> | sed 's/^$/, /' | tr -d '\n'
The following awk would do:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}($1=$1)' file
This adds an extra , at the end. You can get rid of that in the following way:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}
{$1=$1;s=s $0 ORS}END{sub(ORS"$","",s); print s}' file
But what happened now, by making this slight modification to eliminate the last ORS (i.e. comma), you have to store the full thing in memory. So you could then just do it more boring and less elegant by storing the full file in memory:
awk '{s=s $0}END{gsub(/\n\n/,",",s);gsub(/\n/,"",s); print s}' file
The following sed does exactly the same. Store the full file in memory and process it.
sed ':a;N;$!ba;s/\n\n/,/g;s/\n//g' <file>
There is, however, a way to play it a bit more clever with awk.
awk 'BEGIN{RS=OFS="";FS="\n"}{$1=$1; print (NR>1?",":"")$0}' file
It depends on what you need to do with that data.
With perl, you have a one-liner:
$ perl -00 -lnE 'say "element $. = $_"' file.txt
element 1 = a
element 2 = b
c
element 3 = d
But clearly you need to process the elements in some way, and I suspect Perl is not your cup of tea.
With bash you could do:
elements=()
n=0
while IFS= read -r line; do
[[ $line ]] && elements[n]+="$line"$'\n' || ((n++))
done < file.txt
# strip the trailing newline from each element
elements=("${elements[#]/%$'\n'/}")
# and show what's in the array
declare -p elements
declare -a elements='([0]="a" [1]="b
c" [2]="d")'
$ awk -v RS= '{print "Element " NR " = " $0}' file
Element 1 = a
Element 2 = b
c
Element 3 = d
If you really want to say First Element instead of Element 1 then enjoy the exercise :-).

awk find and replace variable file2 into file1 when matched

Tried couple of answers from similar questions but not quite getting correct results. Trying to search second file for variable and replace with second variable if there, otherwise keep original...
File1.txt
a
2
c
4
e
f
File2.txt
2 b
4 d
Wanted Output.txt
a
b
c
d
e
f
So far what I have seems to sort of work, but anywhere the replacement is happening I'm getting a blank row instead of the new variable...
Current Output.txt
a
c
e
f
Curent code....
awk -F'\t' 'NR==FNR{a[$1]=$2;next} {print (($1 in a) ? a[$1] : $1)}' file2.txt file1.txt > output.txt
Also tried and got same results...
awk -F'\t' 'NR==FNR{a[$1]=$2;next} {$1 = a[$1]}1' file2.txt file1.txt > output.txt
Sorry first wrote incorrectly..fixed the key value issue.
Did try what you did, still not getting missing in output.txt
awk -F'\t' 'NR==FNR{a[$1]=$2;next} $1 in a{$1 = a[$1]}1' file2.txt file1.txt > output.txt
your key value pair is not right... $1 is the key, $2 is the value.
$ awk -F'\t' 'NR==FNR{a[$1]=$2;next} $1 in a{$1=a[$1]}1' file.2 file.1
a
b
c
d
e
f
try below solution -
awk 'NR==FNR{a[$1]=$NF;next} {print (a[$NF]?a[$NF]:$1)}' file2.txt file1.txt
a
b
c
d
e
f

Resources