Remove duplicate entries in a Bash script [duplicate]

Remove duplicate entries in a Bash script [duplicate] - bash

This question already has answers here:
How to delete duplicate lines in a file without sorting it in Unix
(9 answers)
Closed 7 years ago.
I want to remove duplicate entries from a text file, e.g:
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
Is there any possible way to remove the duplicate entries using a Bash script?
Desired output
kavitha= Tue Feb 20 14:00 19 IST 2012
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012

You can sort then uniq:
$ sort -u input.txt
Or use awk:
$ awk '!a[$0]++' input.txt

It deletes duplicate, consecutive lines from a file (emulates "uniq").
First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'

Perl one-liner similar to #kev's awk solution:
perl -ne 'print if ! $a{$_}++' input
This variation removes trailing whitespace before comparing:
perl -lne 's/\s*$//; print if ! $a{$_}++' input
This variation edits the file in-place:
perl -i -ne 'print if ! $a{$_}++' input
This variation edits the file in-place, and makes a backup input.bak
perl -i.bak -ne 'print if ! $a{$_}++' input

This might work for you:
cat -n file.txt |
sort -u -k2,7 |
sort -n |
sed 's/.*\t/ /;s/\([0-9]\{4\}\).*/\1/'
or this:
awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt

Related

convert multiple variable output to a table in bash

I have 3 variables, $commonName, $expiryDate and $DaysRemInUnixEpoch. Each variable has 3 lines as below output. I want to display output of all 3 variables in to 3 different columns. I tried looking for solution using printf but no luck. Can anyone please advise if they have done this in the past using printf and how? Any help will be much appreciated.
Below are 3 variables output together in one column. I want to split in to 3 columns having 3 rows in each column.
bash-4.1$ echo -e "$commonName\n$expiryDate\n$daysRemInUnixEpoch"
mycertificate_mycert.mycomp.net
PSIN0P551
ROOTROOTCA
Feb 6 2022 11:57:32 GMT
Jan 9 2023 18:51:25 GMT
Mar 12 2035 18:24:54 GMT
682
1020
5465
bash-4.1$
desired output I am looking for is something like below
mycertificate_mycert.mycomp.net Feb 6 2022 11:57:32 GMT 682
PSIN0P551 Jan 9 2023 18:51:25 GMT 1020
ROOTROOTCA Mar 12 2035 18:24:54 GMT 5465

With bash (Process Substitution), paste and column:
paste -d ';' <(echo "$commonName") <(echo "$expiryDate") <(echo "$daysRemInUnixEpoch") | column -s ';' -t
Output:
mycertificate_mycert.mycomp.net Feb 6 2022 11:57:32 GMT 682
PSIN0P551 Jan 9 2023 18:51:25 GMT 1020
ROOTROOTCA Mar 12 2035 18:24:54 GMT 5465
I assume that your variables do not contain ;.
see: man paste and man column

Prepend to lines of a program as they come in

I'm running xinput test and trying to timestamp the data.
From another question, I'm using :
xinput test $KEYBOARD_ID | (echo -n $(date +"$date_format") && cat) > $LOGFILE_NAME
However, that dates the first line, not every line.
If I do a while loop:
while IFS= read -r line
do
echo -n $(date +"date_format") &&cat)
done < $(xinput test $KEYBOARD_ID)
The loop exits right away, since xinput test is yet to generate any text.
Process substitution fails as well, only dating the first line of the file.
while IFS= read -r line
do
(echo -n $(date +"$date_format") && cat) > $LOGFILE_NAME
done < <(xinput test $KEYBOARD_ID)
Writing to file and post-processing won't work, because I need the timestamp when each line was processed.
I feel like I'm making a small error, but I can't find it, any input?

The following GNU awk command is equivalent to #karakfa's answer, but launches fewer processes, so it could be faster if the device is generating a lot of events:
xinput test "$KEYBOARD_ID" | gawk '{print strftime(), $0}' > "$LOGFILE_NAME"

perhaps this will help...
$ seq 10 | xargs -n1 -I {} echo $(date) {}
Wed May 10 14:43:09 EDT 2017 1
Wed May 10 14:43:09 EDT 2017 2
Wed May 10 14:43:09 EDT 2017 3
Wed May 10 14:43:09 EDT 2017 4
Wed May 10 14:43:09 EDT 2017 5
Wed May 10 14:43:09 EDT 2017 6
Wed May 10 14:43:09 EDT 2017 7
Wed May 10 14:43:09 EDT 2017 8
Wed May 10 14:43:09 EDT 2017 9
Wed May 10 14:43:09 EDT 2017 10
Note that, as commented below, this time stamp won't be updated for each line, if you want to time stamp each new line the gawk solution by user000001

I feel like I'm making a small error, but I can't find it
Yep. It's the cat. It reads the rest of the input and puts it there. Instead, you should just write the current line, and append it to the file:
while IFS= read -r line
do
(echo "$(date +"$date_format") $line") >> $LOGFILE_NAME
done < <(xinput test $KEYBOARD_ID)
Which can more canonically be written as
while IFS= read -r line
do
echo "$(date +"$date_format") $line"
done < <(xinput test $KEYBOARD_ID) > "$LOGFILE_NAME"
I would go for #user000001's shorter and more efficient solution though.

Find and Echo only the date (with format) in String Output on Bash

I am trying to get the date "+%a %b %d %R:%S %Y" in bash.
here's the sample command and output
$ xscreensaver-command --time
XScreenSaver 5.32: screen non-blanked since Thu Oct 29 12:15:05 2015 (hacks: #184, #60)
I am trying to get the the value Thu Oct 29 12:15:05 2015 on the string.
How can I achieve this?

Try to append with GNU grep:
2>&1 | grep -Po 'since \K.*(?= \()'
Output:
Thu Oct 29 12:15:05 2015

sed: convert time(3) seconds in a table into printable date (spamdb)

I get the following from spamdb, where the third field represents the time in seconds since the Epoch.
Cns# spamdb | fgrep TRAPPED
TRAPPED|113.163.117.129|1360836903
TRAPPED|113.171.216.201|1360837481
TRAPPED|122.177.159.61|1360844596
TRAPPED|36.231.9.231|1360865649
TRAPPED|37.146.207.209|1360832096
TRAPPED|212.156.98.210|1360837015
TRAPPED|59.99.160.62|1360839785
TRAPPED|86.127.116.162|1360840492
TRAPPED|92.83.139.194|1360843056
TRAPPED|219.71.12.150|1360844704
I want to sort this table by the time, and print the time field with date -r, such that it's presentable and clear when the event has occurred.
How do I do this in tcsh on OpenBSD?
Sorting with sort is easy, and so is editing with sed; but how do I make sed execute date -r or equivalent?

There are indeed a few obstacles here: first, you basically have to separate the data, and then one part of it is presented as-is, whereas another part has to be passed down to date -r for date formatting, prior to being presented to the user.
Another obstacle is making sure the output is aligned: apparently, it's quite difficult to handle the tab character in the shell, possibly only on the BSDs:
sed replace literal TAB
Replacing / with TAB using sed
Also, as we end up piping this to sh for execution, we have to use a different separator for the fields other than the pipe character, |.
So far, this is the best snippet I could come up with, it seems to work great in my tcsh:
Cns# spamdb | fgrep TRAPPED | sort -n -t '|' -k 3 | sed -E -e 's#\|###g' \
-e 's#^([A-Z]+)#([0-9.]+)#([0-9]+)$#"echo -n \2_"; "date -r \3"#g' | \
xargs -n1 sh -c | awk '{gsub("_","\t",$0); print;}'
37.146.207.209 Thu Feb 14 00:54:56 PST 2013
113.163.117.129 Thu Feb 14 02:15:03 PST 2013
212.156.98.210 Thu Feb 14 02:16:55 PST 2013
113.171.216.201 Thu Feb 14 02:24:41 PST 2013
59.99.160.62 Thu Feb 14 03:03:05 PST 2013
86.127.116.162 Thu Feb 14 03:14:52 PST 2013
92.83.139.194 Thu Feb 14 03:57:36 PST 2013
122.177.159.61 Thu Feb 14 04:23:16 PST 2013
219.71.12.150 Thu Feb 14 04:25:04 PST 2013
36.231.9.231 Thu Feb 14 10:14:09 PST 2013

Remove lines where next line matches certain pattern

I have a following simple script for parsing out dates from irc logs (created by irssi)
#!/bin/bash
query=$1
grep -n $query logfile > matches.log
grep -n "Day changed" logfile >> matches.log
cat matches.log | sort -n
It produces output like:
--- Day changed Tue Jul 03 2012
--- Day changed Wed Jul 04 2012
--- Day changed Thu Jul 05 2012
16:54 <#Hamatti> who let the dogs out
--- Day changed Fri Jul 06 2012
--- Day changed Sat Jul 07 2012
--- Day changed Sun Jul 08 2012
12:11 <#Hamatti> dogs are fun
But since I'm only interested in finding out dates for actual matches, I'd like to filter out all those
--- Day changed XXX XXX dd dddd
lines where they don't follow by timestamp on the next line. So the example should output
--- Day changed Thu Jul 05 2012
16:54 <#Hamatti> who let the dogs out
--- Day changed Sun Jul 08 2012
12:11 <#Hamatti> dogs are fun
to get rid of all the disinformation that's not useful.
edit.
After the answer by T. Zelieke I realised that I could make this more of a one-liner so I use the following now to save logfile from being iterated twice.
query=$1
egrep "$query|Day changed" logfile |grep -B1 "^[^-]" |sed '/^--$/d'

grep -B1 "^[^-]" data |sed '/^--$/d'
This uses grep to filter lines that do NOT start with a dash ("^[^-]"). -B1 asks to print the immediate line before a match.
Unfortunately grep separates then each match (pair of two lines) by an -- line. Therefore I pipe the output through sed to get rid of those superflouos lines.

Here's one using awk.
awk -v query="$1" '/^--- Day changed/{day=$0;next} $0 ~ query {if (day!=p) {print day;p=day}; print}'
Every time it finds a "Day changed" line, it stores it in the variable day. Then when it finds a match to the query, it outputs the currently stored day line first. In case there are multiple matches in the same day, the variable p is used to determine if the day-line has been printed already.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove duplicate entries in a Bash script [duplicate] - bash

You can sort then uniq: $ sort -u input.txt Or use awk: $ awk '!a[$0]++' input.txt

It deletes duplicate, consecutive lines from a file (emulates "uniq"). First line in a set of duplicate lines is kept, rest are deleted. sed '$!N; /^\(.*\)\n\1$/!P; D'

This might work for you: cat -n file.txt | sort -u -k2,7 | sort -n | sed 's/.\t/ /;s/\([0-9]\{4\}\)./\1/' or this: awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt

Related

convert multiple variable output to a table in bash

Prepend to lines of a program as they come in

Find and Echo only the date (with format) in String Output on Bash

sed: convert time(3) seconds in a table into printable date (spamdb)

Remove lines where next line matches certain pattern

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove duplicate entries in a Bash script [duplicate] - bash

You can sort then uniq: $ sort -u input.txt Or use awk: $ awk '!a[$0]++' input.txt

It deletes duplicate, consecutive lines from a file (emulates "uniq"). First line in a set of duplicate lines is kept, rest are deleted. sed '$!N; /^\(.*\)\n\1$/!P; D'

This might work for you: cat -n file.txt | sort -u -k2,7 | sort -n | sed 's/.*\t/ /;s/\([0-9]\{4\}\).*/\1/' or this: awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt

Related

convert multiple variable output to a table in bash

Prepend to lines of a program as they come in

Find and Echo only the date (with format) in String Output on Bash

sed: convert time(3) seconds in a table into printable date (spamdb)

Remove lines where next line matches certain pattern

Categories

Resources

This might work for you: cat -n file.txt | sort -u -k2,7 | sort -n | sed 's/.\t/ /;s/\([0-9]\{4\}\)./\1/' or this: awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt