how to format the output of a cat expression - sorting

I have a file in this format:
event: event1
event: event2
event: event2
event: event3
With this command:
cat event.txt | cut -d : -f2 |sort |uniq -c |sort -n
I get this result:
1 event1
1 event3
2 event2
I would like to have instead an output like that:
event1 1
event2 2
event3 1
Do you have any idea of a way to do that?

If you want your output to be sorted by eventID like event[1] event[2] ... just take fedorqui's solution.
If you want your output to keep the original order of col2 in input file:
awk -F': *' 'NR==FNR{c[$2]++;next}$2 in c{print $2,c[$2];delete c[$2]}' file file
an example:
kent$ cat f
event: event1
event: event2
event: event2
event: event4
event: event4
event: event3
kent$ awk -F': *' 'NR==FNR{c[$2]++;next}$2 in c{print $2,c[$2];delete c[$2]}' f f
event1 1
event2 2
event4 2
event3 1

Related

Google Sheets Query get Even/Odd rows in grouped results

I have a long list of rows with dates on the side, and a text field after
01/01/2019 | ABC | ...
The list is ordered by date, and may have between 1 and 4 rows per date
01/01/2019 | ABC | ...
01/01/2019 | DEF | ...
05/01/2019 | ABC | ...
05/01/2019 | DEF | ...
05/01/2019 | ABC | ...
05/01/2019 | GHI | ...
10/01/2019 | ABC | ...
10/01/2019 | XYZ | ...
I can happily run a QUERY() which groups by the date and COUNT()s the number of rows matching that date
01/01/2019 | 2
05/01/2019 | 4
10/01/2019 | 2
I'm trying to use a series of functions in acceptable Google Sheets format which will group the items by date, and then only return the Nth rows. I'm also happy with EVEN/ODD rows here.
Importantly, I don't want the EVEN/ODD based on the actual spreadsheet ROW(), but I need the EVEN/ODD/Nth based on the number of matching rows in the aggregated group, if that makes sense.
So I would like this output:
EVENS
01/01/2019 | DEF | (row 2 in group)
05/01/2019 | DEF | (row 2 in group)
05/01/2019 | GHI | (row 4 in group)
10/01/2019 | XYZ | (row 2 in group)
ODDS
01/01/2019 | ABC | (row 1 in group)
05/01/2019 | ABC | (row 1 in group)
05/01/2019 | ABC | (row 3 in group)
10/01/2019 | ABC | (row 1 in group)
Ultimately, my aim is to count all the occurrences of the text field (ABC/DEF/GHI/etc) that happen as the FIRST or SECOND or THIRD or FOURTH event for any particular day, then sort descending, but only include them (for example) if ABC was an EVEN row of that group, or if XYZ was an ODD row within that group (eg row 2 of the group, ignoring the fact in the whole spreadsheet it happens to be on row 35)
ABC | 156
DEF | 30
GHI | 10
JKL | 8
MNO | 7
XYZ | 1
You could do it with one formula if you wanted to
=filter(A2:B,ISEVEN(row(A2:A)-match(A2:A,A2:A,0)))
and
=filter(A2:B,isodd(row(A2:A)-match(A2:A,A2:A,0)+1))
assuming the data starts in row 2.
If the data started in a different row, you could do a lookup on the row:
=filter(A2:B,ISODD(row(A2:A)-vlookup(A2:A,{A2:A,row(A2:A)},2,false)))
and
=filter(A2:B,ISEVEN(row(A2:A)-vlookup(A2:A,{A2:A,row(A2:A)},2,false)))
you can add helper column like:
=ARRAYFORMULA(IF(LEN(A1:A), COUNTIFS(B1:B, B1:B, ROW(B1:B), "<="&ROW(B1:B)), ))
and then filter for even and odd like:
=FILTER(A1:B, ISEVEN(C1:C))
=FILTER(A1:B, ISODD(C1:C))

How to use awk and sed to count number of elements in a column

There are some emails in my email account's inbox:
12:00 <harry#hotmail.com>
12:20 <harry#hotmail.com>
12:22 <jim#gmail.com>
12:30 <clare#bbc.org>
12:40 <harry#hotmail.com>
12:50 <jim#gmail.com>
12:55 <harry#hotmail.com>
I would like to use command line (awk, sed, grep etc.) to count the number of emails I received from different people.(change all the minute to :00) How can I make it?
I prefer the result like:
Number of email time From
3 12:00 <jim#gmail.com>
4 12:00 <harry#hotmail.com>
1 12:00 <clare#bbc.org>
Appreciate for your help!
Here is how to do it with awk
awk '{a[$1]++} END {for (i in a) print a[i]"\t"i}' file
4 <harry#hotmail.com>
1 <clare#bbc.org>
2 <jim#gmail.com>
You may want to use uniq after sort:
$ sort file | uniq -c
1 <clare#bbc.org>
4 <harry#hotmail.com>
2 <jim#gmail.com>
You can also get the header using printf:
$ printf "Number of email\temail\n%s\n" "$(sort file | uniq -c)"
Number of email email
1 <clare#bbc.org>
4 <harry#hotmail.com>
2 <jim#gmail.com>
We initially have to sort the file in order to uniq to work properly. From man uniq:
Filter adjacent matching lines from INPUT

bash- get all lines with the same column value in two files

I have two text files each with 3 fields. I need to get the lines with the same value on the third field. The 3rd field value is unique in each file. Example:
file1:
1 John 300
2 Eli 200
3 Chris 100
4 Ann 600
file2:
6 Kevin 250
7 Nancy 300
8 John 100
output:
1 John 300
7 Nancy 300
3 Chris 100
8 John 100
When I use the following command:
cat file1 file2 | sort -k 3 | uniq -c -f 2
I get only one row from an input file with the duplicate value. I need both!
this one-liner gives you that output:
awk 'NR==FNR{a[$3]=$0;next}$3 in a{print a[$3];print}' file1 file2
My solution is
join -1 3 -2 3 <(sort -k3 file1) <(sort -k3 file2) | awk '{print $2, $3, $1; print $4, $5, $1}'
or
join -1 3 -2 3 <(sort -k3 file1) <(sort -k3 file2) -o "1.1 1.2 0 2.1 2.2 0" | xargs -n3

Group Unique ID

In stata if I have a list if groups:
XYZ
ABC
ABC
BCH
JSA
BCH
XYZ
How I get each group to have a unique ID in a second column after sorting, for example:
ABC 1
BCH 2
JSA 3
XYZ 4
You need sort, then group(), which is part of egen.
sysuse auto,clear
sort make
egen make_gp = group(make)
This yields:
. list make make_gp in 1/5
+-------------------------+
| make make_gp |
|-------------------------|
1. | AMC Concord 1 |
2. | AMC Pacer 2 |
3. | AMC Spirit 3 |
4. | Buick Century 7 |
5. | Buick Electra 8 |
+-------------------------+

Loop bash shell script

I have a bash shell script that outputs an iCal event using iCal Buddy which displays 2 events like:
Event1 Title
Event1 Date
Event2 Title
Event2 Date
I would like to have the script output like:
Event Title
Event Date
(wait 10 seconds) clear the Event Title, Event Date, then output the next Event Title, Event Date (wait 10 seconds) then loop back to the first Event and continue looping. I've tried using the command followed by sleep 10, and repeating the command with | head -n 4 | tail -n 2, although then it only outputs the second Event.
How can I do this? (my shell script is below) Thanks!
/usr/local/bin/icalBuddy -npn -nc -n -iep "title,datetime" -b "★ " -ps "| ★\n|" -po "title,datetime" -nrd -df "%a, %b %e" eventsToday+2 | cut -c 1-33
2
Unless I misunderstand you, this should do what you want:
while true
do
clear
command | pipeline | head -n 2
sleep 10
clear
command | pipeline | head -n 4 | tail -n 2
sleep 10
done
Where "command | pipeline" represents the icalBuddy and cut in your question.

Resources