Editing the output of uniq -c (Bash) - bash

I have a list of words that I sorted using sort -f. Now, I want to use uniq -c in order to obtain a list without repeated words but with a counter on the left side. I also want the column with the numbers to be separated by a tab from the column with the words.
This is my list:
Monday day
Tuesday day
Easter holiday
Monday day
christmas holiday
Tuesday day
Friday day
Thursday day
thanksgiving holiday
And this is my desired output:
1 christmas holiday
1 Easter holiday
1 Friday day
2 Monday day
1 thanksgiving holiday
1 Thursday day
2 Tuesday day
I tried using the following command, though I get a tab before the numbers instead of between the numbers and the words.
sort -f | uniq -c | sed $'s/\t */\t/g'
What do I have to modify in order to get the output that I want?

You need to get the number in a capture group and copy it to the replacement so you can put the tab after it.
sort -f days.txt | uniq -c | sed $'s/^ *\([0-9]*\) */\\1\t/'
uniq -c doesn't put tab before the count, it just puts spaces.

Related

Finding the number of unique values that contain another set of unique values

For example my text file looks something like this:
year, user, tweet
2009, Katie, I love playing football
2010, James, I play football
2013, Bob, I play basketball
2013, James, I play Baseball
The delimiter is ',' and I want to count how many unique users have mentioned the exact word 'play' in their tweet using BASH in a one liner.
The output of this should be 2, as James mentions 'play' twice and Bob once (No Katie as her word is 'playing'), so 2 people.
I have tried this:
$ cut -d ',' -f 2,3 Dataset.txt | grep "\<play\>" | sort | uniq -c
The problem with your pipeline is while uniq -c will provide a count of the unique occurrences, but "James, I play Baseball" and "James, I play football" will be considered unique. You can limit the check to the first N characters with the -w N option to uniq (in your case -w3), but you are much better off (and much, much more efficient) using a single call to awk.
Here you are concerned with the 2nd field (the name) and whether play occurs in the record. You can use /play[[:blank:]]/ (or /[[:blank:]]play[[:blank:]]/) as the test for "play" alone). Then each time a record containing "play" alone is encountered, you save the number in the array a[] indexed by the name (e.g. a[$2]). You just increment the number in the index for each name and then using the END rule, you output the name and the number of occurrences.
That makes the task quite simple, e.g.
awk -F, '/[[:blank:]]play[[:blank:]]/{a[$2]++} END {for (i in a) print i, a[i]}' Dataset.txt
Output
James 2
Bob 1
How about
grep -wF play Dataset.txt | cut -f 2 -d , |sort -u | wc -l
?

Last Day of Month in csvfile

i try to delete all days of a csv file which not matched last days. But I find not the right solution.
date,price
2018-07-02,162.17
2018-06-29,161.94
2018-06-28,162.22
2018-06-27,162.32
2018-06-12,163.01
2018-06-11,163.53
2018-05-31,164.87
2018-05-30,165.59
2018-05-29,165.42
2018-05-25,165.96
2018-05-02,164.94
2018-04-30,166.16
2018-04-27,166.69
The output I want become
date,price
2018-06-29,161.94
2018-05-31,164.87
2018-04-30,166.16
I try it with cut + grep
cut -d, -f1 file.csv | grep -E "28|29|30"
Work but bring nothing when combine -f1,2.
I find csvkit which seem to me the right tool, but I find not the solution for multiple grep.
csvgrep -c 1 -m 30 file.csv
Bring me the right result but how can combine multiple search option? I try -m 28,29,30 and -m 28 -m 29 -m 30 all work not. Best it work with last day of every month.
Maybe one have here a idea.
Thank you and nice Sunday
Silvio
You want to get all records of the LAST day of the month. But months vary in length (28-29-30-31).
I don't see why you used cut to extract the first field (the date part), because the data in the second field does not look like dates at all (xx-xx).
I suggest to use grep directly to display the lines that matches the following pattern mm-dd; where mm is the month number, and dd is the last day of the month.
This command should do the trick:
grep -E "01-31|02-(28|29)|03-31|04-30|05-31|06-30|07-31|08-30|09-31|10-30|11-31|12-30" file.csv
This command will give the following output:
2018-05-31,164.87
2018-04-30,166.16

Shell script to count number of logins per day

I'm new to Shell programming. I'm trying to write a shell script to count the number of logins per days of the week for users on some machine
Output should look like this:
123 Mon
231 Tue
555 Wed
21 Thu
44 Fri
123 Sat
10 Sun
I've tried to do it using commands last, uniq and sort like this
last -s -7days | awk '{print $1, $4,$5,$6}' | uniq -cd |sort -u
but I think I'm missing something because I'm somehow getting duplicated results. Also, I'm not sure how to get overall counts separated by days.
The problem with uniq is it only collapses adjacent duplicates lines. In your case -d on uniq is hiding the lines that are breaking up the duplicate lines, I am guessing you have some lines similar to reboot 4.4.5-1-ARCH Wed Mar between login attempts for the day. You will also have problems with multiple users logging in breaking up the counts for other users.
Typically you sort | uniq to get a true list of uniq rows but if you remove the -d you end up with lines you do not want. These are best filtered out separately either before or after the sort | uniq.
Finally the last sort -u will delete data if two rows happen to match exactly, I do not think this is what you want. Instead it is better to sort on the date column (will cause a small issue on the month rollover) or by another column you care about with the -k FILENUM argument if you need to sort the counts at all.
Combine this together and you get:
last -s -7days | awk '/reboot/ {next}; /wtmp/ {next}; /^$/ {next}; {print $1, $4,$5,$6}' | sort | uniq -c | sort -k 5
Note that .../reboot/ {next};... causes awk to ignore lines that match the pattern within the /s.

Shell Script to get exception from logs for last one hour

I am developing script which will grep logs of last one hour and check any exception and send email for solaris platform.
I did following steps
grep -n -h date +'%Y-%m-%d %H:%M' test.logs
above command gives me line number and then i do following
tail +6183313 test.log | grep 'exception'
sample logs
2014-02-17 10:15:02,625 | WARN | m://mEndpoint | oSccMod | 262 - com.sm.sp-client - 0.0.0.R2D03-SNAPSHOT | 1201 or 101 is returned as exception code from SP, but it is ignored
2014-02-17 10:15:02,625 | WARN | m://mEndpoint | oSccMod | 262 - com.sm.sp-client - 0.0.0.R2D03-SNAPSHOT | SP error ignored and mock success returned
2014-02-17 10:15:02,626 | INFO | 354466740-102951 | ServiceFulfill | 183 - org.apache.cxf | Outbound Message
Please suggest any better alternative to perform above task.
With GNU date, one can use:
grep "^$(date -d -1hour +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
The first step above is to select all log entries from the last hour. This is done with grep by looking for all lines beginning with the year-month-day and hour that matches one hour ago:
grep "^$(date -d -1hour +'%Y-%m-%d %H')" test.logs
The next step in the pipeline is to select from those lines the ones that have exceptions:
grep 'exception'
The last step in the pipeline is to send out the mail:
mail -s "exceptions in last hour of test.logs" ImranRazaKhan
The above sends mail to ImranRazaKhan (or whatever email address you chose) with the subject line of "exceptions in last hour of test.logs".
The convenience of having the -d option to date should not be underestimated. It might seem simple to subtract 1 from the current hour but, if the current hour is 12am, then we need to adjust both the day and the hour. If the hour was 12am on the first of the month, we would also have to change the month. And likewise for year. And, of course, February requires special consideration during leap years.
Adapting the above to Solaris:
Consider three cases:
Under Solaris 11 or better, the GNU date utility is available at /usr/gnu/bin/date. Thus, we need simply to specify a path for date:
grep "^$(/usr/gnu/bin/date -d -1hour +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
Under Solaris 10 or earlier, one can download & install GNU date
If GNU date is still not available, we need to find another way to find the date and time for one hour ago. The simplest workaround is likely to select a timezone that is one hour behind your timezone. If that timezone was, say, Hong Kong, then use:
grep "^$(TZ=HongKong date +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
You can do like this:
dt="$(date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S")"
awk -v dt="$dt" '$0 ~ dt && /exceltion/' test.logs
Scanning through millions lines of log sounds terribly inefficient. I would suggest changing log4j (what it looks like) configuration of your application to cut a new log file every hour. This way, tailing the most recent file becomes a breeze.

Loops in Shell Scripting [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I need help with this shell script.
Must use a loop of some sort.
Must use input data exactly as shown.
Output redirection should be accomplished within the script, not on the command line.
Here's the input files I have:
http://pastebin.com/m3f783597
Here's what the output needs to be:
http://pastebin.com/m2c53b25a
Here's my failed attempt:
http://pastebin.com/m2c60b41
And that failed attempt's output:
http://pastebin.com/m3460e78c
Here's the help. Try to follow these as much as possible before looking at my solution below. That will help you out more in the long run, and in the short runsince it's a certainty that your educator can see this as easily as you can.
If he finds you've plagiarized code, it will probably mean an instant fail.
Your "failed attempt" as you put it is here. It's actually not too bad for a first attempt.
echo -e "Name\t\t On-Call\t\t Phone"
for daycount in 2 1 4 5 7 6 3
do
for namecount in 3 2 6 1 7 4 5
do
day=`head -n $daycount p2input2|tail -n 1|cut -f 2 -d " "`
name=`head -n $namecount p2input1|tail -n 1|cut -f 1 -d " "`
phone=`head -n $namecount p2input1|tail -n 1|cut -f 2 -d " "`
echo -e "$name\c"
echo -e "\t\t$day\c"
echo -e "\t\t$phone"
continue
done
done
And here's the hints:
You have two loops, one inside the other, each occurring 7 times. That means 49 lines of output rather than 7. You want to process each day and look up up name and phone for that day (actually name for that day and phone for that name).
It's not really suitable hardcoding linenumbers (although I admit it is sneaky)- what if the order of data changes? Better to search on values.
Tabs make things messy, use spaces instead since then the output doesn't rely on terminal settings and you don't need to worry about misaligned tabs.
And, for completeness, here's the two input files and the expected output:
p2input1 p2input2
======== ========
Dave 734.838.9801 Bob Tuesday
Bob 313.123.4567 Carol Monday
Carol 248.344.5576 Ted Sunday
Mary 313.449.1390 Alice Wednesday
Ted 248.496.2204 Dave Thursday
Alice 616.556.4458 Mary Saturday
Frank 634.296.3357 Frank Friday
Expected output
===============
Name On-Call Phone
carol monday 248.344.5576
bob tuesday 313.123.4567
alice wednesday 616.556.4458
dave thursday 734.838.9801
frank friday 634.296.3357
mary saturday 313.449.1390
ted sunday 248.496.2204
Having said all that, and assuming you've gone away for at least two hours to try and get your version running, here's mine:
1 #!/bin/bash
2 spc20=" "
3 echo "Name On-Call Phone"
4 echo
5 for day in monday tuesday wednesday thursday friday saturday sunday
6 do
7 name=`grep -i " ${day}$" p2input2 | awk '{print $1}'`
8 name=`echo ${name} | tr '[A-Z]' '[a-z]'`
9 bigname=`echo "${name}${spc20}" | cut -c1-15`
10
11 bigday=`echo "${day}${spc20}" | cut -c1-15`
12
13 phone=`grep -i "^${name} " p2input1 | awk '{print $2}'`
14
15 echo "${bigname} ${bigday} ${phone}"
16 done
And the following description should help:
Line 1elects the right shell, not always necessary.
Line 2 gives us enough spaces to make formatting easier.
Lines 3-4 give us the title and blank line.
Lines 5-6 cycles through the days, one at a time.
Line 7 gives us a name for the day. 'grep -i " ${day}$"' searches for the given day (regardless of upper or lower case) at the end of a line in pinput2 while the awk statement gives you field 1 (the name).
Line 8 simply makes the name all lowercase.
Line 9 creates a string of the right size for output by adding 50 spaces then cutting off all at the end except for 15.
Line 11 does the same for the day.
Line 13 is very similar to line 7 except it searches pinput1, looks for the name at the start of the line and returns the phone number as the second field.
Line 15 just outputs the individual items.
Line 16 ends the loop.
So there you have it, enough hints to (hopefully) fix up your own code, and a sample as to how a professional would do it :-).
It would be wise to read up on the tools used, grep, tr, cut and awk.
This is homework, I assume?
Read up on the sort and paste commands: man sort, man paste
Pax has given a good answer, but this code invokes fewer processes (11 vs a minimum of 56 = 7 * 8). It uses an auxilliary data file to give the days of the week and their sequence number.
cat <<! >p2input3
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
!
sort +1 p2input3 > p2.days
sort +1 p2input2 > p2.call
join -1 2 -2 2 p2.days p2.call | sort +2 > p2.duty
sort +0 p2input1 > p2.body
join -1 3 -2 1 p2.duty p2.body | sort +2n | tr '[A-Z]' '[a-z]' |
awk 'BEGIN { printf("%-14s %-14s %s\n", "Name", "On-Call", "Phone");
printf "\n"; }
{ printf("%-14s %-14s %s\n", $1, $2, $4);}'
rm -f p2input3 p2.days p2.call p2.duty p2.body
The join command is powerful, but requires the data in the two files in sorted order on the joining keys. The cat command gives a list of days and the day number. The first sort places that list in alphabetic order of day name. The second sort places the names of the people on duty in alphabetic order of day name too. The first join then combines those two files on day name, and then sorts based on user name, yielding the output:
Wednesday 3 Alice
Tuesday 2 Bob
Monday 1 Carol
Thursday 4 Dave
Friday 5 Frank
Saturday 6 Mary
Sunday 7 Ted
The last sort puts the names and phone numbers into alphabetic name order. The second join then combines the name + phone number list with the name + duty list, yielding a 4 column output. This is run through tr to make the data all lower case, and then formatted with awk, which demonstrates its power and simplicity nicely here (you could use Perl or Python instead, but frankly, that would be messier).
Perl has a motto: TMTOWTDI "There's more than one way to do it".
That often applies to shell scripting too.
I suppose my code does not use a loop...oh dear. Replace the initial cat command with:
for day in "1 Monday" "2 Tuesday" "3 Wednesday" "4 Thursday" \
"5 Friday" "6 Saturday" "7 Sunday"
do echo $day
done > p2input3
This now meets the letter of the rules.
Try this one:
sort file1.txt > file1sort.txt
sort file2.txt > file2sort.txt
join file2sort.txt file1sort.txt | column -t > result.txt
rm file1sort.txt file2sort.txt

Resources