Shell script to count number of logins per day - bash

I'm new to Shell programming. I'm trying to write a shell script to count the number of logins per days of the week for users on some machine
Output should look like this:
123 Mon
231 Tue
555 Wed
21 Thu
44 Fri
123 Sat
10 Sun
I've tried to do it using commands last, uniq and sort like this
last -s -7days | awk '{print $1, $4,$5,$6}' | uniq -cd |sort -u
but I think I'm missing something because I'm somehow getting duplicated results. Also, I'm not sure how to get overall counts separated by days.

The problem with uniq is it only collapses adjacent duplicates lines. In your case -d on uniq is hiding the lines that are breaking up the duplicate lines, I am guessing you have some lines similar to reboot 4.4.5-1-ARCH Wed Mar between login attempts for the day. You will also have problems with multiple users logging in breaking up the counts for other users.
Typically you sort | uniq to get a true list of uniq rows but if you remove the -d you end up with lines you do not want. These are best filtered out separately either before or after the sort | uniq.
Finally the last sort -u will delete data if two rows happen to match exactly, I do not think this is what you want. Instead it is better to sort on the date column (will cause a small issue on the month rollover) or by another column you care about with the -k FILENUM argument if you need to sort the counts at all.
Combine this together and you get:
last -s -7days | awk '/reboot/ {next}; /wtmp/ {next}; /^$/ {next}; {print $1, $4,$5,$6}' | sort | uniq -c | sort -k 5
Note that .../reboot/ {next};... causes awk to ignore lines that match the pattern within the /s.

Related

how UNIX sort command handles expressions with different character sizes?

I am trying to sort and join two files which contain IP addresses, the first file only has IPs, the second file contains IPs and an associated number. But sort acts differently in these files. here are the code and outcomes:
cat file | grep '180.76.15.15' | sort
cat file | grep '180.76.15.15' | sort -k 1
cat file | grep '180.76.15.15' | sort -t ' ' -k 1
outcome:
180.76.15.150 987272
180.76.15.152 52219
180.76.15.154 52971
180.76.15.156 65472
180.76.15.158 35475
180.76.15.15 99709
cat file | grep '180.76.15.15' | cut -d ' ' -f 1 | sort
outcome:
180.76.15.15
180.76.15.150
180.76.15.152
180.76.15.154
180.76.15.156
180.76.15.158
As you can see, the first three commands all produce the same outcome, but when lines only contain IP address, the sorting changes which causes me a problem trying to join files.
Explicitly, the IP 180.76.15.15 appears at the bottom row in the first case (even when I sort explicitly on the first argument), but at the top row in the second case and I can't understand why.
Can anyone please explain why is this happening?
P.S. I am ssh connecting through windows 10 powershell to ubuntu 20.04 installed on VMware.
sort will use your locale settings to determine the order of the characters. From man sort also:
*** WARNING *** The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses native byte values.
This way you can use the ASCII characters order. For example:
> cat file
#a
b#
152
153
15 4
15 1
Here all is sorted with the alphabetical order excluding special characters, first the numbers, then the letters.
thanasis#basis:~/Documents/development/temp> sort file
15 1
152
153
15 4
#a
b#
Here all characters count, first #, then numbers, but the space counts also, then letters.
thanasis#basis:~/Documents/development/temp> LC_ALL=C sort file
#a
15 1
15 4
152
153
b#

Sort list of files with multiple sort keys

I want to be able to list all files within a directory sorted by multiple sort keys. For example:
Level_5_10_1.jpg
Level_5_1_1.jpg
I want Level_5_1_1.jpg to show up first. The sort order should start from the last number, so:
Level_4_2_1.jpg > Level_4_1_10.jpg
Level_3_2_1.jpg > Level_3_1_10.jpg
and so on..
I tried:
ls | sort -h -k3,3n -k2,2n -k1,1n -t_
but didn't get the result I wanted. For example, it listed Level_5_1_2.jpg < Level_1_2_1.jpg which is incorrect
Any ideas?
PS: This is a pastebin of the file list.
I've taken a small sample of filenames. When you split the filenames by _ with the -t option, the first field is 1 which would be "Level", field 2 would be the first number and so on. I'm not entirely sure of the order that you are specifically after, but I think this solution should at least provide you with something to work with. Note that I have truncated some of the results so that the overall pattern can hopefully be viewed more easily.
me#machine:~$ ls Level*.jpg | sort -t_ -k2n -k3n -k4n
Level_1_1_1.jpg
Level_1_1_2.jpg
Level_1_1_3.jpg
Level_1_1_4.jpg
Level_1_1_5.jpg
Level_1_2_1.jpg
Level_1_2_2.jpg
Level_1_2_3.jpg
Level_1_2_4.jpg
Level_1_2_5.jpg
Level_1_3_1.jpg
...
Level_1_10_5.jpg
Level_2_1_1.jpg
...
Level_2_1_5.jpg
Level_2_2_1.jpg
...
Level_2_2_5.jpg
Level_2_3_1.jpg
...
Level_2_10_5.jpg
Level_3_1_1.jpg
From your description, I think I'm getting the right results from this:
$ ls | sort -nt_ -k4,4 -k3,3 -k2,2
Remember that your first field (-k1) is the word "Level" in the files you've included in your question.
If you have really complex sorting needs, of course, you can always "map" your criteria onto simpler sortable items. For example, if your sort didn't include a -k option, you might do this:
$ ls | awk '{printf "%2d %2d %2d %s\n", $4, $3, $2, $0}' FS="[_.]" - | sort -n | awk '{print $NF}'
This takes the important fields, translates them in prefixed digits, sorts, then prints only the filename. You could use this technique if you wanted to map weekdays, or months, or something that doesn't sort naturally.
Of course, all this suffers from the standard set of ParsingLS issues.

tail a log file from a specific line number

I know how to tail a text file with a specific number of lines,
tail -n50 /this/is/my.log
However, how do I make that line count a variable?
Let's say I have a large log file which is appended to daily by some program, all lines in the log file start with a datetime in this format:
Day Mon YY HH:MM:SS
Every day I want to output the tail of the log file but only for the previous days records. Let's say this output runs just after midnight, I'm not worried about the tail spilling over into the next day.
I just want to be able to work out how many rows to tail, based on the first occurrence of yesterdays date...
Is that possible?
Answering the question of the title, for anyone who comes here that way, head and tail can both accept a code for how much of the file to exclude.
For tail, use -n +num for the line number num to start at
For head, use -n -num for the number of lines not to print
This is relevant to the actual question if you have remembered the number of lines from the previous time you did the command, and then used that number for tail -n +$prevlines to get the next portion of the partial log, regardless of how often the log is checked.
Answering the actual question, one way to print everything after a certain line that you can grep is to use the -A option with a ridiculous count. This may be more useful than the other answers here as you can get a number of days of results. So to get everything from yesterday and so-far today:
grep "^`date -d yesterday '+%d %b %y'`" -A1000000 log_file.txt
You can combine 2 greps to print between 2 date ranges.
Note that this relies on the date actually occurring in the log file. It has the weakness that if no events were logged on a particular day used as the range marker, then it will fail to find anything.
To resolve that you could inject dummy records for the start and end dates and sort the file before grepping. This is probably overkill, though, and the sort may be expensive, so I won't example it.
I don't think tail has any functionality like this.
You could work out the beginning and ending line numbers using awk, but if you just want to exact those lines from the log file, the simplest way is probably to use grep combined with date to do it. Matching yesterday's date at beginning of line should work:
grep "^`date -d yesterday '+%d %b %y'`" < log_file.txt
You may need to adjust the date format to match exactly what you've got in the log file.
You can do it without tail, just grep rows with previous date:
cat my.log | grep "$( date -d "yesterday 13:00" '+%d %m %Y')"
And if you need line count you can add
| wc -l
I worked this out through trial and error by getting the line numbers for the first line containing the date and the total lines, as follows:
lines=$(wc -l < myfile.log)
start=$(cat myfile.log | grep -no $datestring | head -n1 | cut -f1 -d:)
n=$((lines-start))
and then a tail, based on that:
tail -n$n myfile.log

Last Day of Month in csvfile

i try to delete all days of a csv file which not matched last days. But I find not the right solution.
date,price
2018-07-02,162.17
2018-06-29,161.94
2018-06-28,162.22
2018-06-27,162.32
2018-06-12,163.01
2018-06-11,163.53
2018-05-31,164.87
2018-05-30,165.59
2018-05-29,165.42
2018-05-25,165.96
2018-05-02,164.94
2018-04-30,166.16
2018-04-27,166.69
The output I want become
date,price
2018-06-29,161.94
2018-05-31,164.87
2018-04-30,166.16
I try it with cut + grep
cut -d, -f1 file.csv | grep -E "28|29|30"
Work but bring nothing when combine -f1,2.
I find csvkit which seem to me the right tool, but I find not the solution for multiple grep.
csvgrep -c 1 -m 30 file.csv
Bring me the right result but how can combine multiple search option? I try -m 28,29,30 and -m 28 -m 29 -m 30 all work not. Best it work with last day of every month.
Maybe one have here a idea.
Thank you and nice Sunday
Silvio
You want to get all records of the LAST day of the month. But months vary in length (28-29-30-31).
I don't see why you used cut to extract the first field (the date part), because the data in the second field does not look like dates at all (xx-xx).
I suggest to use grep directly to display the lines that matches the following pattern mm-dd; where mm is the month number, and dd is the last day of the month.
This command should do the trick:
grep -E "01-31|02-(28|29)|03-31|04-30|05-31|06-30|07-31|08-30|09-31|10-30|11-31|12-30" file.csv
This command will give the following output:
2018-05-31,164.87
2018-04-30,166.16

How to find which line from first file appears most frequently in second file?

I have two lists. I need to determine which word from the first list appears most frequently in the second list. The first, list1.txt contains a list of words, sorted alphabetically, with no duplicates. I have used some scripts which ensures that each word appears on a unique line, e.g.:
canyon
fish
forest
mountain
river
The second file, list2.txt is in UTF-8 and also contains many items. I have also used some scripts to ensure that each word appears on a unique line, but some items are not words, and some might appear many times, e.g.:
fish
canyon
ocean
ocean
ocean
ocean
1423
fish
109
fish
109
109
ocean
The script should output the most frequently matching item. For e.g., if run with the 2 files above, the output would be “fish”, because that word from list1.txt most often occurs in list2.txt.
Here is what I have so far. First, it searches for each word and creates a CSV file with the matches:
#!/bin/bash
while read -r line
do
count=$(grep -c ^$line list2.txt)
echo $line”,”$count >> found.csv
done < ./list1.txt
After that, found.csv is sorted descending by the second column. The output is the word appearing on the first line.
I do not think though, that this is a good script, because it is not so efficient, and it is possible that there might not be a most frequent matching item, for e.g.:
If there is a tie between 2 or more words, e.g. “fish”, “canyon”, and “forest” each appear 5 times, while no other appear as often, the output would be these 3 words in alphabetical order, separated by commas, e.g.: “canyon,fish,forest”.
If none of the words from list1.txt appears in list2.txt, then the output is simply the first word from the file list1.txt, e.g. “canyon”.
How can I create a more efficient script which finds which word from the first list appears most often in the second?
You can use the following pipeline:
grep -Ff list1.txt list2.txt | sort | uniq -c | sort -n | tail -n1
F tells grep to search literal words, f tells it to use list1.txt as the list of words to search for. The rest sorts the matches, counts duplicates, and sorts them according to the number of occurrences. The last part selects the last line, i.e. the most common one (plus the number of occurrences).
> awk 'FNR==NR{a[$1]=0;next}($1 in a){a[$1]++}END{for(i in a)print a[i],i}' file1 file2 | sort -rn|head -1
assuming 'list1.txt' is sorted, I would use unix join :
sort list2.txt | join -1 1 -2 1 list1.txt - | sort |\
uniq -c | sort -n | tail -n1

Resources