Show with star symbols how many times a user have logged in - bash

I'm trying to create a simple shell script showing how many times a user has logged in to their linux machine for at least one week. The output of the shell script should be like this:
2021-12-16
****
2021-12-15
**
2021-12-14
*******
I have tried this so far but it shows only numeric but i want showing * symbols.
user="$1"
last -F | grep "${user}" | sed -E "s/${user}.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c
Any help?

You might want to refactor all of this into a simple Awk script, where repeating a string n times is also easy.
user="$1"
last -F |
awk -v user="$1" 'BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":");
for(i=1; i<=12; i++) mon[m[i]] = sprintf("%02i", i) }
$1 == user { ++count[$8 "-" mon[$5] "-" sprintf("%02i", $6)] }
END { for (date in count) {
padded = sprintf("%-" count[date] "s", "*");
gsub(/ /, "*", padded);
print date, padded } }'
The BEGIN block creates an associative array mon which maps English month abbreviations to month numbers.
sprintf("%02i", number) produces the value of number with zero padding to two digits (i.e. adds a leading zero if number is a single digit).
The $1 == user condition matches the lines where the first field is equal to the user name we passed in. (Your original attempt had two related bugs here; it would look for the user name anywhere in the line, so if the user name happened to match on another field, it would erroneously match on that; and the regex you used would match a substring of a longer field).
When that matches, we just update the value in the associative array count whose key is the current date.
Finally, in the END block, we simply loop over the values in count and print them out. Again, we use sprintf to produce a field with a suitable length. We play a little trick here by space-padding to the specified width, because sprintf does that out of the box, and then replace the spaces with more asterisks.
Your desired output shows the asterisks on a separate line from the date; obviously, it's easy to change that if you like, but I would advise against it in favor of a format which is easy to sort, grep, etc (perhaps to then reformat into your desired final human-readable form).

If you have GNU sed you're almost there. Just pipe the output of uniq -c to this GNU sed command:
sed -En 's/^\s*(\S+)\s+(\S+).*/printf "\2\n%\1s" ""/e;s/ /*/g;p'
Explanation: in the output of uniq -c we substitute a line like:
6 Dec-15-2021
by:
printf "Dec-15-2021\n%6s" ""
and we use the e GNU sed flag (this is a GNU sed extension so you need GNU sed) to pass this to the shell. The output is:
Dec-15-2021
where the second line contains 6 spaces. This output is copied back into the sed pattern space. We finish by a global substitution of spaces by stars and print:
Dec-15-2021
******

A simple soluction, using tempfile
#!/bin/bash
user="$1"
tempfile="/tmp/last.txt"
IFS='
'
last -F | grep "${user}" | sed -E "s/"${user}".*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c > $tempfile
for LINE in $(cat $tempfile)
do
qtde=$(echo $LINE | awk '{print $1'})
data=$(echo $LINE | awk '{print $2'})
echo -e "$data "
for ((i=1; i<=qtde; i++))
do
echo -e "*\c"
done
echo -e "\n"
done

Related

Error in bash script: arithmetic error

I am wrote a simple script to extract text from a bunch of files (*.out) and add two lines at the beginning and a line at the end. Then I add the extracted text with another file to create a new file. The script is here.
#!/usr/bin/env bash
#A simple bash script to extract text from *.out and create another file
for f in *.out; do
#In the following line, n is a number which is extracted from the file name
n=$(echo $f | cut -d_ -f6)
t=$((2 * $n ))
#To extract the necessary text/data
grep " B " $f | tail -${t} | awk 'BEGIN {OFS=" ";} {print $1, $4, $5, $6}' | rev | column -t | rev > xyz.xyz
#To add some text as the first, second and last lines.
sed -i '1i -1 2' xyz.xyz
sed -i '1i $molecule' xyz.xyz
echo '$end' >> xyz.xyz
#To combine the extracted info with another file (ea_input.in)
cat xyz.xyz ./input_ea.in > "${f/abc.out/pqr.in}"
done
./script.sh: line 4: (ls file*.out | cut -d_ -f6: syntax error: invalid arithmetic operator (error token is ".out) | cut -d_ -f6")
How I can correct this error?
In bash, when you use:
$(( ... ))
it treats the contents of the brackets as an arithmetic expression, returning the result of the calculation, and when you use:
$( ... )
it executed the contents of the brackets and returns the output.
So, to fix your issue, it should be as simple as to replace line 4 with:
n=$(ls $f | cut -d_ -f6)
This replaces the outer double brackets with single, and removes the additional brackets around ls $f which should be unnecessary.
The arithmetic error can be avoided by adding spaces between parentheses. You are already using var=$((arithmetic expression)) correctly elsewhere in your script, so it should be easy to see why $( ((ls "$f") | cut -d_ -f6)) needs a space. But the subshells are completely superfluous too; you want $(ls "$f" | cut -d_ -f6). Except ls isn't doing anything useful here, either; use $(echo "$f" | cut -d_ -f6). Except the shell can easily, albeit somewhat clumsily, extract a substring with parameter substitution; "${f#*_*_*_*_*_}". Except if you're using Awk in your script anyway, it makes more sense to do this - and much more - in Awk as well.
Here is an at empt at refactoring most of the processing into Awk.
for f in *.out; do
awk 'BEGIN {OFS=" " }
# Extract 6th _-separated field from input filename
FNR==1 { split(FILENAME, f, "_"); t=2*f[6] }
# If input matches regex, add to array b
/ B / { b[++i] = $1 OFS $4 OFS $5 OFS $6 }
# If array size reaches t, start overwriting old values
i==t { i=0; m=t }
END {
# Print two prefix lines
print "$molecule"; print -1, 2;
# Handle array smaller than t
if (!m) m=i
# Print starting from oldest values (index i + 1)
for(j=1; j<=m; j++) {
# Wrap to beginning of array at end
if(i+j > t) i-=t
print b[i+j]; }
print "$end" }' "$f" |
rev | column -t | rev |
cat - ./input_ea.in > "${f/foo.out/bar.in}"
done
Notice also how we avoid using a temporary file (this would certainly have been avoidable without the Awk refactoring, too) and how we take care to quote all filename variables in double quotes.
The array b contains (up to) the latest t values from matching lines; we collect these into an array which is constrained to never contain more than t values by wrapping the index i back to the beginning of the array when we reach index t. This "circular array" avoids keeping too many values in memory, which would make the script slow if the input file contains many matches.

Counting lines in a file matching specific string

Suppose I have more than 3000 files file.gz with many lines like below. The fields are separated by commas. I want to count only the line in which the 21st field has today's date (ex:20171101).
I tried this:
awk -F',' '{if { $21 ~ "TZ=GMT+30 date '+%d-%m-%y'" } { ++count; } END { print count; }}' file.txt
but it's not working.
Using awk, something like below
awk -F"," -v toSearch="$(date '+%Y%m%d')" '$21 ~ toSearch{count++}END{print count}' file
The date '+%Y%m%d' produces the date in the format as you requested, e.g. 20170111. Then matching that pattern on the 21st field and counting the occurrence and printing it in the END clause.
Am not sure the Solaris version of grep supports the -c flag for counting the number of pattern matches, if so you can do it as
grep -c "$(date '+%Y%m%d')" file
Another solution using gnu-grep
grep -Ec "([^,]*,){20}$(date '+%Y%m%d')" file
explanation: ([^,]*,){20} means 20 fields before the date to be checked
Using awk and process substitution to uncompress a bunch of gzs and feed them to awk for analyzing and counting:
$ awk -F\, 'substr($21,1,8)==strftime("%Y%m%d"){i++}; END{print i}' * <(zcat *gz)
Explained:
substr($21,1,8) == strftime("%Y%m%d") { # if the 8 first bytes of $21 match date
i++ # increment counter
}
END { # in the end
print i # output counter
}' * <(zcat *gz) # zcat all gzs to awk
If Perl is an option, this solution works on all 3000 gzipped files:
zcat *.gz | perl -F, -lane 'BEGIN{chomp($date=`date "+%Y%m%d"`); $count=0}; $count++ if $F[20] =~ /^$date/; END{print $count}'
These command-line options are used:
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-n loop around each line of the input file
-e execute the perl code
-F autosplit modifier, in this case splits on ,
BEGIN{} executes before the main loop.
The $date and $count variables are initialized.
The $date variable is set to the result of the shell command date "+%Y%m%d"
$F[20] is the 21st element in #F
If the 21st element starts with $date, increment $count
END{} executes after the main loop
Using grep and cut instead of awk and avoiding regular expressions:
cut -f21 -d, file | grep -Fc "$(date '+%Y%m%d')"

How can I find unique characters per line of input?

Is there any way to extract the unique characters of each line?
I know I can find the unique lines of a file using
sort -u file
I would like to determine the unique characters of each line (something like sort -u for each line).
To clarify: given this input:
111223234213
111111111111
123123123213
121212122212
I would like to get this output:
1234
1
123
12
Using sed
sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file
Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these.
Then it replaces all of that including the second occurence with just first occurence and then what was inbetween.
t is test and jumps to the : label if the previous command was successful. Then this repeats until the s/// command fails meaning only unique characters remain.
; just separates commands.
1234
1
123
12
Keeps order as well.
It doesn't get things in the original order, but this awk one-liner seems to work:
awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt
Split apart for easier reading, it could be stand-alone like this:
#!/usr/bin/awk -f
{
# Step through the line, assigning each character as a key.
# Repeated keys overwrite each other.
for(i=1;i<=length($0);i++) {
a[substr($0,i,1)]=1;
}
# Print items in the array.
for(i in a) {
printf("%s",i);
}
# Print a newline after we've gone through our items.
print "";
# Get ready for the next line.
delete a;
}
Of course, the same concept can be implemented pretty easily in pure bash as well:
#!/usr/bin/env bash
while read s; do
declare -A a
while [ -n "$s" ]; do
a[${s:0:1}]=1
s=${s:1}
done
printf "%s" "${!a[#]}"
echo ""
unset a
done < input.txt
Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk.
And I think you've got a solution using sed from Jose, though it has a bunch of extra pipe-fitting involved. :)
The last tool you mentioned was grep. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. grep -P) using -o and lookarounds. They'd need more coffee than is in me right now though.
One way using perl:
perl -F -lane 'print do { my %seen; grep { !$seen{$_}++ } #F }' file
Results:
1234
1
123
12
Another solution,
while read line; do
grep -o . <<< $line | sort -u | paste -s -d '\0' -;
done < file
grep -o . convert 'row line' to 'column line'
sort -u sort letters and remove repetead letters
paste -s -d '\0' - convert 'column line' to 'row line'
- as a filename argument to paste to tell it to use standard input.
This awk should work:
awk -F '' '{delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file
1234
1
123
12
Here:
-F '' will break the record char by char giving us single character in $1, $2 etc.
Note: For non-gnu awk use:
awk 'BEGIN{FS=""} {delete a; for(i=1; i<=NF; i++) a[$i];
for (j in a) printf "%s", j; print ""}' file
This might work for you (GNU sed):
sed 's/\B/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file
Split each line into a series of lines. Unique sort those lines. Combine the result back into a single line.
Unique and sorted alternative to the others, using sed and gnu tools:
sed 's/\(.\)/\1\n/g' file | sort | uniq
which produces one character per line; If you want those on one line, just do:
sed 's/\(.\)/\1\n/g' file | sort | uniq | sed ':a;N;$!ba;s/\n//g;'
This has the advantage of showing the characters in sorted order, rather than order of appearance.

Bash Text file formatting

I have some files with the following format:
555584280113;01-04-2013 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
552185022741;01-04-2013 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511965271852;01-04-2013 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511980644500;01-04-2013 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
553186398559;01-04-2013 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
558487839822;01-04-2013 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I need to have them with a sequence of 10 digits long at the beginning, removed the prefix 55 on the second column (which I have done with a simple sed 's/^55//g') and reformat the date to look like this:
0000000001;555584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;552185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;5511965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;5511980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;553186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I have the date part in a separate way:
cat file.txt | cut -d\; -f2 | awk '{print $1}' |awk -v OFS="-" -F"-" '{print $3$2$1}'
And it works, but I don't know how to put all of them together, the sequence + sed for the prefix + change the date format. The sequence part I'm not even sure how to do it.
Thanks for the help.
awk is one of the best tool out there used for text parsing and formatting. Here is one way of meeting your requirements:
awk '
BEGIN { FS = OFS = ";" }
{
printf "%010d;", NR
$1 = substr($1,3)
split($2, tmp, /[- ]/)
$2=tmp[3]tmp[2]tmp[1]" "tmp[4]
}1' file
We set the input and output field separator to ;
We use printf to format your first column number requirement
We use substr function to remove the first two characters of column 1
We use split function to format the time
Using 1 we print rest of the statement as is.
Output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
If the name of the input file is input, then the following command removes the 55, adds a 10-digit line number, and rearranges the date. With GNU sed:
nl -nrz -w10 -s\; input | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
If one is using Mac OSX (or another OS without GNU sed), then a slight change is required:
nl -nrz -w10 -s\; input | sed -E 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
Sample output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
How it works: nl is a handy *nix utility for adding line numbers. -w10 tells nl that we want 10 digit line numbers. -nrz tells nl to pad the line numbers with zeros, and -s\; tells nl to add a semicolon after the line number. (We have to escape the semicolon so that the shell ignores it.)
The remaining changes are handled by sed. The sed command s/55// removes the first occurrence of 55. The rearrangement of the date is handled by s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/.
You could actually use a Bash loop to do this.
i=0
while read f1 f2; do
((++i))
IFS=\; read n d <<< $f1
d=${d:6:4}${d:3:2}${d:0:2}
printf "%010d;%d;%d %s\n" $i $n $d $f2
done < file.txt

Oneliner to calculate complete size of all messages in maillog

Ok guys I'm really at a dead end here, don't know what else to try...
I am writing a script for some e-mail statistics, one of the things it needs to do is calculate the complete size of all messages in the maillog, this is what I wrote so far:
egrep ' HOSTNAME sendmail\[.*.from=.*., size=' maillog | awk '{print $8}' |
tr "," "+" | tr -cd '[:digit:][=+=]' | sed 's/^/(/;s/+$/)\/1048576/' |
bc -ql | awk -F "." '{print $1}'
And here is a sample line from my maillog:
Nov 15 09:08:48 HOSTNAME sendmail[3226]: oAF88gWb003226:
from=<name.lastname#domain.com>, size=40992, class=0, nrcpts=24,
msgid=<E08A679A54DA4913B25ADC48CC31DD7F#domain.com>, proto=ESMTP,
daemon=MTA1, relay=[1.1.1.1]
So I'll try to explain it step by step:
First I grep through the file to find all the lines containing the actual "size", next i print the 8th field, in this case "size=40992,".
Next I replace all the comma characters with a plus sign.
Then I delete everything except the digits and the plus sign.
Then I replace the beginning of the line with a "(", and I replace the last extra plus sign with a ")" followed by "/1048576". So i get a huge expression looking like this:
"(1+2+3+4+5...+n)/1048576"
Because I want to add up all the individual message sizes and divide it so I get the result in MB.
The last awk command is when I get a decimal number I really don't care for precision so i just print the part before the decimal point.
The problem is, this doesn't work... And I could swear it was working at one point, could it be my expression is too long for bc to handle?
Thanks if you took the time to read through :)
I think a one-line awk script will work too. It matches any line that your egrep pattern matches, then for those lines it splits the eighth record by the = sign and adds the second part (the number) to the SUM variable. When it sees the END of the file it prints out the value of SUM/1048576 (or the byte count in Mibibytes).
awk '/ HOSTNAME sendmail\[.*.from=.*., size=/{ split($8,a,"=") ; SUM += a[2] } END { print SUM/1048576 }' maillog
bc chokes if there is no newline in its input, as happens with your expression. You have to change the sed part to:
sed 's/^/(/;s/+$/)\/1048576\n/'
The final awk will happily eat all your output if the total size is less than 1MB and bc outputs something like .03333334234. If you are not interested in the decimal part remove that last awk command and the -l parameter from bc.
I'd do it with this one-liner:
grep ' HOSTNAME sendmail[[0-9][0-9]*]:..*:.*from=..*, size=' maillog | sed 's|.*, size=\([0-9][0-9]*\), .*|\1+|' | tr -d '\n' | sed 's|^|(|; s|$|0)/1048576\n|' | bc

Resources