How to split string with unequal amount of spaces [duplicate] - bash

This question already has answers here:
How do I pipe a file line by line into multiple read variables?
(3 answers)
Closed 1 year ago.
I am currently struggling, with splitting a string with a varying amount of spaces, coming from a log file.
An excerpt of the log file:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)
As you can see the amount of spaces between the process name and the date of execution varies between 2 to 5 spaces.
I would like to split it up into three parts; - process, date of execution, and execution time.
However I don’t see a solution to that, because of the unequal amount of spaces. Am I wrong or is splitting such a string incredibly hard?
Hopefully somebody out there is way smarter than me and can provide me with a solution for that 😊
Thanks to everybody in advance, who is willing trying to help me with that!

You can also assign fields directly in a read.
while read -r prc wd mon md start _ end dur _; do
echo "prc='$prc' wd='$wd' mon='$mon' md='$md' start='$start' end='$end' dur='${dur//[)(]/}'"
done < file
Output:
prc='ProcessA' wd='Mon' mon='Nov' md='9' start='09:59' end='10:48' dur='00:48'
prc='ProcessB' wd='Sun' mon='Nov' md='8' start='11:16' end='11:17' dur='00:00'
prc='ProcessC' wd='Sat' mon='Nov' md='7' start='12:52' end='12:53' dur='00:00'
prc='ProcessD' wd='Fri' mon='Nov' md='6' start='09:31' end='11:25' dur='01:54'
prc='ProcessE' wd='Thu' mon='Nov' md='5' start='16:41' end='16:41' dur='00:00'
prc='ProcessF' wd='Thu' mon='Nov' md='5' start='11:39' end='11:40' dur='00:00'
read generally doesn't care how much whitespace is between.

In bash, you can use a regex to parse each line:
#! /bin/bash
while IFS=' ' read -r line ; do
if [[ "$line" =~ ([^\ ]+)\ +(.+[^\ ])\ +'('([^\)]+)')' ]] ; then
process=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}
time=${BASH_REMATCH[3]}
echo "$process $date $time."
fi
done
Or, use parameter expansions:
#! /bin/bash
while IFS=' ' read -r process datetime ; do
shopt -s extglob
date=${datetime%%+( )\(*}
time=${datetime#*\(}
time=${time%\)}
echo "$process $date $time."
done

Using awk:
awk '{printf $1; for (i=2; i<NF; i++) printf " %s",$i; print "",$NF}' < file.txt
produces:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)

Related

Restart Apache if average server load past minute is higher than X

I wrote a shell script and added it to my cron. It's supposed to run every minute and check for the average server load, past 1 minute, and if it's over 40 it should log the load, date and then restart Apache httpd. Here is my script:
#!/bin/bash
LOGFILE=/home/user/public_html/domain.com/cron/restart.log
function float_to_int() {
echo $1 | cut -d. -f1
}
check=$(uptime | awk -F' *,? *' '{print $12}')
now=$(date)
checkk=$(float_to_int $check)
if [[ $checkk > 40 ]]; then
echo $now $checkk >> $LOGFILE 2>&1
/usr/bin/systemctl restart httpd.service
fi
If I look at the log file I see the following:
Wed Jul 3 20:02:01 EDT 2019 70
Wed Jul 3 23:03:01 EDT 2019 43
Wed Jul 3 23:12:01 EDT 2019 9
Wed Jul 3 23:13:01 EDT 2019 7
Wed Jul 3 23:14:01 EDT 2019 6
Wed Jul 3 23:15:02 EDT 2019 5
Wed Jul 3 23:16:01 EDT 2019 5
Something is clearly wrong as it should only log and restart Apache if the load is over 40 but as you can see from the logs the load was 9, 7, 6, 5 and 5. Could someone point me in the right direction?
From man bash, section CONDITIONAL EXPRESSIONS (emphasis mine) :
string1 > string2
True if string1 sorts after string2 lexicographically.
You will either want to use [['s -gt operator, or use arithmetic evaluation instead of [[ :
if (( chekk > 40 )); then
Here's one in GNU awk (GNU awk due to strftime()):
awk '
$1 > 0.4 { # interval above 0.4
logfile="./log.txt" # my logpath, change it
print strftime("%c"), $1 >> logfile # date and load to log
cmd="/usr/bin/systemctl restart httpd.service" # command to use for restarting
if((ret=(cmd|getline res)) !=0 ) # store return value and result
print "failed: " ret # if failed
else
print "success"
}' /proc/loadavg # getting load avg from /proc

Number of logins on Linux using Shell script and AWK

How can I get the number of logins of each day from the beginning of the wtmp file using AWK?
I thought about using an associative array but I don't know how to implement it in AWK..
myscript.sh
#!/bin/bash
awk 'BEGIN{numberoflogins=0}
#code goes here'
The output of the last command:
[fnorbert#localhost Documents]$ last
fnorbert tty2 /dev/tty2 Mon Apr 24 13:25 still logged in
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 16:25 still running
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 13:42 still running
fnorbert tty2 /dev/tty2 Fri Apr 21 16:14 - 21:56 (05:42)
reboot system boot 4.8.6-300.fc25.x Fri Apr 21 19:13 - 21:56 (02:43)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:31 - 10:02 (01:30)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:30 - 10:02 (00:-27)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:14 - 08:26 (00:11)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:13 - 08:26 (-1:-47)
wtmp begins Mon Mar 6 09:39:43 2017
The shell script's output should be:
Apr 4: 4
Apr 21: 2
Apr 24: 3
, using associative array if it's possible
In awk, arrays can be indexed by strings or numbers, so you can use it like an associative array.
However, what you're asking will be hard to do with awk reliably because the delimiters are whitespace, therefore empty fields will throw off the columns, and if you use FIELDWIDTHS you'll also get thrown off by columns longer than their assigned width.
If all you're looking for is just the number of logins per day you might want to use a combination of sed and awk (and sort):
last | \
sed -E 's/^.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([ 0-9]{2}).*$/\2 \3/p;d' | \
awk '{arr[$0]++} END { for (a in arr) print a": " arr[a]}' | \
sort -M
The sed -E uses extended regular expressions, and the pattern just prints the date of each line that is emitted by last (This matches on the day of week, but only prints the Month and Date)
We could have used uniq -c to get the counts, but using awk we can do an associative array as you hinted.
Finally using sort -M we're sorting on the abbreviated date formats like Apr 24, Mar 16, etc.
Try the following awk script(assuming that the month is the same, points to current month):
myscript.awk:
#!/bin/awk -f
{
a[NR]=$0; # saving each line into an array indexed by line number
}
END {
for (i=NR-1;i>1;i--) { # iterating lines in reverse order(except the first/last line)
if (match(a[i],/[A-Z][a-z]{2} ([A-Z][a-z]{2}) *([0-9]{1,2}) [0-9]{2}:[0-9]{2}/, b))
m=b[1]; # saving month name
c[b[2]]++; # accumulating the number of occurrences
}
for (i in c) print m,i": "c[i]
}
Usage:
last | awk -f myscript.awk
The output:
Apr 4: 4
Apr 21: 2
Apr 24: 3

How to check on FTP if there files on the list older than 7 days

I have a list of files from remote FTP Server:
drwxrwxrwx 2 test-backup everyone 4096 Jul 8 02:30 .
drwxrwxrwx 5 0 0 4096 Jul 23 07:02 ..
-rw-rw-rw- 1 test-backup everyone 352696 Jul 18 02:30 expdp_TEST11P2_custom_Fri.dmp.gz
-rw-rw-rw- 1 test-backup everyone 352796 Jul 21 02:30 expdp_TEST11P2_custom_Mon.dmp.gz
-rw-rw-rw- 1 test-backup everyone 352615 Jul 19 02:30 expdp_TEST11P2_custom_Sat.dmp.gz
-rw-rw-rw- 1 test-backup everyone 352626 Jul 20 02:30 expdp_TEST11P2_custom_Sun.dmp.gz
-rw-rw-rw- 1 test-backup everyone 10511523642 Jul 24 03:08 expdp_TEST11P2_custom_Thu.dmp.gz
-rw-rw-rw- 1 test-backup everyone 10496881744 Jul 22 03:03 expdp_TEST11P2_custom_Tue.dmp.gz
-rw-rw-rw- 1 test-backup everyone 10504557195 Jul 23 03:03 expdp_TEST11P2_custom_Wed.dmp.gz
I need to check if there are any files older than 7 days, Have You any Ideas how can I do this in Bash?
As I understand the issue, you have a list of file list received via ftp (and you do not have access to find on the remote server). Assuming that you have the directory list stored in a file called ftptimes, then you can identify files older than 7 days via:
$ awk -v cutoff="$(date -d "7 days ago" +%s)" '{line=$0; "date -d \""$6" " $7" " $8 "\" +%s" |getline; fdate=$1} fdate < cutoff {print line} ' ftptimes
From your sample date, the output would be:
drwxrwxrwx 2 test-backup everyone 4096 Jul 8 02:30 .
Addressing the parts of the awk command, one by one:
-v cutoff="$(date -d "7 days ago" +%s)"
This defines an awk variable called cutoff that will have the Unix time (seconds since 1970-01-01 00:00:00 UTC) corresponding to seven days ago
line=$0;
This saves for later use the current input line into the variable line.
"date -d \""$6" " $7" " $8 "\" +%s" |getline; fdate=$1
This converts the date given by ftp into Unix time, reads that time in, and saves it in a variable called fdate.
fdate < cutoff {print line}
If the file date is less than the cutoff date, then the line is printed.
In the sample data that you provided, the only file older than seven days is the current directory (.) which dates to Jul 8.
As an example, if we wanted files older than 5 days, then more files would be printed:
$ awk -v cutoff="$(date -d "5 days ago" +%s)" '{line=$0; "date -d \""$6" " $7" " $8 "\" +%s" |getline; fdate=$1} fdate < cutoff {print line} ' ftptimes
drwxrwxrwx 2 test-backup everyone 4096 Jul 8 02:30 .
-rw-rw-rw- 1 test-backup everyone 352696 Jul 18 02:30 expdp_TEST11P2_custom_Fri.dmp.gz
-rw-rw-rw- 1 test-backup everyone 352615 Jul 19 02:30 expdp_TEST11P2_custom_Sat.dmp.gz
In the above, I assumed that the info from ftp was stored in a file. It is also possible to pipe it in:
echo ls | ftp host port | awk -v cutoff="$(date -d "5 days ago" +%s)" '{line=$0; "date -d \""$6" " $7" " $8 "\" +%s" |getline; fdate=$1} fdate < cutoff {print line} '
where host and port are replaced by the host and port of your server.
Bash version
The above can also be accomplished in bash although it requires explicit looping. Again, assuming the ftp information in the file ftptimes:
$ cutoff="$(date -d "7 days ago" +%s)"; while read line; do set -- $line; fdate=$(date -d "$6 $7 $8" +%s) ; [ $fdate -lt $cutoff ] && echo $line ; done <ftptimes
drwxrwxrwx 2 test-backup everyone 4096 Jul 8 02:30 .
The find command is the most flexible for date ranges. You have 3 basic tests to choose from: -atime +n (last access time was greater than n*24 hours ago); -ctime +n (file status changed greater than n*24 hours ago); and -mtime +n (file was modified greater than n*24 hours ago). Note: the use of n means exactly n*24 hours ago; +n means greater than n*24 hours ago and -n means less than n*24 hours ago. Also note that any fractional parts of the 24 hour period are ignored which means you may have to adjust the +n to +6 to get all files greater than 6 days old (meaning 7 days old) rather than +7. Example:
find /path/to/files -type f -mtime +6
Will find all files (not dirs) in /path/to/files that were modified greater than 6 days ago (which is 7 days). You can test with -atime, -ctime, and -mtime to see which fits your needs.

Bash script doesn't cater to new line, appends text to current line

I have a script that redirects stdout to a log file after running command xyz. I tried doing this in the console:
xyz > temp.log &
and when I look at temp.log its perfect as I expect it to be. But when I run the above command inside a script temp.log is just one big line of all stdout. How do I make it print the same as running it from console? I have shown the logs below.
Running xyz > temp.log & from console
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 18
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 6
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 3
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 2
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 19
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 11
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 1
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 12
/source_id:int64/(Green-aeon-GlobalMillennium-Spout): 3
Running from within the script
ESC[?1049hESC[1;48rESC(BESC[mESC[4lESC[?7hESC[HESC[2JEvery 5.0s: Green-aeon-GlobalMillennium-SpoutESC[1;178HTue Feb 12 11:36:08 2013ESC[3;1H/source_id:36/sch_event:1797777 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[4d/source_id:36/sch_event:1797779 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[5d/source_id:36/sch_event:1797781 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[6d/source_id:36/sch_event:1797783 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[7d/source_id:36/sch_event:1797785 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[8d/source_id:36/sch_event:1797787 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST` 2013^MESC[9d/source_id:36/sch_event:1797789 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC[10d/source_id:36/sch_event:1797791 for Green-aeon-GlobalMillennium-Spout at Tue Feb 12 11:16:41 CST 2013^MESC
This is the script
#!/bin/sh
die () {
echo >&2 "$#"
exit 1
}
[ "$#" -ge 4 ] || die "usage notification_stats <table_name> <listener_name> <time_interval> <time_to_run>"
echo $3 | grep -E -q '^[0-9]+$' || die "Numeric argument required, $3 provided for time_interval"
echo $4 | grep -E -q '^[0-9]+$' || die "Numeric argument required, $4 provided for time_to_run"
watch -n $3 xyz -v $1 $2 > notify.log &
$my_pid = $!
sleep $4
kill -9 $my_pid
Don't use watch. It's meant for interactive display, not to run a program repeatedly.
You can use
#!/bin/bash
while (( SECONDS <= $4 ))
do
xyz -v $1 $2
sleep $3
done > notify.log
instead

Converting Month String into Integer Shell

okay so i run an openssl command to get the date of an expired script. Doing so gives me this:
enddate=Jun 26 23:59:59 2012 GMT
Then i cut everything out and just leave the month which is "Jun"
Now the next part of my script is to tell the user if the the certificate is expired or not and to do that i use an if statement in which it looks like this:
if [ $exp_year -lt $cur_year && $exp_month -lt $cur_month ]; then
echo ""
echo "Certificate is still valid until $exp_date"
echo ""
else
echo ""
echo "Certificate has expired on $exp_date, please renew."
echo ""
fi
I can't figure out how to convert the month into an integer to even do the comparison.
I thought of doing the brute force way which is this:
Jan=01
Feb=02
Mar=03
...
Clearly that's a terrible way to do it. Does anyone know what i can do?
well, you can use:
now=$(date +%s)
cert=$(date --date="$enddate" +%s)
if [ $cert -lt $now ]; then
echo "Old!"
fi
i.e. convert the date into the seconds past the epoch and compare those
I would recommend using Petesh's answer, but here's a way to set up an associative array if you have Bash 4:
months=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
declare -A mlookup
for monthnum in ${!months[#]}
do
mlookup[${months[monthnum]]=$((monthnum + 1))
done
echo "${mlookup["Jun"]}" # outputs 6
If you have Bash before version 4, you can use AWK to help you out:
month=Feb
awk -v "month=$month" 'BEGIN {months = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"; print (index(months, month) + 3) / 4}'
Another way in pure Bash (any version):
months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"
month=Aug
string="${months%$month*}"
echo "$((${#string}/4 + 1))"

Resources