Related
I have a simply question but I'm not this able with bash, I'm using a command line to get the number of queries and cached queries my Pi-Hole makes to Unbound (recursive DNS) and I want to display the cached queries as % of total queries, here're the lines to get the total queries:
sudo unbound-control stats_noreset | awk -F '=' '/total.num.queries/ {print $NF}'
this gives me for example 1500 and I want to divide this number with this line:
sudo unbound-control stats_noreset | awk -F '=' '/total.num.cachehits/ {print $NF}'
this give me for example 500 and I want to display it as:
500 33.3%
with one line code only, not with variables.
Thanks a lot! I was trying for days.
Edit: as asked, the full output of sudo unbound-control stats_noreset is:
sudo unbound-control stats_noreset
thread0.num.queries=1294
thread0.num.queries_ip_ratelimited=0
thread0.num.cachehits=327
thread0.num.cachemiss=967
thread0.num.prefetch=134
thread0.num.zero_ttl=0
thread0.num.recursivereplies=967
thread0.requestlist.avg=0.334242
thread0.requestlist.max=9
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=0
thread0.requestlist.current.user=0
thread0.recursion.time.avg=0.080698
thread0.recursion.time.median=0.0325689
thread0.tcpusage=0
thread1.num.queries=1309
thread1.num.queries_ip_ratelimited=0
thread1.num.cachehits=342
thread1.num.cachemiss=967
thread1.num.prefetch=132
thread1.num.zero_ttl=0
thread1.num.recursivereplies=967
thread1.requestlist.avg=0.374886
thread1.requestlist.max=9
thread1.requestlist.overwritten=0
thread1.requestlist.exceeded=0
thread1.requestlist.current.all=0
thread1.requestlist.current.user=0
thread1.recursion.time.avg=0.075309
thread1.recursion.time.median=0.0322503
thread1.tcpusage=0
thread2.num.queries=1338
thread2.num.queries_ip_ratelimited=0
thread2.num.cachehits=336
thread2.num.cachemiss=1002
thread2.num.prefetch=156
thread2.num.zero_ttl=0
thread2.num.recursivereplies=1002
thread2.requestlist.avg=0.360104
thread2.requestlist.max=9
thread2.requestlist.overwritten=0
thread2.requestlist.exceeded=0
thread2.requestlist.current.all=0
thread2.requestlist.current.user=0
thread2.recursion.time.avg=0.073632
thread2.recursion.time.median=0.031425
thread2.tcpusage=0
thread3.num.queries=1258
thread3.num.queries_ip_ratelimited=0
thread3.num.cachehits=339
thread3.num.cachemiss=919
thread3.num.prefetch=127
thread3.num.zero_ttl=0
thread3.num.recursivereplies=919
thread3.requestlist.avg=0.315488
thread3.requestlist.max=9
thread3.requestlist.overwritten=0
thread3.requestlist.exceeded=0
thread3.requestlist.current.all=0
thread3.requestlist.current.user=0
thread3.recursion.time.avg=0.073834
thread3.recursion.time.median=0.0308651
thread3.tcpusage=0
total.num.queries=5199
total.num.queries_ip_ratelimited=0
total.num.cachehits=1344
total.num.cachemiss=3855
total.num.prefetch=549
total.num.zero_ttl=0
total.num.recursivereplies=3855
total.requestlist.avg=0.34673
total.requestlist.max=9
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.075873
total.recursion.time.median=0.0317773
total.tcpusage=0
time.now=1613041718.040611
time.up=14305.501526
time.elapsed=14305.501526
thread0,1,etc..are the cores but my interest is only the total.
I assume when you say without variables, you mean without variables in the shell.
With this in mind, you can use awk variables to store the intermediate result:
sudo unbound-control stats_noreset | awk -F '=' '$1 == "total.num.queries" {queries=$NF} $1 == "total.num.cachehits" {hits=$NF}END{print hits, hits/queries*100"%"}'
or in a more readable multi-line format:
sudo unbound-control stats_noreset |
awk -F '=' '$1 == "total.num.queries" { queries = $NF }
$1 == "total.num.cachehits" { hits = $NF }
END { print hits, hits / queries * 100 "%" }'
The output is:
1344 25.8511%
If you need only a single decimal place in the output, you can use printf, like
END { printf "%d %.1f%%\n", hits, hits / queries * 100 }
I have a requirement to read certain parameters from log file and then update to a database. I am trying to achieve the first part, i.e. to read from log file using awk commands in a shell script
Log file may consists of below lines or more-
[2018-05-22T11:35:17,857] [RQST: rqst_3ADE-5439-598D-1B8B | TB: 9000042] - [588455375] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 769 - requestType="TESTING",partnerName="Test Merchant 123",testId="123456",lob="TEST1_TO_TEST2",tranType="TEST1",paymentType="P2M",amount="110.00",currency="840",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_2AEF-2339-598D-1B8B | TB: 9000043] - [588455376] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 770 - requestType="TESTING",partnerName="Test Merchant 234",testId="234567",lob="TEST2_TO_TEST3",tranType="TEST2",paymentType="P2M",amount="120.00",currency="850",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_4EDA-4539-598D-1B8B | TB: 9000044] - [588455377] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 771 - requestType="TESTING",partnerName="Test Merchant 345",testId="345678",lob="TEST3_TO_TEST4",tranType="TEST3",paymentType="P2M",amount="130.00",currency="860",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
I need to apply filters processor and paymentType and retrieve values of the amount, currency, network and responseCode to variables in a shell script which will be inserted into an Oracle DB table.
I am new to ShellScript and AWK and unable to wrap this. I have tried using
awk '/amount/{print}' testAPI.log
however, is returning all rows which have amount.
since you didn't specify the expected output, here is a template you can tailor for your needs
$ awk -F' - ' '{n=split($NF,a,",");
for(i=1;i<=n;i++) {split(a[i],b,"="); kv[b[1]]=b[2]}}
kv["processor"]=="\"CBN\""
&& kv["paymentType"]=="\"P2M\""{print kv["amount"],kv["currency"]}' file
"110.00" "840"
"120.00" "850"
"130.00" "860"
you can trim the double quotes as well but not sure it's needed as is...
I tried with the three entries in the question, below gives you the output you want
it checks if $5 is paymentType="P2M" and if $8 is having the value processor="CBN" basically, the filter you were looking for, substitute with the required filters you need.
cat testAccelAPI.log | grep -i "[RQST: rqst" | cut -d ' ' -f 19 | awk -F, '{ if($5=="paymentType=\"P2M\"" && $8=="processor=\"CBN\"") print $5 "=" $6 "="$7 "="$8 "=" $9 "="$10}' | cut -d= -f 4,6,8,9 | tr = " "
My server is having unusually high CPU usage, and I can see Apache is using way too much memory.
I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find the attacker?
I've used the following line, to find the 10 most "active" IPs:
cat access.log | awk '{print $1}' |sort |uniq -c |sort -n |tail
The top 5 IPs have about 200 times as many requests to the server, as the "average" user. However, I can't find out if these 5 are just very frequent visitors, or they are attacking the servers.
Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?
Cheers!
UPDATED 23 OCT 2011 - The commands I needed:
Get entries within last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print Date FS $4}' access.log
Get most active IPs within the last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print $1}' access.log | sort |uniq -c |sort -n | tail
Get entries within relative timespan
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print Date FS Date2 FS $4}' access.log
Get entries within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $0}' access.log
Get most active IPs within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $1}' access.log | sort |uniq -c |sort -n | tail
yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' access_log
assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours' or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23
So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print $1}' | sort |uniq -c |sort -n | tail
If you wanted to use a range specify two date variables and construct your expression appropriately.
so if you wanted do find something between 2-4hrs ago your expression might looks something like this
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date && $4 < Date2 {print Date, Date2, $4} access_log'
Here is a question I answered regarding dates in bash you might find helpful.
Print date for the monday of the current week (in bash)
Introduction
As accepted answer from matchew is wrong, regarding Antoine's comment: Because awk will do alphanumeric comparisons. So if you logfile list events across the end and begin of two months:
[27/Feb/2023:00:00:00
[28/Feb/2023:00:00:00
[01/Mar/2023:00:00:00
awk will consider:
[01/Mar/2023:00:00:00 < [27/Feb/2023:00:00:00 < [28/Feb/2023:00:00:00
Wich is wrong! You have to compare date stings!!
For this, you could use libraries. Conforming to the language
you use.
I will present here two different way, one using perl with Date::Parse library, and another (quicker), using bash with GNU/date.
As this is a common perl task
And because this is not exactly same than extract last 10 minutes from logfile where it's about a bunch of time upto the end of logfile.
And because I've needed them, I (quickly) wrote this:
#!/usr/bin/perl -ws
# This script parse logfiles for a specific period of time
sub usage {
printf "Usage: %s -s=<start time> [-e=<end time>] <logfile>\n";
die $_[0] if $_[0];
exit 0;
}
use Date::Parse;
usage "No start time submited" unless $s;
my $startim=str2time($s) or die;
my $endtim=str2time($e) if $e;
$endtim=time() unless $e;
usage "Logfile not submited" unless $ARGV[0];
open my $in, "<" . $ARGV[0] or usage "Can't open '$ARGV[0]' for reading";
$_=<$in>;
exit unless $_; # empty file
# Determining regular expression, depending on log format
my $logre=qr{^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+)};
$logre=qr{^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\]} unless /$logre/;
while (<$in>) {
/$logre/ && do {
my $ltim=str2time($1);
print if $endtim >= $ltim && $ltim >= $startim;
};
};
This could be used like:
./timelapsinlog.pl -s=09:18 -e=09:24 /path/to/logfile
for printing logs between 09h18 and 09h24.
./timelapsinlog.pl -s='2017/01/23 09:18:12' /path/to/logfile
for printing from january 23th, 9h18'12" upto now.
In order to reduce perl code, I've used -s switch to permit auto-assignement of variables from commandline: -s=09:18 will populate a variable $s wich will contain 09:18. Care to not miss the equal sign = and no spaces!
Nota: This hold two diffent kind of regex for two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile
^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+) # ^Jan 1 01:23:45
^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\] # ^... [01/Jan/2017:01:23:45 +0000]
Quicker** bash version:
Answering to Gilles Quénot's comment, I've tried to create a bash version.
As this version seem quicker than perl version, I post them here:
#!/bin/bash
prog=${0##*/}
usage() {
cat <<EOUsage
Usage: $prog <start date> <end date> <logfile>
Each argument are required. End date could by `now`.
EOUsage
}
die() {
echo >&2 "ERROR $prog: $*"
exit 1
}
(($#==3))|| { usage; die 'Wrong number of arguments.';}
[[ -f $3 ]] || die "File not found."
# Conversion of argument to EPOCHSECONDS by asking `date` for the two conversions
{
read -r start
read -r end
} < <(
date -f - +%s <<<"$1"$'\n'"$2"
)
# Determing wich kind of log format, between "apache logs" and "system logs":
read -r oline <"$3" # read one log line
if [[ $oline =~ ^[^\ ]{3}\ +[0-9]{1,2}\ +([0-9]{2}:){2}[0-9]+ ]]; then
# Look like syslog format
sedcmd='s/^\([^ ]\{3\} \+[0-9]\{1,2\} \+\([0-9]\{2\}:\)\{2\}[0-9]\+\).*/\1/'
elif [[ $oline =~ ^[^\[]+\[[0-9]+/[^\ ]+/([0-9]+:){3}[0-9]+\ \+[0-9]+\] ]]; then
# Look like apache logs
sedcmd='s/^[0-9.]\+ \+[^ ]\+ \+[^ ]\+ \[\([^]]\+\)\].*$/\1/;s/:/ /;y|/|-|'
else
die 'Log format not recognized'
fi
# Print lines begining by `1<tabulation>`
sed -ne s/^1\\o11//p <(
# paste `bc` tests with log file
paste <(
# bc will do comparison against EPOCHSECONDS returned by date and $start - $end
bc < <(
# Create a bc function for testing against $start - $end.
cat <<EOInitBc
define void f(x) {
if ((x>$start) && (x<$end)) { 1;return ;};
0;}
EOInitBc
# Run sed to extract date strings from logfile, then
# run date to convert string to EPOCHSECONDS
sed "$sedcmd" <"$3" |
date -f - +'f(%s)'
)
) "$3"
)
Explanation
Script run sed to extract date strings from logfile
Pass date strings to date -f - +%s to convert in one run all strings to EPOCH (Unix Timestamp).
Run bc for the tests: print 1 if min > date > max or else print 0.
Run paste to merge bc output with logfile.
Finally run sed to find lines that match 1<tab> then replace match with nothing, then print.
So this script will fork 5 subprocess to do dedicated things by specialised tools, but won't do shell loop against each lines of logfile!
** Note:
Of course, this is quicker on my host because I run on a multicore processor, each task run parallelized!!
Conclusion:
This is not a program! This is an aggregation script!
If you consider bash not as a programming language, but as a super language or a tools aggregator, you could take the full power of all your tools!!
If someone encounters with the awk: invalid -v option, here's a script to get the most active IPs in a predefined time range:
cat <FILE_NAME> | awk '$4 >= "[04/Jul/2017:07:00:00" && $4 < "[04/Jul/2017:08:00:00"' | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20
Very quick and readable way to do it in Python. This seems to be faster than the bash version. (Computed time is displayed using an internal module which has been striped from this code)
./ext_lines.py -v -s 'Feb 12 00:23:00' -e 'Feb 15 00:23:00' -i /var/log/syslog.1
Total time : 445 ms 187 musec
Time per line : 7 musec 58 ns
Number of lines : 63,072
Number of extracted lines : 29,265
I can't compare this code with the daemon.log file used by others... But, here is my config
Operating System: Kubuntu 22.10
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kernel Version: 6.2.0-060200rc8-generic (64-bit)
Graphics Platform: X11
Processors: 16 × AMD Ryzen 7 5700U with Radeon Graphics
Memory: 14.9 GiB of RAM
The essential code could fit in just one line (dts = ...), but to make it more readable it's being "splited" in three. It's not only rather fast, it's also very compact :-)
from argparse import ArgumentParser, FileType
from datetime import datetime
from os.path import basename
from sys import argv, float_info
from time import mktime, localtime, strptime
__version__ = '1.0.0' # Workaround (internal use)
now = datetime.now
progname = basename(argv[0])
parser = ArgumentParser(description = 'Is Python strptime faster than sed and Perl ?',
prog = progname)
parser.add_argument('--version',
dest = 'version',
action = 'version',
version = '{} : {}'.format(progname,
str(__version__)))
parser.add_argument('-i',
'--input',
dest = 'infile',
default = '/var/log/syslog.1',
type = FileType('r',
encoding = 'UTF-8'),
help = 'Input file (stdin not yet supported)')
parser.add_argument('-f',
'--format',
dest = 'fmt',
default = '%b %d %H:%M:%S',
help = 'Date input format')
parser.add_argument('-s',
'--start',
dest = 'start',
default = None,
help = 'Starting date : >=')
parser.add_argument('-e',
'--end',
dest = 'end',
default = None,
help = 'Ending date : <=')
parser.add_argument('-v',
dest = 'verbose',
action = 'store_true',
default = False,
help = 'Verbose mode')
args = parser.parse_args()
verbose = args.verbose
start = args.start
end = args.end
infile = args.infile
fmt = args.fmt
############### Start code ################
lines = tuple(infile)
# Use defaut values if start or end are undefined
if not start :
start = lines[0][:14]
if not end :
end = lines[-1][:14]
# Convert start and end to timestamp
start = mktime(strptime(start,
fmt))
end = mktime(strptime(end,
fmt))
# Extract matching lines
t1 = now()
dts = [(x, line) for x, line in [(mktime(strptime(line[:14 ],
fmt)),
line) for line in lines] if start <= x <= end]
t2 = now()
# Print stats
if verbose :
total_time = 'Total time'
time_p_line = 'Time per line'
n_lines = 'Number of lines'
n_ext_lines = 'Number of extracted lines'
print(f'{total_time:<25} : {((t2 - t1) * 1000)} ms')
print(f'{time_p_line:<25} : {((t2 -t1) / len(lines) * 1000)} ms')
print(f'{n_lines:<25} : {len(lines):,}')
print(f'{n_ext_lines:<25} : {len(dts):,}')
# Print extracted lines
print(''.join([x[1] for x in dts]))
To parse the access.log precisely in a specified range, in this case only the last 10 minutes (based from EPOCH aka number of seconds since 1970/01/01):
Input file:
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.4 - - [17/Feb/2023:17:25:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.5 - - [17/Feb/2023:17:15:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
Perl's oneliner:
With the reliable Time::Piece time parser, using strptime() to parse date, and strftime() to format new one. This module is installed in core (by default) thats is not the case with not reliable Date::Parse
$ perl -MTime::Piece -sne '
BEGIN{
my $t = localtime;
our $now = $t->epoch;
our $monthsRe = join "|", $t->mon_list;
}
m!\[(\d{2}/(?:$monthsRe)/\d{4}:\d{2}:\d{2}:\d{2})\s!;
my $d = Time::Piece->strptime("$1", "%d/%b/%Y:%H:%M:%S");
my $old = $d->strftime("%s");
my $diff = (($now - $old) + $gap);
if ($diff > $min and $diff < $max) {print}
' -- -gap=$({ echo -n "0"; date "+%:::z*3600"; } | bc) \
-min=0 \
-max=600 access.log
Explanations of arguments: -gap, -min, -max switches
-gap the $((7*3600)) aka 25200 seconds, is the gap with UTC : +7 hours in seconds in my current case 🇹🇭 (Thai TZ) ¹ rewrote as { echo -n "0"; date "+%:::z*3600"; } | bc if you have GNU date. If not, use another way to set the gap
-min the min seconds since we print log matching line(s)
-max the max seconds until we print log matching line(s)
to know the gap from UTC, take a look to:
¹
$ LANG=C date
Fri Feb 17 15:50:13 +07 2023
The +07 is the gap.
This way, you can filter exactly at the exact seconds range with this snippet.
Sample output
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
My server is having unusually high CPU usage, and I can see Apache is using way too much memory.
I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find the attacker?
I've used the following line, to find the 10 most "active" IPs:
cat access.log | awk '{print $1}' |sort |uniq -c |sort -n |tail
The top 5 IPs have about 200 times as many requests to the server, as the "average" user. However, I can't find out if these 5 are just very frequent visitors, or they are attacking the servers.
Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?
Cheers!
UPDATED 23 OCT 2011 - The commands I needed:
Get entries within last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print Date FS $4}' access.log
Get most active IPs within the last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print $1}' access.log | sort |uniq -c |sort -n | tail
Get entries within relative timespan
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print Date FS Date2 FS $4}' access.log
Get entries within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $0}' access.log
Get most active IPs within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $1}' access.log | sort |uniq -c |sort -n | tail
yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' access_log
assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours' or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23
So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print $1}' | sort |uniq -c |sort -n | tail
If you wanted to use a range specify two date variables and construct your expression appropriately.
so if you wanted do find something between 2-4hrs ago your expression might looks something like this
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date && $4 < Date2 {print Date, Date2, $4} access_log'
Here is a question I answered regarding dates in bash you might find helpful.
Print date for the monday of the current week (in bash)
Introduction
As accepted answer from matchew is wrong, regarding Antoine's comment: Because awk will do alphanumeric comparisons. So if you logfile list events across the end and begin of two months:
[27/Feb/2023:00:00:00
[28/Feb/2023:00:00:00
[01/Mar/2023:00:00:00
awk will consider:
[01/Mar/2023:00:00:00 < [27/Feb/2023:00:00:00 < [28/Feb/2023:00:00:00
Wich is wrong! You have to compare date stings!!
For this, you could use libraries. Conforming to the language
you use.
I will present here two different way, one using perl with Date::Parse library, and another (quicker), using bash with GNU/date.
As this is a common perl task
And because this is not exactly same than extract last 10 minutes from logfile where it's about a bunch of time upto the end of logfile.
And because I've needed them, I (quickly) wrote this:
#!/usr/bin/perl -ws
# This script parse logfiles for a specific period of time
sub usage {
printf "Usage: %s -s=<start time> [-e=<end time>] <logfile>\n";
die $_[0] if $_[0];
exit 0;
}
use Date::Parse;
usage "No start time submited" unless $s;
my $startim=str2time($s) or die;
my $endtim=str2time($e) if $e;
$endtim=time() unless $e;
usage "Logfile not submited" unless $ARGV[0];
open my $in, "<" . $ARGV[0] or usage "Can't open '$ARGV[0]' for reading";
$_=<$in>;
exit unless $_; # empty file
# Determining regular expression, depending on log format
my $logre=qr{^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+)};
$logre=qr{^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\]} unless /$logre/;
while (<$in>) {
/$logre/ && do {
my $ltim=str2time($1);
print if $endtim >= $ltim && $ltim >= $startim;
};
};
This could be used like:
./timelapsinlog.pl -s=09:18 -e=09:24 /path/to/logfile
for printing logs between 09h18 and 09h24.
./timelapsinlog.pl -s='2017/01/23 09:18:12' /path/to/logfile
for printing from january 23th, 9h18'12" upto now.
In order to reduce perl code, I've used -s switch to permit auto-assignement of variables from commandline: -s=09:18 will populate a variable $s wich will contain 09:18. Care to not miss the equal sign = and no spaces!
Nota: This hold two diffent kind of regex for two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile
^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+) # ^Jan 1 01:23:45
^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\] # ^... [01/Jan/2017:01:23:45 +0000]
Quicker** bash version:
Answering to Gilles Quénot's comment, I've tried to create a bash version.
As this version seem quicker than perl version, I post them here:
#!/bin/bash
prog=${0##*/}
usage() {
cat <<EOUsage
Usage: $prog <start date> <end date> <logfile>
Each argument are required. End date could by `now`.
EOUsage
}
die() {
echo >&2 "ERROR $prog: $*"
exit 1
}
(($#==3))|| { usage; die 'Wrong number of arguments.';}
[[ -f $3 ]] || die "File not found."
# Conversion of argument to EPOCHSECONDS by asking `date` for the two conversions
{
read -r start
read -r end
} < <(
date -f - +%s <<<"$1"$'\n'"$2"
)
# Determing wich kind of log format, between "apache logs" and "system logs":
read -r oline <"$3" # read one log line
if [[ $oline =~ ^[^\ ]{3}\ +[0-9]{1,2}\ +([0-9]{2}:){2}[0-9]+ ]]; then
# Look like syslog format
sedcmd='s/^\([^ ]\{3\} \+[0-9]\{1,2\} \+\([0-9]\{2\}:\)\{2\}[0-9]\+\).*/\1/'
elif [[ $oline =~ ^[^\[]+\[[0-9]+/[^\ ]+/([0-9]+:){3}[0-9]+\ \+[0-9]+\] ]]; then
# Look like apache logs
sedcmd='s/^[0-9.]\+ \+[^ ]\+ \+[^ ]\+ \[\([^]]\+\)\].*$/\1/;s/:/ /;y|/|-|'
else
die 'Log format not recognized'
fi
# Print lines begining by `1<tabulation>`
sed -ne s/^1\\o11//p <(
# paste `bc` tests with log file
paste <(
# bc will do comparison against EPOCHSECONDS returned by date and $start - $end
bc < <(
# Create a bc function for testing against $start - $end.
cat <<EOInitBc
define void f(x) {
if ((x>$start) && (x<$end)) { 1;return ;};
0;}
EOInitBc
# Run sed to extract date strings from logfile, then
# run date to convert string to EPOCHSECONDS
sed "$sedcmd" <"$3" |
date -f - +'f(%s)'
)
) "$3"
)
Explanation
Script run sed to extract date strings from logfile
Pass date strings to date -f - +%s to convert in one run all strings to EPOCH (Unix Timestamp).
Run bc for the tests: print 1 if min > date > max or else print 0.
Run paste to merge bc output with logfile.
Finally run sed to find lines that match 1<tab> then replace match with nothing, then print.
So this script will fork 5 subprocess to do dedicated things by specialised tools, but won't do shell loop against each lines of logfile!
** Note:
Of course, this is quicker on my host because I run on a multicore processor, each task run parallelized!!
Conclusion:
This is not a program! This is an aggregation script!
If you consider bash not as a programming language, but as a super language or a tools aggregator, you could take the full power of all your tools!!
If someone encounters with the awk: invalid -v option, here's a script to get the most active IPs in a predefined time range:
cat <FILE_NAME> | awk '$4 >= "[04/Jul/2017:07:00:00" && $4 < "[04/Jul/2017:08:00:00"' | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20
Very quick and readable way to do it in Python. This seems to be faster than the bash version. (Computed time is displayed using an internal module which has been striped from this code)
./ext_lines.py -v -s 'Feb 12 00:23:00' -e 'Feb 15 00:23:00' -i /var/log/syslog.1
Total time : 445 ms 187 musec
Time per line : 7 musec 58 ns
Number of lines : 63,072
Number of extracted lines : 29,265
I can't compare this code with the daemon.log file used by others... But, here is my config
Operating System: Kubuntu 22.10
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kernel Version: 6.2.0-060200rc8-generic (64-bit)
Graphics Platform: X11
Processors: 16 × AMD Ryzen 7 5700U with Radeon Graphics
Memory: 14.9 GiB of RAM
The essential code could fit in just one line (dts = ...), but to make it more readable it's being "splited" in three. It's not only rather fast, it's also very compact :-)
from argparse import ArgumentParser, FileType
from datetime import datetime
from os.path import basename
from sys import argv, float_info
from time import mktime, localtime, strptime
__version__ = '1.0.0' # Workaround (internal use)
now = datetime.now
progname = basename(argv[0])
parser = ArgumentParser(description = 'Is Python strptime faster than sed and Perl ?',
prog = progname)
parser.add_argument('--version',
dest = 'version',
action = 'version',
version = '{} : {}'.format(progname,
str(__version__)))
parser.add_argument('-i',
'--input',
dest = 'infile',
default = '/var/log/syslog.1',
type = FileType('r',
encoding = 'UTF-8'),
help = 'Input file (stdin not yet supported)')
parser.add_argument('-f',
'--format',
dest = 'fmt',
default = '%b %d %H:%M:%S',
help = 'Date input format')
parser.add_argument('-s',
'--start',
dest = 'start',
default = None,
help = 'Starting date : >=')
parser.add_argument('-e',
'--end',
dest = 'end',
default = None,
help = 'Ending date : <=')
parser.add_argument('-v',
dest = 'verbose',
action = 'store_true',
default = False,
help = 'Verbose mode')
args = parser.parse_args()
verbose = args.verbose
start = args.start
end = args.end
infile = args.infile
fmt = args.fmt
############### Start code ################
lines = tuple(infile)
# Use defaut values if start or end are undefined
if not start :
start = lines[0][:14]
if not end :
end = lines[-1][:14]
# Convert start and end to timestamp
start = mktime(strptime(start,
fmt))
end = mktime(strptime(end,
fmt))
# Extract matching lines
t1 = now()
dts = [(x, line) for x, line in [(mktime(strptime(line[:14 ],
fmt)),
line) for line in lines] if start <= x <= end]
t2 = now()
# Print stats
if verbose :
total_time = 'Total time'
time_p_line = 'Time per line'
n_lines = 'Number of lines'
n_ext_lines = 'Number of extracted lines'
print(f'{total_time:<25} : {((t2 - t1) * 1000)} ms')
print(f'{time_p_line:<25} : {((t2 -t1) / len(lines) * 1000)} ms')
print(f'{n_lines:<25} : {len(lines):,}')
print(f'{n_ext_lines:<25} : {len(dts):,}')
# Print extracted lines
print(''.join([x[1] for x in dts]))
To parse the access.log precisely in a specified range, in this case only the last 10 minutes (based from EPOCH aka number of seconds since 1970/01/01):
Input file:
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.4 - - [17/Feb/2023:17:25:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.5 - - [17/Feb/2023:17:15:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
Perl's oneliner:
With the reliable Time::Piece time parser, using strptime() to parse date, and strftime() to format new one. This module is installed in core (by default) thats is not the case with not reliable Date::Parse
$ perl -MTime::Piece -sne '
BEGIN{
my $t = localtime;
our $now = $t->epoch;
our $monthsRe = join "|", $t->mon_list;
}
m!\[(\d{2}/(?:$monthsRe)/\d{4}:\d{2}:\d{2}:\d{2})\s!;
my $d = Time::Piece->strptime("$1", "%d/%b/%Y:%H:%M:%S");
my $old = $d->strftime("%s");
my $diff = (($now - $old) + $gap);
if ($diff > $min and $diff < $max) {print}
' -- -gap=$({ echo -n "0"; date "+%:::z*3600"; } | bc) \
-min=0 \
-max=600 access.log
Explanations of arguments: -gap, -min, -max switches
-gap the $((7*3600)) aka 25200 seconds, is the gap with UTC : +7 hours in seconds in my current case 🇹🇭 (Thai TZ) ¹ rewrote as { echo -n "0"; date "+%:::z*3600"; } | bc if you have GNU date. If not, use another way to set the gap
-min the min seconds since we print log matching line(s)
-max the max seconds until we print log matching line(s)
to know the gap from UTC, take a look to:
¹
$ LANG=C date
Fri Feb 17 15:50:13 +07 2023
The +07 is the gap.
This way, you can filter exactly at the exact seconds range with this snippet.
Sample output
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
Is there a way to get the size of a remote file like
http://api.twitter.com/1/statuses/public_timeline.json
in shell script?
You can download the file and get its size. But we can do better.
Use curl to get only the response header using the -I option.
In the response header look for Content-Length: which will be followed by the size of the file in bytes.
$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134
To get the size use a filter to extract the numeric part from the output above:
$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
Two caveats to the other answers:
Some servers don't return the correct Content-Length for a HEAD request, so you might need to do the full download.
You'll likely get an unrealistically large response (compared to a modern browser) unless you specify gzip/deflate headers.
Also, you can do this without grep/awk or piping:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null
And the same request with compression:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
Similar to codaddict's answer, but without the call to grep:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily answer body, not that of the real file.
Suppose you have the following url:
$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso
With curl, you could obtain:
$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...
HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...
That's why I prefer using HEAD, which is an alias to the lwp-request command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.
So to retrieve the size of the debian iso DVD, one could do for example:
$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}
Please note that:
this method will require launching only one process
it will work only with bash, because of the special expansion syntax used
For other shells, you may have to resort to sed, awk, grep et al..
I think the easiest way to do this would be to:
use cURL to run in silent mode -s,
pull only the headers -I (so as to avoid downloading the whole file)
then do a case insensitive grep -i
and return the second arg using awk $2.
output is returned as bytes
Examples:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'
//output: 52
or
curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'
//output: 86709
or
curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'
//output: 1073741824
Show as Kilobytes/Megabytes
If you would like to show the size in Kilobytes then change the awk to:
awk '{print $2/1024}'
or Megabytes
awk '{print $2/1024/1024}'
The accepted solution was not working for me, this is:
curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:
remote_file_size () {
printf "%q" "$*" |
xargs curl -sI |
grep Content-Length |
awk '{print $2}' |
tr -d '\040\011\012\015' |
gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
# ^ # that lack the GNU coreutils by default, i.e.,
# | # non-Linux systems
# |
# | # in other words, if you're on Linux, remove this
# | # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # | |
# +----------------------------------------+
This will show you a detailed info about the ongoing download
you just need to specify an URL like below example.
$ curl -O -w 'We downloaded %{size_download} bytes\n'
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7328k 100 7328k 0 0 244k 0 0:00:29 0:00:29 --:--:-- 365k
We downloaded 7504706 bytes
For automated purposes you'll just need to add the command to your
script file.
To combine all the above for me works:
URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2
This will return just the content length in bytes:
3767500800
You can kinda do it like this, including auto-following 301/302 redirections :
curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' |
mawk 'NF*=!_<NF' \
OFS= FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 41
It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.
You may also have to add the -g flag so it can auto handle switchover from vanilla http to https :
curl -gILs 'http://apple.com' |
mawk 'NF *= !_<NF' OFS= \
FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 304
2 106049
'(I''m *guessing* this might be the main site,
and first item was the redirection page ? )'
Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.
first write the URL of the files in a file:
cat url_of_files.txt
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
...
then from the command line (from the same directory as your url_of_files.txt):
eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ? gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'
This is for checking file sizes ranging from bytes to Gbs. I use this line to check the fits data files being made available by the JWST team.
It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.
result:
...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
My solution is using awk END to ensure to grep only the last Content-length:
function curl2contentlength() {
curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $#
./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"
10806508
In fact without it would have been
0
0
10806508
I use like this ([Cc]ontent-[Ll]ength:), because I got server give multiple Content-Length character at header response
curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'
Accept-Ranges: bytes
Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length
Server: WowzaStreamingEngine/4.5.0
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD
Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range
Date: Tue, 10 Jan 2017 01:56:08 GMT
Content-Type: video/MP2T
Content-Length: 666460
different solution:
ssh userName#IP ls -s PATH | grep FILENAME | awk '{print$1}'
gives you the size in KB