Manually calculate requests per minute from a HTTP access log

Manually calculate requests per minute from a HTTP access log - ruby

I have a logfile
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
I want to calculate the requests per minute and output something like this.
2012/01/01 00:00 2
2012/01/01 00:01 33
I was thinking of looping over the whole file and extracting out the timestamps to an array, using regex like this.
File.open("log.txt") do |f|
f.each_line do |line|
timestamps << line[/\[(\d{2})\/([a-zA-Z]{3})\/(\d{4}):(\d{2}):(\d{2}) (\d{2})\s(-\d{4})]/]
end
Then using that array to somehow calculate the requests per second, is there a better way I can do this using ruby? Without using CLI tools.

It's not the prettiest, but this is what you're going to want to do.
require 'time'
TIMESTAMP_REGEX = %r{\[(.*?)\]} # extract everything between the []
datetimes = log.scan(TIMESTAMP_REGEX).flatten.map { |log_time| DateTime.parse(log_time.sub(":", ' ')) } # get the results from the regex and make an array of DateTime objects
results = Hash.new(0)
datetimes.each do |datetime|
time = datetime.strftime('%Y/%m/%d %H:%M')
results[time] += 1
end
results.each do |k,v|
puts "#{k}: #{v} requests"
end
There are more optimal ways to do this — including a moderately-lengthy one-liner — but if you're looking for straightforwardness this is the way to go.

Related

Logstash Grok filter

can anybody help me with creating the grok filter for the below logs pattern
Current grok filter works if you remove the first line from the logs line
"message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}'
Combined apache pattern didn't work.
Logs example:-
16387 172.16.8.104 10.100.6.1 [11/Mar/2016:04:10:30 +0100] "GET /test/theme/test_Test_displaytag.css;jsessionid=1fjeyhu11wnj41wkuouxhos9nr HTTP/1.0" 200 5737 0 38933 + 1754 6073 Test.com "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

you can use
https://grokdebug.herokuapp.com/
to build it like below
%{NUMBER:nubber:int} %{IP:ip1} %{IP:ip2} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}"

Read in and parse textfile

I need to parse the file (txt) and display 10 lines of queries by the number of bytes. (sort) I have a file log.txt:
164.94.76.83.cust.bluewin.ch - - [17/Oct/2006:07:56:45 -0700] "GET /example/serif.css HTTP/1.1" 200 4824 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [03/Oct/2006:07:56:45 -0700] "GET /example/example.js HTTP/1.1" 200 6685 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [06/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/Nuke.png HTTP/1.1" 200 19757 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [15/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/diablo.png HTTP/1.1" 200 12597 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [19/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/-big/Nuke.jpg HTTP/1.1" 403 322 "
Output must be (with count in % and links - and sort DESC):
1. http://www.example.org/example/When/200x/2006/09/25/ - 3100 - 74%
2. http://www.example.org/example/ - 1000 - 24%
3. http://www.example.org/example/genx/docs/Guide.html - 91 - 2%
That is, it is necessary to highlight the line for the maximum number of bytes in the request sort and indicate the amount of interest.

Since you insist on a shell-only approach, the closest solution to what you ask seems to be something like this:
sort -t ' ' -k 10 -r -n log.txt | head -n 10 | awk '{print $1 $7, $10}'
You can probably do (much) better by either setting a more useful LogFormat when logging the requests, or by allowing a Perl or Python parser when processing the logs.

Parsing access log file [duplicate]

This question already has answers here:
Save modifications in place with awk
(7 answers)
Closed 8 years ago.
I have a task to receive such information from access log file (in common log format) using unix command line:
Count # of requests with status_code == 200
There is a piece of that log file (wich includes data for 1 day):
127.0.0.1 "1.1.1.1" - [02/Dec/2014:14:30:00 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:14:32:30 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:20:12 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:22:20 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:30:10 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:35:15 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:25:11 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:27:10 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:33:12 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 10
I use this:
$ awk -F[:\ ] '{count[$5]++}; $12 == 200 { hour[$5]++} END { for (i in hour) print i, count[i] }' acces_log.log
And receive this:
14 2
15 4
16 3
But there is a small tip: all results should be stored in file. I wonder, how can I do this from command line.
Regards

All Linux/Unix and DOS command lines understand the symbols <, <<, >, >> for redirection.
To redirect output to a file, use
awk '{....}' > outputFile
#------------^ -- redirection
This redirection will always create a new outputFile, even if one exists already.
To append (extra) data to a file use
awk '{ .... }' >> outputFile
#--------------^^ -- append
IHTH

Search for particular fields and print them in new file in unix

I am searching for one line command to search for few words in 'myfile.txt' and if pattern match then cut that words and print it in new file.
myfile.txt.
162.23.55.222 - - [07/Dec/2013:00:40:35 +0000] 0.033 POST /view/SBEventListComponentController?componentUid=comp_00008OM8 HTTP/1.1 200 77282 Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11 http://sportsbeta.ladbrokes.com/homepage 6476CC940C83EDF031FF2564EE108993.ecomprodsw012
162.16.87.1973 - - [07/Dec/2013:00:40:34 +0000] 0.131 POST /view/SBEventListComponentController?componentUid=comp_000080KW HTTP/1.1 200 82707 Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11 http://sportsbeta.ladbrokes.com/homepage 6476CC940C83EDF031FF2564EE108993.ecomprodsw012
162.23.22.542, 10.32.30.1 - - [07/Dec/2013:00:40:35 +0000] 0.224 GET /view/content/homepage HTTP/1.1 200 66233 Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11 - 6476CC940C83EDF031FF2564EE108993.ecomprodsw012
My output.txt must contains.
162.23.55.222 07/Dec/2013:00:40:35 http://sportsbeta.ladbrokes.com/homepage 6476CC940C83EDF031FF2564EE108993.ecomprodsw012
162.16.87.1973 07/Dec/2013:00:40:34 http://sportsbeta.ladbrokes.com/homepage
162.23.22.542, 10.32.30.1 07/Dec/2013:00:40:35
How can I search for more patterns and redirect it to another file.
I din't get why this down vote. I tried cut, grep and sed commands. But din't get expected results.
Thanks in advance.

Here's how to more or less do what I think you are asking. The first row of your output example includes the field at the end of the line but the second doesn't - I can't immediately see why
perl -l -n -a -e '($u)=/(http:\S+)/; if ($F[0] =~/,$/) { print "#F[0..1] ", substr($F[4],1) } else { print $F[4]," ", substr($F[3],1)," ", $u}' infile.txt infile.txt > outfile.txt

Timing an http server response using bash simple tools

Is that written on the title
I need it to measure a server load time, and in case this value is higher than a threshold, I restart the web server automatically.
How to time an http server response using simple GNU bash?

You could script and action the output of ab or apache benchmark. Also ensure you have %D enabled as a logformat. So rather than scripting a test you could script to tail log files and if time taken above threshold to restart.
Here is a script:
#!/bin/bash
# alert threshold - amount of times to go over limit before capturing it as an issue;
ALERT_THRESHOLD=3
# alert time in seconds
# so this is the time it takes to load the page anything exceeding set seconds
ALERT_LIMIT=60;
ALERT_LIMIT_MILI=$(echo $ALERT_LIMIT|awk '{$3=$1*1000; print $3}')
TAIL_LIMIT=10;
LOG_FILE="/var/log/apache2/access.log"
RESULT=$(tail -n $TAIL_LIMIT $LOG_FILE|awk -v alimit=$ALERT_LIMIT_MILI -v athreshold=$ALERT_THRESHOLD 'BEGIN{QUERY=""; i=0; SENDALERT=0} {
if ($1 > alimit) {
i++; QUERY=QUERY" TIME_TAKEN:"($1/1000)"seconds,"$1"ms|DATE:"$5"|STATUS:"$10"|URL:"$12"\n";
if (i >= athreshold){
SENDALERT++;
};
}
} END { print "QUERY:"QUERY"\nSENDALERT:"SENDALERT; }')
SENDALERT=$(echo -e $RESULT|awk -F"SENDALERT:" '{print $2}')
echo $SENDALERT
if [[ $SENDALERT >=1 ]]; then
echo "restaring apache"
content=$(echo -e $RESULT|awk -F"QUERY:" '{print $2}')
(for lines in $(echo $content); do echo $lines; done;)
#(for lines in $(echo $content); do echo $lines; done;)| mail -s "REstarting apache $(date) " root#localhost
fi
The alert time in seconds was set to 0 for my tests, you will see alert level 8,there are 10 lines that have these time values, so once variable i hits limit which is 3, it starts to inrement sendalert variable, this is why it reports it as 8, since the first two were passed as part of threshold.
running it:
./script.sh
ALERT LEVEL: 8
restaring apache
TIME_TAKEN:0.108seconds,108ms|DATE:[07/Mar/2013:22:12:51|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.299seconds,299ms|DATE:[07/Mar/2013:22:12:51|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:3.432seconds,3432ms|DATE:[07/Mar/2013:22:12:58|STATUS:200|URL:"-"
TIME_TAKEN:0.217seconds,217ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.117seconds,117ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.101seconds,101ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:3.255seconds,3255ms|DATE:[07/Mar/2013:22:13:03|STATUS:200|URL:"-"
TIME_TAKEN:0.351seconds,351ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.242seconds,242ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.112seconds,112ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
SENDALERT:8
---- apache access log:
108 127.0.0.1 - - [07/Mar/2013:22:12:51 +0000] "GET /icons/folder.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
299 127.0.0.1 - - [07/Mar/2013:22:12:51 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 188 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
3432 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET / HTTP/1.1" 200 783 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
217 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/blank.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
117 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/folder.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
101 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 187 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
3255 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET / HTTP/1.1" 200 782 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
351 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/folder.gif HTTP/1.1" 304 187 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
242 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 188 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
112 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/blank.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
where I have put %D as first column output and in the awk statement in the script I am comparing $1's value against limit.. The rest of the $10's etc are according to where things appear in my log..
You could then put it in some script folder, remove verbosity or pump outut to dev null and run it as part of cron every 10 minutes or something
enjoy

This is a one-liner that solves the issue
(time wget -p --no-cache --delete-after www.example.com -q) 2>&1 >/dev/null | grep real | awk -F"[m\t]" '{ printf "%s\n", $2*60+$3 }'
it returns the load time of a page load in seconds and fraction of a second using a dot separator
time(5)
wget
awk

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Manually calculate requests per minute from a HTTP access log - ruby

Related

Logstash Grok filter

Read in and parse textfile

Parsing access log file [duplicate]

Search for particular fields and print them in new file in unix

Timing an http server response using bash simple tools

Categories

Resources