Parsing access log file [duplicate]

Parsing access log file [duplicate] - bash

This question already has answers here:
Save modifications in place with awk
(7 answers)
Closed 8 years ago.
I have a task to receive such information from access log file (in common log format) using unix command line:
Count # of requests with status_code == 200
There is a piece of that log file (wich includes data for 1 day):
127.0.0.1 "1.1.1.1" - [02/Dec/2014:14:30:00 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:14:32:30 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:20:12 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:22:20 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:30:10 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:15:35:15 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:25:11 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 7
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:27:10 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 8
127.0.0.1 "1.1.1.1" - [02/Dec/2014:16:33:12 +0000] "POST /server/ad?uid=abc&type=INV HTTP/1.1" 200 3966 10
I use this:
$ awk -F[:\ ] '{count[$5]++}; $12 == 200 { hour[$5]++} END { for (i in hour) print i, count[i] }' acces_log.log
And receive this:
14 2
15 4
16 3
But there is a small tip: all results should be stored in file. I wonder, how can I do this from command line.
Regards

All Linux/Unix and DOS command lines understand the symbols <, <<, >, >> for redirection.
To redirect output to a file, use
awk '{....}' > outputFile
#------------^ -- redirection
This redirection will always create a new outputFile, even if one exists already.
To append (extra) data to a file use
awk '{ .... }' >> outputFile
#--------------^^ -- append
IHTH

Related

Manually calculate requests per minute from a HTTP access log

I have a logfile
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
I want to calculate the requests per minute and output something like this.
2012/01/01 00:00 2
2012/01/01 00:01 33
I was thinking of looping over the whole file and extracting out the timestamps to an array, using regex like this.
File.open("log.txt") do |f|
f.each_line do |line|
timestamps << line[/\[(\d{2})\/([a-zA-Z]{3})\/(\d{4}):(\d{2}):(\d{2}) (\d{2})\s(-\d{4})]/]
end
Then using that array to somehow calculate the requests per second, is there a better way I can do this using ruby? Without using CLI tools.

It's not the prettiest, but this is what you're going to want to do.
require 'time'
TIMESTAMP_REGEX = %r{\[(.*?)\]} # extract everything between the []
datetimes = log.scan(TIMESTAMP_REGEX).flatten.map { |log_time| DateTime.parse(log_time.sub(":", ' ')) } # get the results from the regex and make an array of DateTime objects
results = Hash.new(0)
datetimes.each do |datetime|
time = datetime.strftime('%Y/%m/%d %H:%M')
results[time] += 1
end
results.each do |k,v|
puts "#{k}: #{v} requests"
end
There are more optimal ways to do this — including a moderately-lengthy one-liner — but if you're looking for straightforwardness this is the way to go.

Puma log config -- 200 OK going to STDERR

My pumaconf.rb has this line:
stdout_redirect '/var/log/puma/puma.log', '/var/log/puma/puma-err.log', true
puma.log contains:
=== puma startup: 2016-09-07 23:33:01 +0000 === [4433] - Worker 0 (pid: 4436) booted, phase: 0
puma-err.log contains:
10.10.10.1 - - [08/Sep/2016:00:03:27 +0000] "GET /health HTTP/1.0" 200 - 0.0005
10.10.10.1 - - [08/Sep/2016:00:03:27 +0000] "GET /health HTTP/1.0" 200 - 0.0005
Any tips on what I may have incorrectly setup?
Edit:
A little more info on my side -- I set up a new test in a clone of the puma code, with the following config:
log_requests
stdout_redirect "t4-stdout", "t4-stderr", true
pidfile "t4-pid"
bind "tcp://0.0.0.0:10103"
rackup File.expand_path('../hello.ru', File.dirname(__FILE__))
daemonize false
And wrote a test around "200 OK should only be in stdout, not in stderr" and everything is passing.
I'm pretty stumped why spinning up a puma server from the base repo is acting so differently, but I will keep digging.

Read in and parse textfile

I need to parse the file (txt) and display 10 lines of queries by the number of bytes. (sort) I have a file log.txt:
164.94.76.83.cust.bluewin.ch - - [17/Oct/2006:07:56:45 -0700] "GET /example/serif.css HTTP/1.1" 200 4824 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [03/Oct/2006:07:56:45 -0700] "GET /example/example.js HTTP/1.1" 200 6685 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [06/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/Nuke.png HTTP/1.1" 200 19757 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [15/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/diablo.png HTTP/1.1" 200 12597 "http://www.example.org/example/When/200x/2003/07/25/NotGaming" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
164.94.76.83.cust.bluewin.ch - - [19/Oct/2006:07:56:46 -0700] "GET /example/When/200x/2003/07/25/-big/Nuke.jpg HTTP/1.1" 403 322 "
Output must be (with count in % and links - and sort DESC):
1. http://www.example.org/example/When/200x/2006/09/25/ - 3100 - 74%
2. http://www.example.org/example/ - 1000 - 24%
3. http://www.example.org/example/genx/docs/Guide.html - 91 - 2%
That is, it is necessary to highlight the line for the maximum number of bytes in the request sort and indicate the amount of interest.

Since you insist on a shell-only approach, the closest solution to what you ask seems to be something like this:
sort -t ' ' -k 10 -r -n log.txt | head -n 10 | awk '{print $1 $7, $10}'
You can probably do (much) better by either setting a more useful LogFormat when logging the requests, or by allowing a Perl or Python parser when processing the logs.

write output to a file dynamically in unix

i want the output of the script need to be write it into the file i.e QR_bar_code_$fdt.sh
here my script file qr_script.sh
dt=$(date "+%d/%B/%Y")
prev=$(date --date yesterday "+%d/%B/%Y")
fdt=$(date --date yesterday "+%d-%m-%Y")
echo "$fdt"
echo "Date :$prev"
grep -F "$prev" access_log|grep -F 'GET /?p='
echo "count :$(grep -F "$prev" access_log|grep -F 'GET /?p='|wc -l)"
cat >QR_bar_code_$fdt.sh
exec >>QR_bar_code_$fdt.sh
I am not sure about last 2 lines.
my problem is when i run the script the file created but the output is not append to newly created file and script execution waits after creation of file when i write something that will be appends to the file but not the output generated by the script
what my newly created file must be like output of the qr_script.sh
that will be like this
Date :13/May/2014
183.82.99.35 - - [13/May/2014:03:03:16 -0400] "GET /?p=135873 HTTP/1.1" 301 281
183.82.99.35 - - [13/May/2014:03:03:49 -0400] "GET /?p=134201 HTTP/1.1" 301 281
183.82.99.35 - - [13/May/2014:03:04:06 -0400] "GET /?p=134201 HTTP/1.1" 301 281
183.82.99.35 - - [13/May/2014:03:04:25 -0400] "GET /?p=134201 HTTP/1.1" 301 281
count :4
if anybody have any idea about my problem.Help Me to resolve it
Thanks in Advance

Try
prev=$(date --date yesterday "+%d/%B/%Y")
fdt=$(date --date yesterday "+%d-%m-%Y")
echo "$fdt" >QR_bar_code_$fdt.sh
echo "Date :$prev" >>QR_bar_code_$fdt.sh
grep -F "$prev" access_log|grep -F 'GET /?p=' >>QR_bar_code_$fdt.sh
echo "count :$(grep -F "$prev" access_log|grep -F 'GET /?p='|wc -l)" >>QR_bar_code_$fdt.sh

Timing an http server response using bash simple tools

Is that written on the title
I need it to measure a server load time, and in case this value is higher than a threshold, I restart the web server automatically.
How to time an http server response using simple GNU bash?

You could script and action the output of ab or apache benchmark. Also ensure you have %D enabled as a logformat. So rather than scripting a test you could script to tail log files and if time taken above threshold to restart.
Here is a script:
#!/bin/bash
# alert threshold - amount of times to go over limit before capturing it as an issue;
ALERT_THRESHOLD=3
# alert time in seconds
# so this is the time it takes to load the page anything exceeding set seconds
ALERT_LIMIT=60;
ALERT_LIMIT_MILI=$(echo $ALERT_LIMIT|awk '{$3=$1*1000; print $3}')
TAIL_LIMIT=10;
LOG_FILE="/var/log/apache2/access.log"
RESULT=$(tail -n $TAIL_LIMIT $LOG_FILE|awk -v alimit=$ALERT_LIMIT_MILI -v athreshold=$ALERT_THRESHOLD 'BEGIN{QUERY=""; i=0; SENDALERT=0} {
if ($1 > alimit) {
i++; QUERY=QUERY" TIME_TAKEN:"($1/1000)"seconds,"$1"ms|DATE:"$5"|STATUS:"$10"|URL:"$12"\n";
if (i >= athreshold){
SENDALERT++;
};
}
} END { print "QUERY:"QUERY"\nSENDALERT:"SENDALERT; }')
SENDALERT=$(echo -e $RESULT|awk -F"SENDALERT:" '{print $2}')
echo $SENDALERT
if [[ $SENDALERT >=1 ]]; then
echo "restaring apache"
content=$(echo -e $RESULT|awk -F"QUERY:" '{print $2}')
(for lines in $(echo $content); do echo $lines; done;)
#(for lines in $(echo $content); do echo $lines; done;)| mail -s "REstarting apache $(date) " root#localhost
fi
The alert time in seconds was set to 0 for my tests, you will see alert level 8,there are 10 lines that have these time values, so once variable i hits limit which is 3, it starts to inrement sendalert variable, this is why it reports it as 8, since the first two were passed as part of threshold.
running it:
./script.sh
ALERT LEVEL: 8
restaring apache
TIME_TAKEN:0.108seconds,108ms|DATE:[07/Mar/2013:22:12:51|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.299seconds,299ms|DATE:[07/Mar/2013:22:12:51|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:3.432seconds,3432ms|DATE:[07/Mar/2013:22:12:58|STATUS:200|URL:"-"
TIME_TAKEN:0.217seconds,217ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.117seconds,117ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.101seconds,101ms|DATE:[07/Mar/2013:22:12:58|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:3.255seconds,3255ms|DATE:[07/Mar/2013:22:13:03|STATUS:200|URL:"-"
TIME_TAKEN:0.351seconds,351ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.242seconds,242ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
TIME_TAKEN:0.112seconds,112ms|DATE:[07/Mar/2013:22:13:03|STATUS:304|URL:"http://localhost/"
SENDALERT:8
---- apache access log:
108 127.0.0.1 - - [07/Mar/2013:22:12:51 +0000] "GET /icons/folder.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
299 127.0.0.1 - - [07/Mar/2013:22:12:51 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 188 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
3432 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET / HTTP/1.1" 200 783 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
217 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/blank.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
117 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/folder.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
101 127.0.0.1 - - [07/Mar/2013:22:12:58 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 187 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
3255 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET / HTTP/1.1" 200 782 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
351 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/folder.gif HTTP/1.1" 304 187 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
242 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/compressed.gif HTTP/1.1" 304 188 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
112 127.0.0.1 - - [07/Mar/2013:22:13:03 +0000] "GET /icons/blank.gif HTTP/1.1" 304 186 "http://localhost/" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0"
where I have put %D as first column output and in the awk statement in the script I am comparing $1's value against limit.. The rest of the $10's etc are according to where things appear in my log..
You could then put it in some script folder, remove verbosity or pump outut to dev null and run it as part of cron every 10 minutes or something
enjoy

This is a one-liner that solves the issue
(time wget -p --no-cache --delete-after www.example.com -q) 2>&1 >/dev/null | grep real | awk -F"[m\t]" '{ printf "%s\n", $2*60+$3 }'
it returns the load time of a page load in seconds and fraction of a second using a dot separator
time(5)
wget
awk

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parsing access log file [duplicate] - bash

Related

Manually calculate requests per minute from a HTTP access log

Puma log config -- 200 OK going to STDERR

Read in and parse textfile

write output to a file dynamically in unix

Timing an http server response using bash simple tools

Categories

Resources