Can't take filename from Apache logs - bash

I have an owncloud seted up myself.
Need to log for the downloads files by users.
I made the bash script which greps Apache logs and pus it to the file.
Example of line in file
/var/log/httpd/ssl_access_log-20200621-46.63.46.133 - - [18/Jun/2020:13:07:33 +0000] "GET /ocs/v2.php/apps/files_sharing/api/v1/shares?format=json&path=%2FHJC%2FMaster-Schedule%20Draft%20for%20SOP%20of%20HJC%20(10.10.2019).xlsx&shared_with_me=true HTTP/1.1" 200 108
How I can get file name "Master-Schedule Draft for SOP of HJC (10.10.2019).xlsx" ???

OK. Finally I found decision by using 'sed'
sed 's#+# #g;s#%#\\x#g' <my-log-file> | xargs -0 printf "%b" > <result-file>
It decoded URL, so all that remains to be done - get the 'path' value

Related

Getting a list of uls with wget using regex

I'm starting with page:
https://mysite/a"
I'd like to spider the page getting the full urls of any nested urls below this that begin with the same stem (like https://mysite/a/b ).
I've tried:
$ wget -r --spider --accept-regex "https://...*" 'https://.../' 2>test.txt
which produces a large amount of output inclusing what appear to be the urls I'm after like:
--2018-04-21 15:04:48-- https:/mysite/a/
Reusing existing connection to mysite:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'a/index.html.tmp.tmp'
How do I just print out a list of the urls?
Edit:
changed it to
$ wget -r --spider 'https://mysite/a/' |grep 'https://mysite/a*' 2>test.txt
as a test . No output is being saved in test.txt. The file is empty.

filter a log file data from a certain time range

I want to write a script that asks user for the first and last date and time of the interval we want to filter our log data and I need some help.
I don't exactly know how to really find the data from that range as I can't use a single regex.
my log file looks like this:
108.162.221.147 - - [04/Aug/2016:18:59:59 +0200] "GET / HTTP/1.1" 200 10254 "-"...
141.101.99.235 - - [04/Aug/2016:19:00:00 +0200] "GET / HTTP/1.1" 200 10255 ...
108.162.242.219 - - [04/Aug/2016:19:00:00 +0200] "GET / HTTP/1.1" 200 10255...
185.63.252.237 - - [04/Aug/2016:19:00:00 +0200] "CONNECT...
108.162.221.147 - - [04/Aug/2016:19:00:00 +0200] "GET /?...
185.63.252.237 - - [04/Aug/2016:19:00:01 +0200] "CONNECT....
etc...
my script:
#!/bin/bash
echo "enter the log file name "
read fname
echo "enter the start date and time "
read startdate
echo "enter the end fate and time "
read enddate
result=$(some code for filtering rows from this range)
echo "$result" > 'log_results'
echo "results written into /root/log_results file"
I tried using
sed -n "/"$startdate"/,/"$enddate"/p" "fname"
didn't word as it couldn't see the date format because of slashes, the regex doesn't also work, as it finds only those 2 dates from log(maybe I've been writing it wrong)
how do I do this?
Usually it's best to use some kind of dedicated log parsing software for this kind of task, so that you don't have to do what you're trying to do. It's also decidedly not a job for regular expressions. However, if you must do this with text processing tools such as grep, I would suggest a two-phase approach:
Generate a list of every timestamp you want to find.
Use grep -F to find all lines in your log that contain one of those timestamps.
For example, if you only wanted to find the middle five lines of your file (the ones with the timestamp [04/Aug/2016:19:00:00 +0200]), that would make step 1 very simple (as you are generating a single-item list, with just one timestamp in it).
echo '[04/Aug/2016:19:00:00 +0200]' > interesting_times
Then find all the lines with that timestamp:
grep -F -f interesting_times logfile
You could generate a shorter list by reducing the precision of the timestamp. For example to find two entire hours of log data:
echo '[04/Aug/2016:19' > interesting_times
echo '[04/Aug/2016:20' >> interesting_times
I leave it to you to determine how to generate the list of interesting times, but seriously look into purpose-built log parsing software.

Parse download speed from wget output in terminal

I have the following command
sudo wget --output-document=/dev/null http://speedtest.pixelwolf.ch which outputs
--2016-03-27 17:15:47-- http://speedtest.pixelwolf.ch/
Resolving speedtest.pixelwolf.ch (speedtest.pixelwolf.ch)... 178.63.18.88, 2a02:418:3102::6
Connecting to speedtest.pixelwolf.ch (speedtest.pixelwolf.ch) | 178.63.18.88|:80... connected.
HTTP Request sent, awaiting response... 200 OK
Length: 85 [text/html]
Saving to: `/dev/null`
100%[======================>]85 --.-K/s in 0s
2016-03-27 17:15:47 (8.79 MB/s) - `dev/null` saved [85/85]
I'd like to be able to parse the (8.79 MB/s) from the last line and store this in a file (or any other way I can get this into a local PHP file easily), I tried to store the full output by changing my command to --output-document=/dev/speedtest however this just saved "Could not reach website" in the file and not the terminal output of the command.
Not quite sure where to start with this, so any help would be awesome.
Not sure if it helps, but my intention is for this stored value (8.79) in this instance to be read by a PHP file and handled there, every 30 seconds which I'll achieve by: while true; do (run speed test and save speed variable to a file cmd); php handleSpeedTest.php; sleep 5; done where handleSpeedTest.php will be able to read that stored value and handle it accordingly.
I changed the URL to one that works. Redirected stderr onto stdout. Used grep --only-matching (-o) and a regex.
sudo wget -O /dev/null http://www.google.com 2>&1 | grep -o '\([0-9.]\+ [KM]B/s\)'

How to fix the error in the bash shell script?

I am trying a code in shell script. while I am trying to convert the code from batch script to shell script I am getting an error.
BATCH FILE CODE
:: Create a file with all latest snapshots
FOR /F "tokens=5" %%a in (' ec2-describe-snapshots ^|find "SNAPSHOT" ^|sort /+64') do set "var=%%a"
set "latestdate=%var:~0,10%"
call ec2-describe-snapshots |find "SNAPSHOT"|sort /+64 |find "%latestdate%">"%EC2_HOME%\Working\SnapshotsLatest_%date-today%.txt"
CODE IN SHELL SCRIPT
#Create a file with all latest snapshots
FOR snapshot_date in $(' ec2-describe-snapshots | grep -i "SNAPSHOT" |sort /+64') do set "var=$snapshot_date"
set "latestdate=$var:~0,10"
ec2-describe-snapshots |grep -i "SNAPSHOT" |sort /+64 | grep "$latestdate">"$EC2_HOME%/SnapshotsLatest_$today_date"
I want to sort the snapshots according to dates and to save the snapshots that are created in latest date in a file.
SAMPLE OUTPUT OF ece-describe-snapshots:
SNAPSHOT snap-5e20 vol-f660 completed 2013-12-10T08:00:30+0000 100% 109030037527 10 2013-12-10: Daily Backup for i-2111 (VolID:vol-f9a0 InstID:i-2601)
It will contain records like this
I got this code :
latestdate=$(ec2-describe-snapshots | grep ^SNAPSHOT | sort -k 5 | awk '{print $5}')
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate | > "$EC2_HOME/SnapshotsLatest_$today_date"
but getting this error :
grep: 2013-12-10T09:55:34+0000: No such file or directory
grep: 2013-12-11T04:16:49+0000: No such file or directory
grep: 2013-12-11T04:17:57+0000: No such file or directory
i have some snapshots made on amazon, i want to find the latest snapshots made on a date and then want to store them in a file. like date 2013-12-10 snapshots made on this date should be stored in file. Contents of snapshotslatest file should be
SNAPSHOT snap-c17f3 vol-f69a0 completed 2013-12-04T09:24:50+0000 100% 109030037‌​527 10 2013-12-04: Daily Backup for Sanjay_Test_Machine (VolID:vol-f66409a0 InstID:i-26048111)
SNAPSHOT snap-c7d617f9 vol-3d335f6b completed 2013-12-04T09:24:54+0000 100% 1090‌​30037527 10 2013-12-04: Daily Backup for sacht_VPC (VolID:vol-3db InstID:i-ed6)
please not that if there are snapshots created on 2013-12-10, 2013-12-11, 2013-12-12. It means that the latest_date should be 2013-12-12 and all the snaphshot created on 2013-12-12 should be saved in file.
Any suggestion or lead is appreciated.
Neither the batch script nor the shell script you posted are a good starting point so let's start from scratch. Sorry, this is too big for a comment.
You want to find the latest snapshots made on a date and then want to store them in a file.
What does that mean?
Do the snapshot files have a timestamp in their name or in their content?
If not - UNIX does not store file creation timestamps so is a last-modified timestamp adequate?
Do you literally want to concatenate all of your snapshot files into one singe file or do you want to create a file that has a list of the snapshot file names?
Post some sample input (e.g. some snapshot file names and contents if that's where the timestamp is stored) and the expected output given that input.
Update your question to address all of the above, do not try to reply in a comment.
Minor issue, you don't need a pipe when re-directing output, so your line to save should be
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate > "$EC2_HOME/SnapshotsLatest_$today_date"
Now the main issue here, is that the grep is messed up. I haven't worked with amazon snapshots, but judging by your example descriptions, you should be doing something like
latestdate=$(ec2-describe-snapshots | grep -oP "\d+-\d+-\d+" | sort -r | head -1)
This will get all the dates containing the form dddd-dd-dd from the file (I'm assuming the two dates in each snapshot line always match up), sort them in reverse order (latest first) and take the head which is the latest date, storing it in $latestdate.
Then to store all snapshots with the given date do something like
ec2-describe-snapshots | grep -oP "SNAPSHOT(.*?)$lastdateT(.*?)\)" > "$EC2_HOME/SnapshotsLatest_$today_date"
This will get all text starting with SNAPSHOT, containing the given date, and ending in a closing ")" and save it. Note, you may have to mess around with it a bit, if ")" can be present elsewhere.

AWSTATS issue : all records dropped

I have a problem when using AWSTATS to analyse my apache logs.
In the past everything worked well.
But now the log format has changed for my server.
Old format example:
194.206.22.25 - - [14/Dec/2009:12:23:33 +0100] "GET /gPM-Systems/css/default.css HTTP/1.1" 404 1036
New format example:
356652,mics,194.206.22.24,194.206.22.24,-,[05/Jul/2011:15:11:18 +0200],"GET /index.html HTTP/x.x",302,-
For the old format the good LogFormat to choose was 4.
Now it is this custom format:
LogFormat="%other %other %host %other %logname %time1 %methodurl
%code"
I also changed the LogSeparator to set "," instead of " ".
My problem is that all records are dropped.
-showdropped option shows that:
Dropped record (method/protocol 'GET /apache_pb.gif' not qualified
when LogType=W):
356652,mics,194.206.22.24,194.206.22.24,-,[05/Jul/2011:15:11:18
+0200],"GET /apache_pb.gif HTTP/1.0",302,-
I had a similar issue when I changed the format of my logs. The format was changed, as well as using tab as the field separator, and this caused the same error.
For the LogFile configuration option, I was already using a pipe. So I switched the tab out for a space by adding tr '\t' ' ' | to the end. Then I modified the AWStats config to separate on spaces.
I was able to get AWStats to parse the logs after this. Perhaps it will work for you as well.
If you are not already using a pipe for the LogFile configuration option, you can use cat to get the files into tr.
LogFile="cat /log/file/path/*.log | tr '\t' ' ' |"
Replace HTTP/1.x with nothing solve this issue

Resources