AWSTATS issue : all records dropped - records

I have a problem when using AWSTATS to analyse my apache logs.
In the past everything worked well.
But now the log format has changed for my server.
Old format example:
194.206.22.25 - - [14/Dec/2009:12:23:33 +0100] "GET /gPM-Systems/css/default.css HTTP/1.1" 404 1036
New format example:
356652,mics,194.206.22.24,194.206.22.24,-,[05/Jul/2011:15:11:18 +0200],"GET /index.html HTTP/x.x",302,-
For the old format the good LogFormat to choose was 4.
Now it is this custom format:
LogFormat="%other %other %host %other %logname %time1 %methodurl
%code"
I also changed the LogSeparator to set "," instead of " ".
My problem is that all records are dropped.
-showdropped option shows that:
Dropped record (method/protocol 'GET /apache_pb.gif' not qualified
when LogType=W):
356652,mics,194.206.22.24,194.206.22.24,-,[05/Jul/2011:15:11:18
+0200],"GET /apache_pb.gif HTTP/1.0",302,-

I had a similar issue when I changed the format of my logs. The format was changed, as well as using tab as the field separator, and this caused the same error.
For the LogFile configuration option, I was already using a pipe. So I switched the tab out for a space by adding tr '\t' ' ' | to the end. Then I modified the AWStats config to separate on spaces.
I was able to get AWStats to parse the logs after this. Perhaps it will work for you as well.
If you are not already using a pipe for the LogFile configuration option, you can use cat to get the files into tr.
LogFile="cat /log/file/path/*.log | tr '\t' ' ' |"

Replace HTTP/1.x with nothing solve this issue

Related

Can't take filename from Apache logs

I have an owncloud seted up myself.
Need to log for the downloads files by users.
I made the bash script which greps Apache logs and pus it to the file.
Example of line in file
/var/log/httpd/ssl_access_log-20200621-46.63.46.133 - - [18/Jun/2020:13:07:33 +0000] "GET /ocs/v2.php/apps/files_sharing/api/v1/shares?format=json&path=%2FHJC%2FMaster-Schedule%20Draft%20for%20SOP%20of%20HJC%20(10.10.2019).xlsx&shared_with_me=true HTTP/1.1" 200 108
How I can get file name "Master-Schedule Draft for SOP of HJC (10.10.2019).xlsx" ???
OK. Finally I found decision by using 'sed'
sed 's#+# #g;s#%#\\x#g' <my-log-file> | xargs -0 printf "%b" > <result-file>
It decoded URL, so all that remains to be done - get the 'path' value

oracle sqlldr not recognizing special characters

I am facing a scenario where the sqlldr is not being able to recognize special characters. I usually don't bother about this as its not important for me to have the exact same names however this led to another issue which is causing the system to malfunction.
unittesting.txt
8888888,John SMITÉ,12345678
unittesting.ctl
load data
CHARACTERSET UTF8
infile 'PATH/unittesting.txt'
INSERT
into table temp_table_name
Fields Terminated By ',' TRAILING NULLCOLS(
ID_NO CHAR(50) "TRIM(:ID_NO)" ,
NAME CHAR(50) "TRIM(:NAME)" ,
ID_NO2 CHAR(50) "TRIM(:ID_NO2)" )
SQLLDR command
sqlldr DB_ID/DB_PASS#TNS
control=PATH/unittesting.ctl
log=PATH/unittesting.log
bad=PATH/unittesting.bad
errors=100000000
OUTPUT from table
|ID_NO |NAME |ID_NO2 |
|8888888 |John SMIT�12345678 | |
Other information about system [RHEL 7.2, Oracle 11G]
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
select userenv('language') from dual
OUTPUT: AMERICAN_AMERICA.AL32UTF8
file -i unittesting.txt
OUTPUT: unittesting.txt: text/plain; charset=iso-8859-1
echo $LANG
OUTPUT: en_US.UTF-8
Edit:
So i tried to change the encoding as advised by [Cyrille MODIANO] of my file & use it. The issue got resolved.
iconv -f iso-8859-1 -t UTF-8 unittesting.txt -o unittesting_out.txt
My challenge now is that I don't know the character set of the incoming files & its coming from different different sources. The output of file -i i get for my source data file is :
: inode/x-empty; charset=binary
From my understanding, charset=binary means that the character set is unknown. Please advise what I can do in this case. Any small advice/ idea is much appreciated.

Can you view historic logs for parse.com cloud code?

On the Parse.com cloud-code console, I can see logs, but they only go back maybe 100-200 lines. Is there a way to see or download older logs?
I've searched their website & googled, and don't see anything.
Using the parse command-line tool, you can retrieve an arbitrary number of log lines:
Usage:
parse logs [flags]
Aliases:
logs, log
Flags:
-f, --follow=false: Emulates tail -f and streams new messages from the server
-l, --level="INFO": The log level to restrict to. Can be 'INFO' or 'ERROR'.
-n, --num=10: The number of the messages to display
Not sure if there is a limit, but I've been able to fetch 5000 lines of log with this command:
parse logs prod -n 5000
To add on to Pascal Bourque's answer, you may also wish to filter the logs by a given range of dates. To achieve this, I used the following:
parse logs -n 5000 | sed -n '/2016-01-10/, /2016-01-15/p' > filteredLog.txt
This will get up to 5000 logs, use the sed command to keep all of the logs which are between 2016-01-10 and 2016-01-15, and store the results in filteredLog.txt.

How to fix the error in the bash shell script?

I am trying a code in shell script. while I am trying to convert the code from batch script to shell script I am getting an error.
BATCH FILE CODE
:: Create a file with all latest snapshots
FOR /F "tokens=5" %%a in (' ec2-describe-snapshots ^|find "SNAPSHOT" ^|sort /+64') do set "var=%%a"
set "latestdate=%var:~0,10%"
call ec2-describe-snapshots |find "SNAPSHOT"|sort /+64 |find "%latestdate%">"%EC2_HOME%\Working\SnapshotsLatest_%date-today%.txt"
CODE IN SHELL SCRIPT
#Create a file with all latest snapshots
FOR snapshot_date in $(' ec2-describe-snapshots | grep -i "SNAPSHOT" |sort /+64') do set "var=$snapshot_date"
set "latestdate=$var:~0,10"
ec2-describe-snapshots |grep -i "SNAPSHOT" |sort /+64 | grep "$latestdate">"$EC2_HOME%/SnapshotsLatest_$today_date"
I want to sort the snapshots according to dates and to save the snapshots that are created in latest date in a file.
SAMPLE OUTPUT OF ece-describe-snapshots:
SNAPSHOT snap-5e20 vol-f660 completed 2013-12-10T08:00:30+0000 100% 109030037527 10 2013-12-10: Daily Backup for i-2111 (VolID:vol-f9a0 InstID:i-2601)
It will contain records like this
I got this code :
latestdate=$(ec2-describe-snapshots | grep ^SNAPSHOT | sort -k 5 | awk '{print $5}')
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate | > "$EC2_HOME/SnapshotsLatest_$today_date"
but getting this error :
grep: 2013-12-10T09:55:34+0000: No such file or directory
grep: 2013-12-11T04:16:49+0000: No such file or directory
grep: 2013-12-11T04:17:57+0000: No such file or directory
i have some snapshots made on amazon, i want to find the latest snapshots made on a date and then want to store them in a file. like date 2013-12-10 snapshots made on this date should be stored in file. Contents of snapshotslatest file should be
SNAPSHOT snap-c17f3 vol-f69a0 completed 2013-12-04T09:24:50+0000 100% 109030037‌​527 10 2013-12-04: Daily Backup for Sanjay_Test_Machine (VolID:vol-f66409a0 InstID:i-26048111)
SNAPSHOT snap-c7d617f9 vol-3d335f6b completed 2013-12-04T09:24:54+0000 100% 1090‌​30037527 10 2013-12-04: Daily Backup for sacht_VPC (VolID:vol-3db InstID:i-ed6)
please not that if there are snapshots created on 2013-12-10, 2013-12-11, 2013-12-12. It means that the latest_date should be 2013-12-12 and all the snaphshot created on 2013-12-12 should be saved in file.
Any suggestion or lead is appreciated.
Neither the batch script nor the shell script you posted are a good starting point so let's start from scratch. Sorry, this is too big for a comment.
You want to find the latest snapshots made on a date and then want to store them in a file.
What does that mean?
Do the snapshot files have a timestamp in their name or in their content?
If not - UNIX does not store file creation timestamps so is a last-modified timestamp adequate?
Do you literally want to concatenate all of your snapshot files into one singe file or do you want to create a file that has a list of the snapshot file names?
Post some sample input (e.g. some snapshot file names and contents if that's where the timestamp is stored) and the expected output given that input.
Update your question to address all of the above, do not try to reply in a comment.
Minor issue, you don't need a pipe when re-directing output, so your line to save should be
ec2-describe-snapshots | grep SNAPSHOT.*$latestdate > "$EC2_HOME/SnapshotsLatest_$today_date"
Now the main issue here, is that the grep is messed up. I haven't worked with amazon snapshots, but judging by your example descriptions, you should be doing something like
latestdate=$(ec2-describe-snapshots | grep -oP "\d+-\d+-\d+" | sort -r | head -1)
This will get all the dates containing the form dddd-dd-dd from the file (I'm assuming the two dates in each snapshot line always match up), sort them in reverse order (latest first) and take the head which is the latest date, storing it in $latestdate.
Then to store all snapshots with the given date do something like
ec2-describe-snapshots | grep -oP "SNAPSHOT(.*?)$lastdateT(.*?)\)" > "$EC2_HOME/SnapshotsLatest_$today_date"
This will get all text starting with SNAPSHOT, containing the given date, and ending in a closing ")" and save it. Note, you may have to mess around with it a bit, if ")" can be present elsewhere.

creating a script which finds two alternating patterns

So my issue is that I need to make a script which finds a pattern where Time to live and User-Agent occur in that order and I increment a count (or grab what data I want, etc; it will likely evolve from there.)
For example:
Time to live: 64
Some other data: ________
...
User-Agent: Mozilla/Chrome/IE:Windows/Unix/Mac
So basically the data appears in that order TTL then user-agent, from that information I can grab the data I want but I don't know what to do about the pattern to identify this. If it helps I'm getting this data from a Wireshark capture saved as a text file.
Thanks to Shellter I got to the point where I have:
egrep ' User-Agent:| Time to live:' ../*.txt
which finds if both (TTL and UA) are in the file.
I'd appreciate any assistance.
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Header checksum: 0x7e4d [correct]
[Good: True]
[Bad: False]
Source: 1.1.1.3 (1.1.1.3)
Destination: 1.1.1.4 (1.1.1.4)
//packet 2
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Hypertext Transfer Protocol
GET / HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET / HTTP/1.1\r\n]
[Message: GET / HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /
Request Version: HTTP/1.1
Host: mail.yahoo.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
I apologize for the slow reply, I had to do some editing.
So basically I just need to identify when a TTL only occurs, when a TTL occurs and there's user-agent data; basically I use this to identify clients behind a gateway.
So if TTL is 126 (windows) and I see 125, we assume it's behind a gateway and count++.
If we get that same count but with a different user-agent but same OS, count doesn't change.
If we get that same count but with a different user-agent and OS, count++.
so output could be as simple as:
1 (ttl)
1 (ttl+os)
2 (ttl+os+ua)
from the example (not the data) above.
It's still a little unclear what you're looking to report, but maybe this will help.
We're going to use awk as that tool was designed to solve problems of this nature (among many others).
And while my output doesn't match your output exactly, I think the code is self-documenting enough that you can work with this, and make a closer approximation to your final need. Feel free to update your question with your new code, new output, and preferably an exact example of the output you hope to achieve.
awk '
/Time to live/{ttl++}
/User-Agent/{agent++}
/Windows|Linux|Solaris/{os++}
END{print "ttl="ttl; print "os="os; print"agent="agent}
' ttlTest.txt
output
ttl=2
os=1
agent=1
The key thing to understand is that awk (and most Unix based reg-ex utilities, grep included) read each line of input and decide if it will print (or do something else) with the current line of data.
awk normally will print every line of input if you give it something like
awk '{print $1}' file
i this example, printing just the first field from each line of data.
In the solution above, we're filtering the data with regular expressions and the applying an action because we have matched some data, i.e.
/Time to live/{ ttl++ }
| | | |
| | | > block end
| | > action (in this case, increment value of ttl var
| > block begin
>/ regex to match / #
So we have 2 other 'regular expressions' that we're scanning each line for, and every time we match that regular expression, we increment the related variable.
Finally, awk allows for END blocks that execute after all data has been read from files.
This is how we create your summary report. awk also has BEGIN blocks that execute before any data has been read.
Another idiom of awk scanning that allows for more complex patterns to be match looks like
awk '{
if ( /Time to live/ && User-Agent/ ) {
ttl_agent++
}
}' ttlTest.txt
Where the first and last { } block-definition characters, indicate that this logic will be applied to each line that is read from the data. This block can be quite complex, and can use other variable values to be evaluated inside the if test, like if ( var=5 ) { print "found var=5"}.
IHTH

Resources