creating a script which finds two alternating patterns - shell

So my issue is that I need to make a script which finds a pattern where Time to live and User-Agent occur in that order and I increment a count (or grab what data I want, etc; it will likely evolve from there.)
For example:
Time to live: 64
Some other data: ________
...
User-Agent: Mozilla/Chrome/IE:Windows/Unix/Mac
So basically the data appears in that order TTL then user-agent, from that information I can grab the data I want but I don't know what to do about the pattern to identify this. If it helps I'm getting this data from a Wireshark capture saved as a text file.
Thanks to Shellter I got to the point where I have:
egrep ' User-Agent:| Time to live:' ../*.txt
which finds if both (TTL and UA) are in the file.
I'd appreciate any assistance.
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Header checksum: 0x7e4d [correct]
[Good: True]
[Bad: False]
Source: 1.1.1.3 (1.1.1.3)
Destination: 1.1.1.4 (1.1.1.4)
//packet 2
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Hypertext Transfer Protocol
GET / HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET / HTTP/1.1\r\n]
[Message: GET / HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /
Request Version: HTTP/1.1
Host: mail.yahoo.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
I apologize for the slow reply, I had to do some editing.
So basically I just need to identify when a TTL only occurs, when a TTL occurs and there's user-agent data; basically I use this to identify clients behind a gateway.
So if TTL is 126 (windows) and I see 125, we assume it's behind a gateway and count++.
If we get that same count but with a different user-agent but same OS, count doesn't change.
If we get that same count but with a different user-agent and OS, count++.
so output could be as simple as:
1 (ttl)
1 (ttl+os)
2 (ttl+os+ua)
from the example (not the data) above.

It's still a little unclear what you're looking to report, but maybe this will help.
We're going to use awk as that tool was designed to solve problems of this nature (among many others).
And while my output doesn't match your output exactly, I think the code is self-documenting enough that you can work with this, and make a closer approximation to your final need. Feel free to update your question with your new code, new output, and preferably an exact example of the output you hope to achieve.
awk '
/Time to live/{ttl++}
/User-Agent/{agent++}
/Windows|Linux|Solaris/{os++}
END{print "ttl="ttl; print "os="os; print"agent="agent}
' ttlTest.txt
output
ttl=2
os=1
agent=1
The key thing to understand is that awk (and most Unix based reg-ex utilities, grep included) read each line of input and decide if it will print (or do something else) with the current line of data.
awk normally will print every line of input if you give it something like
awk '{print $1}' file
i this example, printing just the first field from each line of data.
In the solution above, we're filtering the data with regular expressions and the applying an action because we have matched some data, i.e.
/Time to live/{ ttl++ }
| | | |
| | | > block end
| | > action (in this case, increment value of ttl var
| > block begin
>/ regex to match / #
So we have 2 other 'regular expressions' that we're scanning each line for, and every time we match that regular expression, we increment the related variable.
Finally, awk allows for END blocks that execute after all data has been read from files.
This is how we create your summary report. awk also has BEGIN blocks that execute before any data has been read.
Another idiom of awk scanning that allows for more complex patterns to be match looks like
awk '{
if ( /Time to live/ && User-Agent/ ) {
ttl_agent++
}
}' ttlTest.txt
Where the first and last { } block-definition characters, indicate that this logic will be applied to each line that is read from the data. This block can be quite complex, and can use other variable values to be evaluated inside the if test, like if ( var=5 ) { print "found var=5"}.
IHTH

Related

How do I test the speed between my site and a proxy server?

I'm getting complaints from employees in the field that our site is slow. When I check it -- the speed is acceptable. They are all going through a proxy server that is not controlled by me.
I'd like to run a continuous ping to the proxy server, but I haven't found anything to do that.
How do I check the speed from my site to a proxy server?
You can set up a cronjob to ping a site of your choice, at the frequency you choose. Here I ping google.com every 15 minutes. I can adjust the number of times I ping with the flag -c count and the time between pings with -i interval. This time is in seconds, I can use shorter intervals if required, for example 0.5.
I then pipe to tail -n to only use the last line with the results. At this stage my output is as follows:
rtt min/avg/max/mdev = 12.771/17.448/23.203/4.022 ms
We then use awk to only take the 4th field and use tr to replace the slashes with commas. Finally we store the result in a CSV file.
Here is the whole line in crontab:.
*/15 * * * * ping -c 5 -i 1 google.com | tail -n 1 | awk '{ print $4 }' | tr "/" "," >> /home/john/pingLog.csv
It is important to run this as root. To do so we edit the crontab using sudo:
sudo crontab -e
The end result is a comma separated file that you can open in Excel or equivalent, or process as you wish.
As noted in the ping output the 4 figures are min/avg/max/mdev.
Here is a version for Windows. The result is not so refined as we had in the Linux version but we're still getting the essentiels. You could put it in a .bat file and run it with a planned task or put it directly in the planned task.
ping google.com | findstr Minimum >> TotalPings.txt
Which adds the following line every time it is run:
Minimum = 23ms, Maximum = 23ms, Moyenne = 23ms
You can change the server pinged to suit your needs.

Extract a string that is located above and nearest to the matching pattern in a multiline output

Below is the HP ssacli command to see configured hardware RAID details:
ssacli ctrl slot=0 show config
and its output is as below:
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
Internal Drive Cage at Port 1I, Box 1, OK
Internal Drive Cage at Port 2I, Box 0, OK
Port Name: 1I (Mixed)
Port Name: 2I (Mixed)
Array A (Solid State SAS, Unused Space: 0 MB)
logicaldrive 1 (447.10 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS SSD, 480 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS SSD, 480 GB, OK)
SEP (Vendor ID HPE, Model Smart Adapter) 379 (Port: Unknown)
I have to figure out the Array name in order to delete it by searching for the matching disk info which I get as input from the user. For example if the disk input is 1I:1:1, then I have to search this string in the output of the above command. Since this disk is available and matching, I have to extract the Array name (here it is A) and once I get this Array parameter then I can go ahead delete this existing RAID configuration.
ssacli ctrl slot=0 show config | grep -B 4 '1I:1:1' | grep Array | awk '{print $2}'
The problem with the above command is,
value 4 in the grep -B cannot be always constant as the matching disk may come first, second or third or so on under an Array in the output.
there may be multiple RAID array configurations available in the output, so there may be Array A, B, C etc., I have to find and retrieve the nearest Array string that matches my input disk
I think your requirement could be simply solved with a single usage of awk. You store the disk name as a variable to be passed and first store the array names as you go through the list of lines. Once you do a match of the actual disk name, you print the array you just stored. So pipe your command output to
| awk -v disk="1I:1:1" '/^[[:space:]]*Array/{ array=$2; } $0 ~ disk { print array; exit }'
This answer assumes that the array names do not contain spaces or else it would be broken and would print the first part of the array name only.
You can process the file from the end:
tac infile \
| awk -v input='1I:1:1' '$0 ~ input {flag=1} flag && /Array/ {print $2; exit}'
This sets a flag when encountering a line matching the user input; after that, if a line matches Array and the flag is set, the second field is printed.

How to verify AB responses?

Is there a way to make sure that AB gets proper responses from server? For example:
To force it to output the response of a single request to STDOUT OR
To ask it to check that some text fragment is included into the response body
I want to make sure that authentication worked properly and i am measuring response time of the target page, not the login form.
Currently I just replace ab -n 100 -c 1 -C "$MY_COOKIE" $MY_REQUEST with curl -b "$MY_COOKIE" $MY_REQUEST | lynx -stdin .
If it's not possible, is there an alternative more comprehensive tool that can do that?
You can use the -v option as listed in the man doc:
-v verbosity
Set verbosity level - 4 and above prints information on headers, 3 and above prints response codes (404, 200, etc.), 2 and above prints warnings and info.
https://httpd.apache.org/docs/2.4/programs/ab.html
So it would be:
ab -n 100 -c 1 -C "$MY_COOKIE" -v 4 $MY_REQUEST
This will spit out the response headers and HTML content. The 3 value will be enough to check for a redirect header.
I didn't try piping it to Lynx but grep worked fine.
Apache Benchmark is good for a cursory glance at your system but is not very sophisticated. I am currently attempting to tune a web service and am finding that AB does not measure complete response time when considering the transfer of the body. Also as you mention you can not verify what is returned.
My current recommendation is Apache JMeter. http://jmeter.apache.org/
I am having much better success with it. You may find the Response Assertion useful for your situation. http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion

How to resume reading a file?

I'm trying to find the best and most efficient way to resume reading a file from a given point.
The given file is being written frequently (this is a log file).
This file is rotated on a daily basis.
In the log file I'm looking for a pattern 'slow transaction'. End of such lines have a number into parentheses. I want to have the sum of the numbers.
Example of log line:
Jun 24 2015 10:00:00 slow transaction (5)
Jun 24 2015 10:00:06 slow transaction (1)
This is easy part that I could do with awk command to get total of 6 with above example.
Now my challenge is that I want to get the values from this file on a regular basis. I've an external system that polls a custom OID using SNMP. When hitting this OID the Linux host runs a couple of basic commands.
I want this SNMP polling event to get the number of events since the last polling only. I don't want to have the total every time, just the total of the newly added lines.
Just to mention that only bash can be used, or basic commands such as awk sed tail etc. No perl or advanced programming language.
I hope my description will be clear enough. Apologizes if this is duplicate. I did some researches before posting but did not find something that precisely correspond to my need.
Thank you for any assistance
In addition to the methods in the comment link, you can also simply use dd and stat to read the logfile size, save it and sleep 300 then check the logfile size again. If the filesize has changed, then skip over the old information with dd and read the new information only.
Note: you can add a test to handle the case where the logfile is deleted and then restarted with 0 size (e.g. if $((newsize < size)) then read all.
Here is a short example with 5 minute intervals:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((size != newsize)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\nnewtext:\n\n%s\n" "$newtext"
size=$((newsize)); ## update size to newsize
fi
sleep 300
done

processing a pcap/dmp file for time-to-live, user-agent, and OS

I'm trying to generate a report of the number of clients/devices behind a given NAT gateway using the techniques discussed in this paper.
Basically I need to write a script which looks for both 'User-Agent' and 'Time to live' at the same time:
grep " User-Agent:" *.txt
grep " Time to live:" *.txt
Those are exactly how the lines are formatted in my output files and I'm happy having the text to the end of the line. They work separately but I haven't been successful in combining them.
My most recent attempts have been:
egrep -w ' User-Agent:'|' Time to live:' ../*.txt
grep ' User-Agent:' ../*.txt' && grep ' Time to live:' ../*.txt
(I've been manually exporting text format files from Wireshark, if anyone has a suggestion for doing that via script I would be most grateful, I have a HUGE number of files to do this for.)
I looked for a similar thread but I didn't find one, if one already exists (as I expect) I apologize, whether someone can supply me a link to assistance or provide it I would be most grateful.
EDIT: I thought I should mention, the two phrases I'm looking for are on lines separated by other data so a solution would need to search for both in an example like so:
User-Agent:
blahblahblah:
halbhalbhalb:
Time to live:
egrep ' User-Agent:| Time to live:' ../*.txt gives me:
desktop:~/Documents/scripts$ ./pcap_ttl_OS_useragent
../scripttextfile1p.txt: Time to live: 128
../scripttextfile1p.txt: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
../scripttextfile2p.txt: Time to live: 55
../scripttextfile3p.txt: Time to live: 128
../scripttextfile3p.txt: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
egrep ' User-Agent:| Time to live:' ../*.txt
should work.
I don't think the -w is getting you any "functionality".
Also, you want to quote the whole "extended" regular expression, inluding the | alternation character as one string.
Finally, it's not clear if your leading white space for each field is the result of a tab char or a group of spaces. That would affect the correct text string to put into the search patterns. To confirm white-space type, i like to use
grep 'User-Agent' ../*.txt | head -1 | cat -vet
will show ether
..... User-Agent ....
OR
.....^IUser-Agent .....
The ^I being the representation for the tab character.
IHTH

Resources