processing a pcap/dmp file for time-to-live, user-agent, and OS - shell

I'm trying to generate a report of the number of clients/devices behind a given NAT gateway using the techniques discussed in this paper.
Basically I need to write a script which looks for both 'User-Agent' and 'Time to live' at the same time:
grep " User-Agent:" *.txt
grep " Time to live:" *.txt
Those are exactly how the lines are formatted in my output files and I'm happy having the text to the end of the line. They work separately but I haven't been successful in combining them.
My most recent attempts have been:
egrep -w ' User-Agent:'|' Time to live:' ../*.txt
grep ' User-Agent:' ../*.txt' && grep ' Time to live:' ../*.txt
(I've been manually exporting text format files from Wireshark, if anyone has a suggestion for doing that via script I would be most grateful, I have a HUGE number of files to do this for.)
I looked for a similar thread but I didn't find one, if one already exists (as I expect) I apologize, whether someone can supply me a link to assistance or provide it I would be most grateful.
EDIT: I thought I should mention, the two phrases I'm looking for are on lines separated by other data so a solution would need to search for both in an example like so:
User-Agent:
blahblahblah:
halbhalbhalb:
Time to live:
egrep ' User-Agent:| Time to live:' ../*.txt gives me:
desktop:~/Documents/scripts$ ./pcap_ttl_OS_useragent
../scripttextfile1p.txt: Time to live: 128
../scripttextfile1p.txt: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
../scripttextfile2p.txt: Time to live: 55
../scripttextfile3p.txt: Time to live: 128
../scripttextfile3p.txt: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n

egrep ' User-Agent:| Time to live:' ../*.txt
should work.
I don't think the -w is getting you any "functionality".
Also, you want to quote the whole "extended" regular expression, inluding the | alternation character as one string.
Finally, it's not clear if your leading white space for each field is the result of a tab char or a group of spaces. That would affect the correct text string to put into the search patterns. To confirm white-space type, i like to use
grep 'User-Agent' ../*.txt | head -1 | cat -vet
will show ether
..... User-Agent ....
OR
.....^IUser-Agent .....
The ^I being the representation for the tab character.
IHTH

Related

Get link to an image on google image Wget

I am currently trying to make a Wallpaper randomiser.
The rule that I have is to take the 9th image on google image from a random word selected and put it as the wallpaper. I am doing it on bash.
But when I do a wget on a google website, the common href for these link disappear and get replace (if I don't use the option -k they get replace by a # else they get replace by something that i can't read)
Here is my command:
wget -q -p -k --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" -e robots=off $address
where $address is:
address="https://www.google.fr/search?q=wallpaper+$word&safe=off&biw=1920&bih=880&tbs=isz:ex,iszw:1920,iszh:1080&tbm=isch&source=lnt"
the link that I want to obtain is like
href="/imgres/imgurl="<Paste here an url image>"
I have some new information.
In fact google seems to change his url with javascript and other client technologies. Then i need some wget copy that interpret javascript before. Do some one know this?

Xpath expression returns empty output

My xidel command is the following:
xidel "https://www.iec-iab.be/nl/contactgegevens/c360afae-29a4-dd11-96ed-005056bd424d" -e '//div[#class="consulentdetail"]'
This should extract all data in the divs with class consulentdetail
Nothing special I thought but it wont print anything.
Can anyone help me finding my mistake?
//EDIT: When I use the same expression in Firefox it finds the desired tags
The site you are connecting to obviously checks the user agent string and delivers different pages, according to the user agent string it gets sent.
If you instruct xidel to send a user agent string, impersonating as e.g. Firefox on Windows 10, your query starts to work:
> ./xidel --silent --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0" "http://www.iec-iab.be/nl/contactgegevens/c360afae-29a4-dd11-96ed-005056bd424d" -e '//div[#class="consulentdetail"]'
Lidnummer11484 2 N 73
TitelAccountant, Belastingconsulent
TaalNederlands
Accountant sinds4/04/2005
Belastingconsulent sinds4/04/2005
AdresStationsstraat 2419550 HERZELE
Telefoon+32 (53) 41.97.02
Fax+32 (53) 41.97.03
AdresStationsstraat 2419550 HERZELE
Telefoon+32 (53) 41.97.02
Fax+32 (53) 41.97.03
GSM+32 (474) 29.00.67
Websitehttp://abbeloosschinkels.be
E-mail
<!--
document.write("");document.write(decrypt(unescCtrlCh("5yÿÃ^à(pñ_!13!­[îøû!13!5ãév¦Ãçj|°W"),"Iate1milrve%ster"));document.write("");
-->
As a rule of thumb, when doing Web scraping and getting weird results:
Check the page in a browser with Javascript disabled.
Send a user agent string simulating a Web browser.

Bash expr index command

I am trying to get the index position using Bash 'expr index".
e.g.
$ echo `expr index "Info.out.2014-02-08:INFO|SID:sXfzRjbmKbwX7jyaW1sog7n|Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]" Mozilla`
I am trying to get the index position of the word "Mozilla", and then get the substring using index value.
The result I got back is 4. Is it the period after Info caus the issue? How do I fix this issue?
I followed the Advanced Bash scripting guide www.tldp.org/LDP/abs/html/‎. See section Table B-5. String Operations
expr index "$string" $substring Numerical position in $string of first character in $substring* that matches [0 if no match, first character counts as position 1]
I tried with something simple, and it works.
I am running bash in cygwin.
$ ./bash --version
GNU bash, version 4.1.10(4)-release (i686-pc-cygwin)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Thanks.
In general, you shouldn't be using expr index unless you have a very good reason to.
For instance, let's say you want to get the browser name.
s="Info.out.2014-02-08:INFO|SID:sXfzRjbmKbwX7jyaW1sog7n|Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]"
# strip everything up to and including the first instance of 'Browser['
browser="${s#*Browser[}"
# strip everything after the first ']', again, inclusive
browser="${browser%%]*}"
# ...and show the result...
echo "$browser"
This would return:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0
If you really do want to know how many characters precede Mozilla, well, you can do that too:
s="Info.out.2014-02-08:INFO|SID:sXfzRjbmKbwX7jyaW1sog7n|Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]"
# strip everything after the first instance of 'Mozilla'
prefix=${s%%Mozilla*}
# count number of characters in the string
index=${#prefix}
# ...and show the result...
echo "$index"
This should return 61.
For the "why" and "how" of the above examples, see BashFAQ #73.
To split by | separators, by contrast, I'd personally choose to use read, as documented in BashFAQ #1:
s="Info.out.2014-02-08:INFO|SID:sXfzRjbmKbwX7jyaW1sog7n|Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]"
IFS='|' read -r _ _ browser _
echo "$browser"
...which would emit...
Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]
The expr index command searches through your first string looking the the first occurrence of any character from your second string. In this case, it is recognizing that the 'o' in the characters 'Mozilla' matches the 4th character in "Info.out..."
This using this as a test to see what happens. It will return 4 as the first match for 'd':
echo `expr index "abcdefghijklmnopqrstuvwxyz" xyzd`
This one should do what you want:
echo "Info.out.2014-02-08:INFO|SID:sXfzRjbmKbwX7jyaW1sog7n|Browser[Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0]" | grep -o -b Mozilla
The echo puts your string into stdout, so it can be piped into grep.
The -b prints the byte offset of the string shown.
The -o ensures that only the matching portion gets printed.
GNU expr does not match a substring using index; rather, it looks for the first occurrence of any character from the second string in the first. Your example returns 4 because the 4th character of the string is "o", the first character in "Mozilla" that is found in "Info.out...".
There is no built-in function of this kind in either bash or expr, but you can indirectly get the index of a given substring by first removing the substring and everything after it from the original string, then computing the remaining length.
string="Info.out..."
substring=Mozilla
tmp=${string%%$substring*}
index=${#tmp}

creating a script which finds two alternating patterns

So my issue is that I need to make a script which finds a pattern where Time to live and User-Agent occur in that order and I increment a count (or grab what data I want, etc; it will likely evolve from there.)
For example:
Time to live: 64
Some other data: ________
...
User-Agent: Mozilla/Chrome/IE:Windows/Unix/Mac
So basically the data appears in that order TTL then user-agent, from that information I can grab the data I want but I don't know what to do about the pattern to identify this. If it helps I'm getting this data from a Wireshark capture saved as a text file.
Thanks to Shellter I got to the point where I have:
egrep ' User-Agent:| Time to live:' ../*.txt
which finds if both (TTL and UA) are in the file.
I'd appreciate any assistance.
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Header checksum: 0x7e4d [correct]
[Good: True]
[Bad: False]
Source: 1.1.1.3 (1.1.1.3)
Destination: 1.1.1.4 (1.1.1.4)
//packet 2
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Hypertext Transfer Protocol
GET / HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET / HTTP/1.1\r\n]
[Message: GET / HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /
Request Version: HTTP/1.1
Host: mail.yahoo.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
I apologize for the slow reply, I had to do some editing.
So basically I just need to identify when a TTL only occurs, when a TTL occurs and there's user-agent data; basically I use this to identify clients behind a gateway.
So if TTL is 126 (windows) and I see 125, we assume it's behind a gateway and count++.
If we get that same count but with a different user-agent but same OS, count doesn't change.
If we get that same count but with a different user-agent and OS, count++.
so output could be as simple as:
1 (ttl)
1 (ttl+os)
2 (ttl+os+ua)
from the example (not the data) above.
It's still a little unclear what you're looking to report, but maybe this will help.
We're going to use awk as that tool was designed to solve problems of this nature (among many others).
And while my output doesn't match your output exactly, I think the code is self-documenting enough that you can work with this, and make a closer approximation to your final need. Feel free to update your question with your new code, new output, and preferably an exact example of the output you hope to achieve.
awk '
/Time to live/{ttl++}
/User-Agent/{agent++}
/Windows|Linux|Solaris/{os++}
END{print "ttl="ttl; print "os="os; print"agent="agent}
' ttlTest.txt
output
ttl=2
os=1
agent=1
The key thing to understand is that awk (and most Unix based reg-ex utilities, grep included) read each line of input and decide if it will print (or do something else) with the current line of data.
awk normally will print every line of input if you give it something like
awk '{print $1}' file
i this example, printing just the first field from each line of data.
In the solution above, we're filtering the data with regular expressions and the applying an action because we have matched some data, i.e.
/Time to live/{ ttl++ }
| | | |
| | | > block end
| | > action (in this case, increment value of ttl var
| > block begin
>/ regex to match / #
So we have 2 other 'regular expressions' that we're scanning each line for, and every time we match that regular expression, we increment the related variable.
Finally, awk allows for END blocks that execute after all data has been read from files.
This is how we create your summary report. awk also has BEGIN blocks that execute before any data has been read.
Another idiom of awk scanning that allows for more complex patterns to be match looks like
awk '{
if ( /Time to live/ && User-Agent/ ) {
ttl_agent++
}
}' ttlTest.txt
Where the first and last { } block-definition characters, indicate that this logic will be applied to each line that is read from the data. This block can be quite complex, and can use other variable values to be evaluated inside the if test, like if ( var=5 ) { print "found var=5"}.
IHTH

BASH- trouble pinging from text file lines

Have a text file w/ around 3 million URL's of sites I want to block.
Trying to ping them one by one (yes, I know it is going to take some time).
Have a script (yes, I am a bit slow in BASH) which reads the lines one at a time from text file.
Obviously cannot print text file here. Text file was created >> w/ Python some time ago.
Problem is that ping returns "unknown host" w/ every entry. If I make a smaller file by hand using the same entries the script works. I thought it may be a white space or end of line issue so tried addressing that in script. What could the issue possibly be?
#!/bin/bash
while read line
do
li=$(echo $line|tr -d '\n')
li2=$(echo $li|tr -d ' ')
if [ ${#line} -lt 2 ]
then
continue
fi
ping -c 2 -- $li2>>/dev/null
if [ $? -gt 0 ]
then
echo 'bad'
else
echo 'good'
fi
done<'temp_file.txt'
Does the file contains URLs or hostnames ?
If it contains URLs you must extract the hostname from URLs before pinging:
hostname=$(echo "$li2"|cut -d/ -f3);
ping -c 2 -- "$hostname"
Ping is used to ping hosts. If you have URLs of websites, then it will not work. Check that you have hosts in your file , example www.google.com or an IP address and not actual full website urls. If you want to check actual URLs, use a tool like wget and another tool like grep/awk to grab for errors like 404 or others. Last but not least, people who are security conscious will sometimes block pinging from the outside, so take note.
C heck if the file contains windows-style \r\n line endings: head file | od -c
If so, to fix it: dos2unix filename filename
I wouldn't use ping for this. It can easily be blocked, and it's not the best way to check for either ip addresses or if a server presents web pages.
If you just want to find the corresponding IP, use host:
$ host www.google.com
www.google.com is an alias for www.l.google.com.
www.l.google.com has address 209.85.149.106
www.l.google.com has address 209.85.149.147
www.l.google.com has address 209.85.149.99
www.l.google.com has address 209.85.149.103
www.l.google.com has address 209.85.149.104
www.l.google.com has address 209.85.149.105
As you see, you get all the IPs registered to a host. (Note that this requires you to parse the hostname from your urls!)
If you want to see if a URL points at a web server, use wget:
wget --spider $url
The --spider flag makes wget not save the page, just check that it exists. You could look at the return code, or add the -S flag (which prints the HTTP headers returned)

Resources