Capturing Data from Tshark - ruby

Tshark is a command line packet sniffer. I am trying to find a way to get information from the packets, put it in a variable and do some regular expression on it.
Right now, I am getting this from tshark:
Capturing on eth0
0.000000 74.125.71.116 -> 112.204.184.111 TCP http > 55828 [ACK] Seq=1 Ack=1 Win=6434 Len=0 TSV=2558834852 TSER=542043
0.000035 112.204.184.111 -> 74.125.71.116 HTTP Continuation or non-HTTP traffic
0.000043 112.204.184.111 -> 74.125.71.116 HTTP Continuation or non-HTTP traffic
Note: I am using Ruby.

You can use tshark itself without another utility. This command prints out all URI's from packets as they arrive:
$ tshark -R http.request.full_uri -T fields -e http.request.full_uri -i en0
You can refine the display filter (the -R parameter) to better match your requirements. It even supports Perl regular expression matching:
# Mac OS X
$ tshark -R 'http.request.full_uri matches "\\.jpg\|\\.js"' -T fields -e http.request.full_uri -i en0
Example output from visiting youtube.com:
$ tshark -R 'http.request.full_uri matches "\\.jpg\|\\.js"' -T fields -e http.request.full_uri -i en0
Capturing on en0
http://s.ytimg.com/yt/jsbin/www-core-vfl3_mVgh.js
http://s.ytimg.com/yt/jsbin/www-subscriptions-vfl5HwfxW.js
http://i2.ytimg.com/i/QMbqH7xJu5aTAPQ9y_U7WQ/1.jpg?v=95416b
http://i1.ytimg.com/vi/4R0BAjrZqyY/default.jpg
http://i4.ytimg.com/i/KVtW8ExxO21F2sNLtwrq_w/1.jpg?v=a1fa0c
http://i3.ytimg.com/vi/z3U0udLH974/default.jpg
http://i2.ytimg.com/vi/arKyyDRsE_8/default.jpg
http://i2.ytimg.com/vi/y1TGz-fEyiE/default.jpg
http://i2.ytimg.com/vi/-tc983PZK3o/default.jpg
http://i2.ytimg.com/vi/1yT2rrTyMK8/default.jpg
http://i4.ytimg.com/vi/cciUXpITsu0/default.jpg
http://i2.ytimg.com/vi/uG0dimAxHpI/default.jpg
http://i2.ytimg.com/vi/eP9P50kbzTk/default.jpg
http://i1.ytimg.com/vi/ppBe0T412uU/default.jpg
http://i1.ytimg.com/vi/8360wVLtEuk/default.jpg
http://i4.ytimg.com/vi/G_yB7wdTxa0/default.jpg
http://i4.ytimg.com/vi/gcZxoLs3NIU/default.jpg
http://i1.ytimg.com/i/po2fJvnalYlwN97ehhyfBQ/1.jpg?v=b8e52a
http://i1.ytimg.com/vi/D2Xjj_ra8lQ/default.jpg
http://i1.ytimg.com/vi/PewewGu9gp8/default.jpg
http://i1.ytimg.com/vi/P9FkRD6ppGo/default.jpg
http://i3.ytimg.com/vi/vpZ4SMU4znQ/default.jpg
http://i3.ytimg.com/vi/jrrSGulNOLc/default.jpg
http://i3.ytimg.com/vi/FJtTzQfdnoQ/default.jpg
http://i3.ytimg.com/vi/68sEHPpQXes/default.jpg
http://i2.ytimg.com/vi/iWYqsaJk_U8/default.jpg
http://i4.ytimg.com/vi/7Prb8DbdfwY/default.jpg
http://i1.ytimg.com/vi/HJFlxLJSX8E/default.jpg
http://i1.ytimg.com/vi/ta6Vu_v7VLg/default.jpg
http://i1.ytimg.com/vi/Hq7NtDSIErE/default.jpg
http://i4.ytimg.com/vi/Sjdj7qhcTuw/default.jpg
http://i3.ytimg.com/vi/Nm3Acf3_oMY/default.jpg
http://i3.ytimg.com/vi/BpsrThXh_gM/default.jpg
http://i3.ytimg.com/vi/Z3yapgewktY/default.jpg
http://i3.ytimg.com/vi/2UFc1pr2yUU/default.jpg
http://i2.ytimg.com/vi/q_Bt6NwD4FY/default.jpg
http://i2.ytimg.com/vi/uTAAlzABzBA/default.jpg
http://i2.ytimg.com/vi/iRLUY6dMF8k/default.jpg
http://i2.ytimg.com/vi/-cDH6CYzTAw/default.jpg
http://i1.ytimg.com/vi/8p6Fn8R1Rc4/default.jpg
http://i1.ytimg.com/vi/T8gDQWdlW6A/default.jpg
http://i2.ytimg.com/vi/ERTcZV7uTFU/default.jpg
http://i1.ytimg.com/vi/PyxgwA6PvnI/default.jpg
http://i1.ytimg.com/vi/xUGlezOCvu4/default.jpg
http://i1.ytimg.com/vi/Ljb6Mne8Mfc/default.jpg
Note: In Windows, I've seentshark print all URIs in a particular packet in one line without delimiters (e.g., "http://www.google.comhttp://www.google.com/logos/classicplus.png"). Only some packets were affected by this.

You could either pipe this data into a file which you then open and parse with Ruby, or you could use a Ruby lib that can access the same data, such as: http://sourceforge.net/apps/trac/rubypcap/

Related

tcpdump: Using AND and OR in a compound filter

I'm trying to add a filter to a tcpdump stream.
The expression I'm trying to run is:
tcpdump -i eth0 -U -w - host 192.168.2.29 and (port 22222 or port 22221 or port 80)
This particular format throws:
bash: syntax error near unexpected token '('
I expected this to work based on THIS.
The following work without throwing an error:
a) tcpdump -i eth0 -U -w - host 192.168.2.29
b) tcpdump -i eth0 -U -w - port 22222
I've tried every permutation of association all throwing the same error.
Summarizing the comments for an answer:
The easiest way to deal with the tcpdump expression is to put it all in quotes, because otherwise the shell gets in the way anytime there are special characters. Parentheses are the most common troublesome metacharacters, but many others get to play as well: [ ] & and others, and anytime you refine your expression you have to check that you didn't add something dangerous.
So quotes are the easy way:
tcpdump -i eth0 -U -w - 'host 192.168.2.29 and (port 22222 or port 22221 or port 80)'
But escaping the metacharacters works too and is directly responsive to the OP's question:
tcpdump -i eth0 -U -w - host 192.168.2.29 and \(port 22222 or port 22221 or port 80\)
Personally, I prefer the quotes.

Running tcpdump inside bash script

I am trying to get some numbers from tcpdump inside a shell script and print that number.
Here is my script
while true
do
{
b=`tcpdump -n -i eth1 | awk -F'[, ]' '{print $10}'`
echo $b
}
done
When I execute this script, I get this
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
Is there anything special I need to do to capture tcpdump o/p inside shell script ?
By default, tcpdump runs forever (or until it's interrupted by Control-C or something similar). The
b=`tcpdump ...`
construct runs until tcpdump exits... which is never ... and then puts its output into $b. If you want to capture the output from a single packet, you can use tcpdump -c1 ... (or -c5 to capture groups of 5, or similar). Alternately, you could let it run forever but capture its output one line at a time with a while read loop (although you need to use tcpdump -l to prevent excessive buffering):
tcpdump -l -n -i eth1 | awk -F'[, ]' '{print $10}' | while read b; do
echo $b
done
I'm not entirely sure what your script is supposed to do, but I see some other issues. First, unless your version of tcpdump is much more consistent than mine, printing the 10th comma-delimited field of each packet will not get you anything meaningful. Here's some sample output from my computer:
00:05:02:ac:54:1e
1282820004:1282820094
90
73487384:73487474
1187212630:1187212720
90
90
host
2120673524
Second, what's the point of capturing the output into a variable, then printing it? Why not just run the command and let it output directly? Finally, echo $b may garble the output due to word splitting and wildcard expansion (for example, if $b happened to be "*", it would print a list of files in the current directory). For this reason, you should double-quote variables when you use them (in this case, echo "$b").
It's been so long since this question was asked but the simplest way to accomplish the goal of what the script was intending to catch would be to simply record a pcap matching only the packets you're interestedin seeing; as an example, to write a pcap file consisting only of packets where the ack flag is set and the acknowledgment number is a value between 19000 and 20000:
tcpdump -c2500 -iany 'tcp[8:4]>=19000&&tcp[8:4]<=20000&&tcp[13]&16!=0' -Uw./TCP_ACKs.pcap

Greping a tcpdump with tshark

I'm trying to program a little "dirty" website filter - e.g. an user wants to visit an erotic website (based on domain name)
So basically, I got something like
#!/bin/bash
sudo tshark -i any tcp port 80 or tcp port 443 -V | grep "Host.*keyword"
It works great but now I need to do some actions after I find something (iptables and DROPing packets...). The problem I got is that tcp dumping is still running. If I had a complete file with data, the thing I'm trying to reach is easy to solve.
In pseudocoude, I'd like to have something like:
if (tshark and grep found something)
iptables - drop packets
sleep 600 # a punishment for an user
iptables accept packets I was dropping
else
still look for a match in the tcp dump that's still running
Thanks for your help.
Maybe you could try something like the following:
tshark OPTIONS 2>&1 | grep --line-buffered PATTERN | while read line; do
# actions for when the pattern is found, the matched input is in $line
break
done
The 2>&1 is important so that when PATTERN is matched and the while loop terminates, tshark has nowhere to write to and terminates because of the broken pipe.
If you want to keep tshark running and analyze future output, just remove the break. This way, the while loop never terminates and it keeps reading the filtered output from tshark.

tcpdump - ignore unkown host error

I've got a tcpdump command running from a bash script. looks something like this.
tcpdump -nttttAr /path/to/file -F /my/filter/file
The filter file has a combination of ip addresses and host names. i.e.
host 111.111.111.111 or host 112.112.112.112 and not (host abc.com or host def.com or host zyx.com).
And it works great - as long as the host names are all valid. My problem is sometimes these hostnames will not be valid and upon encountering one - tcpdump spits out
tcpdump: Unknown Host
I thought with the -n option it would skip dns lookup - but in anycase I need it to ignore the unknown host and continue along the filter file.
Any ideas?
Thank you in advance.
The -n option prevents conversion of IP addresses into names, but not the other way around. If you supply a hostname as an argument, it has to be looked up to get the IP address since packets only contain the numeric address and not the hostname. However, there ought to be a way to ignore invalid hostnames, but I can't find one. Perhaps you could pre-process your filter file using dig.
dig +short non-existent-domain.com # returns null
dig +short google.com # returns multiple IP addresses
This could probably be better, but it should show you hostnames in your filter file that aren't valid:
grep -Po '(?<=host )[^ )]*' filterfile | grep -v '[0-9]$' | xargs -I % sh -c 'echo -n "% "; echo $(dig +short %)' | grep -v ' [0-9]'
Any hostnames it prints didn't have IP addresses returned by dig.

Sniffing and displaying TCP packets in UTF-8

I am trying to use tcpdump to display the content of tcp packets flowing on my network.
I have something like:
tcpdump -i wlan0 -l -A
The -A option displays the content as ASCII text, but my text seems to be UTF-8. Is there a way to display UTF-8 properly using tcpdump? Do you know any other tools which could help?
Many thanks
Make sure your terminal supports outputting UTF-8 and pipe the output to something which replaces non printable characters:
tcpdump -lnpi lo tcp port 80 -s 16000 -w - | tr -t '[^[:print:]]' ''
tcpdump -lnpi lo tcp port 80 -s 16000 -w - | strings -e S -n 1
If your terminal does not support UTF-8 you have to convert the output to a supported encoding . E.g.:
tcpdump -lnpi lo tcp port 80 -s 16000 -w - | tr -t '[^[:print:]]' '' | iconv -c -f utf-8 -t cp1251
-c option tells iconv to omit character which does not have valid representation in the target encoding.
tcpdump -i wlan0 -w packet.ppp
This command stores the packets in packet.ppp
After that open it in wireshark
wireshark packet.ppp
right click on the packet and then select Follow tcp packet
Then you can have available different formats to view the data in wireshark.
There are many options that you can explore to sniff packets.
Wireshark is the most useful sniffer and its available for free for all platforms. It has a feature rich GUI which will help you sniff packets and analyze protocols. It has many filters so that you can filter out unwanted packets and only look at packets that you are interested in.
Check out their webpage at: available for download for Windows and OS X
To dowload for Linux distros check out this link
If you prefer an alternate solution more on the lines of tcpdump you can also explore tcpflow which is definitely a good option to analyze packets. It also provides you an option to store the files for later analysis.
Check this link: tcpflow
Another option is Justsniffer
Which probably best addresses your problem and provides you with text mode logging and is customizable.

Resources