My router sends its 'DROP' packets to my server via syslog to be logged to a file in the following manner:
Oct 30 13:01:02 192.168.1.1 kernel: DROP IN=vlan2 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=93.108.197.92 DST=192.168.2.10 LEN=60 TOS=0x00 PREC=0x00 TTL=51
ID=44828 DF PROTO=TCP SPT=55552 DPT=33248 WINDOW=7300 RES=0x00 SYN
URGP=0 OPT (020405840402080A0035BAC40000000001030300)
Oct 30 13:01:06
192.168.1.1 kernel: DROP IN=vlan2 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=93.108.197.92 DST=192.168.2.10 LEN=60 TOS=0x00 PREC=0x00 TTL=51
ID=44829 DF PROTO=TCP SPT=55552 DPT=33248 WINDOW=7300 RES=0x00 SYN
URGP=0 OPT (020405840402080A0035BEAE0000000001030300)
Oct 30 13:01:07
192.168.1.1 kernel: DROP IN=vlan2 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=189.175.171.76 DST=192.168.2.10 LEN=44 TOS=0x00 PREC=0x00 TTL=53
ID=260 PROTO=TCP SPT=14779 DPT=23 WINDOW=50523 RES=0x00 SYN URGP=0 OPT
(020405AC)
Oct 30 13:01:09 192.168.1.1 kernel:
DROP IN=vlan2 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=125.211.218.39 DST=192.168.1.1 LEN=88 TOS=0x00 PREC=0x00 TTL=48
ID=39896 DF PROTO=ICMP TYPE=8 CODE=0 ID=29389 SEQ=1
Oct 30 13:01:14 192.168.1.1 kernel: DROP IN=vlan2 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=93.108.197.92 DST=192.168.2.10 LEN=60 TOS=0x00 PREC=0x00 TTL=51
ID=44830 DF PROTO=TCP SPT=55552 DPT=33248 WINDOW=7300 RES=0x00 SYN
URGP=0 OPT (020405840402080A0035C6800000000001030300)
I'd like to put each field into a mysql db so I view and analyze later. I'm trying to think of the best way to parse/filter/do this. I'd like to do it in bash but I'm open to other alternatives/langs if it makes it way more efficient or easier.
My iptables log files are rotated every so often and I was going to create a bash/sed/awk script to look through each line of the log(s) and create an sql file so I can use a 'LOAD DATA INFILE' command to load all data into one INSERT statement.
As you can see above, an ICMP and TCP type of packet will differ from the way it is written to the file (number of fields after ID)
I have a few different ways to complete this:
Search by PROTO and the remaining awk 'print' commands are used to grab all relevant information.
Search for all [PARAM]=[VALUE] in every line, regardless of PROTO and just shove them in mysql and analyze later.
So far I have (I know this is basic, but I'm wondering if I should approach it differently before investing more time into it):
cat "$fw_file" | while read line; do
type=$(grep -oP 'PROTO=\w+\s' | cut -d= -f2)
df=$(grep -oP 'ID=\w+\sDF\s' | cut -d' ' -f2)
# continuing on for all fields....
# ......
done
If there a better, more efficient, way for me to do this instead of just grabbing all fields?
Before your begin working on your script (Reinventing the Wheel), For parsing logs and analyzing log data there are opensource tools that do the job efficently. Please consider them!
For your exact usecase, You Better use Elasticsearch,Logstash, Kibana (ELK Stack).
Some Advantages of ELK Stack comparing to script and relational database approach:
Easier to store logs from multiple sources ( one can be your router)
Scalable ( you probably get slowed pretty soon depending on your logs input rate)
Very easy to visualize data in web interface with various charts using kibana.
Elasticsearch has REST API, So your developers can do their own thing too!
There are many tutorials online to get you going pretty fast.
Related
ELK run in containers
I setup iptables send all input/forward/output logs to logstash.
example log seen on kibana discover pane.
#version:1 host:3.3.3.3 #timestamp:March 3rd 2018, 12:14:45.220 message:<4>Mar 3 20:14:47 myhost kernel: [2242132.946331] LOG_ALL_TRAF public INPUT IN=public OUT= MAC=00:1e:67:f2:db:28:00:1e:67:f2:d9:7c:08:00 SRC=1.1.1.1 DST=2.2.2.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=17722 DF PROTO=TCP SPT=3504 DPT=8080 WINDOW=512 RES=0x00 ACK URGP=0 type:rsyslog tags:_jsonparsefailure _id:AWHtgJ_qYRe3mIjckQsb _type:rsyslog _index:logstash-2018.03.03 _score: -
The entire log is categorized as 'message' field.
I want to use SRC, DST, SPT, DPT etc as each individual field and then also use them to visualize.
Any guidance is much appreciated.
You will need to learn about Grok filter plugin that will enable you split the message into named fields.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
The list of common patterns is available here.
And you can test your patterns here.
I have a problem that's been bugging me for a while now. I've been searching for solutions for 2 weeks now without any result. These guys have the same problem as me but no answers there..
I'm running gammu (1.31) and gammu-smsd on a Rpi with raspbian.
Using a Huawei E367.
Don't know why I got 3 devices /dev/ttyUSB0, /dev/ttyUSB1, /dev/ttyUSB2
Since I don't know the difference between these I tried different settings and got it running with the following; gammuconf ttyUSB0 and gammu-smsdrc ttyUSB2. Both as root and normal users.
Sending SMS works great. Then comes the problem. Receiving SMS works for a while, then just stops. If I reboot the system it starts to work again. For a while, but the same thing happens after a while.
# Configuration file for Gammu SMS Daemon
# Gammu library configuration, see gammurc(5)
[gammu]
# Please configure this!
port = /dev/ttyUSB2
connection = at
# Debugging
#logformat = textall
# SMSD configuration, see gammu-smsdrc(5)
[smsd]
service = files
logfile = /home/pi/gammu/log/log_smsdrc.txt
# Increase for debugging information
debuglevel = 0
# Paths where messages are stored
inboxpath = /home/pi/gammu/inbox/
outboxpath = /home/pi/gammu/outbox/
sentsmspath = /home/pi/gammu/sent/
errorsmspath = /home/pi/gammu/error/
ReceiveFrequency = 2
LoopSleep = 1
GammuCoding = utf8
CommTimeout = 0
#RunOnReceive =
Log
Tue 2015/03/31 11:05:19 gammu-smsd[7379]: Starting phone communication...
Tue 2015/03/31 11:07:07 gammu-smsd[7379]: Terminating communication...
Tue 2015/03/31 11:07:26 gammu-smsd[2091]: Warning: No PIN code in /etc/gammu-smsdrc file
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Created POSIX RW shared memory at 0xb6f6d000
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Starting phone communication...
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Error at init connection: Error opening device, it doesn't exist. (DEVICENOTEXIST[4])
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Starting phone communication...
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Error at init connection: Error opening device, it doesn't exist. (DEVICENOTEXIST[4])
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Starting phone communication...
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Error at init connection: Error opening device, it doesn't exist. (DEVICENOTEXIST[4])
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Starting phone communication...
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Error at init connection: Error
opening device, it doesn't exist. (DEVICENOTEXIST[4])
Tue 2015/03/31 11:07:26 gammu-smsd[2116]: Going to 30 seconds sleep because of too much connection errors
Tue 2015/03/31 11:08:14 gammu-smsd[2116]: Starting phone communication...
Tue 2015/03/31 11:08:21 gammu-smsd[2116]: Soft reset return code: Function not supported by phone. (NOTSUPPORTED[21])
Tue 2015/03/31 11:08:27 gammu-smsd[2116]: Read 2 messages
Tue 2015/03/31 11:08:27 gammu-smsd[2116]: Received
IN20150331_110600_00_+xxxxxx_00.txt
Tue 2015/03/31 11:08:27 gammu-smsd[2116]: Received
IN20150331_110820_00_+xxxxxx_00.txt
Tue 2015/03/31 11:09:38 gammu-smsd[2116]: Read 1 messages
Tue 2015/03/31 11:09:38 gammu-smsd[2116]: Received
IN20150331_110934_00_+xxxxxx_00.txt
Tue 2015/03/31 11:13:57 gammu-smsd[2116]: Read 1 messages
Tue 2015/03/31 11:13:57 gammu-smsd[2116]: Received
IN20150331_111352_00_+xxxxxx_00.txt
I guess the early warnings are before my modeswitch command kicks in.
in rc.local:
sudo usb_modeswitch -v 0x12d1 -p 0x1446 -V 0x12d1 -P 0x1506 -m 0x01 -M 55534243123456780000000000000011062000000100000000000000000000 -I
I have the same Problem, so I wrote a shell script to reactivate the clean-quick /dev/ttyUSB[0-2] device, and then added it to cron job
*/5 * * * * /home/sysadmin/scripts/reanimate-usb-stick.sh >/dev/null 2>&1
reanimate-usb-stick.sh
#!/bin/bash
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
USBDEVICES=$(ls -l /dev/* | awk '/\/dev\/ttyUSB[0-7]/ {print $6}' | wc -l)
DEVICEINFO=""
DEVICEPORT=""
if [ $USBDEVICES = 0 ]
then
datas=$(lsusb | grep -i hua | awk '/Bus/ {print $6}' | tr ":" "\n")
counter=0
for line in $datas
do
counter=$((counter+1))
if [ $counter = 1 ]
then
DEVICEINFO=$(echo "$line")
fi
if [ $counter = 2 ]
then
DEVICEPORT=$(echo "$line")
fi
done
usb_modeswitch -v $DEVICEINFO -p $DEVICEPORT -J
echo "$DEVICEINFO - $DEVICEPORT"
else
echo "ALLES OK : $USBDEVICES"
exit
fi
This looks pretty much same as https://github.com/gammu/gammu/issues/4 and even though there were some attempts to fix this in Gammu, it seems that the Huawei modems firmware is simply not stable enough for this usage. Simply asking it several times for listing received messages makes it unresponsive.
Also which device you use might make slight difference, see Gammu manual and dd-wrt wiki for more information on that topic.
I had similar problem with Huawei 3g modem e1750. I added following lines to /etc/gammu-smsdrc file:
ReceiveFrequency = 60
StatusFrequency = 60
CommTimeout = 60
SendTimeout = 60
LoopSleep = 10
CheckSecurity = 0
The idea is to minimalize ammount of communication between gammu-smsd and 3g modem. Especially the default value LoopSleep=1 means that gammu sends commands to modem each second and it could be too much for modem firmware, so I used 10.
Next thing is something standard in all Raspberry/ARM embedded projects: Use powerfull power source. I'm using charger with fixed cable (I belive that some reusable cables could be inappriopriate for currents above 2A) that looks like that:
http://botland.com.pl/9240-thickbox_default/zasilacz-extreme-microusb-5v-21a-raspberry-pi.jpg
With that the modem still hangs after about 50-100 hours of operation, but it's enouth for my project.
I'm shipping Windows DNS debug logs via json into Elasticsearch and I need to parse them.
As with Microsoft nothing is easy. The DNS debug log is not a CSV. The only useful thing in that file is that it has fixed lengths of columns.
Here is a sample of the DNS logs:
11/21/2014 5:59:13 PM 0458 PACKET 00000000039ED750 UDP Rcv 192.168.1.98 600c Q [0001 D NOERROR] A (9)grokdebug(9)herokuapp(3)com(0)
11/21/2014 5:59:13 PM 0458 PACKET 00000000039EF460 UDP Snd 192.168.1.1 e044 Q [0001 D NOERROR] A (9)grokdebug(9)herokuapp(3)com(0)
11/21/2014 5:59:13 PM 0458 PACKET 00000000039F85B0 UDP Rcv 192.168.1.1 e044 R Q [8081 DR NOERROR] A (9)grokdebug(9)herokuapp(3)com(0)
11/21/2014 5:59:13 PM 0458 PACKET 00000000039F85B0 UDP Snd 192.168.1.98 600c R Q [8081 DR NOERROR] A (9)grokdebug(9)herokuapp(3)com(0)
I looked at this Stackoverflow answer: Logstash grok filter help - fixed position file
and was trying to set up a grok filter to parse the columns but it's not working for me.
I understand I have a syntax issue but I can't seem to find a good example that would steer me in correct direction.
Here is my grok filter:
grok {
match => [ "message", "(?<dns_date_n_time>.{21}) (?<dns_field_1>.{5}) (?dns_type>.{8}) (?<dns_field_2>.{19}) (?<dns_protocol>.{4}) (?<dns_direction>.{4}) (?<dns_ip>.{16}) (?<dns_field_3>.{4}) (?<dns_query_type>.{5}) (?<dns_field_5>.{7}) (?<dns_field_6>.{3}) (?<dns_flag>.{9}) (?<dns_field_7>.{2}) (?<dns_record>.{5}) (?<dns_domain>.{255})" ]
}
Can anyone help?
Don't get hung up on the fact that the logfile happens to have a fixed-width format. It doesn't really help here since. Parse the file like it's any old logfile using relevant grok patterns. This works for the input you provided:
(?<timestamp>%{DATE_US} %{TIME} (?:AM|PM))\s+%{NUMBER}\s+%{WORD:dns_type}\s+
%{BASE16NUM}\s+%{WORD:dns_protocol}\s+%{WORD:dns_direction}\s+%{IP:dns_ip}\s+
%{BASE16NUM}\s+%{WORD:dns_query_type}\s+\[%{BASE16NUM}\s+%{WORD}\s+
%{WORD:dns_result}\]\s+%{WORD:dns_record}\s+%{GREEDYDATA:dns_domain}
That said, since I don't know what each column in the logfile means some patterns used here might be too sloppy or too strict. I've inserted linebreaks to make the answer more readable but make sure you concatenate thing correctly when you insert it into your configuration file.
In my bash script, I use many echo "......." | wall lines to broadcast event notifications as they occur.
However, the resulting output on the console gets unwieldy:
Broadcast Message from root#BIGFOOT
(somewhere) at 16:07 ...
Photo backup started on Mon Oct 7 16:07:55 PHT 2013
Broadcast Message from root#BIGFOOT
(somewhere) at 16:08 ...
Photo backup successfully finished on Mon Oct 7 16:08:05 PHT 2013
Broadcast Message from root#BIGFOOT
(somewhere) at 16:08 ...
You may now unplug the Photo Backup HDD.
Instead, we'd like it to appear more like the following,
Broadcast Message from root#BIGFOOT
(somewhere) at 16:07 ...
Photo backup started on Mon Oct 7 16:07:55 PHT 2013
Photo backup successfully finished on Mon Oct 7 16:08:05 PHT 2013
You may now unplug the Photo Backup HDD.
which is kind of like what would appear in an open write chat session.
Is this possible? If so, how should I modify my script in order to achieve the desired console output?
Each wall invocation will add the "broadcast message" and blank newline at the top of your code.
As a result, if you want to notify your users at timely intevals (e.g. actually at the start + end of the backup) then you will have to live with the banner message.
As #devnull suggested, you could batch up the messages. One approach would be to declare a script wide variable say $logmsg and then have two functions depending on whether it is something you want the user to know eventually or something they want to know now
function log_message
{
$logmsg = "$logmsg\n$1"
}
function log_message_now
{
log_message "$1"
echo "$logmsg" | wall
logmsg = ""
}
(note I've not actually tested the above, so may need a touch of debugging!)
Use a compound command:
{
echo "line1"
echo "line2"
echo "line3"
} | wall
I would like to parse logfiles produced by fidonet mailer binkd, which are multi-line and much worse - mixed: several instances can write into one logfile, for example:
27 Dec 16:52:40 [2484] BEGIN, binkd/1.0a-545/Linux -iq /tmp/binkd.conf
+ 27 Dec 16:52:40 [2484] session with 123.45.78.9 (123.45.78.9)
- 27 Dec 16:52:41 [2484] SYS BBSName
- 27 Dec 16:52:41 [2484] ZYZ First LastName
- 27 Dec 16:52:41 [2484] LOC City, Country
- 27 Dec 16:52:41 [2484] NDL 115200,TCP,BINKP
- 27 Dec 16:52:41 [2484] TIME Thu, 27 Dec 2012 21:53:22 +0600
- 27 Dec 16:52:41 [2484] VER binkd/0.9.6a-173/Win32 binkp/1.1
+ 27 Dec 16:52:43 [2484] addr: 2:1234/56.78#fidonet
- 27 Dec 16:52:43 [2484] OPT NDA CRYPT
+ 27 Dec 16:52:43 [2484] Remote supports asymmetric ND mode
+ 27 Dec 16:52:43 [2484] Remote requests CRYPT mode
- 27 Dec 16:52:43 [2484] TRF 0 0
*+ 27 Dec 16:52:43 [1520] done (from 2:456/78#fidonet, OK, S/R: 0/0 (0/0 bytes))*
+ 27 Dec 16:52:43 [2484] Remote has 0b of mail and 0b of files for us
+ 27 Dec 16:52:43 [2484] pwd protected session (MD5)
- 27 Dec 16:52:43 [2484] session in CRYPT mode
+ 27 Dec 16:52:43 [2484] done (from 2:1234/56.78#fidonet, OK, S/R: 0/0 (0/0 bytes))
So the logfile is not only multi-line with unpredictable number of lines per session, but also several records can be mixed in between, like session 1520 has finished in the middle of session 2484.
What would be the right direction in hadoop to parse such a file? Or shall I just parse line-by-line and then merge them somehow into a record later and write those records into a SQL database using another set of jobs later on?
Thanks.
Right direction for Hadoop will be to develop your own input format who's record reader will
read input line by line and produce logical records.
Can be stated - that you actually can do it in mapper also - it might be a bit simpler. Drawback will be that it is not standard packaging of such code for hadoop and thus it is less reusable.
Other direction you mentioned is not "natural" for hadoop in my view. Specifically - why to use all complicated (and expensive) machinery of shuffling to join together several lines which are already in hands.
First of all, parsing the file is not what you are trying to do; you are trying to extract some information from your data.
In your case you can consider multi-step MR job where first MR job will essentially (partially) sort your input by session_id (do some filtering? Some aggregation? Multiple reducers?) and then reducer or next MR job will do actual calculation.
Without explanation of what you are trying to extract from your log files it is hard to give more definitive answer.
Also if your data is small, maybe you can process it without MR machinery at all?