Compare files with Bash - bash

I'm trying to compare Bro/Zeek logs against a second file to determine if IP addresses or domain names from second file exist in the zeek logs. I want to be able to pass conn/dns.log as a parameter (compressed/uncompressed) to the script and have it parsed with duplicates removed and compared to the second file as a second parameter. The final result should only show the file name and the matching IP/Domains between the two files.
I've made an attempt below to accomplish this however,I can only cut successfully I see the sort isn't working as I'm still getting duplicates and I'm not sure how to do the comparison against the second parameter.
If there is a better or more efficient way I'm all for it. Thanks.
compare.sh <conn.log/dns.log> indicators.txt
#!/bin/bash
# Compare files to see if they have matching strings.
clog=conn.log
dlog=dns.log
if [ $1 == $clog ]
then
cut -f3 $1;cut -f5 $1 | sort -u | grep -Fwf $2
echo "We have a match in $1"
elif
[ $1 == $dlog ]
then
cut -f10 $1|sort -u|grep -Fwf $2
echo "We have a match in $1"
else
echo "No matches"
fi
echo "Comparison complete"
Below is some example data and expected output:
Example: conn.log
1.2.3.4 1.2.3.5
172.3.2.4 10.2.20.50
...
Example: indicators
1.2.3.4
10.20.20.50
172.3.2.4
...
Expected Output:
1.2.3.4
172.3.2.4
We have a match in conn.log

For conn.log you're only piping the second cut to sort and grep. You need to group the two cut commands to get both of them piped.
{ cut -f3 "$1";cut -f5 "$1"; } | sort -u | grep -Fwf "$2"
Another option would be to use grep instead of running cut twice:
awk -F'\t' '{print $3; print $5}' "$1" | sort -u | grep -Fwf "$2"

Related

Loop through txt file comma-separated and use as variable

I have txt file separated by comma:
2012,wp_fronins.pdf
2013,test789.pdf
2014,ok09report.pdf
I'm trying to extract from the file each value and pass him to CURL command with a condition before.
For example:
if $value1=2012 do
curl "https://onlinesap.org/reports/$valu1/$value2
Any idea ?
Another way to achieve is to read the file directly and cut the rows to get the elements directly.
while read p; do
value1=`echo $p | cut -d',' -f1`
value2=`echo $p | cut -d',' -f2`
if [ $value1 = "2012" ]; then
curl "https://onlinesap.org/reports/$value1/$value2"
fi
# Add More conditional statements here for other value1
done < filename.txt
Since the name of the pdf file (value2) is unique, you may try something like this:
#!/bin/bash
FILENAME=myFile.txt
cat $FILENAME | awk -F',' '{print $2}' | while read value2; do
value1=`grep -w "$value2" $FILENAME | awk -F',' '{print $1}'` # watch the back-tick
if [ $value1 = "2012" ]; then
curl https://onlinesap.org/reports/$value1/$value2
fi
done
Please notice that the whole file is scanned a second time for each line found.
In other words, its complexity is O(n^2)

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

exiting an IF statement after initial match bash scripting

I have a script which iterates through a file and finds matches in another file. How to I get the process to stop once I've found a match.
For example:
I take the first line in name.txt, and then try to find a match for it in file.txt.
name.txt:
7,7,FRESH,98,135,
65,10,OLD,56,45,
file.txt:
7,7,Dave,S
8,10,Frank,S
31,7,Gregg
45,5,Jake,S
Script:
while read line
do
name_id=`echo $line | cut -f1,2 -d ','`
identiferOne=`echo $name_id | cut -f1 -d ','`
identiferTwo=`echo $name_id | cut -f2 -d ','`
while IFS= read line
do
CHECK=`echo $line | cut -f4 -d','`
if [ $CHECK = "S" ]
then
symbolName=`echo $line | cut -f3 -d ','`
numberOne=`echo $line | awk -F',' '{print $1}'`
numberTwo=`echo $line | cut -f2 -d ','`
if [ "$numberOne" == $identiferOne ] && [ "$numberTwo" == $identifierTwo ]
then
echo "WE HAVE A MATCH with $symbolName"
break
fi
fi
done < /tmp/file.txt
done < /tmp/name.txt
My question is - how do I stop the script from iterating through file.txt once it has found an initial match, and then set that matched record into a variable, stop the if statement, then do some other stuff within the loop using that variable. I tried using break; but that exits the loop, which is not what I want.
You can tell grep different things:
Stop searching after the first match (option -m 1).
Read the searchkeys from a file (option -f file).
Pretend that the output of a command is a file (not really grep, bash helps here) with <(cmmnd).
Combining these will give you
grep -m1 -f <(cut -d"," -f1-2 name.txt) file.txt
Close, but not what you want. The substrings given by cut -d"," -f1-2 name.txt will match everywhere in the line, and you want to match the first two fields. Matching at the start of the line is done with ^, so we use sed to make strings like ^field1,field2 :
grep -m1 -f <(sed 's/\([^,]*,[^,]*,\).*/^\1/' name.txt) file.txt

How to compare two columns of IP and hostnames grepped from multiple files in bash

I'm attempting to grep IPs from a number of files, look them up in DNS and compare them to the hostnames already in the same files to ensure both are correct. Then print out anything that is wrong.
I've gathered I need to put the information into arrays and diff them somehow.
Here is my horrible bash code which does not work. I'm pretty sure at least my for loop is wrong:
declare -a ipaddr=(`grep -h address *test.com.cfg | awk '{print $2}'`)
declare -a host_names=(`grep -h address *test.com.cfg | awk '{print $2}'`)
for i in "${ipaddr[#]}"
do
lookedup_host_names=( $(/usr/sbin/host ${ipaddr[#]} | awk '{print $5}' | cut -d. -f1-4 | tr '[:upper:]' '[:lower:]'))
done
if [[ -z diff <(printf "%s\n" "${lookedup_host_names[#]}"| sort ) <(printf "%s\n" "${host_names[#]}"| sort) ]]
then
printf "%s\n" "${lookedup_host_names[#]}"
fi
I don't see a difference between your arrays ipaddr and host_names. Supposed your files contain lines like
address 1.2.3.4 somehost.tld
a script like this may do what you want.
cat *test.com.cfg | grep address | while read line; do
IP=$(awk {'print $2'});
CO=$(awk {'print $3'});
CN=$(host $CO | cut -d ' ' -f 4)
[ "$CN" = "$IP" ] || echo "Error with IP $IP";
done
The two principal problems are that your for loop overwrites the array each time rather than appending, and your diff check is invalid.
To quickly fix the for loop, you could use += instead of =, .e.g lookedup_host_names+=( ... ).
To do the diff, you don't really need a condition. You could just run
diff <(printf "%s\n" "${host_names[#]}"| sort ) <(printf "%s\n" "${lookedup_host_names[#]}"| sort)
and it would show any differences between the two in diff format which most Unix users are familiar with (note that I switched the arguments around, since the first argument is supposed to be original).
If, like in your example, you actually do want to compare them and show the entire final list if there is a difference, you could do
if diff <(printf "%s\n" "${host_names[#]}"| sort ) <(printf "%s\n" "${lookedup_host_names[#]}"| sort) > /dev/null
then
printf "%s\n" "${lookedup_host_names[#]}"
fi

How to pass array of arguments in shell script?

Right now i a have a working script to pass 2 arguments to a shell script. The script basically takes a ticket# and svn URL as arguments on command line and gives an output of all the revisions that have been changed associated with that ticket# (in svn comments).
#!/bin/sh
jira_ticket=$1
src_url=$2
revs=(`svn log $2 --stop-on-copy | grep -B 2 $1 | grep "^r" | cut -d"r" -f2 | cut -d" " -f1| sort`)
for revisions in ${!revs[*]}
do
printf "%s %s\n" ${revs[$revisions]}
done
Output:
4738
4739
4743
4744
4745
I need some help to pass an array of arguments - meaning more than one ticket# and give the output of revisions associated with those ticket numbers that get passed as args to the script.
I don't think POSIX shell has arrays, so be plain and use #!/bin/bash
I would put the url as the first arg, and all the reset are tickets
#!/bin/bash
revs=()
src_url=$1
svn_log=$(svn log "$src_url" --stop-on-copy)
shift
for jira_ticket in "$#"; do
revs+=( $(grep -B 2 "$jira_ticket" <<< "$svn_log" | grep "^r" | cut -d"r" -f2 | cut -d" " -f1) )
done
for revisions in $( printf "%s\n" "${!revs[#]}" | sort )
do
printf "%s %s\n" ${revs[$revisions]}
done

Resources