How to compare two columns of IP and hostnames grepped from multiple files in bash

How to compare two columns of IP and hostnames grepped from multiple files in bash - bash

I'm attempting to grep IPs from a number of files, look them up in DNS and compare them to the hostnames already in the same files to ensure both are correct. Then print out anything that is wrong.
I've gathered I need to put the information into arrays and diff them somehow.
Here is my horrible bash code which does not work. I'm pretty sure at least my for loop is wrong:
declare -a ipaddr=(`grep -h address *test.com.cfg | awk '{print $2}'`)
declare -a host_names=(`grep -h address *test.com.cfg | awk '{print $2}'`)
for i in "${ipaddr[#]}"
do
lookedup_host_names=( $(/usr/sbin/host ${ipaddr[#]} | awk '{print $5}' | cut -d. -f1-4 | tr '[:upper:]' '[:lower:]'))
done
if [[ -z diff <(printf "%s\n" "${lookedup_host_names[#]}"| sort ) <(printf "%s\n" "${host_names[#]}"| sort) ]]
then
printf "%s\n" "${lookedup_host_names[#]}"
fi

I don't see a difference between your arrays ipaddr and host_names. Supposed your files contain lines like
address 1.2.3.4 somehost.tld
a script like this may do what you want.
cat *test.com.cfg | grep address | while read line; do
IP=$(awk {'print $2'});
CO=$(awk {'print $3'});
CN=$(host $CO | cut -d ' ' -f 4)
[ "$CN" = "$IP" ] || echo "Error with IP $IP";
done

The two principal problems are that your for loop overwrites the array each time rather than appending, and your diff check is invalid.
To quickly fix the for loop, you could use += instead of =, .e.g lookedup_host_names+=( ... ).
To do the diff, you don't really need a condition. You could just run
diff <(printf "%s\n" "${host_names[#]}"| sort ) <(printf "%s\n" "${lookedup_host_names[#]}"| sort)
and it would show any differences between the two in diff format which most Unix users are familiar with (note that I switched the arguments around, since the first argument is supposed to be original).
If, like in your example, you actually do want to compare them and show the entire final list if there is a difference, you could do
if diff <(printf "%s\n" "${host_names[#]}"| sort ) <(printf "%s\n" "${lookedup_host_names[#]}"| sort) > /dev/null
then
printf "%s\n" "${lookedup_host_names[#]}"
fi

Related

Compare files with Bash

I'm trying to compare Bro/Zeek logs against a second file to determine if IP addresses or domain names from second file exist in the zeek logs. I want to be able to pass conn/dns.log as a parameter (compressed/uncompressed) to the script and have it parsed with duplicates removed and compared to the second file as a second parameter. The final result should only show the file name and the matching IP/Domains between the two files.
I've made an attempt below to accomplish this however,I can only cut successfully I see the sort isn't working as I'm still getting duplicates and I'm not sure how to do the comparison against the second parameter.
If there is a better or more efficient way I'm all for it. Thanks.
compare.sh <conn.log/dns.log> indicators.txt
#!/bin/bash
# Compare files to see if they have matching strings.
clog=conn.log
dlog=dns.log
if [ $1 == $clog ]
then
cut -f3 $1;cut -f5 $1 | sort -u | grep -Fwf $2
echo "We have a match in $1"
elif
[ $1 == $dlog ]
then
cut -f10 $1|sort -u|grep -Fwf $2
echo "We have a match in $1"
else
echo "No matches"
fi
echo "Comparison complete"
Below is some example data and expected output:
Example: conn.log
1.2.3.4 1.2.3.5
172.3.2.4 10.2.20.50
...
Example: indicators
1.2.3.4
10.20.20.50
172.3.2.4
...
Expected Output:
1.2.3.4
172.3.2.4
We have a match in conn.log

For conn.log you're only piping the second cut to sort and grep. You need to group the two cut commands to get both of them piped.
{ cut -f3 "$1";cut -f5 "$1"; } | sort -u | grep -Fwf "$2"
Another option would be to use grep instead of running cut twice:
awk -F'\t' '{print $3; print $5}' "$1" | sort -u | grep -Fwf "$2"

How to process values from for loop in shell script

I have below for loop in shell script
#!/bin/bash
#Get the year
curr_year=$(date +"%Y")
FILE_NAME=/test/codebase/wt.properties
key=wt.cache.master.slaveHosts=
prop_value=""
getproperty(){
prop_key=$1
prop_value=`cat ${FILE_NAME} | grep ${prop_key} | cut -d'=' -f2`
}
#echo ${prop_value}
getproperty ${key}
#echo "Key = ${key}; Value="${prop_value}
arr=( $prop_value )
for i in "${arr[#]}"; do
echo $i | head -n1 | cut -d "." -f1
done
The output I am getting is as below.
test1
test2
test3
I want to process the test2 from above results to below script in place of 'ABCD'
grep test12345 /home/ptc/storage/**'ABCD'**/apache/$curr_year/logs/access.log* | grep GET > /tmp/test.access.txt
I tried all the options but could not able to succeed as I am new to shell scripting.

Ignoring the many bugs elsewhere and focusing on the one piece of code you say you want to change:
for i in "${arr[#]}"; do
val=$(echo "$i" | head -n1 | cut -d "." -f1)
grep test12345 /dev/null "/home/ptc/storage/$val/apache/$curr_year/logs/access.log"* \
| grep GET
done > /tmp/test.access.txt
Notes:
Always quote your expansions. "$i", "/path/with/$val/"*, etc. (The * should not be quoted on the assumption that you want it to be expanded).
for i in $prop_value would have the exact same (buggy) behavior; using arr buys you nothing. If you want using arr to increase correctness, populate it correctly: read -r -a arr <<<"$prop_value"
The redirection is moved outside the loop -- that way the second iteration through the loop doesn't overwrite the file written by the first one.
The extra /dev/null passed to grep ensures that its behavior is consistent regardless of the number of matches; otherwise, it would display filenames only if more than one matching log file existed, and not otherwise.

Bash Script to batch-convert IP Addresses to CIDR?

Ok, here's the problem.
I have a plaintext list of IP addresses that I'm blocking on my servers, growing more and more unwieldy every day (added 3000+ entries today alone).
It's already been sorted for duplicates so that's not a problem. What I'd like to do is write a script to go through it and consolidate the entries a bit better for mass blocking.
For example, take this:
2.132.35.104
2.132.79.240
2.132.99.87
2.132.236.34
2.132.245.30
And turn it into this:
2.132.0.0/16
Any suggestions on how to code that in a bash script?
UPDATE: I've worked out part-way how to do what I'm needing. Converting it to /24 is easy, as follows:
cat /usr/local/blocks/blocks.txt | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> twentyfour.srt
done
sort -u twentyfour.srt > twentyfour.txt
rm -f twentyfour.srt
ori=`cat /usr/local/blocks/blocks.txt | wc -l`
new=`cat twentyfour.txt | wc -l`
echo "$ori"
echo "$new"
That reduced it down from 4,452 entries to 4,148 entries.
Instead of having:
109.86.9.93
109.86.26.77
109.86.55.225
109.86.70.224
109.86.87.199
109.86.89.202
109.86.95.248
109.86.100.19
109.86.110.43
109.86.145.216
109.86.152.86
109.86.155.238
109.86.156.54
109.86.187.91
109.86.228.86
109.86.234.51
109.86.239.61
I now have:
109.86.100.0/24
109.86.110.0/24
109.86.145.0/24
109.86.152.0/24
109.86.155.0/24
109.86.156.0/24
109.86.187.0/24
109.86.228.0/24
109.86.234.0/24
109.86.239.0/24
109.86.26.0/24
109.86.55.0/24
109.86.70.0/24
109.86.87.0/24
109.86.89.0/24
109.86.9.0/24
109.86.95.0/24
All well and good. BUT, there's 17 entries from the 109.86.. area. In a case where the first 2 octets match more than say 5 entries on /24, I'd like to reduce that to /16.
That's where I'm stuck.
UPDATE 2:
For Steve: Here's the block list for today. And here's the result so far. Apparently it's not removing the near-duplicate entries from twentyfour that are in sixteen.

I wish I could tell you this is a simple filter. However, all of the 2.0.0.0/8 network is registered to RIPE NCC. There's just way too many different ranges of blocked IP addresses, its easier to just narrow down the scope of visitors you do want versus what you don't want.
You could also use various tools you can use to block attacks automatically.
Map to identify which is which. https://www.iana.org/numbers
Here's a script I just made for you. Then you can create the major block lists for each of the primary registries. Afrinic, Lacnic, Apnic, Ripe, and Arin.
create_tables_by_registry.sh
Just run this script... Then run the following registry.sh files. (E.g; ripe.sh)
#!/bin/bash
# Author: Steve Kline
# Date: 03-04-2014
# Designed and tested to run on properly on CentOS 6.5
#Grab Updated IANA Address Space Assignments only if Newer Version
wget -N https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.txt
assigned=ipv4-address-space.txt
arrayregistry=( afrinic apnic arin lacnic ripe )
for registry in "${arrayregistry[#]}"
do
#Clean up the ipv4-address-space.txt file and keep useable IPs
grep "$registry" $assigned | sed 's/\/8/\.0\.0\.0\/8/g'| colrm 15 > $registry-tmp1.txt
ip=($(cat $registry-tmp1.txt))
echo "#!/bin/bash" > $registry.sh
for ip in "${ip[#]}"
do
echo $ip | sed -e 's/" "//g' > $registry-tmp2.txt
#INSERT OR MODIFY YOUR COMPATIBLE FIREWALL RULES HERE
#This section creates the country to block.
echo "iptables -A INPUT -s $ip -j DROP" >> $registry.sh
chmod +x $registry.sh
done
rm $registry-tmp1.txt -f
rm $registry-tmp2.txt -f
done
Ok! Well I'm back, a little insane here and a little nutty there... I think I helped figure this out for you. I'm sure you can piece together a modification to better fit your needs.
#MODIFY FOR YOUR LIST OF IP ADDRESSES
BADIPS=block.ip
twentyfour=./twentyfour.ips #temp file for all IPs converted to twentyfour net ids
sixteen=./sixteen.ips #temp file for sixteen bit
twentyfourlst1=./twentyfour1.txt #temp file for 24 bit IDs
twentyfourlst2=./twentyfour2.txt #temp file for 24 bit IDs filtered by 16 bit IDs that match
sixteenlst=./sixteen.txt #temp file for parsed sixteenbit
#MODIFY FOR YOUR OUTPUT OF CIDR ADDRESSES
finalfile=./blockips.list #Final file post-merge
cat $BADIPS | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> $twentyfour
echo "$oc1.$oc2.0.0/16" >> $sixteen
done
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>4){print i,a[i]}}}' $sixteen | sed 's/ [0-9]\| [0-9][0-9]\| [0-9][0-9][0-9]//g' > $sixteenlst
sort -u $twentyfour > twentyfour.txt
# THIS FINDS NEAR DUPLICATES MATCHING FIRST TWO OCTETS
cat $sixteenlst | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
grep "\b$oc1.$oc2\b" twentyfour.txt >> duplicates.txt
done
#THIS REMOVES THE NEAR DUPLICATES FROM THE TWENTYFOUR FILE
fgrep -vw -f duplicates.txt twentyfour.txt > twentyfourfinal.txt
#THIS MERGES BOTH RESULTS
cat twentyfourfinal.txt $sixteenlst > $finalfile
sort -u $finalfile
ori=`cat $BADIPS | wc -l`
new=`cat $finalfile | wc -l`
echo "$ori"
echo "$new"
#LAST MIN CLEANUP
rm -f $twentyfour $twentyfourlst $sixteen $sixteenlst duplicates.txt twentyfourfinal.txt
Going Back to fix: I noted a problem... Originally unsuccessful.
`grep "$oc1.$oc1" twentyfour.txt > duplicates.txt
For Example: The old script had bad results with this test IP range... the updated version now above... Does exactly as its intended. match the octet exactly.. and not a similar.
192.168.1.1
192.168.2.50
192.168.5.23
192.168.14.10
192.168.10.5
192.168.24.25
192.165.20.10
10.192.168.30
5.76.10.20
5.76.20.30
5.76.250.10
5.76.34.10
5.76.50.30
95.76.30.1 - Old script matched this to 5.76
20.20.5.5
20.20.10.10
20.20.16.50
20.20.205.20
20.20.60.20
205.20.16.20 - not a problem
20.205.150.150 - Old script matched this to 20.20
220.20.16.0 - Also failed without adding -w parameter to the last grep to only match exact strings.

How to pass array of arguments in shell script?

Right now i a have a working script to pass 2 arguments to a shell script. The script basically takes a ticket# and svn URL as arguments on command line and gives an output of all the revisions that have been changed associated with that ticket# (in svn comments).
#!/bin/sh
jira_ticket=$1
src_url=$2
revs=(`svn log $2 --stop-on-copy | grep -B 2 $1 | grep "^r" | cut -d"r" -f2 | cut -d" " -f1| sort`)
for revisions in ${!revs[*]}
do
printf "%s %s\n" ${revs[$revisions]}
done
Output:
4738
4739
4743
4744
4745
I need some help to pass an array of arguments - meaning more than one ticket# and give the output of revisions associated with those ticket numbers that get passed as args to the script.

I don't think POSIX shell has arrays, so be plain and use #!/bin/bash
I would put the url as the first arg, and all the reset are tickets
#!/bin/bash
revs=()
src_url=$1
svn_log=$(svn log "$src_url" --stop-on-copy)
shift
for jira_ticket in "$#"; do
revs+=( $(grep -B 2 "$jira_ticket" <<< "$svn_log" | grep "^r" | cut -d"r" -f2 | cut -d" " -f1) )
done
for revisions in $( printf "%s\n" "${!revs[#]}" | sort )
do
printf "%s %s\n" ${revs[$revisions]}
done

shell - Characters contained in both strings - edited

I want to compare two string variables and print the characters that are the same for both. I'm not really sure how to do this, I was thinking of using comm or diff but I'm not really sure the right parameters to print only matching characters. also they say they take in files and these are strings. Can anyone help?
Input:
a=$(echo "abghrsy")
b=$(echo "cgmnorstuvz")
Output:
"grs"

You don't need to do that much work to assign $a and $b shell variables, you can just...
a=abghrsy
b=cdgmrstuvz
Now, there is a classic computer science problem called the longest common subsequence1 that is similar to yours.
However, if you just want the common characters, one way would let Ruby do the work...
$ ruby -e "puts ('$a'.chars.to_a & '$b'.chars.to_a).join"
1. Not to be confused with the different longest common substring problem.

Use Character Classes with GNU Grep
The isn't a widely-applicable solution, but it fits your particular use case quite well. The idea is to use the first variable as a character class to match against the second string. For example:
a='abghrsy'
b='cgmnorstuvz'
echo "$b" | grep --only-matching "[$a]" | xargs | tr --delete ' '
This produces grs as you expect. Note that the use of xargs and tr is simply to remove the newlines and spaces from the output; you can certainly handle this some other way if you prefer.
Set Intersection
What you're really looking for is a set intersection, though. While you can "wing it" in the shell, you'd be better off using a language like Ruby, Python, or Perl to do this.
A Ruby One-Liner
If you need to integrate with an existing shell script, a simple Ruby one-liner that uses Bash variables could be called like this inside your current script:
a='abghrsy'
b='cgmnorstuvz'
ruby -e "puts ('$a'.split(//) & '$b'.split(//)).join"
A Ruby Script
You could certainly make things more elegant by doing the whole thing in Ruby instead.
string1_chars = 'abghrsy'.split //
string2_chars = 'cgmnorstuvz'.split //
intersection = string1_chars & string2_chars
puts intersection.join
This certainly seems more readable and robust to me, but your mileage may vary. At least now you have some options to choose from.

Nice question +1.
You can use an awk trick to get this done.
a=abghrsy
b=cdgmrstuvz
comm -12 <(echo $a|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}') <(echo $b|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}')|tr -d '\n'
OUTPUT:
grs
Note use of awk -F"\0" that breaks input string character by character into different awk fiedls. Rest is pretty straightforward use of comm and tr.
PS: If you input string is not sorted then you need to pipe awk's output to sort or do sort of an array inside awk.
UPDATE: awk only solution (without comm):
echo "$a;$b" | awk -F"\0" '{scnd=0; for (i=1; i<=NF; i++) {if ($i!=";") {if (!scnd) arr1[$i]=$i; else if ($i in arr1) arr2[$i]=$i} else scnd=1}} END { for (a in arr2) printf("%s", a)}'
This assumes semicolon doesn't appear in your string (you can use any other character if that's not the case).
UPDATE 2: I think simplest solution is using grep -o
(thanks to answer from #CodeGnome)
echo "$b" | grep -o "[$a]" | tr -d '\n'

Using gnu coreutils(inspired by #DigitalRoss)..
a="abghrsy"
b="cgmnorstuvz"
echo "$(comm -12 <(echo "$a" | fold -w1 | sort | uniq) <(echo "$b" | fold -w1 | sort | uniq) | tr -d '\n')"
will print grs. I assumed you only want uniq characters.
UPDATE:
Modified for dash..
#!/bin/dash
string1=$(printf "$1" | fold -w1 | sort | uniq | tr -d '\n');
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string1" != "" ]; do
c1=$(printf '%s\n' "$string1" | cut -c 1-1 )
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string2" != "" ]; do
c2=$(printf '%s\n' "$string2" | cut -c 1-1 )
if [ "$c1" = "$c2" ]; then
echo "$c1\c"
fi
string2=$(printf '%s\n' "$string2" | cut -c 2- )
done
string1=$(printf '%s\n' "$string1" | cut -c 2- )
done
echo;
Note: I am just a beginner. There might be a better way of doing this.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to compare two columns of IP and hostnames grepped from multiple files in bash - bash

Related

Compare files with Bash

How to process values from for loop in shell script

Bash Script to batch-convert IP Addresses to CIDR?

How to pass array of arguments in shell script?

shell - Characters contained in both strings - edited

Categories

Resources