Find all distinct IP addresses and print them in lexicographical order - algorithm

I have a hierarchy of directories and some files in some of those directories:
/root/development/dir1/file1.txt, file2.txt, ...
/root/development/dir2/file3.txt, file4, ...
/root/development/file6.in, file7.out, ...
...
Some of these files contain IP addresses inside the text. In the form x.x.x.x where each x is a number from 0 to 255 (inclusive).
For example, say we have file1.txt that looks like this:
hello world 127.0.0.1
this is some example 128.99.107.55
file with some correct and incorrect 128.128.4.11 ip 0.11.1115.78 addresses
This file contains only 3 IP addresses, namely 127.0.0.1, 128.99.107.55, and 128.128.4.11, since 0.11.1115.78 is not a valid IP address.
I need to write a program (in java or python) to find all distinct IP addresses from all the files in the /root/development/ directory and print them in lexicographical order.
The input will be a setup shell script, and the code should print the required data to stdout.
An example shell script is as follows:
#!/bin/bash
rm -rf /root/development
mkdir /root/development
mkdir /root/development/dir1
mkdir /root/development/dir2
touch /root/development/dir1/file1.txt
echo -e "hello world 127.0.0.1\nthis is some example 128.99.107.55 \nfile with some correct and incorrect 128.128.4.11 ip 0.11.1115.78 addressesaddresses" >> /root/development/dir1/file1.txt
touch /root/development/dir1/file2.txt
echo -e "hello from 74.0.65.76 and 8.dd.99.88.907 good\nthis is some example 16.1215.76.35 \nfile with some correct and incorrect 15.128.4.65 ip addresses\n0.0.0.0" >> /root/development/dir1/file2.txt
touch /root/development/dir2/file3.txt
echo -e "127.65.64.1 127.0.64.1 127.0.0.1\nexample 128.57.107.76 128.57.907.70 \nfile with some correct and incorrect 67.128.4.11 ip addresses 7.7.7.8" >> /root/development/dir2/file3.txt
touch /root/development/dir2/file4.txt
echo -e "hello world 127.98.0.1\nthis is some example 128.96.107.55 \nfile with some correct and incorrect 128.68.4.11 ip addresses" >> /root/development/dir2/file4.txt
touch /root/development/f.inp
echo -e "hello world 127.0.49.1 \nthis is some example 128.99.58.55 8.88.888.88 77.255.255.254\n7.7.257.25 file with some correct and incorrect 26.56.4.23 ip addresses" >> /root/development/f.inp
Example
For the following /root/development/ directory:
/root/development/dir1/file1.txt
hello world 127.0.0.1
this is some example 128.99.107.55
file with some correct and incorrect 128.128.4.11 ip 0.11.1115.78 addressesaddresses
/root/development/dir1/file2.txt
hello from 74.0.65.76 and 8.dd.99.88.907 good
this is some example 306.5.76.35
file with some correct and incorrect 15.128.4.65 ip addresses
0.0.0.0
/root/development/dir2/file3.txt
127.65.64.1 127.0.64.1 127.0.0.1
exaMple 128.57.107.76 128.57.907.70
file with some correct and incorrect 67.128.4.11 ip addresses 7.7.7.8
/root/devops/dir2/file4.txt
hello world 127.98.0.1
this is some example 128.96.107.55
file with some correct and incorrect 128.68.4.11 ip addresses
/root/development/f.inp
hello world 127.0.49.1
this is some example 128.99.58.55 8.88.888.88 77.255.255.254
7.7.257.25 file with some correct and incorrect 26.56.4.23 ip addresses
The output should be
0.0.0.0
127.0.0.1
127.0.49.1
127.0.64.1
127.65.64.1
127.98.0.1
128.128.4.11
128.57.107.76
128.68.4.11
128.96.107.55
128.99.107.55
128.99.58.55
15.128.4.65
26.56.4.23
67.128.4.11
7.7.7.8
74.0.65.76
77.255.255.254

From your comment, I assume that you have no or very little pre-existing programming knowledge, so I'll try to broadly explain steps of how you could tackle this project.
For each of the following steps, if you don't know how to do it (which is expected if you're new to programming), try an internet search to know how to do that step in your chosen language.
Step 0. First, you should choose between Java and Python. Both can be used for this, it's just a matter of what language you know the best at this time, or have already installed on your computer...
Step 1: write the code to read the content of just one file. You can temporarily write the whole file content to stdout to ensure that this works well.
Step 2. change the code to print only the IP adresses from the file content (maybe using regex to extract the IP adresses from the file content)
Step 3. remove duplicate IP adresses. You'll probably do that by putting IP adresses in a list and apply an existing Java/Python function to that list, and then print the list to stdout
Step 4. sort the list before printing it
Now you want to do the same thing, but on several files:
Step 5. Write code to list all the files of one folder
Step 6. Write code to list all the files of one folder and all it's subfolders, recursively
Step 7. Combine your code from Steps 1-4 with code from Steps 5-6 to achieve the result you want

To approach above problem we will have to be good with regex and file handling. The below code is in python should help you to move forward
Code
from pathlib import Path
import re
# Get lines from the files
myDir = r"C:\Users\myuser\Downloads\find\yourFolder"
result = list(Path(myDir).rglob("*.[tT][xX][tT]"))
ips = []
flatList = []
def getListOfIps():
for i in result:
with open(i) as f:
for text in f.readlines():
text = text.rstrip()
regex = re.findall(
r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
if regex is not None and regex not in ips:
ips.append(regex)
# Get the list of IPs
def check(Ip):
if(re.search(myPattern, Ip)): # re.search returns None (if the pattern doesn’t match)
return True
else:
return False
myPattern = "^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])"
getListOfIps()
flatList = [ item for elem in ips for item in elem]
flatList.sort()
for eachIP in flatList:
if check(eachIP):
print(f"{eachIP}")
# May require for future improvements
# matches for 8.dd.99.88 is (?:[\d]{1,3})\.(?:[\d]|[a-z]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})
if you are new to programming and need some reference program you may look at my repo https://github.com/Kapil987/Python_prac/tree/master

Related

How can I create a custom :host_role fact from the hostname?

I'm looking to create a role based on host name prefix and I'm running into some problems. Ruby is new to me and although I've done extensive searching for a solution, I'm still confused.
Host names look like this:
work-server-01
home-server-01
Here's what I've written:
require 'facter'
Facter.add('host_role') do
setcode do
hostname_array = Facter.value(:hostname).split('-')
first_in_array = hostname_array.first
first_in_array.each do |x|
if x =~ /^(home|work)/
role = '"#{x}" server'
end
role
end
end
I'd like to use variable interpolation within my role assignment, but I feel like using a case statement along with 'when' is incorrect. Please keep in mind that I'm new to Ruby.
Would anybody have any ideas on how I might achieve my goal?
Pattern-Matching the Hostname Fact
The following is a relatively DRY refactoring of your code:
require 'facter'
Facter.add :host_role do
setcode do
location = case Facter.value(:hostname)
when /home/ then $&
when /work/ then $&
else 'unknown'
end
'%s server' % location
end
end
Mostly, it just looks for a regex match, and assigns the value of the match to location which is then returned as part of a formatted string.
On my system the hostname doesn't match either "home" or "work", so I correctly get:
Facter.value :host_role
#=> "unknown server"

How to find most recently modified file in a remote directory (via ssh)?

I found this answer helpful:
How can you find the most recently modified folder in a directory using Ruby?
But what I need is to do the same for a remote directory (via SSH). What is the easiest way to do this in Ruby?
Here's what I have so far:
paths = (IO.popen("ssh -A user#yo.mammas.house.com ls /install/")).read.split("\n")
I only want these folders:
if p =~ /^release-MC-.*$/
I'm currently parsing the result of the ls command, splitting on new lines, matching on the regex and the next step is to build a hash of the date string embedded in the folder name. I really don't want to have to do this last step but it will work.
Is there a better way?
This is less a Net::SSH question as it is "What command can I issue to find the most recently modified file?"
SSH connections can issue a command, so once you know what command to send, or execute, you're done. I'd look at:
ls -Alt path/to/files | sed -n '2p'
Fleshing out something more usable results in:
require 'net/ssh'
HOST = 'hostname.domain'
USER = 'user'
PASSWORD = "password"
output = Net::SSH.start(HOST, USER, :password => PASSWORD) { |ssh|
ssh.exec!('ls -alt . | grep pattern_to_find')
}
puts output
Which, after filling in the fields with the right values and running it, connected to one of my hosts at work and returned something like:
drwxr-xr-x 11 xxxxxxxxxxxx xxxxxxxxx 4096 Oct 2 16:20 development
If you have multiple hits you need to retrieve, either expand the pattern after grep or discard the pipe to grep and parse your resulting output in Ruby once the command returns. You can also discard the t flag from ls if you want to sort locally, though it's a better idea to offload as much of the processing to the far-side host, rather than have it return a huge glob of data and process it locally. The less you return, the faster your overall code will be.

Regex issue with building a file system crawler

I am building a crawler to search my file system for specific documents containing specific information. However, the regex part is leaving me a little perplexed. I have a testfile on my desktop containing 'teststring' and a test credit card number '4060324066583245' and the code below will run properly and find the file containing teststring:
require 'find'
count = 0
Find.find('/') do |f| # '/' for root directory on OS X
if f.match(/\.doc\Z/) # check if filename ends in desired format
contents = File.read(f)
if /teststring/.match(contents)
puts f
count += 1
end
end
end
puts "#{count} sensitive files were found"
Running this confirms that the crawler is working and properly finding matches. However, when I try to run it for finding the test credit card number it fails to find a match:
require 'find'
count = 0
Find.find('/') do |f| # '/' for root directory on OS X
if f.match(/\.doc\Z/) # check if filename ends in desired format
contents = File.read(f)
if /^4[0-9]{12}(?:[0-9]{3})?$/.match(contents)
puts f
count += 1
end
end
end
puts "#{count} sensitive files were found"
I checked the regex on rubular.com with 4060324066583245 as a piece of test data, which is contained in my test document, and Rubular verifies that the number is a match for the regex. To sum things up:
The crawler works on the first case using teststring - verifying that the crawler is properly scanning my file system and reading contents of the desired file type
Rubular verifies that my regex successfully matches my test credit card number 4060324066583245
The crawler fails to find the test credit card number.
Any suggestions? I'm at a loss why Rubular shows the regex as working but the script won't work when run on my machine.
^ and $ are anchors that tie the match to the start and end of the string, respectively.
Therefore, ^[0-9]{4}$ will match "1234", but not "12345" or " 1234 " etc.
You should be using word boundaries instead:
if contents =~ /\b4[0-9]{12}(?:[0-9]{3})?\b/

Checking a URL for hostname

I am trying to evaluate user-submitted urls to find out whether they contain valid hostnames (formatting) and if so, extract the hostname. Know of any libraries/methods that could help?
Example:
user_input = "www.google.com"
if user_input.has_valid_host?
hostname = user_input.get_hostname #=> "google.com"
url = "http://" + #hostname #=> "http://google.com"
else
puts "Invalid URL"
end
This example is very simple but I need the url checked against all valid domain extensions and the hostname extracted from any string (assuming that it's present)
I don't know ruby, but I wouldn't think of this as a ruby question.
I would use regex to split out the hostname as you suggest.
Then I would do a system call to the nslookup routine.
On a Windows system from the command prompt it is nslookup.
C:\Users\xyz>nslookup www.google.com
Server: UnKnown
Address: 192.168.237.2
Name: www.l.google.com
Address: 173.194.73.99
Aliases: www.google.com.localdomain
From Ruby you should do an API call instead of using the command line, but both will eventually interface to the DNS service on the local machine.
See: Is there a good DNS server library in ruby?

Ruby : Use external script from within a script to do a comparison

So I got a script (A) that finds a suitable IP address for a new virtual server. At first, it takes a look in the database to see if the first ip he chose isn't already taken by another server. If the IP is not already in use, the script pings it. If there is no response from the ping, then we get to the next step and this is where I'm having a problem.
In the next step, I have to check if the IP address is already registred in the netscaler (router) or not. To do this, I must use another script on the same machine (B). This other script return the list of all the ips defined in the netscaler. When I run it, the output looks like this
x.x.x.x
x.x.x.x
x.x.x.x (and so on..).
I found many ways to execute the script B from whiting the script A, but none of what I found allow me to do what I'd like to.
My goal is to compare the ip my script found with all of those that are listed, without having those last ones printed on the screen.
So, to make it a bit clearer, let's say that the scrip A found the IP : 1.2.3.4
It would then call script B that would return to script A this list
1.2.3.5
1.2.4.5
1.2.5.1
and so on.
and then A would compare 1.2.3.4 with all those returned by script B without actually showing them on screen.
Thank you very much!
I would separate scriptB business logic from scriptB ui (CLI) logic:
scriptA.rb
scriptB.rb
netscaler.rb # extract logic here
Extract your list of all the ips defined in the netscaler logic into separate class/method:
#netscaler.rb
class Netscaler
def self.list_ips
# return array of ips here
end
end
#scriptB.rb
require_relative 'netscaler'
ips = Netscaler.list_ips
puts ips # script B may show these ips on the screen
...
#scriptA.rb
require_relative 'netscaler'
ips = Netscaler.list_ips
# script A will not show them. Instead it will operate on the returned result.
...
You can use backticks to execute script B and return the output:
ip_list = `scriptB`.split("\n")
This can be plugged into Alex's organizational suggestion. I would do this if script B is a non-Ruby script that you don't control.
Note that if there is any leading or trailing whitespace you can add a .map(&:strip) to the end.

Resources