writing the same result for the duplicated values of a column

writing the same result for the duplicated values of a column - bash

I'm really new to bash. I have a list of domains in a .txt file (URLs.txt). I also want to have a .csv file which consists of 3 columns separated by , (myFile.csv). My code reads each line of URLs.txt (each domain), finds its IP address and then inserts them into myFile.csv (domain in the first column, its IP in the 2nd column.
Name, IP
ex1.com, 10.20.30.40
ex2.com, 20.30.40.30
ex3.com, 10.45.60.20
ex4.com, 10.20.30.40
Here is my code:
echo "Name,IP" > myFile.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]]; then
# if the IP is not empty
echo "$line,$ipValue" >> myFile.csv
fi
done < URLs.txt
I want to add another column to myFile.csv for keeping open ports of each IP. So output would be like this:
Name, IP, Port
ex1.com, 10.20.30.40, 21/tcp
ex2.com, 20.30.40.30, 20/tcp
ex3.com, 10.45.60.20, 33/tcp
ex4.com, 10.20.30.40, 21/tcp
I want to use Nmap to do this. After I choose an IP address from the 2nd column of myFile.csv and find its open ports using Nmap, I want to write the Nmap result to the corresponding cell of the 3rd column.
Also, if there is another similar IP in the 2nd column I want to write the Nmap result for that line too. I mean I don't want to run Nmap again for the duplicated IP. For example, in my example, there are two "10.20.30.40" in the 2nd column. I want to use Nmap just once and for the 1st "10.20.30.40" (and write the result for the 2nd "10.20.30.40" as well, Nmap should not be run for the duplicated IP).
For this to happen, I changed the first line of my code to this:
echo "Name,IP,Port" > myFile.csv
and also here is the Nmap code to find the open ports:
nmap -v -Pn -p 1-100 $ipValue -oN out.txt
port=$(grep '^[0-9]' out.txt | tr '\n' '*' | sed 's/*$//')
but I don't know what to do next and how to apply these changes to my code.
I updated my code to something like this:
echo "Name,IP" > myFile.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]]; then
# if the IP is not empty
nmap -v -Pn -p 1-100 $ipValue -oN out.txt
port=$(grep '^[0-9]' out.txt | tr '\n' '*' | sed 's/*$//')
echo "$line,$ipValue,$port" >> myFile.csv
fi
done < URLs.txt
but this way, Nmap was used for finding the open ports of the duplicated IPs too, but I didn't want this. What should I do?

Here's a modified version of your script that roughly does what you want:
#!/usr/bin/env bash
# cache maps from IP addresses to open ports
declare -A cache
getports() {
local ip=$1
nmap -v -Pn -p 1-100 "$ip" -oG - \
| awk -F '\t' '
/Ports:/ {
n = split($2, a, /,? /)
printf "%s", a[2]
for (i = 3; i <= n; ++i)
printf ":%s", a[i]
}
'
}
{
echo 'Name,IP,Port'
while IFS= read -r url; do
# Read filtered dig output into array
readarray -t ips < <(dig +short "$url" | grep -E '^([0-9]+\.){3}[0-9]+$')
# Build array of open ports
unset ports
for ip in "${ips[#]}"; do
ports+=("${cache["$ip"]:=$(getports "$ip")}")
done
# Output
printf '%s,%s,%s\n' \
"$url" \
"$(IFS='-'; echo "${ips[*]}")" \
"$(IFS='-'; echo "${ports[*]}")"
done < URLs.txt
} > myFile.csv
The readarray line reads the filtered output from dig into an array of IP addresses; if that array has length zero, the rest of the loop is skipped.
Then, for each elements in the ips array, we get the ports. To avoid calling nmap if we've seen the IP address before, we use the ${parameter:=word} parameter expansion: if ${cache["$ip"]} is non-empty, use it, otherwise call the getports function and store the output in the cache associative array.
getports is called for IP addresses we haven't seen before; I've used -oG ("grepable output") to make parsing easier. The awk command filters for lines containing Ports:, which look something like
Host: 52.94.225.242 () Ports: 80/open/tcp//http/// Ignored State: closed (99)
with tab separated fields. We then split the second field on the regular expression /,? / (an optional comma followed by a blank) and store all but the first field of the resulting array, colon separated.
Finally, we print the line of CSV data; if ips or ports contain more than one element, we want to join the elements with -, which is achieved by setting IFS in the command substitution and then printing the arrays with [*].
The initial echo and the loop are grouped within curly braces so output redirection has to happen just once.

Related

how to echo out or print the input during a bash loop against an array of IPs

I am running a loop like so:
for i in $(cat ips_uniq.txt)
do
whois $i | grep 'netname|country|org-name' | sed ':a;N;$!ba;s/\\n//g'
done
Output:
netname: NETPLUScountry: INcountry: INcountry: IN
netname: NETPLUScountry: INcountry: INcountry: IN
This is good however my ips_uniq.txt contains over 300 uniq IP addresses so Ideally I want the IP address to be on the same line of each output.

My version:
#!/bin/bash
while IFS= read -r i
do
if [[ "$i" != "" ]]
then
results=$(whois "$i" | grep -iE 'netname|country|org-name' | tr '\n' ' ')
echo "$i $results"
fi
done < ips_uniq.txt
using the while read method is a safe way to read files line per line, avoiding all problems with spaces or weird line formats. Read https://mywiki.wooledge.org/BashFAQ/001 for details. Not required in your case, but a good construct to learn about.
the if is to avoid empty lines.
then store the result of the whois | grep combination in a variable.
note that I replaced your sed with a simple tr which removes the \n, and adds a space to split fields. Modify as required.
then the final echo adds the IP address prefix to the results line.
I tested with ips_uniq.txt equal to
8.8.8.8
74.6.231.20
and the result is
8.8.8.8 NetName: LVLT-ORG-8-8 Country: US NetName: LVLT-GOGL-8-8-8 Country: US
74.6.231.20 NetName: INKTOMI-BLK-6 Country: US
Using printf you could format the output to be better (i.e. fixed length columns for example).

Split a string with two consecutive delimiters in Bash

Here is an extract of a Windows DHCP lease file:
10.11.1.3 Infinite DHCP 5c497d1ee201 xxxx yyyyy
10.11.1.4 PC-name Infinite DHCP 0002025e611e xxxx yyyyy
I would like to get IP and MAC in variable. So here is how i parse each line:
IFS=$'\t' read -r -a array <<< "$line"
ip=${array[0]}
mac=${array[3]}
Problem is that on the first line there is no name, so I have two consecutive tab between IP and infinite. So with this code, the first line is parsed correctly, but on the second line, I get "DHCP" in variable mac.
How should I correct that?
Thanks

You may be able to use awk with tab as input field separator:
awk -F '\t' '{print $1, $5}' file
10.11.1.3 5c497d1ee201
10.11.1.4 0002025e611e

I'd use readarray instead.
$ line=$'10.11.1.3\t\tInfinite\tDHCP\t5c497d1ee201\txxxx\tyyyyy'
$ readarray -d $'\t' -t array <<< "$line"
$ declare -p array
declare -a array=([0]="10.11.1.3" [1]="" [2]="Infinite" [3]="DHCP" [4]="5c497d1ee201" [5]="xxxx" [6]=$'yyyyy\n')
If the trailing line break is a problem, either manually trim it, or append a tab to input and limit it out.
$ readarray -d $'\t' -t -n 7 array <<< "$line"$'\t'
$ declare -p array
declare -a array=([0]="10.11.1.3" [1]="" [2]="Infinite" [3]="DHCP" [4]="5c497d1ee201" [5]="xxxx" [6]="yyyyy")

Bash script with long command as a concatenated string

Here is a sample bash script:
#!/bin/bash
array[0]="google.com"
array[1]="yahoo.com"
array[2]="bing.com"
pasteCommand="/usr/bin/paste -d'|'"
for val in "${array[#]}"; do
pasteCommand="${pasteCommand} <(echo \$(/usr/bin/dig -t A +short $val)) "
done
output=`$pasteCommand`
echo "$output"
Somehow it shows an error:
/usr/bin/paste: invalid option -- 't'
Try '/usr/bin/paste --help' for more information.
How can I fix it so that it works fine?
//EDIT:
Expected output is to get result from the 3 dig executions in a string delimited with | character. Mainly I am using paste that way because it allows to run the 3 dig commands in parallel and I can separate output using a delimiter so then I can easily parse it and still know the dig output to which domain (e.g google.com for first result) is assigned.

First, you should read BashFAQ/050 to understand why your approach failed. In short, do not put complex commands inside variables.
A simple bash script to give intended output could be something like that:
#!/bin/bash
sites=(google.com yahoo.com bing.com)
iplist=
for site in "${sites[#]}"; do
# Capture command's output into ips variable
ips=$(/usr/bin/dig -t A +short "$site")
# Prepend a '|' character, replace each newline character in ips variable
# with a space character and append the resulting string to the iplist variable
iplist+=\|${ips//$'\n'/' '}
done
iplist=${iplist:1} # Remove the leading '|' character
echo "$iplist"
outputs
172.217.18.14|98.137.246.7 72.30.35.9 98.138.219.231 98.137.246.8 72.30.35.10 98.138.219.232|13.107.21.200 204.79.197.200

It's easier to ask a question when you specify input and desired output in your question, then specify your try and why doesn't it work.
What i want is https://i.postimg.cc/13dsXvg7/required.png
$ array=("google.com" "yahoo.com" "bing.com")
$ printf "%s\n" "${array[#]}" | xargs -n1 sh -c '/usr/bin/dig -t A +short "$1" | paste -sd" "' _ | paste -sd '|'
172.217.16.14|72.30.35.9 98.138.219.231 98.137.246.7 98.137.246.8 72.30.35.10 98.138.219.232|204.79.197.200 13.107.21.200

I might try a recursive function like the following instead.
array=(google.com yahoo.com bing.com)
paster () {
dn=$1
shift
if [ "$#" -eq 0 ]; then
dig -t A +short "$dn"
else
paster "$#" | paste -d "|" <(dig -t A +short "$dn") -
fi
}
output=$(paster "${array[#]}")
echo "$output"

Now finally clear with expected output:
domains_arr=("google.com" "yahoo.com" "bing.com")
out_arr=()
for domain in "${domains_arr[#]}"
do
mapfile -t ips < <(dig -tA +short "$domain")
IFS=' '
# Join the ips array into a string with space as delimiter
# and add it to the out_arr
out_arr+=("${ips[*]}")
done
IFS='|'
# Join the out_arr array into a string with | as delimiter
echo "${out_arr[*]}"

If the array is big (and not just 3 sites) you may benefit from parallelization:
array=("google.com" "yahoo.com" "bing.com")
parallel -k 'echo $(/usr/bin/dig -t A +short {})' ::: "${array[#]}" |
paste -sd '|'

How to skip repeated entries in a .csv file

I'm new to bash scripting. I have a text file containing a list of subdomains (URLs) and I'm creating a .csv file (subdomainIP.csv) that has 2 columns: the 1st column contains subdomains (Subdomain) and the 2nd one contains IP addresses (IP). The columns are separated by ",". My code intends to read each line of URLs.txt, finds its IP address and enter the selected subdomain and its IP address in the .csv file.
Whenever I find the IP address of a domain and I want to add it as a new entry to .csv file, I want to check the previous entries of the 2nd column. If there is a similar IP address, I don't want to add the new entry, but if there isn't any similar case, I want to add the new entry. I have done this by adding these lines to my code:
awk '{ if ($IP ~ $ipValue) print "No add"
else echo "${line}, ${ipValue}" >> subdomainIP.csv}' subdomainIP.csv
but I receive this error:
awk: cmd. line:2: else echo "${line}, ${ipValue}" >> subdomainIP.csv}
awk: cmd. line:2: ^ syntax error
What's wrong?

Would you please try the following:
declare -A seen # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
# if the IP is not empty and not in the previous list
echo "$line,$ipValue" >> subdomainIP.csv
fi
done < URLs.txt
The associative array seen may be a key for the purpose. It is indexed
by an arbitrary string (ip adddress in the case) and can memorize the value
associated with the string. It will be suitable to check the appearance
of the ip address across the input lines.

There are some issues in your code. Here's a few of them.
If the awk script is in single quotes, as in awk 'script' file, any variables $var in script will not expand. If you want to perform variable expansion, use double quotes. Compare echo hello | awk "{ print \"$PATH\" }" vs echo hello | awk '{ print "$PATH" }'.
However, if you do so, than the shell will try to expand $0, $1, $NF, ... and this is certainly not what you want. Therefore you can concatenate single- and double-quoted strings as needed, e.g. echo hello | awk '{ print "$0:"$0 >> "log"; print "$PATH:'"$PATH"'" >> "log" }'
Based on what I see from O'Reilly's sed & awk, when you redirect to file from within an awk script, you have to quote the file name, as I've done in the command above for the file named log.

process every line from command output in bash

From every line of nmap network scan output I want to store the hosts and their IPs in variables (for further use additionaly the "Host is up"-string):
The to be processed output from nmap looks like:
Nmap scan report for samplehostname.mynetwork (192.168.1.45)
Host is up (0.00047s latency).
thats my script so far:
#!/bin/bash
while IFS='' read -r line
do
host=$(grep report|cut -f5 -d' ')
ip=$(grep report|sed 's/^.*(//;s/)$//')
printf "Host:$host - IP:$ip"
done < <(nmap -sP 192.168.1.1/24)
The output makes something I do not understand. It puts the "Host:" at the very beginning, and then it puts "IP:" at the very end, while it completely omits the output of $ip.
The generated output of my script is:
Host:samplehostname1.mynetwork
samplehostname2.mynetwork
samplehostname3.mynetwork
samplehostname4.mynetwork
samplehostname5.mynetwork - IP:
In separate, the extraction of $host and $ip basically works (although there might a better solution for sure). I can either printf $host or $ip alone.
What's wrong with my script? Thanks!

Your two grep commands are reading from standard input, which they inherit from the loop, so they also read from nmap. read gets one line, the first grep consumes the rest, and the second grep exits immediately because standard input is closed. I suspect you meant to grep the contents of $line:
while IFS='' read -r line
do
host=$(grep report <<< "$line" |cut -f5 -d' ')
ip=$(grep report <<< "$line" |sed 's/^.*(//;s/)$//')
printf "Host:$host - IP:$ip"
done < <(nmap -sP 192.168.1.1/24)
However, this is inefficient and unnecessary. You can use bash's built-in regular expression support to extract the fields you want.
regex='Nmap scan report for (.*) \((.*)\)'
while IFS='' read -r line
do
[[ $line =~ $regex ]] || continue
host=${BASH_REMATCH[1]}
ip=${BASH_REMATCH[2]}
printf "Host:%s - IP:%s\n" "$host" "$ip"
done < <(nmap -sP 192.168.1.1/24)

Try this:
#!/bin/bash
while IFS='' read -r line
do
if [[ $(echo $line | grep report) ]];then
host=$(echo $line | cut -f5 -d' ')
ip=$(echo $line | sed 's/^.*(//;s/)$//')
echo "Host:$host - IP:$ip"
fi
done < <(nmap -sP it-50)
Output:
Host:it-50 - IP:10.0.0.10
I added an if clause to skip unwanted lines.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

writing the same result for the duplicated values of a column - bash

Related

how to echo out or print the input during a bash loop against an array of IPs

Split a string with two consecutive delimiters in Bash

Bash script with long command as a concatenated string

How to skip repeated entries in a .csv file

process every line from command output in bash

Categories

Resources