How to skip repeated entries in a .csv file

How to skip repeated entries in a .csv file - bash

I'm new to bash scripting. I have a text file containing a list of subdomains (URLs) and I'm creating a .csv file (subdomainIP.csv) that has 2 columns: the 1st column contains subdomains (Subdomain) and the 2nd one contains IP addresses (IP). The columns are separated by ",". My code intends to read each line of URLs.txt, finds its IP address and enter the selected subdomain and its IP address in the .csv file.
Whenever I find the IP address of a domain and I want to add it as a new entry to .csv file, I want to check the previous entries of the 2nd column. If there is a similar IP address, I don't want to add the new entry, but if there isn't any similar case, I want to add the new entry. I have done this by adding these lines to my code:
awk '{ if ($IP ~ $ipValue) print "No add"
else echo "${line}, ${ipValue}" >> subdomainIP.csv}' subdomainIP.csv
but I receive this error:
awk: cmd. line:2: else echo "${line}, ${ipValue}" >> subdomainIP.csv}
awk: cmd. line:2: ^ syntax error
What's wrong?

Would you please try the following:
declare -A seen # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
# if the IP is not empty and not in the previous list
echo "$line,$ipValue" >> subdomainIP.csv
fi
done < URLs.txt
The associative array seen may be a key for the purpose. It is indexed
by an arbitrary string (ip adddress in the case) and can memorize the value
associated with the string. It will be suitable to check the appearance
of the ip address across the input lines.

There are some issues in your code. Here's a few of them.
If the awk script is in single quotes, as in awk 'script' file, any variables $var in script will not expand. If you want to perform variable expansion, use double quotes. Compare echo hello | awk "{ print \"$PATH\" }" vs echo hello | awk '{ print "$PATH" }'.
However, if you do so, than the shell will try to expand $0, $1, $NF, ... and this is certainly not what you want. Therefore you can concatenate single- and double-quoted strings as needed, e.g. echo hello | awk '{ print "$0:"$0 >> "log"; print "$PATH:'"$PATH"'" >> "log" }'
Based on what I see from O'Reilly's sed & awk, when you redirect to file from within an awk script, you have to quote the file name, as I've done in the command above for the file named log.

Related

writing the same result for the duplicated values of a column

I'm really new to bash. I have a list of domains in a .txt file (URLs.txt). I also want to have a .csv file which consists of 3 columns separated by , (myFile.csv). My code reads each line of URLs.txt (each domain), finds its IP address and then inserts them into myFile.csv (domain in the first column, its IP in the 2nd column.
Name, IP
ex1.com, 10.20.30.40
ex2.com, 20.30.40.30
ex3.com, 10.45.60.20
ex4.com, 10.20.30.40
Here is my code:
echo "Name,IP" > myFile.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]]; then
# if the IP is not empty
echo "$line,$ipValue" >> myFile.csv
fi
done < URLs.txt
I want to add another column to myFile.csv for keeping open ports of each IP. So output would be like this:
Name, IP, Port
ex1.com, 10.20.30.40, 21/tcp
ex2.com, 20.30.40.30, 20/tcp
ex3.com, 10.45.60.20, 33/tcp
ex4.com, 10.20.30.40, 21/tcp
I want to use Nmap to do this. After I choose an IP address from the 2nd column of myFile.csv and find its open ports using Nmap, I want to write the Nmap result to the corresponding cell of the 3rd column.
Also, if there is another similar IP in the 2nd column I want to write the Nmap result for that line too. I mean I don't want to run Nmap again for the duplicated IP. For example, in my example, there are two "10.20.30.40" in the 2nd column. I want to use Nmap just once and for the 1st "10.20.30.40" (and write the result for the 2nd "10.20.30.40" as well, Nmap should not be run for the duplicated IP).
For this to happen, I changed the first line of my code to this:
echo "Name,IP,Port" > myFile.csv
and also here is the Nmap code to find the open ports:
nmap -v -Pn -p 1-100 $ipValue -oN out.txt
port=$(grep '^[0-9]' out.txt | tr '\n' '*' | sed 's/*$//')
but I don't know what to do next and how to apply these changes to my code.
I updated my code to something like this:
echo "Name,IP" > myFile.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]]; then
# if the IP is not empty
nmap -v -Pn -p 1-100 $ipValue -oN out.txt
port=$(grep '^[0-9]' out.txt | tr '\n' '*' | sed 's/*$//')
echo "$line,$ipValue,$port" >> myFile.csv
fi
done < URLs.txt
but this way, Nmap was used for finding the open ports of the duplicated IPs too, but I didn't want this. What should I do?

Here's a modified version of your script that roughly does what you want:
#!/usr/bin/env bash
# cache maps from IP addresses to open ports
declare -A cache
getports() {
local ip=$1
nmap -v -Pn -p 1-100 "$ip" -oG - \
| awk -F '\t' '
/Ports:/ {
n = split($2, a, /,? /)
printf "%s", a[2]
for (i = 3; i <= n; ++i)
printf ":%s", a[i]
}
'
}
{
echo 'Name,IP,Port'
while IFS= read -r url; do
# Read filtered dig output into array
readarray -t ips < <(dig +short "$url" | grep -E '^([0-9]+\.){3}[0-9]+$')
# Build array of open ports
unset ports
for ip in "${ips[#]}"; do
ports+=("${cache["$ip"]:=$(getports "$ip")}")
done
# Output
printf '%s,%s,%s\n' \
"$url" \
"$(IFS='-'; echo "${ips[*]}")" \
"$(IFS='-'; echo "${ports[*]}")"
done < URLs.txt
} > myFile.csv
The readarray line reads the filtered output from dig into an array of IP addresses; if that array has length zero, the rest of the loop is skipped.
Then, for each elements in the ips array, we get the ports. To avoid calling nmap if we've seen the IP address before, we use the ${parameter:=word} parameter expansion: if ${cache["$ip"]} is non-empty, use it, otherwise call the getports function and store the output in the cache associative array.
getports is called for IP addresses we haven't seen before; I've used -oG ("grepable output") to make parsing easier. The awk command filters for lines containing Ports:, which look something like
Host: 52.94.225.242 () Ports: 80/open/tcp//http/// Ignored State: closed (99)
with tab separated fields. We then split the second field on the regular expression /,? / (an optional comma followed by a blank) and store all but the first field of the resulting array, colon separated.
Finally, we print the line of CSV data; if ips or ports contain more than one element, we want to join the elements with -, which is achieved by setting IFS in the command substitution and then printing the arrays with [*].
The initial echo and the loop are grouped within curly braces so output redirection has to happen just once.

Shell script to print the lines which contains a word added by the user

I have a file named data.txt, which contains the following:
1440;150;1000000;pizza;hamburger
1000;180;56124;coke;sprite;water;juice
566;40;10000;cake;pizza;coke
I want to make a program which asks for an input from the user then prints out the lines which contains the given word.
For example:
If I enter coke, it should print out the second and third line. If I enter hambuger it should only print out the first line.
Here is the code that I tried but it doesn't work. Can anybody help me please?
echo "Enter a word"`
read word
while read line; do
numbersinthefile=$(echo $line | cut -d';' -f4);
if [ $numbersinthefile -eq $num ]; then
echo $line;
fi
done
Earlier I forgot to mention that I want the program to allow multiple inputs from the user. Example:
If I type in "pizza sprite", it gives me the first and second line.

That's a simple grep, isn't it?
read -p "Enter a word: " word
grep -F "$word" file
Add -w to match coke with coke only, and not with co or ok.
read -p "Enter a word: " word
grep -Fw "$word" file

Could you please try following once.
cat script.ksh
echo "Please enter word which you want to look for in Input_file:"
read value
awk -v val="$value" '$0 ~ val' Input_file
After running above code following is how it will work.
./script.ksh
Please enter word which you want to look for in Input_file:
coke
1000;180;56124;coke;sprite;water;juice
566;40;10000;cake;pizza;coke
EDIT: In case you want to pass multiple values to script then how about passing them as an arguments to program itself?
cat script.ksh
for var in "$#"
do
awk -v val="$var" '$0 ~ val' Input_file
done
Then run script in following fashion.
script.ksh test coke cake etc

Here is one in awk that accepts partial matches:
$ awk '
BEGIN {
FS=";" # file field sep
printf "Feed me text: " # text for prompt
if((getline s < "-")<=0) # read user input
exit # exit if unsuccessful
}
{
for(i=4;i<=NF;i++) # iterate fields from file records >= 4
if($i~s) { # if match (~ since there was a space in eof NR==3)
print
next # only output each matching record once
}
}' file
Output
Feed me text: coke
1000;180;56124;coke;sprite;water;juice
566;40;10000;cake;pizza;coke

Extract value of column from a line (variable)

Okay, so I have a variable ($line) that is defined in the bash/shell script as
$line = "abc:123:def:345"
need to get this column2 = "123"
How do I extract the value of the 2nd column i.e. "123" and name it as a different variable which can be summed later on? I know you have to separate it based on the delimiter ':' but I don't know how to transfer to different variable whilst taking input from $line variable. I only ask this because for some weird reason my code reads the first line of text file BUT doesn't perform the awk on just the first line only so hence the sum is wrong.
FILE=$1
while read line
do
awk -F: '{summation += $3;}END{print summation;}'
done < $FILE
-code via shell script
Thanks.

You can use awk to get second field:
line="abc:123:def:345"
awk -F: '{print $2}' <<< "$line"
123

To assign a variable in the shell, no $ on the left-hand side, no spaces around the =, and < and > are not valid quote characters
line="abc:123:def:345"
In bash, you would do this:
IFS=: read -ra fields <<< "$line"
temporarily set IFS to a colon
use the $line variable as input to the read command (a here-string)
and read the values into the fields array.
Bash arrays are indexed starting from zero, so to extract the 2nd field:
echo "${fields[1]}" # => 123

way in bash using expr
Line2=$(expr $line : "[^:]*:\([^:]*\)")
or if the fields are always 3 characters
Line2=${line:4:3}

Using plain POSIX features:
#!/usr/bin/env sh
line='abc:123:def:345'
IFS=:
# Split line on IFS into arguments
set -- $line
printf %s\\n "$2"
Alternate method:
#!/usr/bin/env sh
line='abc:123:def:345'
# Strip out first column
a="${line#*:}"
# Strip out remaining columns except 1st
b="${a%%:*}"
printf %s\\n "$b"

Set variable from awk while parsing lines from a multiline file

I've got a txt file with several lines, each one describing a remote server, like this:
user#server:port:remote_working_path:whether_using_VPN
The : char separates the 4 fields.
I need to operate batch actions within each server, hence I need to parse each line and set appropriate variables. Right now, what I've coded is this:
while read server;
do
echo "$server" | awk -F ':' '{print $1}' &&
echo "$server" | awk -F ':' '{print $2}' &&
echo "$server" | awk -F ':' '{print $3}'
echo "$VPN"
declare $( echo "$server" | awk -F ':' '{print $VPN=$4}' )
echo 'VPN: '$VPN
done < $CUSTOMER_SERVERS_FILE
This script only prints the first 3 fields, and in my intentions should also set $VPN variable as the 4th field. However this seems way broken, and I'm being unable to fix it. How should I modify it so that $VPN = $4?

First, you don't need to use awk in this case. You could try to use something like :
while IFS=':' read -ra array; do
# "${array[0]}" => first field
# "${array[1]}" => second field
# ...
# "${array[#]}" => all fields
done < "$CUSTOMER_SERVERS_FILE"
Then if you want to set VPN variable with the 4th field, you could use :
while IFS=':' read -ra array; do
# ...
VPN="${array[3]}"
done < "$CUSTOMER_SERVERS_FILE"
Another solution :
while IFS=':' read -r address port path vpn trash; do
# The variables $adress $port $path and $vpn are assigned.
# $trash is set with other fields if there are more than 4 fields
done
Finally, when you want to assign the output of a command in a variable, you could do :
var="$(command)"
# or
var="`command`"

how to prevent for loop from using space as deliminator, bash script

I am trying to right a bash script to do multiple checks and searches for a CMS my company uses. I trying to implement a function for a user to be able to search for a certain macro call and the function return all the files that contain the call, the line the macro is called on, and the actual code in the macro call. What I have seems to be getting screwed up by the fact I am using a for loop to format the output. Here's the snippet of the script I am working on:
elif [ "$choice" = "2" ]
then
echo -e "\n What macro call are we looking for $name?"
read macrocall
for i in $(grep -inR "$macrocall" $sitepath/templates/macros/); do
file=$(echo $i | cut -d\: -f1 | awk -F\/ '{ print $NF }')
line=$(echo $i | cut -d\: -f2)
calltext=$(echo $i | cut -d\: -f3-)
echo -e "\nFile: $file"
echo -e "\nLine: $line"
echo -e "\nMacro Call from file: $calltext"
done
fi
the current script runs the first few fields until it gets a a space and then everything gets all screwy. Anybody have any idea how I can have the for loops deliminator to be each result of the grep? any suggestions would be helpful. Let me know if any of you need more info. Thanks!

The right way to do this would be more like:
printf "\n What macro call are we looking for %s?" "$name"
read macrocall
# ensure globbing is off and set IFS to a newline after saving original values
oSET="$-"; set -f; oIFS="$IFS"; IFS=$'\n'
awk -v macrocall="$macrocall" '
BEGIN { lc_macrocall = "\\<" tolower(macrocall) "\\>" }
tolower($0) ~ lc_macrocall {
file=FILENAME
sub(/.*\//,"",file)
printf "\n%s\n", file
printf "\n%d\n", FNR
printf "\nMacro Call from file: %s\n", $0
}
' $(find "$sitepath/templates/macros" -type f -print)
# restore original IFS and globbing values
IFS="$oIFS"; set +f -"$oSET"
This solves the problem of having spaces in your file names as originally requested, but also handles globbing characters in your file names, and the various typical echo issues.

You can set the internal field separator $IFS (which is normally set to space, tab and newline) to just newline to get around this problem:
IFS="\n"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to skip repeated entries in a .csv file - bash

Related

writing the same result for the duplicated values of a column

Shell script to print the lines which contains a word added by the user

Extract value of column from a line (variable)

Set variable from awk while parsing lines from a multiline file

how to prevent for loop from using space as deliminator, bash script

Categories

Resources