Host file make a unique file for all servers - sorting

I have many hosts files. I collect them from all servers and i put them together in host_files.txt and then I must make one hosts file for all servers.
I do this command to make a unique file, but some rows share the same ip address or hostname.
awk '!a[$0]++' host_files.txt
Here is my host_files.txt
#backup server IPs
95.23.23.56
95.23.23.57
#ftp server IPs
45.89.67.5
45.89.67.3
#apache
12.56.35.36
12.56.35.35
#ftp server IPs
95.23.23.50
#apache
12.56.35.37
I want to output file, but I need to keep the comment line
#backup server IPs <= comment line, i need to keep them
95.23.23.56
95.23.23.57
#ftp server IPs <= comment line, i need to keep them
45.89.67.5
45.89.67.3
95.23.23.50
#apache <= comment line, i need to keep them
12.56.35.36
12.56.35.35
12.56.35.37
i already try :
sort -ur host_files.txt
cat host_files.txt | uniq > ok_host.txt
I need the ip without # just need ip adresse please help me
Thanks in advance

In GNU awk for using multidimensional arrays:
$ awk '
/^#/ { k=$0; next } # group within identical comments, k is key to hash
/./ { a[k][$1]=$0 } # remove empty records and hash ips
END { for(k in a) { # after everything, output
print k
for(i in a[k])
print a[k][i]
}
}' file*
#apache
12.56.35.35 #apacheprivate
12.56.35.36 #apachepub
12.56.35.37 #apachepub
#ftp server IPs
45.89.67.3 #ftpssh
45.89.67.5 #ftpmain
95.23.23.50 #ftp
#backup server IPs
95.23.23.56 #masterbasckup
95.23.23.57 #agentbasckup
The output is random order because of for(k in a), ie. comment groups and ips within groups are in no particular order.

This will work in any awk:
$ cat tst.awk
/^#/ { key = $0; next }
NF && !seen[$0]++ {
ips[key] = ips[key] $0 ORS
}
END {
for (key in ips) {
print key ORS ips[key]
}
}
$ awk -f tst.awk file
#apache
12.56.35.36 #apachepub
12.56.35.35 #apacheprivate
12.56.35.37 #apachepub
#ftp server IPs
45.89.67.5 #ftpmain
45.89.67.3 #ftpssh
95.23.23.50 #ftp
#backup server IPs
95.23.23.56 #masterbasckup
95.23.23.57 #agentbasckup
Output order will be random due to use of the in operator, if that's a problem it's just a couple more lines of code to change.

If awk is not a requirement.
#!/bin/ksh
cat host_files.txt | while read line ; do
[[ $line =~ ^$ ]] && { continue; } # skip empty lines
[[ $line =~ ^# ]] && { group=$line; continue; } # remember the group name
print "$group|$line" # print with group name in front
done | sort \
| while read line ; do
if [[ ${line%\|*} != $last ]]; then # if the group name changed
print "\n${line%\|*}" # print the group name
last=${line%\|*} # remember the new group name
fi
print "${line#*\|}" # print the entry without the group name
done
put the group name in front of the line
sort
detect changing group name and print it
print entry without group name
Using the same concept with awk (avoiding the while loop in shell).
awk '
/^#/ { k=$0; next }
/./ { print k "|" $0 }
' host_files.txt | sort | awk -F '|' '{
if ( k != $1 ) { print "\n" $1; k = $1; }
print $2
}' -
Because it does not use an array it would not loose lines due to duplicate keys.
And, thinking a bit more, the second awk can be avoided. Adding the key to each line. For the header without 'x'. So the header is sorted above the rest. In the output, just remove the added sort-key.
awk '
/^#/ { k=$0; print k "|" $0; next; }
/./ { print k "x|" $0 }
' t18.dat | sort -u | cut -d '|' -f 2

Related

How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:
id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.
I need to format it this way: name and surname must start with a capital letter
add an email record that consists of the first letter of the name and full surname in lowercase
create a new csv with records from the old csv with corrected fields.
I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).
#!/bin/bash
input="$HOME/test.csv"
exec 0<$input
while read line; do
awk -v FPAT='"[^"]*"|[^,]*' '{
...
}' $input)
done
inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):
#it doesn't work
IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv
# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ?
# as an example:
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
$5="${name_surname[0]}' '${name_surname[1]}"
email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='#domain'
$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv
how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?
function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[#]})
echo $result >> new.csv
This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
n = split($5,name,/\s*/)
$7 = tolower(substr(name[1],1,1) name[n]) "#example.com"
print
}
' "${#:--}"
$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname#example.com,
2,1,,,name Surname,department1,nsurname#example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname#example.com,
I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.
Completely working answer by Ed Morton.
If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file
#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
(NR == 1) {print} #header of csv
(NR > 1) {
if (length($0) > 1) { #exclude empty lines
count = 0
n = split($5,name,/\s*/)
email_local_part = tolower(substr(name[1],1,1) name[n])
#array stores emails from csv file
a[i++] = email_local_part
#find amount of occurrences of the same email address
for (el in a) {
ret=match(a[el], email_local_part)
if (ret == 1) { count++ }
}
#add number of occurrence to email address
if (count == 1) { $7 = email_local_part "#abc.com" }
else { --count; $7 = email_local_part count "#abc.com" }
print
}
}
' "${#:--}" > new.csv

Grep returns 'grep: node_list.txt: No such file or directory' error

I have a script that reads from a file.
################################################
# IP TABLES FOR INSTALL_CONFIG #
# #
# m = master #
# k = kibana #
# d = data #
# i = ingest #
# c = coordinator #
# Format: xxx.xxx.xxx.xxx m #
################################################
#
10.1.7.93 m
10.1.7.94 k
10.1.7.95 d
This is the function that the script uses.
function readIpFile () {
initMasterVar=0
grep "^[^# ]" node_list.txt | awk '$2 ~ /m/ { print $1 }' > tmp_master_list.txt
grep "^[^# ]" node_list.txt | awk '$2 ~ /k/ { print $1 }' > tmp_kibana_list.txt
grep "^[^# ]" node_list.txt | awk '$2 ~ /i/ { print $1 }' > tmp_ingest_list.txt
grep "^[^# ]" node_list.txt | awk '$2 ~ /d/ { print $1 }' > tmp_data_list.txt
grep "^[^# ]" node_list.txt | awk '$2 !~ /k/ { print $1 }' > tmp_all_nodes.txt
}
The functions purpose is to read from a master node list, it then sorts the list into tmp files according to the role each IP or FQDN is assigned. The grep statement filters all lines that begin with #, and AWK searches the second field for the role, and prints the IP with that role, redirected into a tmp file which is used later in the script.
My problem is that before, this function was working fine. The commands individually work in my terminal and grep is able to locate the file, and filter it accordingly. However when input in this function in this script, it breaks.
I am unsure what I am doing wrong. My script when put into shellcheck turns up no errors that would cause this.
A couple of us mentioned doing all this sorting in a single awk script instead of 5 different pipelines as an optimization - that way, the file only has to be read once. One way to do that is using in-awk output redirection:
awk '/^[# ]/ { next } # Skip lines starting with a # or space.
$2 ~ /m/ { print $1 > "/path/to/tmp_master_list.txt" }
$2 ~ /k/ { print $1 > "/path/to/tmp_kibana_list.txt" }
$2 ~ /i/ { print $1 > "/path/to/tmp_ingest_list.txt" }
$2 ~ /d/ { print $1 > "/path/to/tmp_data_list.txt" }
$2 !~ /k/ { print $1 > "/path/to/tmp_all_nodes.txt" }' /path/to/node_list.txt

Turning multi-line string into single comma-separated list in Bash

I have this format:
host1,app1
host1,app2
host1,app3
host2,app4
host2,app5
host2,app6
host3,app1
host4... and so on.
I need it like this format:
host1;app1,app2,app3
host2;app4,app5,app6
I have tired this: awk -vORS=, '{ print $2 }' data | sed 's/,$/\n/'
and it gives me this:
app1,app2,app3 without the host in front.
I do not want to show duplicates.
I do not want this:
host1;app1,app1,app1,app1...
host2;app1,app1,app1,app1...
I want this format:
host1;app1,app2,app3
host2;app2,app3,app4
host3;app2;app3
With input sorted on the first column (as in your example ; otherwise just pipe it to sort), you can use the following awk command :
awk -F, 'NR == 1 { currentHost=$1; currentApps=$2 }
NR > 1 && currentHost == $1 { currentApps=currentApps "," $2 }
NR > 1 && currentHost != $1 { print currentHost ";" currentApps; currentHost=$1; currentApps=$2 }
END { print currentHost ";" currentApps }'
It has the advantage over other solutions posted as of this edit to avoid holding the whole data in memory. This comes at the cost of needing the input to be sorted (which is what would need to put lots of data in memory if the input wasn't sorted already).
Explanation :
the first line initializes the currentHost and currentApps variables to the values of the first line of the input
the second line handles a line with the same host as the previous one : the app mentionned in the line is appended to the currentApps variable
the third line handles a line with a different host than the previous one : the infos for the previous host are printed, then we reinitialize the variables to the value of the current line of input
the last line prints the infos of the current host when we have reached the end of the input
It probably can be refined (so much redundancy !), but I'll leave that to someone more experienced with awk.
See it in action !
$ awk '
BEGIN { FS=","; ORS="" }
$1!=prev { print ors $1; prev=$1; ors=RS; OFS=";" }
{ print OFS $2; OFS=FS }
END { print ors }
' file
host1;app1,app2,app3
host2;app4,app5,app6
host3;app1
Maybe something like this:
#!/bin/bash
declare -A hosts
while IFS=, read host app
do
[ -z "${hosts["$host"]}" ] && hosts["$host"]="$host;"
hosts["$host"]+=$app,
done < testfile
printf "%s\n" "${hosts[#]%,}" | sort
The script reads the sample data from testfile and outputs to stdout.
You could try this awk script:
awk -F, '{a[$1]=($1 in a?a[$1]",":"")$2}END{for(i in a) printf "%s;%s\n",i,a[i]}' file
The script creates entries in the array a for each unique element in the first column. It appends to that array entry all element from the second column.
When the file is parsed, the content of the array is printed.

nslookup for IP & replace the result FQDN with IP

Requirement
I have a txt file in which last column have URLs.
Some of the URL entries have IPs instead of FQDN.
So, for entries with IPs (e.g. url=https://174.37.243.85:443*), I need to do reverse nslookup for IP and replace the result (FQDN) with IP.
Text File Input
httpMethod=SSL-SNI destinationIPAddress=174.37.243.85 url=https://174.37.243.85:443*
httpMethod=SSL-SNI destinationIPAddress=183.3.226.92 url=https://pingtas.qq.com:443/*
httpMethod=SSL-SNI destinationIPAddress=184.173.136.86 url=https://v.whatsapp.net:443/*
Expected Output
httpMethod=SSL-SNI destinationIPAddress=174.37.243.85 url=https://55.f3.25ae.ip4.static.sl-reverse.com:443/*
httpMethod=SSL-SNI destinationIPAddress=183.3.226.92 url=https://pingtas.qq.com:443/*
httpMethod=SSL-SNI destinationIPAddress=184.173.136.86 url=https://v.whatsapp.net:443/*
Here's a quick and dirty attempt in pure Awk.
awk '$3 ~ /^url=https?:\/\/[0-9.]*([:\/?*].*)?$/ {
# Parse out the hostname part
split($3, n, /[\/:?\*]+/);
cmd = "dig +short -x " n[2]
cmd | getline reverse;
sub(/\.$/, "", reverse);
close(cmd)
# Figure out the tail after the hostname part
match($3, /^url=https:?\/\/[0-9.]*/); # update index
$3 = n[1] "://" reverse substr($3, RSTART+RLENGTH) } 1' file
If you don't have dig, you might need to resort to nslookup or host instead; but the only one of these which portably offers properly machine-readable output is dig so you might want to install it for that feature alone.
Solution 1st: Within single awk after discussion on comments adding this now:
awk '
{
if(match($0,/\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)){
val_match=substr($0,RSTART+1,RLENGTH-1);
system("nslookup " val_match " > temp")};
val=$0;
while(getline < "temp"){
if($0 ~ /name/){
num=split($0, array," ");
sub(/\./,"",array[num]);
sub(val_match,array[num],val);
print val}}
}
NF
' Input_file
Solution 2nd: It is my initial solution with awk and shell.
Following simple script may help you on same:
cat script.ksh
CHECK_IP () {
fdqn=$(echo "$1" | awk '{if(match($0,/\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)){system("nslookup " substr($0,RSTART+1,RLENGTH-1))}}')
actual_fdqn=$(echo "$fqdn" | awk '/name/{sub(/\./,""$NF);print $NF}')
echo "$actual_fdqn"
}
while read line
do
val=$(CHECK_IP "$line")
if [[ -n "$val" ]]
then
echo "$line" | awk -v var="$val" '{if(match($0,/\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)){ip_val=substr($0,RSTART+1,RLENGTH-1);sub(ip_val,var)}} 1'
else
echo "$line"
fi
done < "Input_file"

Extract specific string between two strings and list the required content

how to find the block name in which the string available ?
server.conf file
server_pool odd {
0:server1:yes:profile_server1:192.168.1.1,192.168.1.2;
1:server3:yes:profile_server3:192.168.1.5,192.168.1.6;
}
server_pool even {
0:server2:yes:profile_server2:192.168.1.3,192.168.1.4;
1:server4:yes:profile_server4:192.168.1.7,192.168.1.8;
}
#server_pool even {
# 0:server1:yes:profile_server1:192.168.1.1,192.168.1.2;
# 1:server3:yes:profile_server3:192.168.1.5,192.168.1.6;
#}
Notes:-
"server_pool" is a static string
"pool_name" can be any string without spaces
"if a line has # in it ignore it
Requirement
Need to find the "pool_name" by the provided server hostname as input i.e server{1,2,3,} and store it in a variable
for example
if need to find server1 belongs to which block/ stanza. in the given use case it belongs to odd, so store variable as POOLNAME=odd
grep -oP '^server\s\K[^ ]+|^[^#]\s+\d+:\K[^:]+' inputfile
pool0
server1
server2
pool1
server3
server4
Using awk
awk -F'[: ]+' '/}/{p=0}/^#|}/||!NF{next}/pool[0-9]+[ \t]+?{/{if(h)print "";p=1;print $2;next}p{print $3;h=1}' file
Better Readable:
awk -F'[: ]+' '
# if line contains }, set variable p=0
/}/{
p=0;
}
# If line start with #, closing }, or empty line, skip
/^#|}/ || !NF{
next
}
# if line contains pool[0-9]+ can be space or tab and then {,
# if variable h was set before
# print newline,
# set variable p =1, print 2nd field, go to next line
/pool[0-9]+[ \t]+?{/{
if(h)print "";
p=1;
print $2;
next
}
# as long as p is set,
# print 3rd field from such record,
# h =1, to have newline char when awk finds news pool
p{
print $3;
h=1
}
' file
Here is Test results:
Input:
$ cat file
server pool0 {
0:server1:yes:profile_server1:192.168.1.1,192.168.1.2;
1:server2:yes:profile_server2:192.168.1.3,192.168.1.4;
}
server pool1 {
0:server3:yes:profile_server3:192.168.1.5,192.168.1.6;
1:server4:yes:profile_server4:192.168.1.7,192.168.1.8;
}
#server pool2 {
# 0:server5:yes:profile_server5:192.168.1.9,192.168.1.10;
# 1:server6:yes:profile_server6:192.168.1.11,192.168.1.12;
#}
Output:
$ awk -F'[: ]+' '/\}/{p=0}/^#|\}/||!NF{next}/pool[0-9]+[ \t]+?\{/{if(h)print "";p=1;print $2;next}p{print $3;h=1}' file
pool0
server1
server2
pool1
server3
server4
GRPNAME="server pool0 {"
GRPNAME=${GRPNAME%{*}; GRPNAME=${GRPNAME#*\ }
where:
${GRPNAME%\{*} = delete everyting from end ("%") until 1st "{*"; the "\" is an escape character
${GRPNAME#*\ } = delete everything from beginning ("#")and stop after 1st space; ; the "\" is an escape character
Following awk may help you in same.
awk -F' +|:' '/^$/{flag="";next} /^server pool/{print $2;flag=1;next} flag && NF && !/}/{print $3}' Input_file
EDIT: If your pool block you could have many other entries other than servers then I have added an additional check for it, try it and let me know then.
awk -F' +|:' '/^$/{flag="";next} /^server pool/{print $2;flag=1;next} flag && NF && !/}/ && $3~/server/{print $3}' Input_file
EDIT2: Showing OP that code is providing the expected output by OP only.
awk -F' +|:' '/^$/{flag="";next} /^server pool/{print $2;flag=1;next} flag && NF && !/}/ && $3~/server/{print $3}' Input_file
pool0
server1
server2
pool1
server3
server4
egrep -o "^server pool[0-9]|^[^#][ ]+[0-9]:server[0-9]" file.txt | cut -d ':' -f2 | sed 's/\(server pool[1-9]\)/\n\1/g'
Output
server pool0
server1
server2
server pool1
server3
server4
Note
I suppossed that pool are always starting by 0 and that you can not have pool with an index > 9. If that's not the case you can change, for example [0-9] to [0-9]{1,2} to accept number between -1 and 100.
This might work for you (GNU sed):
sed -nr '/^(server \S+).*/{s//\1/p;:a;n;s/^(([^:]*):){2}.*/\2/p;ta}' file
Focus on lines that begin server and extract the first two words from such lines. From subsequent lines, extract the second field (using : as a separator) until a failure to match occurs.

Resources