Bash compare previous variable - bash

I am trying to parse csv file and rewrite with extra field, you can see csv file below
in the file you will see 192.168 ip address and 10.0 ip address.
192.168 is end point address and 10.0 is voip addresses.
If Switch ID,Switch Port ID,Description are equal it means 10.0 ip is voip phone for 192.168 ip address
for example;
192.168.205.76,189,FC3F.DB02.ED78,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1
10.0.40.46,1640,F025.7279.6DAA,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1
they are on both basement-k001a-asw1 and GigabitEthernet3/4,K013-EB3-C1.
So 10.0.40.29 is the voip phone for 192.168.189.26 endpoint.
What i am trying to do if ip start with 10.0 and its switchid,portid and description match previous one then i would like to write voip ip end of line of 192.168 line.
I can use global variable and use previous value, current value and change them as "for loop" continues
Example:
#!/bin/bash
previuos_ip=
current_ip=
previous_location=
current_location=
for systems in $(cat list.csv)
do
previous_ip=$current_ip
current_ip=$(echo "$systems" | cut -d, -f1)
previous_location=$current_location
current_location=$(echo "$systems" | cut -d, -f4,5,7)
printf "$previous_ip,$current_ip,$previous_location,$current_location\n"
done
previous_ip=$current_ip
current_ip=
previous_location=$current_location
current_location=
I hope i explained well. Any help appreciated
Thanks
Here is sample csv I have.
192.168.205.76,189,FC3F.DB02.ED78,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1
10.0.40.46,1640,F025.7279.6DAA,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1
10.68.194.185,1189,9C93.4E2D.EE1A,basement-k001a-asw1,GigabitEthernet3/3,basement-access,K022D-NB2-C2
192.168.189.26,189,9C8E.99DD.A49F,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1
10.0.40.29,1640,1CDE.A783.EA7B,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1
192.168.189.230,189,EC9A.7435.2177,basement-k001a-asw1,GigabitEthernet3/6,basement-access,K024-SB1-C1
192.168.189.34,189,70F3.95C1.11F8,basement-k001a-asw1,GigabitEthernet3/8,basement-access,K020-CF7-C1
10.0.40.45,1640,0008.2FB7.6F84,basement-k001a-asw1,GigabitEthernet3/11,basement-access,K002A-NB1-C1
192.168.189.22,189,8851.FB82.5DE3,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2
10.0.40.28,1640,3CCE.73AC.ED44,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2
192.168.189.225,189,9C93.4E4D.1DDA,basement-k001a-asw1,GigabitEthernet3/13,basement-access,K022D-NB2-C1
10.68.189.182,1189,001C.9B09.0504,basement-k001a-asw1,GigabitEthernet3/15,basement-access,K006-NW1-C1
10.0.40.42,1640,1CDE.A783.B19B,basement-k001a-asw1,GigabitEthernet3/16,basement-access,K005-NB1-C1
10.68.189.181,1189,9C93.4E16.D940,basement-k001a-asw1,GigabitEthernet3/17,basement-access,K004-WB1-C2
192.168.189.233,1189,9C93.4E67.2017,basement-k001a-asw1,GigabitEthernet3/27,basement-access,K013-SB1-C1
10.68.189.52,1189,0040.580D.157E,basement-k001a-asw1,GigabitEthernet3/28,basement-access,K009HALL-EW5-C1(KRONOS)
192.168.189.31,189,984B.E17D.5BE1,basement-k001a-asw1,GigabitEthernet3/34,basement-access,K013-WB1-C1
192.168.189.222,189,68B5.9941.32CE,basement-k001a-asw1,GigabitEthernet3/35,basement-access,K004-NB3-C1
10.0.40.56,1640,0CD9.9691.B9C3,basement-k001a-asw1,GigabitEthernet3/36,basement-access,K024HALL-WW1-C1
192.168.189.223,189,3CD9.2B0F.E714,basement-k001a-asw1,GigabitEthernet3/39,basement-access,K006-EB1-C2
10.0.40.44,1640,1CDE.A782.1A7E,basement-k001a-asw1,GigabitEthernet3/41,basement-access,K011-NB1-C2
192.168.189.224,189,1458.D039.9735,basement-k001a-asw1,GigabitEthernet3/42,basement-access,K013-WB2-C2
192.168.189.23,189,D4C9.EFD8.1490,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1
10.0.40.30,1640,1CDE.A783.A7CD,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1
192.168.189.25,189,8851.FB81.72E4,basement-k001a-asw1,GigabitEthernet3/44,basement-access,K002A-WB1-C2
192.168.189.29,189,D4C9.EFD3.E39B,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1
10.0.40.22,1640,3820.5618.1630,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1
10.0.40.39,1640,3820.5618.169B,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2
192.168.189.221,189,001A.4B1C.F810,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2
192.168.189.27,189,F4CE.4613.FF62,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1
10.0.40.25,1640,1CDE.A783.A92C,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1
172.16.45.183,45,0040.1135.7FC6,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY)
10.50.10.183,1045,0040.1935.7AC2,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY)
172.16.45.241,45,00C0.B792.8CD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBOTZ
10.50.10.241,1045,1AD1.B792.8AD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBO
192.168.189.2,189,00C0.B7B6.3A1A,basement-k001a-asw1,GigabitEthernet3/48,basement-access,Connectiontobasement-k001a-ups1
192.168.x.x and 172.16.x.x are endpoint
10.0.x.x and 10.50.x.x are phones.
Location match $4,$5,&7
expected result.
192.168.205.76,189,FC3F.DB02.ED78,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1,10.0.40.46
10.0.40.46,1640,F025.7279.6DAA,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1,N/A
10.68.194.185,1189,9C93.4E2D.EE1A,basement-k001a-asw1,GigabitEthernet3/3,basement-access,K022D-NB2-C2,N/A
192.168.189.26,189,9C8E.99DD.A49F,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1,10.0.40.29
10.0.40.29,1640,1CDE.A783.EA7B,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1,N/A
192.168.189.230,189,EC9A.7435.2177,basement-k001a-asw1,GigabitEthernet3/6,basement-access,K024-SB1-C1,N/A
192.168.189.34,189,70F3.95C1.11F8,basement-k001a-asw1,GigabitEthernet3/8,basement-access,K020-CF7-C1,N/A
10.0.40.45,1640,0008.2FB7.6F84,basement-k001a-asw1,GigabitEthernet3/11,basement-access,K002A-NB1-C1,N/A
192.168.189.22,189,8851.FB82.5DE3,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2,10.0.40.28
10.0.40.28,1640,3CCE.73AC.ED44,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2,N/A
192.168.189.225,189,9C93.4E4D.1DDA,basement-k001a-asw1,GigabitEthernet3/13,basement-access,K022D-NB2-C1,N/A
10.68.189.182,1189,001C.9B09.0504,basement-k001a-asw1,GigabitEthernet3/15,basement-access,K006-NW1-C1,N/A
10.0.40.42,1640,1CDE.A783.B19B,basement-k001a-asw1,GigabitEthernet3/16,basement-access,K005-NB1-C1,N/A
10.68.189.181,1189,9C93.4E16.D940,basement-k001a-asw1,GigabitEthernet3/17,basement-access,K004-WB1-C2,N/A
192.168.189.233,1189,9C93.4E67.2017,basement-k001a-asw1,GigabitEthernet3/27,basement-access,K013-SB1-C1,N/A
10.68.189.52,1189,0040.580D.157E,basement-k001a-asw1,GigabitEthernet3/28,basement-access,K009HALL-EW5-C1(KRONOS),N/A
192.168.189.31,189,984B.E17D.5BE1,basement-k001a-asw1,GigabitEthernet3/34,basement-access,K013-WB1-C1,N/A
192.168.189.222,189,68B5.9941.32CE,basement-k001a-asw1,GigabitEthernet3/35,basement-access,K004-NB3-C1,N/A
10.0.40.56,1640,0CD9.9691.B9C3,basement-k001a-asw1,GigabitEthernet3/36,basement-access,K024HALL-WW1-C1,N/A
192.168.189.223,189,3CD9.2B0F.E714,basement-k001a-asw1,GigabitEthernet3/39,basement-access,K006-EB1-C2,N/A
10.0.40.44,1640,1CDE.A782.1A7E,basement-k001a-asw1,GigabitEthernet3/41,basement-access,K011-NB1-C2,N/A
192.168.189.224,189,1458.D039.9735,basement-k001a-asw1,GigabitEthernet3/42,basement-access,K013-WB2-C2,N/A
192.168.189.23,189,D4C9.EFD8.1490,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1,10.0.40.30
10.0.40.30,1640,1CDE.A783.A7CD,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1,N/A
192.168.189.25,189,8851.FB81.72E4,basement-k001a-asw1,GigabitEthernet3/44,basement-access,K002A-WB1-C2,N/A
192.168.189.29,189,D4C9.EFD3.E39B,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1,10.0.40.22
10.0.40.22,1640,3820.5618.1630,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1,N/A
10.0.40.39,1640,3820.5618.169B,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2,N/A
192.168.189.221,189,001A.4B1C.F810,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2,10.0.40.39
192.168.189.27,189,F4CE.4613.FF62,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1,10.0.40.25
10.0.40.25,1640,1CDE.A783.A92C,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1,N/A
172.16.45.183,45,0040.1135.7FC6,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY),10.50.10.183
10.50.10.183,1045,0040.1935.7AC2,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY),N/A
172.16.45.241,45,00C0.B792.8CD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBOTZ,10.50.10.241
10.50.10.241,1045,1AD1.B792.8AD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBO,N/A
192.168.189.2,189,00C0.B7B6.3A1A,basement-k001a-asw1,GigabitEthernet3/48,basement-access,Connectiontobasement-k001a-ups1,N/A

You can use awk to simplify this:
awk 'BEGIN{FS=OFS=","} {k=$4 FS $5 FS $7}
$1~/^1[79]2\./{if (pr) print pr, "N/A"; pr=$0; pk=k}
$1~/^10\./{if (k == pk) { print pr, $1; pr=""} print $0, "N/A"}
END{if (pr) print pr, "N/A"}' file
Output:
192.168.205.76,189,FC3F.DB02.ED78,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1,10.0.40.46
10.0.40.46,1640,F025.7279.6DAA,basement-k001a-asw1,GigabitEthernet3/1,basement-access,K022E-NB1-C1,N/A
10.68.194.185,1189,9C93.4E2D.EE1A,basement-k001a-asw1,GigabitEthernet3/3,basement-access,K022D-NB2-C2,N/A
192.168.189.26,189,9C8E.99DD.A49F,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1,10.0.40.29
10.0.40.29,1640,1CDE.A783.EA7B,basement-k001a-asw1,GigabitEthernet3/4,basement-access,K013-EB3-C1,N/A
192.168.189.230,189,EC9A.7435.2177,basement-k001a-asw1,GigabitEthernet3/6,basement-access,K024-SB1-C1,N/A
10.0.40.45,1640,0008.2FB7.6F84,basement-k001a-asw1,GigabitEthernet3/11,basement-access,K002A-NB1-C1,N/A
192.168.189.34,189,70F3.95C1.11F8,basement-k001a-asw1,GigabitEthernet3/8,basement-access,K020-CF7-C1,N/A
192.168.189.22,189,8851.FB82.5DE3,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2,10.0.40.28
10.0.40.28,1640,3CCE.73AC.ED44,basement-k001a-asw1,GigabitEthernet3/12,basement-access,K022D-NB1-C2,N/A
10.68.189.182,1189,001C.9B09.0504,basement-k001a-asw1,GigabitEthernet3/15,basement-access,K006-NW1-C1,N/A
10.0.40.42,1640,1CDE.A783.B19B,basement-k001a-asw1,GigabitEthernet3/16,basement-access,K005-NB1-C1,N/A
10.68.189.181,1189,9C93.4E16.D940,basement-k001a-asw1,GigabitEthernet3/17,basement-access,K004-WB1-C2,N/A
192.168.189.225,189,9C93.4E4D.1DDA,basement-k001a-asw1,GigabitEthernet3/13,basement-access,K022D-NB2-C1,N/A
10.68.189.52,1189,0040.580D.157E,basement-k001a-asw1,GigabitEthernet3/28,basement-access,K009HALL-EW5-C1(KRONOS),N/A
192.168.189.233,1189,9C93.4E67.2017,basement-k001a-asw1,GigabitEthernet3/27,basement-access,K013-SB1-C1,N/A
192.168.189.31,189,984B.E17D.5BE1,basement-k001a-asw1,GigabitEthernet3/34,basement-access,K013-WB1-C1,N/A
10.0.40.56,1640,0CD9.9691.B9C3,basement-k001a-asw1,GigabitEthernet3/36,basement-access,K024HALL-WW1-C1,N/A
192.168.189.222,189,68B5.9941.32CE,basement-k001a-asw1,GigabitEthernet3/35,basement-access,K004-NB3-C1,N/A
10.0.40.44,1640,1CDE.A782.1A7E,basement-k001a-asw1,GigabitEthernet3/41,basement-access,K011-NB1-C2,N/A
192.168.189.223,189,3CD9.2B0F.E714,basement-k001a-asw1,GigabitEthernet3/39,basement-access,K006-EB1-C2,N/A
192.168.189.224,189,1458.D039.9735,basement-k001a-asw1,GigabitEthernet3/42,basement-access,K013-WB2-C2,N/A
192.168.189.23,189,D4C9.EFD8.1490,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1,10.0.40.30
10.0.40.30,1640,1CDE.A783.A7CD,basement-k001a-asw1,GigabitEthernet3/43,basement-access,K013-WB2-C1,N/A
192.168.189.25,189,8851.FB81.72E4,basement-k001a-asw1,GigabitEthernet3/44,basement-access,K002A-WB1-C2,N/A
192.168.189.29,189,D4C9.EFD3.E39B,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1,10.0.40.22
10.0.40.22,1640,3820.5618.1630,basement-k001a-asw1,GigabitEthernet3/45,basement-access,K002A-WB1-C1,N/A
10.0.40.39,1640,3820.5618.169B,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2,N/A
192.168.189.221,189,001A.4B1C.F810,basement-k001a-asw1,GigabitEthernet3/46,basement-access,K002A-SB1-C2,N/A
192.168.189.27,189,F4CE.4613.FF62,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1,10.0.40.25
10.0.40.25,1640,1CDE.A783.A92C,basement-k001a-asw1,GigabitEthernet3/47,basement-access,K002A-SB1-C1,N/A
172.16.45.183,45,0040.1135.7FC6,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY),10.50.10.183
10.50.10.183,1045,0040.1935.7AC2,zph-04721-asw1,GigabitEthernet1/0/15,zph-access,04740-WB1-C1(SECURITY),N/A
172.16.45.241,45,00C0.B792.8CD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBOTZ,10.50.10.241
10.50.10.241,1045,1AD1.B792.8AD1,zph-04721-asw1,GigabitEthernet1/0/25,zph-access,04721-NETBOTZ,N/A
192.168.189.2,189,00C0.B7B6.3A1A,basement-k001a-asw1,GigabitEthernet3/48,basement-access,Connectiontobasement-k001a-ups1,N/A

Related

Block IPs that requested more than N times per minute from a log file

I want to block IPs that requested more than N times/min using iptables .
I've sorted the log file using this script:
cat $log_path | awk '{print $1, $4}' | sort -n -k 1,4 | sed "s/\[//g"
10.200.3.120 20/May/2021:21:05:04
10.200.3.120 20/May/2021:21:05:17
10.200.3.120 20/May/2021:21:05:18
10.200.3.120 20/May/2021:21:05:19
10.200.3.120 20/May/2021:21:05:20
10.200.3.120 20/May/2021:22:05:39
104.131.19.181 20/May/2021:19:05:31
107.23.7.76 20/May/2021:20:05:16
119.252.76.162 20/May/2021:22:05:00
119.252.76.162 20/May/2021:22:05:01
119.252.76.162 20/May/2021:22:05:01
119.252.76.162 20/May/2021:22:05:04
119.252.76.162 20/May/2021:22:05:04
119.252.76.162 20/May/2021:21:05:10
119.252.76.162 20/May/2021:21:05:44
⋮
In the example log above, two IPs requested more than 4 times in a minute (10.200.3.120, 119.252.76.162) and they should be blocked.
How can I get the number of requests in a time interval for each IP and block those IPs?
You can try this solution:
awk '
{
gsub(/\[|:[0-9]+$/, "", $4)
++fq[$4,$1]
}
END {
for (i in fq)
if (fq[i] >= 4) {
sub(".*" SUBSEP, "", i)
print "iptables -A INPUT -s", i, "-j DROP"
}
}' "$log_path" | sh
Here:
gsub function strips starting [ and seconds value from timestamp
++fq[$4,$1] increments array element fq by 1 where each element is composite key $4,$1 i.e. $4 SUBSEP $1 string
In the END block we loop through fq array. When fq[i] >= 4 we remove starting text followed by SUBSEP from array index to leave only IP.
Finally we print full iptables command line using the ip we just extracted
Finally we pipe awk output to sh to run all commands
You can block the ip like this :
iptables -A INPUT -s <ip-address-to-block> -j DROP
Adapt your bash script to use this command whenever you see an ip requesting more than you want. The idea is to read your log file with a given frequency and parse the number each ip appears. If it appears more than you want, you drop it.
To unblock the ip, you can use this
command:
iptables -D INPUT -s <ip-address-to-unblock> -j DROP

read from the nth column on using awk

The nmcli -c no device displays:
DEVICE TYPE STATE CONNECTION
wlp3s0 wifi connected My Test Connection
p2p-dev-wlp3s0 wifi-p2p disconnected --
enp4s0f1 ethernet unavailable --
lo loopback unmanaged --
In order to separate the info on wifi, I have this command:
wf_info="$(nmcli -c no device | grep "wifi[^-]" | awk '{print "wf_devc="$1, "wf_state="$3, "wf_conn="$4}')"
eval "$wf_info"
echo "$wf_devc" # returns wlp3s0
echo "$wf_state" # returns connected
echo "$wf_conn" # returns My (while should be My Test Connection)
The problem with the above command is that for wf_conn it gives me My while I should be the full name My Test Connection. How can I tell the command to read from the 4th column on and not just the 4th column for the wf_conn?
You can "collect" the rest of the fields into a single variable and then print it:
read wf_devc wf_state wf_conn < <(nmcli -c no device | awk '/wifi[^-]/{r=""; for(i=4;i<=NF;i++){r=r (i==4 ? "":" ") $i}; print $1" "$3" "r}')
Note that grep part is incorporated into awk, /wifi[^-]/ will make sure only those lines will be printed that contains wifi followed by a char other than a - char.
The r=""; for(i=4;i<=NF;i++){r=r (i==4 ? "":" ") $i} part inits an r empty string and then all fields starting with Field 4 are concatenated using a space.
See the online demo:
#!/bin/bash
s='DEVICE TYPE STATE CONNECTION
wlp3s0 wifi connected My Test Connection
p2p-dev-wlp3s0 wifi-p2p disconnected --
enp4s0f1 ethernet unavailable --
lo loopback unmanaged --'
read wf_devc wf_state wf_conn < <(awk '
/wifi[^-]/{
r="";
for(i=4;i<=NF;i++){
r=r (i==4 ? "":" ") $i
};
print $1" "$3" "r
}' <<< "$s")
echo "wf_devc=$wf_devc wf_state=$wf_state wf_conn=$wf_conn"
Output:
wf_devc=wlp3s0 wf_state=connected wf_conn=My Test Connection

Store variables from lines in a text file using awk and cut in a for loop

I have a tab separated text file, call it input.txt
cat input.txt
Begin Annotation Diff End Begin,End
6436687 >ENST00000422706.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-205|APOL1|2901|protein_coding| 50 6436736 6436687,6436736
6436737 >ENST00000426053.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-206|APOL1|2808|protein_coding| 48 6436784 6436737,6436784
6436785 >ENST00000319136.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000075315.5|APOL1-201|APOL1|3000|protein_coding| 51 6436835 6436785,6436835
6436836 >ENST00000422471.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319151.1|APOL1-204|APOL1|561|nonsense_mediated_decay| 11 6436846 6436836,6436846
6436847 >ENST00000475519.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319153.1|APOL1-212|APOL1|600|retained_intron| 11 6436857 6436847,6436857
6436858 >ENST00000438034.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319152.2|APOL1-210|APOL1|566|protein_coding| 11 6436868 6436858,6436868
6436869 >ENST00000439680.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319252.1|APOL1-211|APOL1|531|nonsense_mediated_decay| 10 6436878 6436869,6436878
6436879 >ENST00000427990.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319154.2|APOL1-207|APOL1|624|protein_coding| 12 6436890 6436879,6436890
6436891 >ENST00000397278.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319100.4|APOL1-202|APOL1|2795|protein_coding| 48 6436938 6436891,6436938
6436939 >ENST00000397279.8|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-203|APOL1|1564|protein_coding| 28 6436966 6436939,6436966
6436967 >ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding| 11 6436977 6436967,6436977
6436978 >ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay| 11 6436988 6436978,6436988
Using the information in input.txt I want to obtain information from a file called Other_File.fa. This file is an annotation file filled with ENST#'s (transcript IDs) and sequences of A's,T's,C's,and G's. I want to store the sequence in a file called Output.log (see example below) and I want to store the command used to retrieve the text in a file called Input.log (see example below).
I have tried to do this using awk and cut so far using a for loop. This is the code I have tried.
for line in `awk -F "\\t" 'NR != 1 {print substr($2,2,17)"#"$5}' input.txt`
do
transcript=`cut -d "#" -f 1 $line`
range=`cut -d "#" -f 2 $line` #Range is the string location in Other_File.fa
echo "Our transcript is ${transcript} and our range is ${range}" >> Input.log
sed -n '${range}' Other_File.fa >> Output.log
done
Here is an example of the 11 lines between ENST00000433768.5 and ENST00000431184.1 in Other_File.fa.
grep -A 11 ENST00000433768.5 Other_File.fa
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
>ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay|
The range value in input.txt for this transcript is 6436967,6436977. In my file Input.log for this transcript I hope to get
Our transcript is ENST00000433768.5 and our range is 6436967,6436977
And in Output.log for this transcript I hope to get
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
But I am getting the following error, and I am unsure as to why or how to fix it.
cut: ENST00000433768.5#6436967,6436977: No such file or directory
cut: ENST00000433768.5#6436967,6436977: No such file or directory
Our transcript is and our range is
My thought was each line from the awk would be read as a string then cut could split the string along the "#" symbol I have added, but it is reading each line as a file and throwing an error when it can't locate the file in my directory.
Thanks.
EDIT2: This is a generic solution which will compare 2 files(input and other_file.fa) and on whichever line whichever range is found it will print them. Eg--> Range numbers are found on 300 line number but range shows you should print from 1 to 20 it will work in that case also. Also note this calls system command which further calls sed command(like you were using range within sed), there are other ways too, like to load whole Input_file into an array or so and then print, but I am going with this one here, fair warning this is not tested with huge size files.
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum,",")
print arr[$2]
start=lineNum[1]
end=lineNum[2]
print "sed -n \047" start","end"p \047 " FILENAME
system("sed -n \047" start","end"p\047 " FILENAME)
start=end=0
}
' file1 FS="[>|]" other_file.fa
EDIT: With OP's edited samples, please try following to print lines based on other file. assumes that the line you find range values, those values will be always after the line on which they found(eg--> 3rd line range values found and range is 4 to 10).
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum," ")
start=lineNum[1]
end=lineNum[2]
}
FNR>=start && FNR<=end{
print
if(FNR==end){
start=end=0
}
}
' file1 FS="[>|]" other_file.fa
You need not to do this with a for loop and then call awk program each time for each line. This could be done in single awk, considering that you have to only print them. Written and tested with your shown samples.
awk -F'[>| ]' 'FNR>1{print "Our transcript is:"$3" and our range is:"$NF}' Input_file
NOTE: This will print for each line of your Input_file values of transcript and range, in case you want to further perform some operation with their values then please do mention.

Get package name and corr. data from file

I've been banging my head lately,trying to parse dumpsys output.
Here is the output:
NotificationRecord(0x4297d448: pkg=com.android.systemui user=UserHandle{0} id=273 tag=null score=0: Notification(pri=0 icon=7f020148 contentView=com.android.systemui/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x2 when=0 ledARGB=0x0 contentIntent=N deleteIntent=N contentTitle=6 contentText=15 tickerText=6 kind=[null]))
uid=10012 userId=0
icon=0x7f020148 / com.android.systemui:drawable/stat_sys_no_sim
pri=0 score=0
contentIntent=null
deleteIntent=null
tickerText=No SIM
contentView=android.widget.RemoteViews#429c1f58
defaults=0x00000000 flags=0x00000002
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=No SIM
android.subText=null
android.showChronometer=false
android.icon=2130837832
android.text=Insert SIM card
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x427e1878: pkg=jackpal.androidterm user=UserHandle{0} id=1 tag=null score=0: Notification(pri=0 icon=7f02000d contentView=jackpal.androidterm/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x62 when=1456782124817 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=17 contentText=27 tickerText=27 kind=[null]))
uid=10094 userId=0
icon=0x7f02000d / jackpal.androidterm:drawable/ic_stat_service_notification_icon
pri=0 score=0
contentIntent=PendingIntent{42754f78: PendingIntentRecord{42802aa0 jackpal.androidterm startActivity}}
deleteIntent=null
tickerText=Terminal session is running
contentView=android.widget.RemoteViews#4279b510
defaults=0x00000000 flags=0x00000062
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=Terminal Emulator
android.subText=null
android.showChronometer=false
android.icon=2130837517
android.text=Terminal session is running
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x429381f8: pkg=com.droidsail.dsapp2sd user=UserHandle{0} id=128 tag=null score=0: Notification(pri=0 icon=7f020000 contentView=com.droidsail.dsapp2sd/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x10 when=1456786729004 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=13 contentText=35 tickerText=35 kind=[null]))
uid=10107 userId=0
icon=0x7f020000 / com.droidsail.dsapp2sd:drawable/appicon
pri=0 score=0
contentIntent=PendingIntent{42955a60: PendingIntentRecord{4286db18 com.droidsail.dsapp2sd startActivity}}
deleteIntent=null
tickerText=Detected new app can be moved to SD
contentView=android.widget.RemoteViews#42a891a8
defaults=0x00000000 flags=0x00000010
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=New app to SD
android.subText=null
android.showChronometer=false
android.icon=2130837504
android.text=Detected new app can be moved to SD
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
NotificationRecord(0x423708b0: pkg=android user=UserHandle{-1} id=17041135 tag=null score=0: Notification(pri=0 icon=1080399 contentView=android/0x1090069 vibrate=null sound=null defaults=0x0 flags=0x1002 when=0 ledARGB=0x0 contentIntent=Y deleteIntent=N contentTitle=19 contentText=17 tickerText=N kind=[android.system.imeswitcher]))
uid=1000 userId=-1
icon=0x1080399 / android:drawable/ic_notification_ime_default
pri=0 score=0
contentIntent=PendingIntent{425a8960: PendingIntentRecord{426f84b0 android broadcastIntent}}
deleteIntent=null
tickerText=null
contentView=android.widget.RemoteViews#428846b8
defaults=0x00000000 flags=0x00001002
sound=null
vibrate=null
led=0x00000000 onMs=0 offMs=0
extras={
android.title=Choose input method
android.subText=null
android.showChronometer=false
android.icon=17302425
android.text=Hacker's Keyboard
android.progress=0
android.progressMax=0
android.showWhen=true
android.infoText=null
android.progressIndeterminate=false
android.scoreModified=false
}
I want to get the package name and the corresponding extras={}
for each of them.
For example:
pkg:com.android.systemui
extras={
.....
}
So far I've tried:
dumpsys notification | awk '/pkg=/,/\n}/'
But without any success.
I'm a newbie to awk,and if possible I want to do it with awk or perl.Of course,any other tool like sed or grep is fine by me too,I just wanna parse it somehow.
Can anyone help me?
If you have GNU awk, try the following:
awk -v RS='(^|\n)NotificationRecord\\([^=]+=' \
'NF { print "pkg:" $1; print gensub(/^.*\n\s*(extras=\{[^}]+\}).*$/, "\\1", 1) }' file
-v RS='(^|\n)NotificationRecord\\([^=]+=' breaks the input into records by lines starting with NotificationRecord( up to and including the following = char.
In effect, that means you get records starting with the package names (com.android.systemui, ...`)
NF is a condition that only executes the following block if it evaluates to nonzero; NF is the count of fields in the record, so as long as at least 1 field is present, the block is evaluated - in effect, this skips the implied empty record before the very first line.
print "pkg:" $1 prints the package name, prefixed with literal pkg:.
gensub(/^.*\n\s*(extras=\{[^}]+\}).*$/, "\\1", 1) matches the entire record and replaces it with the extras property captured via a capture group, effectively returning the extras property only.
I would suggest perl over awk, because you'll be storing whether you're inside the extras=... block in a variable:
dumpsys notification | perl -lne '
print $1 if /^Notif.*?: pkg=(\S+)/;
$in_extras = 0 if /^ \}/;
print if $in_extras;
$in_extras = 1 if /^ extras=\{/'
Oh, if you want the extra pkg: and extras= text, slight modification:
dumpsys notification | perl -lne '
print "pkg: $1" if /^Notif.*?: pkg=(\S+)/;
$in_extras = 1 if /^ extras=\{/;
print if $in_extras;
$in_extras = 0 if /^ \}/;'
Sed version:
dumpsys notification |\
sed -n 's/.*pkg=\([^ ]*\).*/pkg:\1/p;/^ extras={$/,/^ }$/s/^ //p'
I'm assuming you always have two spaces in front of extras={ and } and you also want to remove these spaces.

Conditional Sort using Awk or sort

Alright, so I asked a question a week or so ago about how I could use sed or awk to extract a block of text between two blank lines, as well as omit part of the extracted text. The answers I got pretty much satisfied my needs, but now I'm doing something extra for fun (and for OCD's sake).
I want to sort the output from awk in this round. I found this question & answer but it doesn't quite help me to solve the problem. I've also tried wrapping my head around a lot of awk documentation as well to try and figure out how I could do this, to no avail.
So here's the block of code in my script that does all the dirty work:
# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
# Then it gets piped into awk to work some more formatting magic...
# The following is a step-for-step description since I can't put comments inside the awk block:
# BEGIN:
# Set the record separator to a blank line
# Set the input/output field separators to newlines
# FNR == 3:
# The third block of dig's output is the nameservers reported by the registrar
# Also blanks the last field & strips it since it's just a useless dig comment
dig +trace +additional $host | \
awk -v host="$host" '
BEGIN {
RS = "";
FS = "\n"
}
FNR == 3 {
print "Nameservers of",host,"reported by the registrar:";
OFS = "\n";
$NF = ""; sub( /[[:space:]]+$/, "" );
print
}
'
And here's the output if I pass google.com in as the value of $host (other hostnames may produce output of differing line counts):
Nameservers of google.com reported by the registrar:
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
ns2.google.com. 172800 IN A 216.239.34.10
ns1.google.com. 172800 IN A 216.239.32.10
ns3.google.com. 172800 IN A 216.239.36.10
ns4.google.com. 172800 IN A 216.239.38.10
The idea is, using either the existing block of awk, or piping awk's output into a combination of more awk, sort, or whatever else, sort that block of text using a conditional algorithm:
if ( column 4 == 'NS' )
sort by column 5
else // This will ensure that the col 1 sort includes A and AAAA records
sort by column 1
I've pretty much got the same preferences for answers as the previous question:
Most important of all, it must be portable since I've encountered different behaviour between OS X (my home system) and Fedora (what I use at work) when using sed (had to replace it with gsed on OS X) and grep's -m flag (used in another script)
An explanation of how the solution works would be very much appreciated, as a learning opportunity moreso than anything else. I already learned quite a bit from the awk solution already provided in the previous question.
If the solution can be implemented within the same block of awk, that would also be awesome
If not, then something simple and eloquent that I can pipe awk's output through would suffice
Here's a solution based on #shellter's idea. Pipe the output of your nameserver records to this:
awk '$4 == "NS" {print $1, $5, $0} $4 == "A" {print $1, $1, $0}' | sort | cut -f3- -d' '
Explanation:
With awk, we take only the NS and A records, and re-print the same line with prefix: primary search column + secondary search column
sort will sort the lines, thanks to the way we set the first and second column, the order should be as you wanted
With cut we get rid of the prefix that we used for sorting
I know you asked about awk solution, but since you tagged it with bash too, I thought I'd provide such a version. It should also be more portable than awk ;)
# the whole line
declare -a lines
# the key to use for sorting
declare -a keys
# insert into the arrays at the appropriate position
function insert
{
local key="$1"
local line="$2"
local count=${#lines[*]}
local i
# go from the end backwards
for((i=count; i>0; i-=1))
do
# if we have the insertion point, break
[[ "${keys[i-1]}" > "$key" ]] || break
# shift the current item to make room for the new one
lines[i]=${lines[i-1]}
keys[i]=${keys[i-1]}
done
# insert the new item
lines[i]=$line
keys[i]=$key
}
# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
# The third block of dig's output is the nameservers reported by the registrar
# Also blanks the last field & strips it since it's just a useless dig comment
block=0
dig +trace +additional $host |
while read f1 f2 f3 f4 f5
do
# empty line begins new block
if [ -z "$f1" ]
then
# increment block counter
block=$((block+1))
# and read next line
continue
fi
# if we are not in block #3, read next line
[[ $block == 3 ]] || continue
# ;; ends the block
if [[ "$f1" == ";;" ]]
then
echo "Nameservers of $host reported by the registrar:"
# print the lines collected so far
for((i=0; i<${#lines[*]}; i+=1))
do
echo ${lines[i]}
done
# don't bother reading the rest
break
fi
# figure out what key to use for sorting
if [[ "$f4" == "NS" ]]
then
key=$f5
else
key=$f1
fi
# add the line to the arrays
insert "$key" "$f1 $f2 $f3 $f4 $f5"
done

Resources