Speeding up echo in ksh - performance

I've got the below code working, in KSH but it takes the jobs a while to run generating .tmp1 it's slow in the echo $LINE | cut -f 2,4 -d " " >> [file] command, but I don't know why.
I'm guessing it's because it's due to the echo but I don't know; and I don't know how to re-write it to speed it up.
echo "Generating on zTempDay$count.tmp"
while read LINE
do
#Use Cut to trim down to right colums
#cut -b 11-26 $LINE
#mac= cut -b 39-52 $LINE
#vlan= cut -b 62 $LINE
#This line pegs out the CPU - want to know why
echo $LINE | cut -f 2,4 -d " " >> zTempDay$count.tmp1
update_spinner
done < zTempDay$count.tmp
#Remove 'Incomplete' Enteries
#numOfIncomplete=grep "Incomplete" zTempDay$count.tmp1 | wc -l
sed -e "/Incomplete/d" zTempDay$count.tmp1 > zTempDay$count.tmp2
#Use sort to sort by MAC
#Use uniq to remove duplicates
sort +1 -2 zTempDay$count.tmp2 | uniq -f 1 > zTempDay$count.tmp3
#Format Nicely
tr ' ' '\t' < zTempDay$count.tmp3 > zTempDay$count.tmp4
##Want to put a poper progress bar in if program remains slow
#dialog --gauge "Formatting Data: Please wait" 10 70 0
#bc 100*$count/$maxDaysInMonth
Example Data
Internet 10.174.199.193 - 8843.e1a3.1b40 ARPA Vlan####
Internet 10.1.103.206 110 f4ce.46bd.e2e8 ARPA Vlan####
Intended Product (using a tab between IP and MAC)
10.174.199.193 8843.e1a3.1b40
10.1.103.206 f4ce.46bd.e2e8

*awk '/Incomplete/ {next} ;
{print $2 "\t" $4}' zTempDay01.tmp | sort +1 -2 | uniq -f 1 > outfile*
works like a charm thanks to Shellter's help. Thank you! :)

Related

While Read Line - Limit Number of Lines

I am trying to limit the number of lines found during a while read line loop. For example:
File: order.csv
123456,ORDER1,NEW
123456,ORDER-2,NEW
123456,ORDER-3,SHIPPED
I am doing the following.
cat order.csv | while read line;
do
order=$(echo $line | cut -d "," -f 1)
status=$(echo $line | cut -d "," -f 3)
echo "$order:$status"
done
Which outputs:
123456:NEW
123456:NEW
123456:SHIPPED
How can I limit the number of lines. In this case there are three. How can I limit them to only 2 so that only the first two are displayed?
Desired output:
123456:NEW
123456:NEW
There are some ways to meet your requirements:
Method 1
Use head to display first few lines of a file.
head -n 2 order.csv | while read line;
do
order=$(echo $line | cut -d "," -f 1)
status=$(echo $line | cut -d "," -f 3)
echo "$order:$status"
done
Method 2
Use a for loop.
for i in {1..2}
do
read line
order=$(echo $line | cut -d "," -f 1)
status=$(echo $line | cut -d "," -f 3)
echo "$order:$status"
done < order.csv
Method 3
Use awk.
awk -F, 'NR <= 2 { print $1":"$3 }' order.csv

Print Unique Values while using Do-While loop

I have a file named textfile.txt like below:
a 1 xxx
b 1 yyy
c 2 zzz
d 2 aaa
e 3 bbb
f 3 ccc
I am trying to filter the second column with a unique values in that. I had below code:
while read LINE
do
compname=`echo ${LINE} | cut -d' ' -f2 | uniq`
echo -e "${compname}"
done < textfile.txt
It is displaying:
1
1
2
2
3
3
But I am looking for an output like:
1
2
3
I tried another command also like : echo ${LINE} | cut -d' ' -f2 | sort -u | uniq
still not expected output.
Can anyone help me?
There's no need to loop, sort -u already processes the whole input.
cut -d' ' -f2 textfile.txt | sort -u
Maybe you wanted to get the output in the original order, showing the first occurrence only? You can use an associative array to remember which values have been already seen:
#! /bin/bash
declare -A seen
while read x ; do
[[ ${seen[$x]} ]] || printf '%s\n' "$x"
seen[$x]=1
done < <(cut -d' ' -f2 textfile.txt)
For the last occurrence only, change the last line to
done < <(cut -d' ' -f2 textfile.txt | tac) | tac
(i.e. the last occurrence is the first occurrence in the reversed order)
Just pipe the output of the loop to sort -u. There's no need for cut; the read command can handle this type of splitting.
while read -r _ compname _; do
echo "$compname"
done < textfile.txt | sort -u
Try moving the sort -u or sort | uniq after the done statement like this:
while read LINE;
do
compname=$(echo ${LINE} | cut -d' ' -f2)
echo "${compname}"
done < textfile.txt | sort -u

bash calculations with numbers from files

I am trying to do a simple thing:
To get the second number in the the line with the second occurence of the word TER and lower it by one and further process it. The tr -s ' ' is there because the file is not delimited by tabs, but by different amounts of whitespaces.
My script:
first_res_atombumb= grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' '
echo $((first_res_atombumb-1))
but this only returnes:
255
-1
Of course I want to have 254.
adding | tr -d '\n' does not help either, what on earth is going on? I have already asked several people at work noone seems to know.
the lines in question look linke this
TER 128 DA3 4
TER 255 DA3 8
and if I apply grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 | tr -s ' '| cut -f 2 -d ' ' in the command line i get what i expect, just 255
With bash, I'd write
n_ter=0
while read -a words; do
if [[ ${words[0]} == TER ]] && (( ++n_ter == 2 )); then
echo $(( ${words[1]} - 1 ))
fi
done < file
but I'd use awk
awk '$1 == "TER" && ++n == 2 {print $2 - 1}' file
The problem with your code: you forgot to use the $() command substitution syntax
first_res_atombumb= grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' '
# .................^...............................................................................^
echo $((first_res_atombumb-1))
You're setting the variable to an empty string in the environment of the grep command. Then, since you're not capturing the output of that pipeline, "255" is printed to the terminal. Because the variable is unset in your current shell, you get echo $((-1))
All you need is:
first_res_atombumb=$(grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' ')
# .................^^...............................................................................^
But I'd still use awk.
If I understand your problem correctly you can solve it using AWK:
awk 'BEGIN{v=0} $1 == "TER" {v++;if (v==2) {print $2-1 ;exit}}' tata_sbox_cuda.pdb
Explanation:
BEGIN{v=0} declaring and nulling the variable.
$1 == "TER" execute the command in {} only if it's the second occurence of TER.
{v++;if (v==2) {print $2-1 ;exit}}' increase the value of v and check if it's 2, in this case subtract 1 from the second field and display, exit afterwards (will make the processing faster and will skip unnecessary lines).

Inifinite loop in bash

I have written the following command to loop over a set of strings in the second column of my file and then do sorting for each string on column 11, then take the second and eleventh column and count the number of unique occurrences. Very simple but it seems that it enters an infinite loop and I can't see why. I would appreciate your help very much.
for item in $(cat file.txt | cut -f2 -d " "| uniq)
do
sort -k11,11 file.txt | cut -f2,11 -d " " | uniq -c | sort -k2,2 > output
done
There's no infinite loop here, but it is a very silly loop (that takes a long time to run, while not accomplishing the script's stated purpose). Let's look at how one might accomplish that purpose more sanely:
Using a temporary file for counts.txt to avoid needing to rerun the sort, cut and uniq steps on each iteration:
sort -k11,11 file.txt | cut -f2,11 -d " " | uniq -c >counts.txt
while read -r item; do
fgrep -e " ${item}" counts.txt
done < <(cut -f2 -d' ' <file.txt | uniq)
Even better, using bash 4 associative arrays and no temporary file:
# reads counts into an array
declare -A counts=( )
while read -r count item; do
counts[$item]=count
done < <(sort -k11,11 file.txt | cut -f2,11 -d " " | sort | uniq -c)
# reads counts back out
while read -r item; do
echo "$item ${counts[$item]}"
done < <(cat file.txt | cut -f2 -d " "| sort | uniq)
...that said, that's only if you want to use sort for ordering on pulling data back out. If you don't need to do that, the latter part could be replaced as such:
# read counts back out
for item in "${!counts[#]}"; do
echo "$item ${counts[$item]}"
done

Bash Script to batch-convert IP Addresses to CIDR?

Ok, here's the problem.
I have a plaintext list of IP addresses that I'm blocking on my servers, growing more and more unwieldy every day (added 3000+ entries today alone).
It's already been sorted for duplicates so that's not a problem. What I'd like to do is write a script to go through it and consolidate the entries a bit better for mass blocking.
For example, take this:
2.132.35.104
2.132.79.240
2.132.99.87
2.132.236.34
2.132.245.30
And turn it into this:
2.132.0.0/16
Any suggestions on how to code that in a bash script?
UPDATE: I've worked out part-way how to do what I'm needing. Converting it to /24 is easy, as follows:
cat /usr/local/blocks/blocks.txt | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> twentyfour.srt
done
sort -u twentyfour.srt > twentyfour.txt
rm -f twentyfour.srt
ori=`cat /usr/local/blocks/blocks.txt | wc -l`
new=`cat twentyfour.txt | wc -l`
echo "$ori"
echo "$new"
That reduced it down from 4,452 entries to 4,148 entries.
Instead of having:
109.86.9.93
109.86.26.77
109.86.55.225
109.86.70.224
109.86.87.199
109.86.89.202
109.86.95.248
109.86.100.19
109.86.110.43
109.86.145.216
109.86.152.86
109.86.155.238
109.86.156.54
109.86.187.91
109.86.228.86
109.86.234.51
109.86.239.61
I now have:
109.86.100.0/24
109.86.110.0/24
109.86.145.0/24
109.86.152.0/24
109.86.155.0/24
109.86.156.0/24
109.86.187.0/24
109.86.228.0/24
109.86.234.0/24
109.86.239.0/24
109.86.26.0/24
109.86.55.0/24
109.86.70.0/24
109.86.87.0/24
109.86.89.0/24
109.86.9.0/24
109.86.95.0/24
All well and good. BUT, there's 17 entries from the 109.86.. area. In a case where the first 2 octets match more than say 5 entries on /24, I'd like to reduce that to /16.
That's where I'm stuck.
UPDATE 2:
For Steve: Here's the block list for today. And here's the result so far. Apparently it's not removing the near-duplicate entries from twentyfour that are in sixteen.
I wish I could tell you this is a simple filter. However, all of the 2.0.0.0/8 network is registered to RIPE NCC. There's just way too many different ranges of blocked IP addresses, its easier to just narrow down the scope of visitors you do want versus what you don't want.
You could also use various tools you can use to block attacks automatically.
Map to identify which is which. https://www.iana.org/numbers
Here's a script I just made for you. Then you can create the major block lists for each of the primary registries. Afrinic, Lacnic, Apnic, Ripe, and Arin.
create_tables_by_registry.sh
Just run this script... Then run the following registry.sh files. (E.g; ripe.sh)
#!/bin/bash
# Author: Steve Kline
# Date: 03-04-2014
# Designed and tested to run on properly on CentOS 6.5
#Grab Updated IANA Address Space Assignments only if Newer Version
wget -N https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.txt
assigned=ipv4-address-space.txt
arrayregistry=( afrinic apnic arin lacnic ripe )
for registry in "${arrayregistry[#]}"
do
#Clean up the ipv4-address-space.txt file and keep useable IPs
grep "$registry" $assigned | sed 's/\/8/\.0\.0\.0\/8/g'| colrm 15 > $registry-tmp1.txt
ip=($(cat $registry-tmp1.txt))
echo "#!/bin/bash" > $registry.sh
for ip in "${ip[#]}"
do
echo $ip | sed -e 's/" "//g' > $registry-tmp2.txt
#INSERT OR MODIFY YOUR COMPATIBLE FIREWALL RULES HERE
#This section creates the country to block.
echo "iptables -A INPUT -s $ip -j DROP" >> $registry.sh
chmod +x $registry.sh
done
rm $registry-tmp1.txt -f
rm $registry-tmp2.txt -f
done
Ok! Well I'm back, a little insane here and a little nutty there... I think I helped figure this out for you. I'm sure you can piece together a modification to better fit your needs.
#MODIFY FOR YOUR LIST OF IP ADDRESSES
BADIPS=block.ip
twentyfour=./twentyfour.ips #temp file for all IPs converted to twentyfour net ids
sixteen=./sixteen.ips #temp file for sixteen bit
twentyfourlst1=./twentyfour1.txt #temp file for 24 bit IDs
twentyfourlst2=./twentyfour2.txt #temp file for 24 bit IDs filtered by 16 bit IDs that match
sixteenlst=./sixteen.txt #temp file for parsed sixteenbit
#MODIFY FOR YOUR OUTPUT OF CIDR ADDRESSES
finalfile=./blockips.list #Final file post-merge
cat $BADIPS | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> $twentyfour
echo "$oc1.$oc2.0.0/16" >> $sixteen
done
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>4){print i,a[i]}}}' $sixteen | sed 's/ [0-9]\| [0-9][0-9]\| [0-9][0-9][0-9]//g' > $sixteenlst
sort -u $twentyfour > twentyfour.txt
# THIS FINDS NEAR DUPLICATES MATCHING FIRST TWO OCTETS
cat $sixteenlst | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
grep "\b$oc1.$oc2\b" twentyfour.txt >> duplicates.txt
done
#THIS REMOVES THE NEAR DUPLICATES FROM THE TWENTYFOUR FILE
fgrep -vw -f duplicates.txt twentyfour.txt > twentyfourfinal.txt
#THIS MERGES BOTH RESULTS
cat twentyfourfinal.txt $sixteenlst > $finalfile
sort -u $finalfile
ori=`cat $BADIPS | wc -l`
new=`cat $finalfile | wc -l`
echo "$ori"
echo "$new"
#LAST MIN CLEANUP
rm -f $twentyfour $twentyfourlst $sixteen $sixteenlst duplicates.txt twentyfourfinal.txt
Going Back to fix: I noted a problem... Originally unsuccessful.
`grep "$oc1.$oc1" twentyfour.txt > duplicates.txt
For Example: The old script had bad results with this test IP range... the updated version now above... Does exactly as its intended. match the octet exactly.. and not a similar.
192.168.1.1
192.168.2.50
192.168.5.23
192.168.14.10
192.168.10.5
192.168.24.25
192.165.20.10
10.192.168.30
5.76.10.20
5.76.20.30
5.76.250.10
5.76.34.10
5.76.50.30
95.76.30.1 - Old script matched this to 5.76
20.20.5.5
20.20.10.10
20.20.16.50
20.20.205.20
20.20.60.20
205.20.16.20 - not a problem
20.205.150.150 - Old script matched this to 20.20
220.20.16.0 - Also failed without adding -w parameter to the last grep to only match exact strings.

Resources