Ok, here's the problem.
I have a plaintext list of IP addresses that I'm blocking on my servers, growing more and more unwieldy every day (added 3000+ entries today alone).
It's already been sorted for duplicates so that's not a problem. What I'd like to do is write a script to go through it and consolidate the entries a bit better for mass blocking.
For example, take this:
2.132.35.104
2.132.79.240
2.132.99.87
2.132.236.34
2.132.245.30
And turn it into this:
2.132.0.0/16
Any suggestions on how to code that in a bash script?
UPDATE: I've worked out part-way how to do what I'm needing. Converting it to /24 is easy, as follows:
cat /usr/local/blocks/blocks.txt | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> twentyfour.srt
done
sort -u twentyfour.srt > twentyfour.txt
rm -f twentyfour.srt
ori=`cat /usr/local/blocks/blocks.txt | wc -l`
new=`cat twentyfour.txt | wc -l`
echo "$ori"
echo "$new"
That reduced it down from 4,452 entries to 4,148 entries.
Instead of having:
109.86.9.93
109.86.26.77
109.86.55.225
109.86.70.224
109.86.87.199
109.86.89.202
109.86.95.248
109.86.100.19
109.86.110.43
109.86.145.216
109.86.152.86
109.86.155.238
109.86.156.54
109.86.187.91
109.86.228.86
109.86.234.51
109.86.239.61
I now have:
109.86.100.0/24
109.86.110.0/24
109.86.145.0/24
109.86.152.0/24
109.86.155.0/24
109.86.156.0/24
109.86.187.0/24
109.86.228.0/24
109.86.234.0/24
109.86.239.0/24
109.86.26.0/24
109.86.55.0/24
109.86.70.0/24
109.86.87.0/24
109.86.89.0/24
109.86.9.0/24
109.86.95.0/24
All well and good. BUT, there's 17 entries from the 109.86.. area. In a case where the first 2 octets match more than say 5 entries on /24, I'd like to reduce that to /16.
That's where I'm stuck.
UPDATE 2:
For Steve: Here's the block list for today. And here's the result so far. Apparently it's not removing the near-duplicate entries from twentyfour that are in sixteen.
I wish I could tell you this is a simple filter. However, all of the 2.0.0.0/8 network is registered to RIPE NCC. There's just way too many different ranges of blocked IP addresses, its easier to just narrow down the scope of visitors you do want versus what you don't want.
You could also use various tools you can use to block attacks automatically.
Map to identify which is which. https://www.iana.org/numbers
Here's a script I just made for you. Then you can create the major block lists for each of the primary registries. Afrinic, Lacnic, Apnic, Ripe, and Arin.
create_tables_by_registry.sh
Just run this script... Then run the following registry.sh files. (E.g; ripe.sh)
#!/bin/bash
# Author: Steve Kline
# Date: 03-04-2014
# Designed and tested to run on properly on CentOS 6.5
#Grab Updated IANA Address Space Assignments only if Newer Version
wget -N https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.txt
assigned=ipv4-address-space.txt
arrayregistry=( afrinic apnic arin lacnic ripe )
for registry in "${arrayregistry[#]}"
do
#Clean up the ipv4-address-space.txt file and keep useable IPs
grep "$registry" $assigned | sed 's/\/8/\.0\.0\.0\/8/g'| colrm 15 > $registry-tmp1.txt
ip=($(cat $registry-tmp1.txt))
echo "#!/bin/bash" > $registry.sh
for ip in "${ip[#]}"
do
echo $ip | sed -e 's/" "//g' > $registry-tmp2.txt
#INSERT OR MODIFY YOUR COMPATIBLE FIREWALL RULES HERE
#This section creates the country to block.
echo "iptables -A INPUT -s $ip -j DROP" >> $registry.sh
chmod +x $registry.sh
done
rm $registry-tmp1.txt -f
rm $registry-tmp2.txt -f
done
Ok! Well I'm back, a little insane here and a little nutty there... I think I helped figure this out for you. I'm sure you can piece together a modification to better fit your needs.
#MODIFY FOR YOUR LIST OF IP ADDRESSES
BADIPS=block.ip
twentyfour=./twentyfour.ips #temp file for all IPs converted to twentyfour net ids
sixteen=./sixteen.ips #temp file for sixteen bit
twentyfourlst1=./twentyfour1.txt #temp file for 24 bit IDs
twentyfourlst2=./twentyfour2.txt #temp file for 24 bit IDs filtered by 16 bit IDs that match
sixteenlst=./sixteen.txt #temp file for parsed sixteenbit
#MODIFY FOR YOUR OUTPUT OF CIDR ADDRESSES
finalfile=./blockips.list #Final file post-merge
cat $BADIPS | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> $twentyfour
echo "$oc1.$oc2.0.0/16" >> $sixteen
done
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>4){print i,a[i]}}}' $sixteen | sed 's/ [0-9]\| [0-9][0-9]\| [0-9][0-9][0-9]//g' > $sixteenlst
sort -u $twentyfour > twentyfour.txt
# THIS FINDS NEAR DUPLICATES MATCHING FIRST TWO OCTETS
cat $sixteenlst | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
grep "\b$oc1.$oc2\b" twentyfour.txt >> duplicates.txt
done
#THIS REMOVES THE NEAR DUPLICATES FROM THE TWENTYFOUR FILE
fgrep -vw -f duplicates.txt twentyfour.txt > twentyfourfinal.txt
#THIS MERGES BOTH RESULTS
cat twentyfourfinal.txt $sixteenlst > $finalfile
sort -u $finalfile
ori=`cat $BADIPS | wc -l`
new=`cat $finalfile | wc -l`
echo "$ori"
echo "$new"
#LAST MIN CLEANUP
rm -f $twentyfour $twentyfourlst $sixteen $sixteenlst duplicates.txt twentyfourfinal.txt
Going Back to fix: I noted a problem... Originally unsuccessful.
`grep "$oc1.$oc1" twentyfour.txt > duplicates.txt
For Example: The old script had bad results with this test IP range... the updated version now above... Does exactly as its intended. match the octet exactly.. and not a similar.
192.168.1.1
192.168.2.50
192.168.5.23
192.168.14.10
192.168.10.5
192.168.24.25
192.165.20.10
10.192.168.30
5.76.10.20
5.76.20.30
5.76.250.10
5.76.34.10
5.76.50.30
95.76.30.1 - Old script matched this to 5.76
20.20.5.5
20.20.10.10
20.20.16.50
20.20.205.20
20.20.60.20
205.20.16.20 - not a problem
20.205.150.150 - Old script matched this to 20.20
220.20.16.0 - Also failed without adding -w parameter to the last grep to only match exact strings.
Related
I'm still new to the shell and need some help.
I have a file stapel_old.
Also I have in the same directory files like english_old_sync, math_old_sync and vocabulary_old_sync.
The content of stapel_old is:
english
math
vocabulary
The content of e.g. english is:
basic_grammar.md
spelling.md
orthography.md
I want to manipulate all files which are given in stapel_old like in this example:
take the first line of stapel_old 'english', (after that math, and so on)
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
search in english_old_sync line by line for the pattern '.md'
And append to each line after .md :::#a1
The result should be e.g. of english_old_sync:
basic_grammar.md:::#a1
spelling.md:::#a1
orthography.md:::#a1
of math_old_sync:
geometry.md:::#a1
fractions.md:::#a1
and so on. stapel_old should stay unchanged.
How can I realize that?
I tried with sed -n, while loop (while read -r line), and I'm feeling it's somehow the right way - but I still get errors and not the expected result after 4 hours inspecting and reading.
Thank you!
EDIT
Here is the working code (The files are stored in folder 'olddata'):
clear
echo -e "$(tput setaf 1)$(tput setab 7)Learning directories:$(tput sgr 0)\n"
# put here directories which should not become flashcards, command: | grep -v 'name_of_directory_which_not_to_learn1' | grep -v 'directory2'
ls ../ | grep -v 00_gliederungsverweise | grep -v 0_weiter | grep -v bibliothek | grep -v notizen | grep -v Obsidian | grep -v z_nicht_uni | tee olddata/stapel_old
# count folders
echo -ne "\nHow much different folders: " && wc -l olddata/stapel_old | cut -d' ' -f1 | tee -a olddata/stapel_old
echo -e "Are this learning directories correct? [j ODER y]--> yes; [Other]-->no\n"
read lernvz_korrekt
if [ "$lernvz_korrekt" = j ] || [ "$lernvz_korrekt" = y ];
then
read -n 1 -s -r -p "Learning directories correct. Press any key to continue..."
else
read -n 1 -s -r -p "Learning directories not correct, please change in line 4. Press any key to continue..."
exit
fi
echo -e "\n_____________________________\n$(tput setaf 6)$(tput setab 5)Found cards:$(tput sgr 0)$(tput setaf 6)\n"
#GET && WRITE FOLDER NAMES into olddata/stapel_old
anzahl_zeilen=$(cat olddata/stapel_old |& tail -1)
#GET NAMES of .md files of every stapel and write All to 'stapelname'_old_sync
i=0
name="var_$i"
for (( num=1; num <= $anzahl_zeilen; num++ ))
do
i="$((i + 1))"
name="var_$i"
name=$(cat olddata/stapel_old | sed -n "$num"p)
find ../$name/ -name '*.md' | grep -v trash | grep -v Obsidian | rev | cut -d'/' -f1 | rev | tee olddata/$name"_old_sync"
done
(tput sgr 0)
I tried to add:
input="olddata/stapel_old"
while IFS= read -r line
do
sed -n "$line"p olddata/stapel_old
done < "$input"
The code to change only the english_old_sync is:
lines=$(wc -l olddata/english_old_sync | cut -d' ' -f1)
for ((num=1; num <= $lines; num++))
do
content=$(sed -n "$num"p olddata/english_old_sync)
sed -i "s/"$content"/""$content":::#a1/g"" olddata/english_old_sync
done
So now, this need to be a inner for-loop, of a outer for-loop which holds the variable for english, right?
stapel_old should stay unchanged.
You could try a while + read loop and embed sed inside the loop.
#!/usr/bin/env bash
while IFS= read -r files; do
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
cp copies the file with a new name, if the goal is renaming the original file name from the content of the file staple_old then change cp to mv
The -n and -i flag from sed was ommited , include it, if needed.
The script also assumes that there are no empty/blank lines in the content of staple_old file. If in case there are/is add an addition test after the line where the do is.
[[ -n $files ]] || continue
It also assumes that the content of staple_old are existing files. Just in case add an additional test.
[[ -e $files ]] || { printf >&2 '%s no such file or directory.\n' "$files"; continue; }
Or an if statement.
if [[ ! -e $files ]]; then
printf >&2 '%s no such file or directory\n' "$files"
continue
fi
See also help test
See also help continue
Combining them all together should be something like:
#!/usr/bin/env bash
while IFS= read -r files; do
[[ -n $files ]] || continue
[[ -e $files ]] || {
printf >&2 '%s no such file or directory.\n' "$files"
continue
}
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
Remove the echo's If you're satisfied with the output so the script could copy/rename and edit the files.
I wrote a script to change specific lines in one text files (fasta format) and I want to parallelize because there is a lot of lines (~800k).
>CTC14_37541|M00842:336:000000000-C7WWK:1:2101:20913:9309:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=75;p=71|CO
And I want to transform it to:
>Sample-CTC14_Read37541
I have two problems.
I tried to run my script with and without function:
Without function, it works: all the lines I want to change are modified.
When I use a function, only one line is modified. Something is wrong in my function header()?
Second problem is the parallelization. I tried something with "&" but I'm not sure that is the best solution. Any idea?
My code without function and parallel:
#!/bin/bash
TMP_PATH="/path/where/is/my/fasta"
cd $TMP_PATH
for fasta in *.fasta
do
echo $fasta
lines=$(grep ">" $fasta)
for line in $lines
do
if [[ $line = *">"* ]]; then
read_nb="_Read"$(echo $line | cut -d'|' -f1 | cut -d'_' -f2)
sample=$(echo $line | cut -d'_' -f1 | cut -d'>' -f2)
newheader=$(echo ">Sample-$sample$read_nb")
sed -i -e "s/$line/$newheader/g" $fasta
sed -i -e "s/ /\n/g" $fasta
fi
done
done
echo "END"
My code with function and parallel:
#!/bin/bash
TMP_PATH="/path/where/is/my/fasta"
cd $TMP_PATH
n=0
maxjobs=500
header(){
if [[ $line = *">"* ]]; then
read_nb="_Read"$(echo $line | cut -d'|' -f1 | cut -d'_' -f2)
sample=$(echo $line | cut -d'_' -f1 | cut -d'>' -f2)
newheader=$(echo ">Sample-$sample$read_nb")
sed -i -e "s/$line/$newheader/g" $fasta
sed -i -e "s/ /\n/g" $fasta
fi
}
for fasta in *.fasta
do
lines=$(grep ">" $fasta)
for line in $lines
do
header $line &
#limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait
echo $n wait
fi
done
done
I have a fasta file as input that contains several headers and sequences. And I want to transform headers in order to use my fasta file in a specific workflow. I need to go from that :
>CTC14_18758|M00842:336:000000000-C7WWK:1:1108:17474:5670:0:66|o:98|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=66;p=62|CO:0|
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGCGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCTTGGGGAGCAAACAGG
>CTC14_20535|M00842:336:000000000-C7WWK:1:1108:28568:20175:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=77;p=64|CO:0|
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACCCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG
>CTC14_24700|M00842:336:000000000-C7WWK:1:1110:7911:9824:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=77;p=71|CO:0|
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG
To this:
>Sample-CTC14_Read18758
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGCGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCTTGGGGAGCAAACAGG
>Sample-CTC14_Read20535
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACCCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG
>Sample-CTC14_Read24700
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG
And I want to make this parallel because I have a lot of lines to change (~700-800k) and it takes very long time if I run the script line by line.
With my script without function, job is works but it's too long.
With my script with function and parallel, job doesn't work fine because only one header is changed in my fasta instead of all headers and I don't understand why. I tried different ways to write and call my function but the result is always the same.
Moreover, I tried with the gnu-parallel but it's the same way. I think my function or my call have a problem but I don't understand where.
I think use awk as you suggested is a good idea but I'm not comfortable with it. Can you help me please?
Proper format of my fasta file is:
>CTC14_1600|M00842:336:000000000-C7WWK:1:1101:26089:18004:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=77;p=71|CO:0| TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGACGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG$
>CTC14_11169|M00842:336:000000000-C7WWK:1:1105:11636:11876:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=76;p=65|CO:0| TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAACTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG$
>CTC14_16471|M00842:336:000000000-C7WWK:1:1107:6941:10486:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=77;p=70|CO:0| TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGGCGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAGCTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTGAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG$
Assuming that >CTC14_18758|M00842:336:000000000- is on a separate line, this code will convert the input to the output.
#!/bin/sed -f
#skip blank lines
/^[[:space:]]*$/n
#change >CTC14_18758|M00842:336:000000000-
# to >Sample-CTC14_Read18758
s/^>/>Sample-/
s/_/_Read/
/^>/s/|.*$//
# remove 2ndary header
# C7WWK:1:1108:17474:5670:0:66|o:98|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=66;p=62|CO:0| TGGGGAATATTGGAC...
# to
# TGGGGAATATTGGAC...
s/^[^>].*| //
Save that as a file/script.
Then mark it as executable with
chmod +x mySed
and run it like
./mySed -i fileIn
Or if you get an warning/error message about -i, then run
./mySed fileIn > fileOut && mv fileOut fileIn
Now you can eliminate your function header(), and the 2ndary loop in your code.
Just
for file in *.fasta ; do
echo "processing file=$file"
/path/to/mySed -i "$file"
# run other processing if needed
# don't think you need wait any more
#uncomment? wait
done
-------------- version 2 sed ---------------
#!/bin/sed -f
#skip blank lines
/^[[:space:]]*$/n
#>CTC14_18758|M00842:336:000000000-C7WWK:1:1108:17474:5670:0:66|o:98|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=66;p=62|CO:0| TGGGGA...
#change >CTC14_18758|M00842:336:000000000-
# to >Sample-CTC14_Read18758
s/^>/>Sample-/
s/_/_Read/
s/|.*| / /
# /^>/s/-.*| / /
# s/-.*| / /
works with data like
>CTC14_16471|M00842:336:000000000-C7WWK:1:1107:6941:10486:0:66|o:97|mo:0.000000|MR:n=0;r1=0;r2=0|Q30:p=77;p=70|CO:0| TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGGCGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCTGGTTAATAAGTTAAAAGTGAAATCCCGAGGCTTAACCTTGGAATTGCTTTTAAAGCTATTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTGATGTAGAGGTAAAATTCGTGAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAAGCGCGAAGGCGTGGGGAGCAAACAGG
IHTH
I'm writing a script to handle multiple uses for searching lists. Long story short, I am querying a DB and have a basic list such as this:
sdc 10:0 KQJWBE11
sdd 10:1 KSDJFBQK
sde 10:2 13KN13DD
sdf 10:3 123DJN1O
sdg 10:4 213JBDKJ
sdh 10:5 N2QQWMNE
sdi 10:6 QKEWJDQJ
sdj 10:7 QKWJEDWE
sdk 20:0 QEDQWEDQ
sdl 20:1 1234E13L
sdm 20:2 KQNE2OUN
sdn 20:3 QN2NK3JN
sdo 20:4 23J23EN2
sdp 20:5 2WBNEKNW
sdq 20:6 QWEDKJNW
sdr 20:7 QWEDQEDD
These exist in the variable "${TABLE_FORMAT}" and are formatted as just as above into a table.
#... other logic above this
# Query via primary and secondary location. Example: DISK_ARG="10:1"
elif [[ ${DISK_ARG} =~ ([[:digit:]]:[[:digit:]])+$ ]]; then
DISK_ARG_PRIMARY=$(echo "${DISK_ARG}" | cut -d: -f1)
DISK_ARG_SECONDARY=$(echo "${DISK_ARG}" | cut -d: -f2)
echo -e "${HEADER}"
echo -e "${TABLE_FORMAT}" | grep -Ei "($DISK_ARG_PRIMARY):($DISK_ARG_SECONDARY)"
fi
# Query secondary location. Example: DISK_ARG="5"
elif [[ ${DISK_ARG} =~ ([[:digit:]])+$ ]]; then
DISK_ARG_F2="$(echo "${F2}" | grep -Ei "([[:digit:]]):(${DISK_ARG})")"
DISK_ARG_PRIMARY=$(echo "${DISK_ARG_F2}" | cut -d: -f1)
DISK_ARG_SECONDARY=$(echo "${DISK_ARG_F2}" | cut -d: -f2)
echo -e "${TABLE_FORMAT}" | grep -Ei "($DISK_ARG_PRIMARY):($DISK_ARG_SECONDARY)"
fi
else :
fi
The offending line that doesn't work is in the second elif:
echo -e "${TABLE_FORMAT}" | grep -Ei "($DISK_ARG_PRIMARY):($DISK_ARG_SECONDARY)"
grep: Unmatched ( or \(
The current variables at this point are:
DISK_ARG_PRIMARY="10 20"
DISK_ARG_SECONDARY="5 5"
I want the following rendered as output:
sdh 10:5 N2QQWMNE
sdp 20:5 2WBNEKNW
I'm not sure if this could be accomplished with building some type of array in grep or modifying the IFS somehow. I want the script to handle many inputs and look for matches off of the relevant fields.
In your case
DISK_ARG_PRIMARY=$(echo "${DISK_ARG}" | cut -d: -f1)
DISK_ARG_SECONDARY=$(echo "${DISK_ARG}" | cut -d: -f2)
could be replaced with
DISK_ARG_PRIMARY=${DISK_ARG%:*} # remove :* suffix
DISK_ARG_SECONDARY=${DISK_ARG#*:} # remove *: prefix
to avoid subshell pipe and cut
also add -w option to grep to avoid to match substrings for example 5 will not match 51.
DISK_ARG_PRIMARY="10 20"
DISK_ARG_SECONDARY="5 5"
To have
grep -Ew '(10|20):(5|5)'
grep -Ew "($(set -- $DISK_ARG_PRIMARY;IFS=\|;echo "$*")):($(set -- $DISK_ARG_SECONDARY;IFS=\|;echo "$*"))"
#!/bin/bash
echo -n "Enter the domain name > "
read name
dig -t ns "$name" | cut -d ";" -f3 | cut -d ":" -f2| grep ns | cut -f6 > registerfile.txt;
cat registerfile.txt | while read line; do dig axfr "#$line" "$name"; done | cut -d"." -f-4 > nmap.txt
It is done till this section. Below, it could be not matched the line and name parameters. How should be changed?
cat nmap.txt | while read line; do if [ "$line" == "$name" ]; then host "$line"; fi; done > ping.txt
cat ping.txt | cut -d " " -f4 | while read line; do if [[ "$line" =~ ^[0-9]+$ ]]; then nmap -sS "$line";fi ;done
It's not clear where exactly things are going wrong, but here is a refactoring which might hopefully at least nudge you in the right direction.
#!/bin/bash
read -p "Enter the domain name > " name
dig +short -t ns "$name" |
tee registerfile.txt |
while read line; do
dig axfr "#$line" "$name"
done |
cut -d"." -f-4 |
tee nmap.txt |
while read line; do
if [ "$line" = "$name" ]; then
host "$line"
fi
done > ping.txt
cut -d " " -f4 ping.txt |
grep -E '^[0-9]+$' |
xargs -r -n 1 nmap -sS
Your remark in comments that if [ "$line" = "$name" ]; then host "$line"; fi isn't working suggests that the logic there is somehow wrong. It currently checks whether each line is identical to the original domain name, and then looks it up over and over again in those cases, which seems like a curious thing to do; but given only the code and the "does not work", it's hard to say what it's really supposed to accomplish. If you actually want something else, you need to be more specific about what you require. Perhaps you are actually looking for something like
... tee nmap.txt |
# Extract the lines which contain $name at the end
grep "\.$name\$" |
xargs -n 1 dig +short |
tee ping.txt |
grep -E '^[0-9]+$' ...
The use of multiple statically-named files is an antipattern; obviously, if these files serve no external purpose, just take out the tee commands and run the entire pipeline with no in-between output files. If you do need these files, having them overwritten on each run seems problematic -- maybe add a unique date stamp suffix to the file names?
I have below for loop in shell script
#!/bin/bash
#Get the year
curr_year=$(date +"%Y")
FILE_NAME=/test/codebase/wt.properties
key=wt.cache.master.slaveHosts=
prop_value=""
getproperty(){
prop_key=$1
prop_value=`cat ${FILE_NAME} | grep ${prop_key} | cut -d'=' -f2`
}
#echo ${prop_value}
getproperty ${key}
#echo "Key = ${key}; Value="${prop_value}
arr=( $prop_value )
for i in "${arr[#]}"; do
echo $i | head -n1 | cut -d "." -f1
done
The output I am getting is as below.
test1
test2
test3
I want to process the test2 from above results to below script in place of 'ABCD'
grep test12345 /home/ptc/storage/**'ABCD'**/apache/$curr_year/logs/access.log* | grep GET > /tmp/test.access.txt
I tried all the options but could not able to succeed as I am new to shell scripting.
Ignoring the many bugs elsewhere and focusing on the one piece of code you say you want to change:
for i in "${arr[#]}"; do
val=$(echo "$i" | head -n1 | cut -d "." -f1)
grep test12345 /dev/null "/home/ptc/storage/$val/apache/$curr_year/logs/access.log"* \
| grep GET
done > /tmp/test.access.txt
Notes:
Always quote your expansions. "$i", "/path/with/$val/"*, etc. (The * should not be quoted on the assumption that you want it to be expanded).
for i in $prop_value would have the exact same (buggy) behavior; using arr buys you nothing. If you want using arr to increase correctness, populate it correctly: read -r -a arr <<<"$prop_value"
The redirection is moved outside the loop -- that way the second iteration through the loop doesn't overwrite the file written by the first one.
The extra /dev/null passed to grep ensures that its behavior is consistent regardless of the number of matches; otherwise, it would display filenames only if more than one matching log file existed, and not otherwise.