How to find files in UNIX which have a multiple-line pattern? - bash

I'm trying to search all files for a pattern that spans multiple lines, and then return a list of file names that match the pattern.
I'm using this line:
find . -name "$file_to_check" 2>/dir1/null | xargs grep "$2" >> $grep_out
This will create a list of files and the line the matched pattern is found on within $grep_out. The problem with this is that the search doesn't span multiple lines. I've read that grep cannot span multiple lines, so I'm looking to replace grep with sed or awk.
The only thing I think that needs to be changed is the grep. I've found that grep can't search for a pattern across multiple lines, so I'm looking to use sed or awk. When I use these commands from the terminal, I get a large printout of the file matching the pattern I've given sed. All I want is the filename, not the context of the pattern. Is there a way to retrieve this - perhaps have sed print out the filename rather than the context? Or, have sed return true/false when it finds a match, and then I can save the current filename that was used to do the search.

Most text processing tools are line-oriented by default. If we choose to read records as paragraphs, using blank lines as record separators:
awk -v RS= -v pattern="$2" '$0 ~ pattern {print FILENAME; exit}' file
or
find . -options ... -print0 | xargs -0 awk -v RS= -v pattern="$2" '$0 ~ pattern {print FILENAME; exit}'
I'm assuming your pattern does not contain consecutive newlines (i.e. blank lines)
To check if a file contains "word1[anything]word2[anything]word3"
brute force: read the entire file and then to a regex comparison: with bash
contents=$(< "$file")
if [[ $contents =~ "$word1".*"$word2".*"$word3" ]]; then
echo "match"
else
echo "no match"
fi
2. line-by-line with awk, use a state machine
awk -v w1="$word1" -v w2="$word2" -v w3="$word3" '
$0 ~ w1 {have_w1 = 1}
have_w1 && $0 ~ w2 {have_w2 = 1}
have_w2 && $0 ~ w3 {have_w3 = 1; exit}
END {exit (! have_w3)}
' filename
Ah, strike #2: that would match the line "word3word2word1" -- does not enforce order of the words

I'm trying to search all files for a pattern that spans multiple lines, and then return a list of file names that match the pattern.
pattern=$( echo "whatever your search pattern is" | tr '\n' ' ' )
for FILE in *
do
tr '\n' ' ' <"$FILE" | if grep "$pattern" then; echo $FILE; fi
done
Just replace the newlines for spaces both in your pattern and your grep-input
With 'find' , you could do it like this:
#!/bin/bash
find . -name "$file_to_check" 2>/dir1/null | while read FILE
do
tr '\n' ' ' <"$FILE" | if grep -q "word1.*word2.*word3" ; then echo "$FILE" ; fi
done >grep_out
As for the search pattern: ".*" means "any amount of any character"
Remember that a searchpattern in grep always wants to have certain characters escaped like "." becomes "\." and "^" becomes "\^"

Related

how to change words with the same words but with number at the back bash

I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2

Optimal way to recursively find files that match one or more patterns

I have to optimize a shell script, but after one week, i didn't succeed to optimize it enough.
I have to search recursively for .c .h and .cpp file in a directory, and check if word like this exist:
"float short unsigned continue for signed void default goto sizeof volatile do if static while"
words=$(echo $# | sed 's/ /\\|/g')
files=$(find $dir -name '*.cpp' -o -name '*.c' -o -name '*.h' )
for file in $files; do
(
test=$(grep -woh "$words" "$file" | sort -u | awk '{print}' ORS=' ')
if [ "$test" != "" ] ; then
echo "$(realpath $file) contains : $test"
fi
)&
done
wait
I have tried with xargs and -exec, but no result, i have to keep this result format:
/usr/include/c++/6/bits/stl_set.h contains : default for if void
Maybe you can help me (to optimize it)..
EDIT: I have to keep one occurence of each word
YES: while, for, volatile...
NOPE: while, for, for, volatile...
If you are interested in finding all files that have at least one match of any of your patterns, you can use globstar:
shopt -s globstar
oldIFS=$IFS; IFS='|'; patterns="$*"; IFS=$oldIFS # make a | delimited string from arguments
grep -lwE "$patterns" **/*.c **/*.h **/*.cpp # list files with matching patterns
globstar
If set, the pattern ‘**’ used in a filename expansion context
will match all files and zero or more directories and subdirectories.
If the pattern is followed by a ‘/’, only directories and
subdirectories match.
Here is an approach that keeps your desired format while eliminating the use of find and bash looping:
words='float|short|unsigned|continue|for|signed|void|default|goto|sizeof|volatile|do|if|static|while'
grep -rwoE --include '*.[ch]' --include '*.cpp' "$words" path | awk -F: '$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]} $1==last && !($2 in a){printf " %s",$2; a[$2]} END{print""}'
How it works
grep -rwoE --include '*.[ch]' --include '*.cpp' "$words" path
This searches recursively through directories starting with path looking only in files whose names match the globs *.[ch] or *.cpp.
awk -F: '$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]} $1==last{printf " %s",$2} END{print""}'
This awk command reformats the output of grep to match your desired output. The script uses a variable last and array a. last keeps track of which file we are on and a contains the list of words seen so far. In more detail:
-F:
This tells awk to use : as the field separator. In this way, the first field is the file name and the second is the word that is found. (limitation: file names that include : are not supported.)
'$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]}
Every time that the file name, $1, does not match the variable last, we start the output for a new file. Then, we update last to contain the name of this new file. We then delete array a and then assign key $2 to a new array a.
$1==last && !($2 in a){printf " %s",$2; a[$2]}
If the current file name is the same as the previous and the current word has not been seen before, we print out the new word found. We also add this word, $2 as a key to array a.
END{print""}
This prints out a final newline (record separator) character.
Multiline version of code
For those who prefer their code spread out over multiple lines:
grep -rwoE \
--include '*.[ch]' \
--include '*.cpp' \
"$words" path |
awk -F: '
$1!=last{
printf "%s%s: contains %s",r,$1,$2
last=$1
r=ORS
delete a
a[$2]
}
$1==last && !($2 in a){
printf " %s",$2; a[$2]
}
END{
print""
}'
You should be able to do most of this with a single grep command:
grep -Rw $dir --include \*.c --include \*.h --include \*.cpp -oe "$words"
This will put it in file:word format, so all that's left is to change it to produce the output that you want.
echo $output | sed 's/:/ /g' | awk '{print $1 " contains : " $2}'
Then you can add | sort -u to get only single occurrences for each word in each file.
#!/bin/bash
#dir=.
words=$(echo $# | sed 's/ /\\|/g')
grep -Rw $dir --include \*.c --include \*.h --include \*.cpp -oe "$words" \
| sort -u \
| sed 's/:/ /g' \
| awk '{print $1 " contains : " $2}'

Using grep and pipes in Unix to find specific words

Let's say I'm using grep, and I use use the -v option on a text file to find all the words that do not contain vowels. If I then wanted to see how many words there are in this file that do not contain vowels, what could I do?
I was thinking of using a pipe and using the rc command by itself. Would that work? Thanks.
Actually, I believe that you want wc, not rc, as in:
grep -civ '[aeiouy]' words.txt
For example, consider the file:
$ cat words.txt
the
words
mph
tsk
hmmm
Then, the following correctly counts the three "words" without vowels:
$ grep -civ '[aeiouy]' words
3
I included y in the vowel list. You can decide whether y or not it should be removed.
Also, I assumed above that your file has one word per line.
The grep options used above are as follows:
-v means exclude matching lines
-i makes the matching case-insensitive
-c tells grep to return a count, not the actual matches
Multiple words per line
$ echo the tsk hmmm | grep -io '\b[bcdfghjklmnpqrstvxz]*\b' | wc -l
2
Because \b matches at word boundaries, the above regex matches only words that lack vowels. -o tells grep to print only the matching portion of the line, not the entire. Because -c counts the number of lines with matches, it is not useful here. wc -l is used instead to count matches.
The following script will count the number of words that don't contain vowels (if there are several words per line):
#!/bin/bash
# File can be a script parameter
FILE="$1"
let count=0
while read line; do
for word in $line; do
grep -qv "[aeiou]" <<< "$word"
if [ $? -eq 0 ]; then
let count++
fi
done
done < FILE
echo "words without vowels: $count"
If there is only one word per line, then the following will be enough:
grep -cv "[aeiou]" < file
If multiple words can be on the same line, and you want to count them too, you can use grep -o with wc -l to count all the matches correctly, like so:
$ echo "word work no-match wonder" | grep -o "wo[a-z]*" | wc -l
3
You could, alternatively, do this all within an Awk:
awk '!/[aeiou]/ {n++} END {print n}' file
For lines with multiple fields:
awk '{for(i=1; i<=NF; i++) if($i !~ /[aeiou]/) n++} END {print n}' file

I want to re-arrange a file in an order in shell

I have a file test.txt like below spaces in between each record
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
Now I want to rearrange it as below into a output.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service/2.2
I've tried:
final_str=''
COMPOSITES=''
# Re-arranging the composites and preparing the composite property file
while read line; do
partition_val="$(echo $line | cut -d ',' -f 2)"
composite_temp1_val="$(echo $line | cut -d ',' -f 1)"
composite_val="$(echo $composite_temp1_val | cut -d '[' -f 1)"
version_temp1_val="$(echo $composite_temp1_val | cut -d '[' -f 2)"
version_val="$(echo $version_temp1_val | cut -d ']' -f 1)"
final_str="$partition_val/$composite_val/$version_val,"
COMPOSITES=$COMPOSITES$final_str
done <./temp/test.txt
We start with the file:
$ cat test.txt
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
We can rearrange that file as follows:
$ awk -F, -v RS=" " 'BEGIN{printf "COMPOSITES=";} {gsub(/[[]/, "/"); gsub(/[]]/, ""); if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1;}' test.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service2/2.2
The same command split over multiple lines is:
awk -F, -v RS=" " '
BEGIN{
printf "COMPOSITES=";
}
{
gsub(/[[]/, "/")
gsub(/[]]/, "")
if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1
}
' test.txt
Here's what I came up with.
awk -F '[],[]' -v RS=" " 'BEGIN{printf("COMPOSITES=")}/../{printf("%s/%s/%s,",$4,$1,$2);}' test.txt
Broken out for easier reading:
awk -F '[],[]' -v RS=" " '
BEGIN {
printf("COMPOSITES=");
}
/../ {
printf("%s/%s/%s,",$4,$1,$2);
}' test.txt
More detailed explanation of the script:
-F '[],[]' - use commas or square brackets as field separators
-v RS=" " - use just the space as a record separator
'BEGIN{printf("COMPOSITES=")} - starts your line
/../ - run the following code on any line that has at least two characters. This avoids the empty field at the end of a line terminating with a space.
printf("%s/%s/%s,",$4,$1,$2); - print the elements using a printf() format string that matches the output you specified.
As concise as this is, the format string does leave a trailing comma at the end of the line. If this is a problem, it can be avoided with a bit of extra code.
You could also do this in sed, if you like writing code in line noise.
sed -e 's:\([^[]*\).\([^]]*\).,\([^,]*\), :\3/\1/\2,:g;s/^/COMPOSITES=/;s/,$//' test.txt
Finally, if you want to avoid external tools like sed and awk, you can do this in bash alone:
a=($(<test.txt))
echo -n "COMPOSITES="
for i in "${a[#]}"; do
i="${i%,}"
t="${i%]*}"
printf "%s/%s/%s," "${i#*,}" "${i%[*}" "${t#*[}"
done
echo ""
This slurps the contents of test.txt into an array, which means your input data must be separated by whitespace, per your example. It then adds the prefix, then steps through the array, using Parameter Expansion to massage the data into the fields you need. The last line (echo "") is helpful for testing; you may want to eliminate it in practice.

parse output in bash

My file looks like
//
[297]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.123648,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,(47:0.0275497,39:0.0275497):0.0125652):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[2271]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[687]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[186]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0665766):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.0339623,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.00916857):0.0793167):0.0576977):0.0378275):0.552713);
So after the first line every line starts with a number in brackets. I would need to grep the number in brackets and output it into a new file (without [) ..how would that be done>
grep -Po '(?<=\[)\d+(?=\])' file > new_file
-P for Perl regexs so it is possible to use:
\d for a digit
positive lookbehind and positive lookahead ((?<=\[) and (?=\]))
-o for only matching
Another possibility if your grep doesn't support the -P option but awk is available could be this:
awk -F '[][]' '{ if ($2 != "") print $2 }' file > new_file
-F tells awk to accept both ] and [ as a field delimiter, $2 then contains the number you want and is printed.
In three steps using simple commands:
grep -v "//" inputfile | cut -d"[" -f2 | cut -d"]" -f1
In sed can you remove everything outside the []:
grep -v "//" inputfile | sed 's/.*\[\(.*\)].*/\1/'

Resources