parse output in bash - bash

My file looks like
//
[297]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.123648,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,(47:0.0275497,39:0.0275497):0.0125652):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[2271]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[687]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[186]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0665766):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.0339623,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.00916857):0.0793167):0.0576977):0.0378275):0.552713);
So after the first line every line starts with a number in brackets. I would need to grep the number in brackets and output it into a new file (without [) ..how would that be done>

grep -Po '(?<=\[)\d+(?=\])' file > new_file
-P for Perl regexs so it is possible to use:
\d for a digit
positive lookbehind and positive lookahead ((?<=\[) and (?=\]))
-o for only matching

Another possibility if your grep doesn't support the -P option but awk is available could be this:
awk -F '[][]' '{ if ($2 != "") print $2 }' file > new_file
-F tells awk to accept both ] and [ as a field delimiter, $2 then contains the number you want and is printed.

In three steps using simple commands:
grep -v "//" inputfile | cut -d"[" -f2 | cut -d"]" -f1
In sed can you remove everything outside the []:
grep -v "//" inputfile | sed 's/.*\[\(.*\)].*/\1/'

Related

Count how many words in file test.txt start with “tol”?

I'm new to Linux shell. I know there are tools to do this thing, such as awk. But I'm wondering if I could do it using grep or wc or other commands? awk seems intimidating to me. Thanks.
I tried grep and wc, like this:
grep tol test.txt | wc -w
But grep will give me the whole line.
If I tried the following:
grep '^tol$*' test.txt | wc -w
It only counts the line begins with mol.
How can I grep the words starting with tol?
Something like that:
grep -o '\<tol[[:alpha:]]*\>' test.txt | wc -w
< - for beginning of the word,
> - the end of the word.
[[:alpha:]] - to avoid match of combinations like tol123 (You said you need only words).
-o - to show only matches, not the entire line.
You can do the same fairly simply with awk, e.g.
awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
Example
$ echo -e "tolerance topaz tolstoy\nbats toluene toledo" |
> awk '{for(i=1;i<=NF;i++) $i~/^tol/ && n++} END {print n}'
4
Another option is to translate all whitespace characters into linefeeds so that each word starts on a new line, then grep can count them itself:
echo -e "tolerance topaz\ttolstoy\nbats toluene toledo" | tr '[:space:]' '\n' | grep -c "^tol"
4
Or, if using a file called words.txt:
tr '[:space:]' '\n' < words.txt | grep -c "^tol"

Print line based on 2nd field value, without using a loop

I try to retrieve a line from a file without using a loop.
myFile.txt
val1;a;b;c
val2;b;d;e
val3;c;r;f
I would like to get the line where the second column is b.
If I do grep "b" myFile.txt then both first and second line will be outputed.
If I do cat myFile.txt | cut -d ';' -f2 | grep "b" then the output will just be b whereas I'd like to get the full line val2;b;d;e.
Is there a way of reaching the desired results without using a loop as below ? My file being huge it wouldn't be nice looping through it again and again.
while read line; do
if [ `echo $line | cut -d ';' -f2` = "b" ]; then
echo $line
fi
done < myFile.txt
Given your input file, The below one-liner should work:
awk -F";" '$2 == "b" {print}' myFile.txt
Explanation:
awk -F";" ##Field Separator as ";"
'$2 == "b" ##Searches for "b" in the second column($2)
{print}' ##prints the searched line
Using:
grep:
grep '^[^;]*;b;' myFile.txt
sed:
sed '/^[^;]*;b;/!d' myFile.txt
Output is the same for both:
val2;b;d;e

How can I print the first matched line using sed or grep?

I have a config file where each line is in a format say UniqueOption = SomeValue:
$ cat somefile
option1sub1 = yes
option1sub2 = 1234
...
option1subn = xxxx
option2 = 2345
option3 = no
...
I want to deal with each value of "option1" in a loop. but, sed or grep give me all of option1 in one time.
How could I achieve that using sed or grep, getting a single option1 line at a time?
pipe the output of grep to a while loop:
grep 'option1' somefile | while read line
do
echo "single option is in var $line"
done
Solution 1st: Following awk may help you on same to get the value of option1 string's last value.
awk -F" = " '/^option1/{print $NF}' Input_file
Solution 2nd: Above will print all values of string option1 in case you need only very first value of string option1 then use following.
awk -F" = " '/^option1/{print $NF;exit}' Input_file
The following will parse out all sub-options for option1 in the file file.conf and save them in a bash array. The options are then easily accessed from that array.
#!/bin/bash
while IFS= read -r data; do
opt1+=( "$data" )
done < <( awk -F ' *= *' '$1 ~ /^option1/ { print $2 }' file.conf )
printf 'Option 1, sub-option 1 is "%s"\n' "${opt1[0]}"
Output:
Option 1, sub-option 1 is "yes"
The awk script will return everything after the = (and any spaces), which allows you to store data that contains multiple words. Only the lines starting with option1 in the configuration file are processed.
This would be adapted to parse the whole configuration file into a single structure, possibly using an associative array in a sufficiently recent version of bash.
Already we can see few awesome answers but as you asked something with grep, you can use one of the following if you want.
For all values
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1'
For first value
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1' | head -1
Here: cut is used to cut the second option uisng dilimiter =; awk is used to trim the spaces in output and head is used to print first occurrence
With sed
sed '/^option1.* = /!d;s///' somefile
With gnu grep 2.20 (support of pcre)
grep -oP '^option1.* = \K.*' somefile
If you want to get only the first match
sed '/^option1.* = /!d;s///;q' somefile
grep -m1 -oP '^option1.* = \K.*' somefile

I want to re-arrange a file in an order in shell

I have a file test.txt like below spaces in between each record
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
Now I want to rearrange it as below into a output.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service/2.2
I've tried:
final_str=''
COMPOSITES=''
# Re-arranging the composites and preparing the composite property file
while read line; do
partition_val="$(echo $line | cut -d ',' -f 2)"
composite_temp1_val="$(echo $line | cut -d ',' -f 1)"
composite_val="$(echo $composite_temp1_val | cut -d '[' -f 1)"
version_temp1_val="$(echo $composite_temp1_val | cut -d '[' -f 2)"
version_val="$(echo $version_temp1_val | cut -d ']' -f 1)"
final_str="$partition_val/$composite_val/$version_val,"
COMPOSITES=$COMPOSITES$final_str
done <./temp/test.txt
We start with the file:
$ cat test.txt
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
We can rearrange that file as follows:
$ awk -F, -v RS=" " 'BEGIN{printf "COMPOSITES=";} {gsub(/[[]/, "/"); gsub(/[]]/, ""); if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1;}' test.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service2/2.2
The same command split over multiple lines is:
awk -F, -v RS=" " '
BEGIN{
printf "COMPOSITES=";
}
{
gsub(/[[]/, "/")
gsub(/[]]/, "")
if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1
}
' test.txt
Here's what I came up with.
awk -F '[],[]' -v RS=" " 'BEGIN{printf("COMPOSITES=")}/../{printf("%s/%s/%s,",$4,$1,$2);}' test.txt
Broken out for easier reading:
awk -F '[],[]' -v RS=" " '
BEGIN {
printf("COMPOSITES=");
}
/../ {
printf("%s/%s/%s,",$4,$1,$2);
}' test.txt
More detailed explanation of the script:
-F '[],[]' - use commas or square brackets as field separators
-v RS=" " - use just the space as a record separator
'BEGIN{printf("COMPOSITES=")} - starts your line
/../ - run the following code on any line that has at least two characters. This avoids the empty field at the end of a line terminating with a space.
printf("%s/%s/%s,",$4,$1,$2); - print the elements using a printf() format string that matches the output you specified.
As concise as this is, the format string does leave a trailing comma at the end of the line. If this is a problem, it can be avoided with a bit of extra code.
You could also do this in sed, if you like writing code in line noise.
sed -e 's:\([^[]*\).\([^]]*\).,\([^,]*\), :\3/\1/\2,:g;s/^/COMPOSITES=/;s/,$//' test.txt
Finally, if you want to avoid external tools like sed and awk, you can do this in bash alone:
a=($(<test.txt))
echo -n "COMPOSITES="
for i in "${a[#]}"; do
i="${i%,}"
t="${i%]*}"
printf "%s/%s/%s," "${i#*,}" "${i%[*}" "${t#*[}"
done
echo ""
This slurps the contents of test.txt into an array, which means your input data must be separated by whitespace, per your example. It then adds the prefix, then steps through the array, using Parameter Expansion to massage the data into the fields you need. The last line (echo "") is helpful for testing; you may want to eliminate it in practice.

Using grep to get the line number of first occurrence of a string in a file

I am using bash script for testing purpose.During my testing I have to find the line number of first occurrence of a string in a file. I have tried "awk" and "grep" both, but non of them return the value.
Awk example
#/!bin/bash
....
VAR=searchstring
...
cpLines=$(awk '/$VAR/{print NR}' $MYDIR/Configuration.xml
this does not expand $VAR. If I use the value of VAR it works, but I want to use VAR
Grep example
#/!bin/bash
...
VAR=searchstring
...
cpLines=grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d:
this gives error line 20: -n: command not found
grep -n -m 1 SEARCH_TERM FILE_PATH |sed 's/\([0-9]*\).*/\1/'
grep switches
-n = include line number
-m 1 = match one
sed options (stream editor):
's/X/Y/' - replace X with Y
\([0-9]*\) - regular expression to match digits zero or multiple times occurred, escaped parentheses, the string matched with regex in parentheses will be the \1 argument in the Y (replacement string)
\([0-9]*\).* - .* will match any character occurring zero or multiple times.
You need $() for variable substitution in grep
cpLines=$(grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d: )
Try something like:
awk -v search="$var" '$0~search{print NR; exit}' inputFile
In awk, / / will interpret awk variable literally. You need to use match (~) operator. What we are doing here is looking for the variable against your input line. If it matches, we print the line number stored in NR and exit.
-v allows you to create an awk variable (search) in above example. You then assign it your bash variable ($var).
grep -n -m 1 SEARCH_TERM FILE_PATH | grep -Po '^[0-9]+'
explanation:
-Po = -P -o
-P use perl regex
-o only print matched string (not the whole line)
Try pipping;
grep -P 'SEARCH TERM' fileName.txt | wc -l

Resources