Using grep to pull a series of random numbers from a known line - bash

I have a simple scalar file producing strings like...
bpred_2lev.ras_rate.PP 0.9413 # RAS prediction rate (i.e., RAS hits/used RAS)
Once I use grep to find this line in the output.txt, is there a way I can directly grab the "0.9413" portion? I am attempting to make a cvs file and just need whatever value is generated.
Thanks in advance.

There are several ways to combine finding and extracting into a single command:
awk (POSIX-compliant)
awk '$1 == "bpred_2lev.ras_rate.PP" { print $2 }' file
sed (GNU sed or BSD/OSX sed)
sed -En 's/^bpred_2lev\.ras_rate\.PP +([^ ]+).*$/\1/p' file
GNU grep
grep -Po '^bpred_2lev\.ras_rate\.PP +\K[^ ]+' file

You can use awk like this:
grep <your_search_criteria> output.txt | awk '{ print $2 }'

Related

Extract field from xml file

xml file:
<head>
<head2>
<dict type="abc" file="/path/to/file1"></dict>
<dict type="xyz" file="/path/to/file2"></dict>
</head2>
</head>
I need to extract the list of files from this. So the output would be
/path/to/file1
/path/to/file2
So far, I've managed to the following.
grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'
quick and dirty based on your sample, not xml possibilties
# sed a bit secure
sed -e '/<head>/,/<\/head>/!d' -e '/.*[[:blank:]]file="\([^"]*\)".*/!d' -e 's//\1/' YourFile
# sed in brute force
sed -n 's/.*[[:blank:]]file="\([^"]*\)".*/\1/p' -e 's//\1/' YourFile
# awk quick unsecure using your sample
awk -F 'file="|">' '/<head>/{h=1} /\/head>{h=0} h && /[[:blank:]]file/ { print $2 }' YourFile
now, i don't promote this kind of extraction on XML unless your really know how is your source in format and content (extra field, escaped quote, content of string like tag format, ...) are a big cause of failure and unexpected result and no more appropriate tools are available
now to use your own script
#grep "<dict*file=" /path/to/xml.file | awk '{print $3}' | awk -F= '{print $NF}'
awk '! /<dict.*file=/ {next} {$0=$3;FS="\"";$0=$0;print $2;FS=OFS}' YourFile
no need of a grep with awk, use starting pattern filter /<dict.*file/
second awk for using a different separator (FS) could be done inside the same script changing FS but because it only occur at next evaluation (next line by default), you could force a reevaluation of current content with $0=$0 in this case
Use an xmllint solution with -xpath as //head/head2/dict/#file
xmllint --xpath "//head/head2/dict/#file" input-xml | awk 'BEGIN{FS="file="}{printf "%s\n%s\n", gensub(/"/,"","g",$2), gensub(/"/,"","g",$3)}'
/path/to/file1
/path/to/file2
Unfortunately couldn't provide a pure xmllint logic, because thought applying,
xmllint --xpath "string(//head/head2/dict/#file)" input-xml
will return the file attributes from both the nodes, but it was returning only the first instance.
So added coupled my logic with GNU Awk, to extract the required values, doing
xmllint --xpath "//head/head2/dict/#file" input-xml
returns values as
file="/path/to/file1" file="/path/to/file2"
On the above output, setting a string de-limiter as file= and removing the double-quotes using gensub() function solved the requirement.
Also PE [perl everywhere :) ] solution:
perl -MXML::LibXML -E 'say $_->to_literal for XML::LibXML->load_xml(location=>q{file.xml})->findnodes(q{/head/head2/dict/#file})'
it prints
/path/to/file1
/path/to/file2
For the above you need to have installed the XML::LibXML module.
With xmlstarlet it would be:
xmlstarlet sel -t -v "//head/head2/dict/#file" -nl input.xml
This command:
awk -F'[=" ">]' '{print $12}' file
Will produces:
/path/to/file1
/path/to/file2

Need to capture particular output

This is the exact output I got from a program:
#Meaningless output
[TABL]
BSSID
4c:e6:78:e3:4e:58
a0:8b:16:e3:3a:42
ADMAC=a1:3c:24:e5:2e:22
ADMAC=.......
#Meaningless output
I just want to capture the BSSID column along with its mac addresses ONLY and not the ADMAC values or any other values.How can I do that using bash(or grep or sed or awk,anything)?Thanks.
awk to the rescue!
$ awk '/BSSID/{p=1} p&&!NF{exit} p' file
BSSID
4c:e6:78:e3:4e:58
a0:8b:16:e3:3a:42
prints after the pattern match until an empty line.
Or, simpler but gets you the empty line at the end.
$ awk '/BSSID/,/^$/' file
BSSID
4c:e6:78:e3:4e:58
a0:8b:16:e3:3a:42
<- empty line here ...
to filter the last empty line, you can add a condition
$ awk '/BSSID/,/^$/{if(NF) print}'
note that the first alternative is the most flexible and the preferred one.
Try this. It worked on Mac using your example.
cat output.txt | awk '/BSSID/,/ADMAC/'| grep -v ADMAC
Tell grep to show the two lines after the match and stop after 1 match.
grep -m1 -A2 "^BSSID$" output.txt
sed to the rescue!
Since the requirement is to include only the MAC addresses, which must include a colon, period, or dash, the following would be reasonable, given the example input:
sed -n '/^BSSID/,/^ *$/ {/[:.-]/p;}'
If you have awk try:
awk '{/BSSID/,/ADMAC/ print}' output.txt

Extract string between two patterns (inclusive) while conserving the format

I have a file in the following format
cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACK
id3,PPLRTOMcccccccccccJACK
I am trying to identify and print the string between TOM and JACK including these two strings, while maintaining the first column FS=,
Desired output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
So far I have tried gsub:
awk -F"," 'gsub(/.*TOM|JACK.*/,"",$2) && !_[$0]++' test.txt > out.txt
and have the following output
id1 aaaaaaaaaaa
id2 bbbbbbbbbbb
id3 ccccccccccc
As you can see I am getting close but not able to include TOM and JACK patterns in my output. Plus I am also losing the original FS. What am I doing wrong?
Any help will be appreciated.
You are changing a field ($2) which causes awk to reconstruct the record using the value of OFS as the field separator and so in this case changing the commas to spaces.
Never use _ as a variable name - using a name with no meaning is just slightly better than using a name with the wrong meaning, just pick a name that means something which, in this case is seen but idk what you are trying to do when using that in this context.
gsub() and sub() do not support capture groups so you either need to use match()+substr():
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/){$2=substr($2,RSTART,RLENGTH)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or use GNU awk for the 3rd arg to match()
$ gawk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or for gensub():
$ gawk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
The main difference between the match() and gensub() solutions is how they would behave if TOM appeared twice on the line:
$ cat file
id1,PPLLfooTOMbarTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACKfooJACKbar
id3,PPLRfooTOMbarTOMcccccccccccJACKfooJACKbar
$
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMbarTOMcccccccccccJACKfooJACK
$
$ awk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMcccccccccccJACKfooJACK
and just to show one way of stopping at the first instead of the last JACK on the line:
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=gensub(/(JACK).*/,"\\1","",a[0])} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMbarTOMcccccccccccJACK
Use capture groups to save the parts of the line you want to keep. Here's how to do it with sed
sed 's/^\([^,]*,\).*\(TOM.*JACK\).*/\1\2/' <test.txt > out.txt
Do you mean to do the following?
$ cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACKABCD
id2,PPLRTOMbbbbbbbbbbbJACKDFCC
id3,PPLRTOMcccccccccccJACKSDER
$ cat test.txt | sed -e 's/,.*TOM/,TOM/g' | sed -e 's/JACK.*/JACK/g'
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
$
This should work as long as the TOM and JACK do not repeat themselves.
sed 's/\(.*,\).*\(TOM.*JACK\).*/\1\2/' <oldfile >newfile
Output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK

Reading numbers from a text line in bash shell

I'm trying to write a bash shell script, that opens a certain file CATALOG.dat, containing the following lines, made of both characters and numbers:
event_0133_pk.gz
event_0291_pk.gz
event_0298_pk.gz
event_0356_pk.gz
event_0501_pk.gz
What I wanna do is print the numbers (only the numbers) inside a new file NUMBERS.dat, using something like > ./NUMBERS.dat, to get:
0133
0291
0298
0356
0501
My problem is: how do I extract the numbers from the text lines? Is there something to make the script read just the number as a variable, like event_0%d_pk.gz in C/C++?
A grep solution:
grep -oP '[0-9]+' CATALOG.dat >NUMBERS.dat
A sed solution:
sed 's/[^0-9]//g' CATALOG.dat >NUMBERS.dat
And an awk solution:
awk -F"[^0-9]+" '{print $2}' CATALOG.dat >NUMBERS.dat
There are many ways that you can achieve your result. One way would be to use awk:
awk -F_ '{print $2}' CATALOG.dat > NUMBERS.dat
This sets the field separator to an underscore, then prints the second field which contains the numbers.
Awk
awk 'gsub(/[^[:digit:]]/,"")' infile
Bash
while read line; do echo ${line//[!0-9]}; done < infile
tr
tr -cd '[[:digit:]\n]' <infile
You can use grep command to extract the number part.
grep -oP '(?<=_)\d+(?=_)' CATALOG.dat
gives output as
0133
0291
0298
0356
0501
Or
much simply
grep -oP '\d+' CATALOG.dat
You don't need perl mode in grep for this. BREs can do this.
grep -o '[[:digit:]]\+' CATALOG.dat > NUMBERS.dat

awk sed filter values in all lines greater/smaller than

is there a way to construct a filter in awk (or something similar) that for a given file, say:
0.99,0.98,1.1,0.85,0.92
0.76,1.4,0.99,0.99,0.82
1.0,1.45,0.78,0.91,0.95
would replace any record in a line that is greater than 1.0 with 1.0?
Here is something you can do with awk
awk -F, '{for(i=1;i<=NF;i++) if($i>1) {$i="replacement"}}1' OFS=, file
Test:
$ cat file
0.99,0.98,1.1,0.85,0.92
0.76,1.4,0.99,0.99,0.82
1.0,1.45,0.78,0.91,0.95
$ awk -F, '{for(i=1;i<=NF;i++) if($i>1) {$i="replacement"}}1' OFS=, file
0.99,0.98,replacement,0.85,0.92
0.76,replacement,0.99,0.99,0.82
1.0,replacement,0.78,0.91,0.95
Here’s a sed solution:
sed -e 's/[1-9][0-9]*\.[0-9]*/1.0/g' in-file > out-file
The pattern [1-9][0-9]*\.[0-9]* simply matches any sequence that begins with a digit greater than 0, followed by zero or more digits, followed by the decimal point, followed by additional digits. If you want an in-place replacement, you can use the -i option:
sed -i -e 's/[1-9][0-9]*\.[0-9]*/1.0/g' in-file

Resources