This is my sample data
Apple 13
Apple 37
Apple 341
Apple 27B
Apple 99
Banana 00
Banana 988
Banana 507
Banana 11
Banana 11A
I would like to get the output like this
Apple 13
Apple 37
Apple 341
Banana 00
Banana 988
The problem is I can only do grep with switch -A 2 one time only
root#Ubuntu:/tmp# grep -A 2 'e 1' data.txt
Apple 13
Apple 37
Apple 341
root#Ubuntu:/tmp#
Another grep -A 1
root#Ubuntu:/tmp# grep -A 1 'a 0' data.txt
Banana 00
Banana 988
root#Ubuntu:/tmp#
I've been trying to use egrep but I did not get the output that I wanted.
root#Ubuntu:/tmp# egrep 'e 1|a 0' data.txt
Apple 13
Banana 00
root#Ubuntu:/tmp#
I would like to get 2 more line after Apple 13 and 1 more line after Banana 00
Please advise
With GNU sed:
sed -n '/e 1/{N;N;p}; /a 0/{N;p}' file
Output:
Apple 13
Apple 37
Apple 341
Banana 00
Banana 988
See: man sed
I'll recommend awk or sed to solve these kind of problems
using awk
$ awk ' /e 1/{i=1; t=3} /a 0/{i=1; t=2} i++<=t' file
Apple 13
Apple 37
Apple 341
Banana 00
Banana 988
i : iterator
t : threshold
/e 1/{i=1; t=3} : If string contains e 1 then set i=1 and t=3 .
t=3 because in total 3 lines needs to be printed (including matched line)
/a 0/{i=1; t=2} : If string contains a 0 then set i=1 and t=2
i++<=t : if it's true , then print line. i++ to increment i after each check
I don't have any idea to solve this problem by a single command. If you don't mind using couple of commands, please use the following,
grep -A 2 "e 1" test.txt && grep -A 1 "a 0" test.txt
Related
I want to convert the output of command:
dmidecode -s system-serial-number
which is a string looking like this:
VMware-56 4d ad 01 22 5a 73 c2-89 ce 3f d8 ba d6 e4 0c
to:
564dad01-225a-73c2-89ce-3fd8bad6e40c
I suspect I need to first of all extract all letters and numbers after the "VMware-" part at that start and then insert "-" at the known positions after character 10, 14, 18, 22.
To try the first extraction I have tried:
$ echo `dmidecode -s system-serial-number | grep -oE '(VMware-)?[a0-Z9]'`
VMware-5 6 4 d a d 0 1 2 2 5 a 7 3 c 2 8 9 c e 3 f d 8 b a d 6 e 4 0 c
However this isn't going the right way.
EDIT:
This gets me to a single log string however it's not elegant:
$ echo `dmidecode -s system-serial-number | sed -s "s/VMware-//" | sed -s "s/-//" | sed -s "s/ //g"`
564dad01225a73c289ce3fd8bad6e40c
Like this :
dmidecode -s system-serial-number |
sed -E 's/VMware-//;
s/ +//g;
s/(.)/\1-/8;
s/(.)/\1-/13;
s/(.)/\1-/23'
You can use Bash sub string extraction:
$ s="VMware-56 4d ad 01 22 5a 73 c2-89 ce 3f d8 ba d6 e4 0c"
$ s1=$(echo "${s:7}" | tr -d '[:space:]')
$ echo "${s1:0:8}-${s1:8:4}-${s1:12:9}-${s1:21}"
564dad01-225a-73c2-89ce-3fd8bad6e40c
Or, built-ins only (ie, no tr):
$ s1=${s:7}
$ s1="${s1// /}"
$ echo "${s1:0:8}-${s1:8:4}-${s1:12:9}-${s1:21}"
$ cat grades.dat
santosh 65 65 65 65
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
santosh 99 99 99 99 99
Scripts:-
#!/usr/bin/bash
filename="$1"
while read line
do
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
echo "total is count of the file is $a";
done <"$filename"
O/p
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
Real O/P should be
total is count of the file is 2 like this right..please let me know,where i am missing in above scripts.
Whilst others have shown you better ways to solve your problem, the answer to your question is in the following line:
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
You are storing names in the variable "line" through the while loop, but it is never used. Instead your loop is always looking for "santosh" which does appear twice and because you run the same query for all 5 lines in the file being searched, you therefore get 5 lines of the exact same output.
You could alter your current script like so:
a=$(grep -w "$line" "$filename" | awk '{print$1}' | wc -l)
The above is not meant to be a solution as others have pointed out, but it does solve your issue.
I am trying to parse an input file (my test file is 4 lines) and then query an online biological database. However my loop seems to stop after returning the first result.
#!/bin/bash
if [ "$1" = "" ]; then
echo "No input file to parse given. Give me a BLAST output file"
else
file=$1
#Extracts GI from each result and stores it on temp file.
rm -rf /home/chris/TEMP/tempfile.txt
awk -F '|' '{printf("%s\n",$2);}' "$file" >> /home/chris/TEMP/tempfile.txt
#gets the species from each gi.
input="/home/chris/TEMP/tempfile.txt"
while read -r i
do
echo GI:"$i"
/home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" | /home/chris/EntrezDirect/edirect/efetch -format gpc | /home/chris/EntrezDirect/edirect/xtract -insd source o
rganism | cut -f2
done < "$input"
rm -rf /home/chris/TEMP/tempfile.txt
fi
For example, my only output is
GI:751637161
Pseudomonas stutzeri group
whereas I should have 4 results. Any help appreciated and thanks in advance.
This is the format of the sample input:
TARA042SRF022_1 gi|751637161|ref|WP_041104882.1| 40.4 151 82 2 999 547 1 143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1| 62.7 263 96 1 915 133 80 342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1| 38.3 47 29 0 184 44 152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1| 37.5 48 29 1 525 668 192 238 6.1e+01 37.0
It would appear that read -r i is returning with a non-zero exit status on its second call, indicating that there is no more data to be read from the input file. This usually means that a command inside the while loop is also reading from standard input, and is consuming the remainder of the file before read has a chance.
The only candidate here is esearch, as echo does not read from standard input and the other commands are all reading from the previous command in the pipeline. Redirect standard input for esearch so that it does not consume your input data inadvertently.
while read -r i
do
echo GI:"$i"
/home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" < /dev/null |
/home/chris/EntrezDirect/edirect/efetch -format gpc |
/home/chris/EntrezDirect/edirect/xtract -insd source organism |
cut -f2
done < "$input"
Use cut to extract columns from an ASCII file, use the -d option to denote the delimiter and -f to specify the column. Wrap everything in a loop like so
$ cat data.txt
TARA042SRF022_1 gi|751637161|ref|WP_041104882.1| 40.4 151 82 2 999 547 1 143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1| 62.7 263 96 1 915 133 80 342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1| 38.3 47 29 0 184 44 152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1| 37.5 48 29 1 525 668 192 238 6.1e+01 37.0
$ cat t.sh
#!/bin/bash
for gi in $(cut -d"|" -f 2 data.txt); do
echo $gi
done
$ bash t.sh
751637161
1057355277
950462516
918428433
Edit:
I cannot reproduce the problem but I feel it is linked to newlines and/or the usage of a temp file. My suggestions omits this but does not answer your actual question (but your problem I guess)
I have bash running a command from another program (AFNI). The command outputs two numbers, like this:
70.0 13.670712
I need to make a bash variable that will be whatever the last # is (in this case 13.670712). I've figured out how to make it print only the last number, but I'm having trouble setting it to be a variable. What is the best way to do this?
Here is the code that prints only 13.670712:
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')"; echo "${test}" | awk '{print $2}'
Just pipe(|) the command output to awk. Here in your example, awk reads from stdout of your previous command and prints the 2nd column de-limited by the default single white-space character.
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]' | awk '{print $2}')"
printf "%s\n" "$test"
13.670712
(or) using echo
echo "$test"
13.670712
This is the simplest of the ways to do this, if you are looking for other ways to do this in bash-ism, use read command as using process-substitution
read _ va2 < <(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')
printf "%s\n" "$val2"
13.670712
Another more portable version using set, which will work irrespective of the shell available.
set -- $(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]');
printf "%s\n" "$2"
13.670712
You can use cut to print to print the second column:
$ echo "70.0 13.670712" | cut -d ' ' -f2
13.670712
And assign that to a variable with command substitution:
$ sc="$(echo '70.0 13.670712' | cut -d ' ' -f2)"
$ echo "$sc"
13.670712
Just replace echo '70.0 13.670712' with the command that is actually producing the two numbers.
If you want to grab the last value of some delimited field (or delimited output from a command), you can use parameter expansion. This is completely internal to Bash:
$ echo "$s"
$ echo ${s##*' '}
10
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ echo ${s2##*' '}
20
And then just assign directly:
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ lf=${s2##*' '}
$ echo "$lf"
20
I have the following test file
Kmax Event File - Text Format
1 4 1000
65 4121 9426 12312
56 4118 8882 12307
1273 4188 8217 12309
1291 4204 8233 12308
1329 4170 8225 12303
1341 4135 8207 12306
63 4108 8904 12300
60 4106 8897 12307
731 4108 8192 12306
...
ÿÿÿÿÿÿÿÿ
In this file I want to delete the first two lines and apply some mathematical calculations. For instance each column i will be $i-(i-1)*number. A script that does this is the following
#!/bin/bash
if test $1 ; then
if [ -f $1.evnt ] ; then
rm -f $1.dat
sed -n '2p' $1.evnt | (read v1 v2 v3
for filename in $1*.evnt ; do
echo -e "Processing file $filename"
sed '$d' < $filename > $1_tmp
sed -i '/Kmax/d' $1_tmp
sed -i '/^'"$v1"' '"$v2"' /d' $1_tmp
cat $1_tmp >> $1.dat
done
v3=`wc -l $1.dat | awk '{print $1}' `
echo -e "$v1 $v2 $v3" > .$1.dat
rm -f $1_tmp)
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e " Event file $1.evnt doesn't exist !!!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e "!!!!! Give name for event files !!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
awk '{print $1, $2-4096, $3-(2*4096), $4-(3*4096)}' $1.dat >$1_Processed.dat
rm -f $1.dat
exit 0
The file won't always have 4 columns. Is there a way to read the number of columns, print this number and apply those calculations?
EDIT The idea is to have an input file (*.evnt), convert it to *.dat or any other ascii file(it doesn't matter really) which will only include the number in columns and then apply the calculation $i=$i-(i-1)*number. In addition it will keep the number of columns in a variable, that will be called in another program. For instance in the above file, number=4096 and a sample output file is the following
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
while in the console I will get the message There are 4 detectors.
Finally a new file_processed.dat will be produced, where file is the initial name of awk's input file.
The way it should be executed is the following
./myscript <filename>
where <filename> is the name without the format. For instance, the files will have the format filename.evnt so it should be executed using
./myscript filename
Let's start with this to see if it's close to what you're trying to do:
$ numdet=$( awk -v num=4096 '
NR>2 && NF>1 {
out = FILENAME "_processed.dat"
for (i=1;i<=NF;i++) {
$i = $i-(i-1)*num
}
nf = NF
print > out
}
END {
printf "There are %d detectors\n", nf | "cat>&2"
print nf
}
' file )
There are 4 detectors
$ cat file_processed.dat
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
$ echo "$numdet"
4
Is that it?
Using awk
awk 'NR<=2{next}{for (i=1;i<=NF;i++) $i=$i-(i-1)*4096}1' file