sed or awk replace after ignoring N matches

sed or awk replace after ignoring N matches - bash

I have a repetitive text, I want to carefully replace one label with another several times, I don mind repeating sed awk or another method. Therefore I would want to first replace the first two matches, then after the first 4, 6, etc. I don't want a for, I just need something like the code below, I want to skip the first two matches and then increase that number.
sed 's/foo/bar/2g' fileX
awk '{ sub(/foo/,"bar"); print }' fileX
here is an example. Two occurrences per line
blastx -q specie.fa -db pep -num 6 -max 1 -o 6 > specie.x.outfmt6
blastp -q specie.pep -db pep -num 6 -max 1 -o 6 > specie.p.outfmt6
blastx -q specie.fa -db pep -num 6 -max 1 -o 6 > specie.x.outfmt6
blastp -q specie.pep -db pep -num 6 -max 1 -o 6 > specie.p.outfmt6
Desired output
blastx -q dog.fa -db pep -num 6 -max 1 -o 6 > dog.x.outfmt6
blastp -q dog.pep -db pep -num 6 -max 1 -o 6 > dog.p.outfmt6
blastx -q worm.fa -db pep -num 6 -max 1 -o 6 > worm.x.outfmt6
blastp -q worm.pep -db pep -num 6 -max 1 -o 6 > worm.p.outfmt6

Is this what you're trying to do?
$ awk -v animals='monkey worm dog' 'BEGIN{split(animals,a)} NR%2{c++} {$NF=a[c]} 1' file
here some text -t monkey
and then do something -t monkey
here some text -t worm
and then do something -t worm
here some text -t dog
and then do something -t dog
Given your new sample input/output maybe this is what you want:
$ awk -v animals='dog worm' 'BEGIN{split(animals,a)} NR%2{c++} {gsub(/specie/,a[c])} 1' file
blastx -q dog.fa -db pep -num 6 -max 1 -o 6 > dog.x.outfmt6
blastp -q dog.pep -db pep -num 6 -max 1 -o 6 > dog.p.outfmt6
blastx -q worm.fa -db pep -num 6 -max 1 -o 6 > worm.x.outfmt6
blastp -q worm.pep -db pep -num 6 -max 1 -o 6 > worm.p.outfmt6
Since you didn't include any regexp characters or backreference characters or partial match cases in your sample input/output (e.g. if the word species appeared somewhere and should NOT be changed) I assume they can't happen and so we don't need the script to guard against them.

Related

How to use sed and cut to find and replace value at certain position of line in a file

I have a case where I have to replace the number 1 with number 3 at 10th location of various lines in a stored text file. I am unable to find a way to do that. Below is sample file and code.
Sample file:
$ cat testdata.txt
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 1 1 John Carter 19880712
#!/bin/sh
filename=testdata.txt
echo "reading number of line"
nol=$(cat $filename | wc -l)
flag[$nol]=''
echo "reading content of file"
for i in (1..$nol)
do
flag=($cut -c10-11 $filename)
if($flag==1)
sed 's/1/3/2'
fi
done
But this is not working.
Please help to resolve this.
Updated:
Sample Output:
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712

try this
sed "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt > new_testdata.txt
If sed supports the option -i you can also edit inplace.
sed -i "s/^\(.\{8\}\) 1 \(.*\)$/\1 3 \2/g" testdata.txt
output
1 43515 8 1 victor Samuel 20190112
3215736 4 6 Michael pristine 20180923
1 56261 3 1 John Carter 19880712
explanation
s # substitute
/^\( # from start of line, save into arg1
.\{8\} # the first 8 characters
\) 1 \( # search pattern ' 1 '
.* # save the rest into arg2
\)$/ # to the end of the line
\1 3 \2 # output: arg1 3 arg2
/g # global on whole line

Simulating User Interaction In Gromacs in Bash

I am currently doing parallel cascade simulations in GROMACS 4.6.5 and I am inputting the commands using a bash script:
#!/bin/bash
pdb2gmx -f step_04_01.pdb -o step_04_01.gro -water none -ff amber99sb -ignh
grompp -f minim.mdp -c step_04_01.gro -p topol.top -o em.tpr
mdrun -v -deffnm em
grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr
mdrun -v -deffnm nvt
grompp -f md.mdp -c nvt.gro -t nvt.cpt -p topol.top -o step_04_01.tpr
mdrun -v -deffnm step_04_01
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
g_rms -s itasser_2znh.tpr -f step_04_01_pbc.xtc -o step_04_01_rmsd.xvg
Commands such as trjconv and g_rms require user interaction to select options. For instance when running trjconv you are given:
Select group for output
Group 0 ( System) has 6241 elements
Group 1 ( Protein) has 6241 elements
Group 2 ( Protein-H) has 3126 elements
Group 3 ( C-alpha) has 394 elements
Group 4 ( Backbone) has 1182 elements
Group 5 ( MainChain) has 1577 elements
Group 6 ( MainChain+Cb) has 1949 elements
Group 7 ( MainChain+H) has 1956 elements
Group 8 ( SideChain) has 4285 elements
Group 9 ( SideChain-H) has 1549 elements
Select a group:
And the user is expected to enter eg. 0 into the terminal to select Group 0. I have tried using expect and send, eg:
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
expect "Select group: "
send "0"
However this does not work. I have also tried using -flag like in http://www.gromacs.org/Documentation/How-tos/Using_Commands_in_Scripts#Within_Script but it says that it is not a recognised input.
Is my expect \ send formatted correctly? Is there another way around this in GROMACS?

I don't know gromacs but I think they are just asking you to to use the bash syntax:
yourcomand ... <<EOF
1st answer to a question
2nd answer to a question
EOF
so you might have
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc <<EOF
0
EOF

You can use
echo 0 | trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
And if you need to have multiple inputs, just use
echo 4 4 | g_rms -s itasser_2znh.tpr -f step_04_01_pbc.xtc -o step_04_01_rmsd.xvg

How to properly use the grep command to grab and store integers?

I am currently building a bash script for class, and I am trying to use the grep command to grab the values from a simple calculator program and store them in the variables I assign, but I keep receiving a syntax error message when I try to run the script. Any advice on how to fix it? my script looks like this:
#!/bin/bash
addanwser=$(grep -o "num1 + num2" Lab9 -a 5 2)
echo "addanwser"
subanwser=$(grep -o "num1 - num2" Lab9 -s 10 15)
echo "subanwser"
multianwser=$(grep -o "num1 * num2" Lab9 -m 3 10)
echo "multianwser"
divanwser=$(grep -o "num1 / num2" Lab9 -d 100 4)
echo "divanwser"
modanwser=$(grep -o "num1 % num2" Lab9 -r 300 7)
echo "modawser"`

You want to grep the output of a command.
grep searches from either a file or standard input. So you can say either of these equivalent:
grep X file # 1. from a file
... things ... | grep X # 2. from stdin
grep X <<< "content" # 3. using here-strings
For this case, you want to use the last one, so that you execute the program and its output feeds grep directly:
grep <something> <<< "$(Lab9 -s 10 15)"
Which is the same as saying:
Lab9 -s 10 15 | grep <something>
So that grep will act on the output of your program. Since I don't know how Lab9 works, let's use a simple example with seq, that returns numbers from 5 to 15:
$ grep 5 <<< "$(seq 5 15)"
5
15

grep is usually used for finding matching lines of a text file. To actually grab a part of the matched line other tools such as awk are used.
Assuming the output looks like "num1 + num2 = 54" (i.e. fields are separated by space), this should do your job:
addanwser=$(Lab9 -a 5 2 | awk '{print $NF}')
echo "$addanwser"
Make sure you don't miss the '$' sign before addanwser when echo'ing it.
$NF selects the last field. You may select nth field using $n.

Output only specific var from text

Output only specific var from text. In that case:
echo "ote -C -pname ap01 -HLS 134 -db 1 -instance 43 -log ap01"
Want to get only this value from "-pname"
Exptected result:
ap01

-log takes a string. That string could be -pname. The existing solutions so far fail to handle that and treat the value of the -log parameter as the start of another argument.
You'll have to recreate the argument parsing ote performs if you want a robust solution. The following is well on your way to do that.
echo ... | perl -MGetopt::Long -nle'
local #ARGV = split;
GetOptions(\%args, "C", "pname=s", "HLS=i", "db=i", "instant=i", "log=s")
&& defined($args{pname})
&& print($args{pname});
'

This will deal with a doubled -pname.
echo "ote -C -pname ap01 -HLS 134 -db 1 -instance 43 -log ap01" |\
perl -ne 'print ( /\s+-pname\s+([^-]\S*)/ )'
As ikegami notes below, if you should happen to want to use dashes as the first character of this value, the only way I know that you can be sure you're getting a value and not another switch is more complicated. One way is to do a negative lookahead for all known switches:
echo "ote -C -pname -pname -ap01- -HLS 134 -db 1 -instance 43 -log ap01" |\
perl -ne 'print ( /\s+-pname\s+(?!-(?:pname|other|known|switches))(\S+)/ )'

echo "ote -C -pname ap01 -HLS 134 -db 1 -instance 43 -log ap01" | \
awk '{for (i = 1; i <= NF; i++) {if ($i == "-pname") { print $(i+1); break; } } }'

echo "ote -C -pname ap01 -HLS 134 -db 1 -instance 43 -log ap01" | \
perl -pe '($_)= /-pname\s+([^-]\S*)/'

grep with look behind?
$ grep -Po '(?<=-pname )[^ ]*' <<< "ote -C -pname ap01 -HLS 134 -db 1 -instance 43 -log ap01"
ap01
As there might be many -pname in the string (see comments below), you can then "play" with head and tail to get the value you want.
Explanation
This uses -P for Perl Regex and -o for "print only the matched parts of a machine line".
(?<=-pname ) is a look-behind: match strings that are preceeded by -pname (note the space).
[^ ]* match any set of characters until a space is found.

You can simply use (GNU) grep:
$ echo "ote -C -pname ap01 -HLS 134 -pname foo -db 1 -instance 43 -log ap01" |
grep -Po -- '-pname \K[^ ]+'
ap01
Explanation
The -P enables Perl Compatible Regular Expressions (PCREs) which gives us \K (meaning "discard anything matched up to this point). The -o means "print only the matched portion of the line. So, we then look for the string -pname followed by a space and then as many consecutive non-space characters as possible ([^ ]+). Because of the \K, everything before that is discarded and because of the -o, only the matched portion is printed.
This will work for an arbitrary number of -pname flags as long as none of their values contain spaces.

This looks simple
xargs -n1 | sed -n "/-pname/,//p" | tail -n1

How do I pick random unique lines from a text file in shell?

I have a text file with an unknown number of lines. I need to grab some of those lines at random, but I don't want there to be any risk of repeats.
I tried this:
jot -r 3 1 `wc -l<input.txt` | while read n; do
awk -v n=$n 'NR==n' input.txt
done
But this is ugly, and doesn't protect against repeats.
I also tried this:
awk -vmax=3 'rand() > 0.5 {print;count++} count>max {exit}' input.txt
But that obviously isn't the right approach either, as I'm not guaranteed even to get max lines.
I'm stuck. How do I do this?

This might work for you:
shuf -n3 file
shuf is one of GNU coreutils.

If you have Python accessible (change the 10 to what you'd like):
python -c 'import random, sys; print("".join(random.sample(sys.stdin.readlines(), 10)).rstrip("\n"))' < input.txt
(This will work in Python 2.x and 3.x.)
Also, (again change the 10 to the appropriate value):
sort -R input.txt | head -10

If jot is on your system, then I guess you're running FreeBSD or OSX rather than Linux, so you probably don't have tools like rl or sort -R available.
No worries. I had to do this a while ago. Try this instead:
$ printf 'one\ntwo\nthree\nfour\nfive\n' > input.txt
$ cat rndlines
#!/bin/sh
# default to 3 lines of output
lines="${1:-3}"
# default to "input.txt" as input file
input="${2:-input.txt}"
# First, put a random number at the beginning of each line.
while read line; do
printf '%8d%s\n' $(jot -r 1 1 99999999) "$line"
done < "$input" |
sort -n | # Next, sort by the random number.
sed 's/^.\{8\}//' | # Last, remove the number from the start of each line.
head -n "$lines" # Show our output
$ ./rndlines input.txt
two
one
five
$ ./rndlines input.txt
four
two
three
$
Here's a 1-line example that also inserts the random number a little more cleanly using awk:
$ printf 'one\ntwo\nthree\nfour\nfive\n' | awk 'BEGIN{srand()} {printf("%8d%s\n", rand()*10000000, $0)}' | sort -n | head -n 3 | cut -c9-
Note that different versions of sed (in FreeBSD and OSX) may require the -E option instead of -r to handle ERE instead or BRE dialect in the regular expression if you want to use that explictely, though everything I've tested works with escapted bounds in BRE. (Ancient versions of sed (HP/UX, etc) might not support this notation, but you'd only be using those if you already knew how to do this.)

This should do the trick, at least with bash and assuming your environment has the other commands available:
cat chk.c | while read x; do
echo $RANDOM:$x
done | sort -t: -k1 -n | tail -10 | sed 's/^[0-9]*://'
It basically outputs your file, placing a random number at the start of each line.
Then it sorts on that number, grabs the last 10 lines, and removes that number from them.
Hence, it gives you ten random lines from the file, with no repeats.
For example, here's a transcript of it running three times with that chk.c file:
====
pax$ testprog chk.c
} else {
}
newNode->next = NULL;
colm++;
====
pax$ testprog chk.c
}
arg++;
printf (" [%s] n", currNode->value);
free (tempNode->value);
====
pax$ testprog chk.c
char tagBuff[101];
}
return ERR_OTHER;
#define ERR_MEM 1
===
pax$ _

sort -Ru filename | head -5
will ensure no duplicates. Not all implementations of sort have the -R option.

To get N random lines from FILE with Perl:
perl -MList::Util=shuffle -e 'print shuffle <>' FILE | head -N

Here's an answer using ruby if you don't want to install anything else:
cat filename | ruby -e 'puts ARGF.read.split("\n").uniq.shuffle.join("\n")'
for example, given a file (dups.txt) that looks like:
1 2
1 3
2
1 2
3
4
1 3
5
6
6
7
You might get the following output (or some permutation):
cat dups.txt| ruby -e 'puts ARGF.read.split("\n").uniq.shuffle.join("\n")'
4
6
5
1 2
2
3
7
1 3
Further example from the comments:
printf 'test\ntest1\ntest2\n' | ruby -e 'puts ARGF.read.split("\n").uniq.shuffle.join("\n")'
test1
test
test2
Of course if you have a file with repeated lines of test you'll get just one line:
printf 'test\ntest\ntest\n' | ruby -e 'puts ARGF.read.split("\n").uniq.shuffle.join("\n")'
test

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

sed or awk replace after ignoring N matches - bash

Related

How to use sed and cut to find and replace value at certain position of line in a file

Simulating User Interaction In Gromacs in Bash

How to properly use the grep command to grab and store integers?

Output only specific var from text

How do I pick random unique lines from a text file in shell?

Categories

Resources