Bash grep sth. then to find the position - bash

I've long been wondering about this question;
say I first try to grep some lines from a file:
cat 101127_2.bam |grep 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA'
Then it'll pop out the whole line containing this string.
However, can we use some simple bash code to locate at which line this string locates? (100th? 1000th?...)

grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA' 101127_2.bam
I found it using man grep and writing /line number
// EDIT: Thanks #Keith Thompson I'm editing post from cat file | grep -n pattern to grep -n pattern file, I was in a hurry sorry

try this:
cat 101127_2.bam |grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA'

This might work for you too:
sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=;d' 101127_2.bam
or
sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=' 101127_2.bam
The above solutions only output the matching line numbers, to see the lines matched too:
sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/!d;=' 101127_2.bam
or
sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/{=;p}' 101127_2.bam

Related

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

How to pass output of grep to sed?

I have a command like this :
cat error | grep -o [0-9]
which is printing only numbers like 2,30 and so on. Now I wish to pass this number to sed.
Something like :
cat error | grep -o [0-9] | sed -n '$OutPutFromGrep,$OutPutFromGrepp'
Is it possible to do so?
I'm new to shell scripting. Thanks in advance
If the intention is to print the lines that grep returns, generating a sed script might be the way to go:
grep -E -o '[0-9]+' error | sed 's/$/p/' | sed -f - error
You are probably looking for xargs, particularly the -I option:
themel#eristoteles:~$ xargs -I FOO echo once FOO, twice FOO
hi
once hi, twice hi
there
once there, twice there
Your example:
themel#eristoteles:~$ cat error
error in line 123
error in line 234
errors in line 345 and 346
themel#eristoteles:~$ grep -o '[0-9]*' < error | xargs -I OutPutFromGrep echo sed -n 'OutPutFromGrep,OutPutFromGrepp'
sed -n 123,123p
sed -n 234,234p
sed -n 345,345p
sed -n 346,346p
For real-world use, you'll probably want to pass sed an input file and remove the echo.
(Fixed your UUOC, by the way. )
Yes you can pass output from grep to sed.
Please note that in order to match whole numbers you need to use [0-9]* not only [0-9] which would match only a single digit.
Also note you should use double quotes to get variables expanded(in the sed argument) and it seems you have a typo in the second variable name.
Hope this helps.

how to insert new line after a specific character in scripts

I have the following (example.txt) file:
blue(4) red(8) green(5) yellow(19) brown(60) black(5)
how can I achieve in unix the following result?
blue(4)
red(8)
green(5)
yellow(19)
brown(60)
black(5)
If you need to insert newline after closing brackets, try
sed 's/) \?/)\n/g' example.txt
The following in-line sed script will replace a space with a newline, and should solve your problem.
sed -i 's/ /\n/g' example.txt > example_out.txt
xargs -n 1 < example.txt
By passing example.txt into xargs taking one argument at a time -n 1, xargs will place each entry on a separate line.
E.g., to put two entries per line one would simply change the -n 1 to -n 2
The option -n is also referred to as max-args on the man page.
Pass your data to this sed command, like so:
sed 's/ /\n/g' example.txt
I needed to achieve something like this and used sed command. It can be used to perform functions on streams.
For your requirement, you can use it like this:
sed -i 's/ /\n/g' example.txt
You can read more about this in the sed man page.

Output specific line huge text file

I have a sql dump with 300mb that gives me an error on specific line.
But that line is in the middle of the file. What is the best approach?
head -n middleLine dump.sql > output?
Or can i output only the line i need?
You could use sed -n -e 123456p your.dump to print line 123456
If the file is long, consider using
sed -n 'X{p;q}' file
Where X is the line number. It will stop reading the file after reaching that line.
If sed is too slow for your taste you may also use
cat $THE_FILE | head -n $DESIRED_LINE | tail -n 1
You can use sed:
sed -n "x p" dump.sql
where x is the line number.
This might work for you:
sed 'X!d;q' file
where X is the line number.
This can also be done with Perl:
perl -wnl -e '$. == 4444444 and print and exit;' FILENAME.sql
4444444 being the line number you wish to print.
You can also try awk like:
awk 'NR==YOUR_LINE_NO{print}' file_name
If you know a phrase on that line I would use grep. If the phrase is "errortext" use:
$ cat dump.sql | grep "errortext"

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources