Sed Capturing Repeating Number Groups - bash

I am trying to use sed to capture a group like these examples:
123123 (i would want the first group 123)
144144 (I would want the group 144)
however sed does not seem to realize what \1 is.
Is there any way to do this using sed? I want to replace the first group with a specific string afterwards.
([0-9]+)\1
I have tried using the above regex yet, sed does not seem to realize what I am trying to do.
also tried this:
~/Desktop$ cat file
123123
23231
12323
123231
12345
144144
~/Desktop$ sed -n 's/.*\b\([[:digit:]]\{1,\}\)\1\b.*/\1/p' file
~/Desktop$
~/Desktop$ sed -n -E 's/([0-9]+)\1/specificstring\1/p' file
specificstring12323
specificstring2323
specificstring12323
specificstring14444
~/Desktop$ sed -nE 's/^([0-9]+)\1([^0-9]|$)/\1/p' file
2323
12323

Use a BRE, and avoid using + since it is not a part of POSIX REs.
$ cat file
123123
23231
12323
123231
12345
144144
$
$ sed -n 's/^\([0-9]\{1,\}\)\1$/\1/p' file
123
144

I want to replace the first group with a specific string afterwards.
With GNU sed :
sed -n -E 's/([0-9]+)\1/specificstring\1/p' file
Takeaways
-n suppresses the output which we override using the print (p ) flag of the s command.
-E enables extended regular expressions.
Note
This doesn't, however, print the lines where there no identical groups the existence of which is not mentioned in the question.

Given that the file only contains 6 digit numbers and nothing else it could be done like this:
sed -n 's/\([0-9]\{3\}\)\1/\1/p' file

Related

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Matching a pattern with sed and getting an integer out at the same time

I have an xml file with these lines (among others):
#Env=DEV2,DEV3,DEV5,DEV6
#Enter your required DEV environment after the ENV= in the next line:
Env=DEV6
I need to:
Verify that the text after ENV= is of the pattern DEV{1..99}
extract the number (in this case, 6) from the line ENV=DEV6 to some environment variable
I know a bit of awk and grep, and can use those to get the number, but I'm thinking of Sed, which I'm told matches patterns nicer than awk and takes less time. Also, I'm concerned about long long lines of greps matching the beginning of the line for that particular Env= .
How would I go about doing it with Sed? would I get away with a shorter line?
I'm a sed newbie, read a bunch of tutorials and examples and got my fingers twisted trying to do both things at the same time...
Can use grep also if pcre regex is available
$ cat ip.txt
#Env=DEV2,DEV3,DEV5,DEV6
#Enter your required DEV environment after the ENV= in the next line:
Env=DEV6
foo
Env=DEV65
bar
Env=DEV568
$ grep -xoP 'Env=DEV\K[1-9][0-9]?' ip.txt
6
65
-x match whole line
-o output only matching text
-P use pcre regex
Env=DEV\K match Env=DEV but not part of output
[1-9][0-9]? range of 1 to 99
I suggest with GNU sed:
var=$(sed -nE 's/^Env=DEV([0-9]{1,2})$/\1/p' file)
echo "$var"
Output:
6
awk -F'Env=DEV' '/Env=DEV[0-9]$|Env=DEV[0-9][0-9]$/{print $2}' input
Input:
echo '
Env=DEV6
Env=DEVasd
Env=DEV62
Env=DEV622'
Output:
awk -F'Env=DEV' '/Env=DEV[0-9]$|Env=DEV[0-9][0-9]$/{print $2}' input
6
62
To store it into any variable:
var=$(awk command)
In awk. First some test cases:
$ cat file
foo
Env=DEV0
Env=DEV1
Env=DEV99
Env=DEV100
$ awk 'sub(/^Env=DEV/,"") && /^[1-9][0-9]?$/' file
1
99
You can used sed as
$ sed 's/^Env=DEV\([1-9][0-9]\?\)/\1/' file
6
You can directly use the above command in export command as
export YOUR_EXPORT_VARIABLE=$(sed 's/^Env=DEV\([1-9][0-9]\?\)/\1/' file)
(or) its pretty straight forward with perl
$ perl -nle 'print $1 if /Env=DEV.*?(\d+)/' file
6

Extract all characters after a match - shell script

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'
You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16
One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

awk and cat - How to ignore multiple lines?

I need to extract Voip log from a D-Link router, so I've setup a little python script that executes a command in this router via telnet.
My script does a "cat /var/log/calls.log" and returns the result, however...
it also sends non-important stuff, like the BusyBox banner, etc...
How can I ignore lines from 1 to 6 and the last 2 ?
This is my current output:
yaba#foobar:/stuff$ python calls.py
BusyBox v1.00 (2009.04.09-11:17+0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.
DVA-G3170i/PT # cat /var/call.log
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
DVA-G3170i/PT # exit
And I just need:
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
(it can have multiple lines)
So that I can save it to a CSV and later to a sql db.
Thanks, and sorry my bad english.
Why not use a pattern in AWK to match the text you want?
python calls.py | awk '/^[0-9]/{print}/'
The whole POINT of AWK is matching lines based on patterns and manipulating/printing those matched lines.
Edited to add example run.
Here's a junk data file based on your sample above.
$ cat junk.dat
BusyBox v1.00 (2009.04.09-11:17+0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.
DVA-G3170i/PT # cat /var/call.log
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
DVA-G3170i/PT # exit
Here's running it through AWK with a filter.
$ cat junk.dat | awk '/^[0-9]/ {print}'
1 ,1294620563,2 ,+351xxx080806 ,xxx530802 ,1 ,3 ,1
No need for SED, no need for counting lines, no need for anything but AWK. Why make things more complicated than they need to be?
In one call to sed:
sed -n '1,6d;7,${N;$q;P;D}'
or for picky versions of sed:
sed -ne '1,6d' -e '7,${N' -e '$q' -e 'P' -e 'D}'
You could also do it based on matches:
sed -n '/^[0-9]+/p'
or something similar.
But why doesn't your Python script read the file and do the filtering (instead of calling an external utility)?
python calls.py | sed -e 1,6d -e '$d'
So that might work. It will filter out the first 6 and the last, which is what your example indicates you need. If you really want to clobber the last two lines then you could do:
python calls.py | sed -e 1,6d -e '$d' | sed -e '$d'
But wait ... you said awk, so...
python calls.py | awk '{ if(NR > 7) { print t }; t = $0 }'
This might work for you:
sed '1,6d;$!N;$d;P;D' file
I'm not sure this is the best way to do it (maybe D-Link router has FTP or SSH support) but you can do it with awk:
awk '/cat/, /exit/' | sed -e '1d' -e '$d'
awk will print everything between lines containing "cat" and "exit", unfortunately including these two lines. That's what the remaining commands are for, I couldn't figure out how to do it nicer than that...

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources