Unix command to find a string in a file and print only the pattern text with the string - shell

I want only to print the matching pattern if the pattern has a specific string using unix commands.
For example: find 25487 in the xyz.txt file and print the text in between please and till here from the xyz.txt file to a new file.
xyz.txt file...
.........
..
....
...
please print 25487 this
sadf
sdfa
sdfasgda
till here
.....
.........
..
please print 45862 this
qret
ret
ASF
H
till here
.........
..
....
...
And finally print only
please print 25487 this
sadf
sdfa
sdfasgda
till here

sed can do this very easily.
sed -n '/25487/, /till here/ p'
Test
$ sed -n '/25487/, /till here/ p' input
please print 25487 this
sadf
sdfa
sdfasgda
till here
What it does?
-n suppresses the printing of pattern space
'/25487/, /till here/ Address range. Selects all lines between the two patterns and take the action following.
Here it selects lines between 25487 and till here
p prints the pattern space.

You could use grep.
$ grep -oPz '(?s)\bplease\b.*?25487.*?\btill here\b' file
please print 25487 this
sadf
sdfa
sdfasgda
till here
(?s) DOTALL modifier which makes dot in your regex to match newline characters also. By default dot won't match line breaks.

Just use the context option in grep to get the lines after the current line. The -A option will give you what you want, just specify how many lines you want after the current line, so just do:
grep -A 4 25487 xyz.txt > newfile.txt
-A 4 will match the 4 lines after the current matching line. If you wanted the 4 lines before you'd use -B and if you want before and after the current line you'd use -C.

A simple awk
awk '/25487/,/till here/' xyz.txt
please print 25487 this
sadf
sdfa
sdfasgda
till here
This is better if you have other test to do too:
awk '/25487/{f=1} f; /till here/{f=0}' xyz.txt
or this
awk '/25487/{f=1} /till here/{f=0;print} f' xyz.txt

sed '/please/,/till here/!d
H
/till here/!b
s/.*//
x
/25487/!d
s/.//p' YourFile
don't care line out of section starting with please and ending with till here and cycle (go to next line entry)
store the line
if not line with till here, cycle
empty current line
exchange with buffer content
if no 25847 inside, delete and cycle
remove first char (an extra new line) and print, than cycle

Related

Remove first character of a text file from shell

I have a text file and I would like to only delete the first character of the text file, is there a way to do this in shell script?
I'm new to writing scripts so I really don't know where to start. I understand that the main command most people use is "sed" but I can only find how to use that as a find and replace tool.
All help is appreciated.
You can use the tail command, telling it to start from character 2:
tail -c +2 infile > outfile
You can use sed
sed '1s/^.//' startfile > endfile
1s means match line 1, in substitution mode (s)
^. means at the beginning of the line (^), match any character (.)
There's nothing between the last slashes, which means substitute with nothing (remove)
I used to use cut command to do this.
For example:
cat file|cut -c2-80
Will show characters from column 2 to 80 only.
In your case you can use:
cat file|cut -c2-10000 > newfile
I hope this help you.
[]s
You can also use the 0,addr2 address-range to limit replacements to the first substitution, e.g.
sed '0,/./s/^.//' file
That will remove the 1st character of the file and the sed expression will be at the end of its range -- effectively replacing only the 1st occurrence.
To edit the file in place, use the -i option, e.g.
sed -i '0,/./s/^.//' file
or simply redirect the output to a new file:
sed '0,/./s/^.//' file > newfile
A few other ideas:
awk '{print (NR == 1 ? substr($0,2) : $0)}' file
perl -0777 -pe 's/.//' file
perl -pe 's/.// unless $done; $done = 1' file
ed file <<END
1s/.//
w
q
END
dd allows you to specify an offset at which to start reading:
dd ibs=1 seek=1 if="$input" of="$output"
(where the variables are set to point to your input and output files, respectively)

How do I delete every line from a file matching the first line

I want to delete every line of a file that matches the first line, but not delete the first line.
I've tried this code so far, but it deletes all matching patterns including the first line.
sed -i "1,/$VARIABLE_CONTAINING_PATTERN/d" $MY_FILE.txt
Following your description literally - to delete every line of a file that matches the first line, but not delete the first line, awk solution:
Let's say we have the following myfile.txt:
my pattern
some text
another pattern
regex
awk sed
my pattern
text text
my pattern
our patterns
awk 'NR==1{ pat=$0; print }NR>1 && $0!~pat' myfile.txt > tmp && mv tmp myfile.txt
Final myfile.txt contents:
my pattern
some text
another pattern
regex
awk sed
text text
our patterns
Using awk. Solution depends a bit on your definition of a match:
$ cat file
1
2
3
1
12
$ awk 'NR==1{p=$0;print;next} p $0 != p' file
1
2
3
12
$ awk 'NR==1{p=$0;print;next} $0 !~ p' file
1
2
3
Therefore you should provide proper sample data with the expected output.
This might work for you (GNU sed):
sed -i '1h;1!G;/^\(.*\)\n\1$/!P;d' file
Copy the first line into the hold space (HS). For every line except the first, append the HS to the pattern space (PS). Compare the current line to the first line and print the current line if it is not the same. Delete the pattern space.

grep (awk) a file from A to first empty line

I need to grep a file from a line containing Pattern A to a first empty line.
I used awk but I don't know how to code this empty line.
cat ${file} | awk '/Pattern A/,/Pattern B/'
sed might be best:
sed -n '/PATTERN/,/^$/p' file
To avoid printing the empty line:
sed -n '/PATTERN/,/^$/{/^$/d; p}' file
or even better - thanks jthill!:
sed -n '/PATTERN/,/^$/{/./p}' file
Above solutions will give more output than needed if PATTERN appears more than once. For that, it is best to quit after empty line is found, as jaypal's answer suggests:
sed -n '/PATTERN/,/^$/{/^$/q; p}' file
Explanation
^$ matches empty lines, because ^ stands for beginning of line and $ for end of line. So that, ^$ means: lines not containing anything in between beginning and end of line.
/PATTERN/,/^$/{/^$/d; p}
/PATTERN/,/^$/ match lines from PATTERN to empty line.
{/^$/d; p} remove (d) the lines being on ^$ format, print (p) the rest.
{/./p} just prints those lines having at least one character.
With awk you can use:
awk '!NF{f=0} /PATTERN/ {f=1} f' file
Same as sed, if it has many lines with PATTERN it would fail. For this, let's exit once empty line is found:
awk 'f && !NF{exit} /PATTERN/ {f=1} f' file
Explanation
!NF{f=0} if there are no fields (that is, line is empty), unset the flag f.
/PATTERN/ {f=1} if PATTERN is found, set the flag f.
f if flag f is set, this is True, so it performs the default awk behaviour: print the line.
Test
$ cat a
aa
bb
hello
aaaaaa
bbb
ttt
$ awk '!NF{f=0} /hello/ {f=1} f' a
hello
aaaaaa
bbb
$ sed -n '/hello/,/^$/{/./p}' a
hello
aaaaaa
bbb
Using sed:
sed -n '/PATTERN/,/^$/{/^$/q;p;}' file
Using regex range, you define your range from the PATTERN to blank line (/^$/). When you encounter a blank line, you quit else you keep printing.
Using awk:
awk '/PATTERN/{p=1}/^$/&&p{exit}p' file
You enable a flag when you encounter your PATTERN. When you reach a blank line and flag is enabled, you exit. If not, you keep printing.
Another alternate suggested by devnull in the comments is to use pcregrep:
pcregrep -M 'PATTERN(.|\n)*?(?=\n\n)' file
I think this is a nice, readable Perl one-liner:
perl -wne '$f=1 if /Pattern A/; exit if /^\s*$/; print if $f' file
Set the flag $f when the pattern is matched
Exit if a blank line (only whitespace between start and end of line) is found
Print the line if the flag is set
Testing it out:
$ cat file
1
2
Pattern A
3
4
5
6
7
8
9
$ perl -wne '$f=1 if /Pattern A/; exit if /^$/; print if $f' file
Pattern A
3
4
5
6
Alternatively, based on the suggestion by #jaypal, you could do this:
perl -lne '/Pattern A/ .. 1 and !/^$/ ? print : exit' file
Rather than using a flag $f, the range operator .. takes care of this for you. It evaluates to true when "Pattern A" is found on the line and remains true indefinitely. When it is true, the other part will be evaluated and will print until a blank line is found.
Never use
/foo/,/bar/
in awk unless you want to get from the first occurrence of "foo" to the last occurrence of "bar" as it makes trivial jobs marginally briefer but even slightly more interesting requirements require a complete re-write.
Just use:
/foo/{f=1} f{print; if (/bar/) f=0}
or similar instead.
In the case the awk solution is:
awk '/pattern/{f=1} f{print; if (!NF) exit}' file

How to get the part of a file after the first line that matches a regular expression

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
The following will print the line matching TERMINATE till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.
If you want the lines before TERMINATE:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.
As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)
The other solutions might consume a lot of memory if you use them on very large files.
If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).
This will print all lines after the TERMINATE-line.
Generalization:
You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.
Notes:
The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.
grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Use Bash parameter expansion like the following:
content=$(cat file)
echo "${content#*TERMINATE}"
There are many ways to do it with sed or awk:
sed -n '/TERMINATE/,$p' file
This looks for TERMINATE in your file and prints from that line up to the end of the file.
awk '/TERMINATE/,0' file
This is exactly the same behaviour as sed.
In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):
awk 'NR>=535' file
Example
$ seq 10 > a #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10
If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:
tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file
And the following will print from the following line matching TERMINATE till the end of the file:
tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file
It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.
Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:
awk '/TERMINATE/ {y=1;next} y' (Hai Vu's answer to 'grep +A': print everything after a match)
awk '/TERMINATE/ ? c++ : c' (Steven Penny's answer to 'grep +A': print everything after a match)
perl -ne 'print unless 1 .. /TERMINATE/' (tchrist's answer to 'grep +A': print everything after a match)
This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:
grep -A466 'TERMINATE' file
sed is a much better tool for the job:
sed -n '/re/,$p' file
where re is a regular expression.
Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.
This will print all lines from the last found line "TERMINATE" till the end of the file:
LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

sed 's/this/that/' -- ignoring g but still replace entire file

as title said, Im trying to change only the first occurrence of word.By using
sed 's/this/that/' file.txt
though i'm not using g option it replace entire file. How to fix this.?
UPDATE:
$ cat file.txt
first line
this
this
this
this
$ sed -e '1s/this/that/;t' file.txt
first line
this // ------> I want to change only this "this" to "that" :)
this
this
this
http://www.faqs.org/faqs/editor-faq/sed/
4.3. How do I change only the first occurrence of a pattern?
sed -e '1s/LHS/RHS/;t' -e '1,/LHS/s//RHS/'
Where LHS=this and RHS=that for your example.
If you know the pattern won't occur on the first line, omit the first -e and the statement following it.
sed by itself applies the edit thru out the file and combined with "g" flag the edit is applied to all the occurrences on the same line.
e.g.
$ cat file.txt
first line
this this
this
this
this
$ sed 's/this/that/' file.txt
first line
that this
that
that
that
$ sed 's/this/that/g' file.txt
first line
that that <-- Both occurrences of "this" have changed
that
that
that

Resources