How to search the content of a file using sed - shell

I tried several commands and read a few tutorials, but none helped. I'm using gnu sed 4.2.2.
I just want to search in a file for:
func1(int a)
{
I don't know if it is the fact of a newline followed by "{" but it just doesn't work.
A correct command would help and with an explanation even more. Thanks!
Note: After "int a)" I typed enter, but I think stackoverflow doesn't put a newline. I don't want confusion, I just want to search func1(int a)'newline'{.

sed -n '/func1/{N;/func1(int a)\n{/p}'
Explanation:
sed -n '/func1/{ # look for a line with "func1"
N # append the next line
/func1(int a)\n{/p # try to match the whole pattern "func1(int a)\n{"
# and print if matched
}'
With grep it would be for example:
grep -Pzo 'func1\(int a\)\n{'
but notice that thanks to -z the input to grep will be one large "line" that includes the newline characters too (that is unless the input contains null characters). Parenthesis have to be escaped in this case because it is a Perl regular expression (-P). -o makes grep print just the matched pattern.

Related

Append text to top of file using sed doesn't work for variable whose content has "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three
The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).
The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g
A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.
add \ before special characters:
s/\?page=one&/page\/one\//g
etc.
In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}
this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three
replace.txt should be
s/?page=/\/page\//g
s/&//g
please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /
Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";
A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file
You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

ignore spaces within/around brackets to count occurrences

(to LaTeX users) I want to search for manually labeled items
(to whom it may concern) script file on GitHub
I tried to find solution, but what I've found suggested to remove spaces first. In my case, I think there should be simpler solution. It could be using grep or awk or some other tool.
Consider the following lines:
\item[a)] some text
\item [i) ] any text
\item[ i)] foo and faa
\item [ 1) ] foo again
I want to find (or count) if there are items with a single ) inside brackets. The format could have blank spaces inside the brackets and/or around it. Also, the char before the closing parentheses could be any letter or number.
Edit: I tried grep "\[a)\]" but it missed [ a) ].
Since there are many possible ways to write an item, I can not decide about a possible pattern. I think that it is enough for me such as
\item<blank spaces>[<blank spaces><letter or number>)<blank spaces>]
Replace blank space could not work because the patter above in general contains text around it (for example: \item[ a)] consider the function...)
The output should indicate is there are such patterns or not. It could be zero or the number of occurrences.
So to do it all in the grep itself:
grep -c -E '\\item\s*\[\s*\w+\)\s*\]' file.txt
Note all the \s* checks for spaces. Also -c to get the count.
Breaking it down:
\\ a backslash (needs escape in grep)
item "item"
\s* optional whitespaces
\[ "[" (needs escape in -E)
\s* optional whitespaces
\w+ at least one 'word' char
\) ")" (needs escape in -E)
\s* optional whitespaces
\] "]" (needs escape in -E)
Following awk may also help here(I am simply removing the spaces between [ to ] and then looking for pattern of either digit or character in it.
awk '
match($0,/\[.*\]/){
val=substr($0,RSTART+1,RLENGTH-1);
gsub(/[[:space:]]+/,"",val);
if(val ~ /[a-z0-9]+\)/){ count++ }
}
END{
print count
}' Input_file
So I am thinking something like this:
tr -d " \t" < file.txt | grep -c '\\item\[[0-9A-Za-z])\]'
This will count the number of matches for you.
Edit: Added \t to tr call. Now removes all spaces and tabs.
Here is a grep only version. This could be useful for printing out all of the matches (by removing -c) as well since the above version modifies the input:
grep -c '\\item *\[ *[0-9A-Za-z]) *\]' file.txt
Here is a more versatile answer if this is what you looking for. Here, we output the matches to a file and count the lines from the file to get the number of matches...
grep '\\item *\[ *[0-9A-Za-z]) *\]' file.txt > matches.txt
wc -l < matches.txt

Bash script output text between first match and 2nd match only [duplicate]

I'm trying to use sed to clean up lines of URLs to extract just the domain.
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trailing slash, it doesn't matter)
I have tried:
sed 's|\(http:\/\/.*?\/\).*|\1|'
and (escaping the non-greedy quantifier)
sed 's|\(http:\/\/.*\?\/\).*|\1|'
but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.
Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:
perl -pe 's|(http://.*?/).*|\1|'
In this specific case, you can get the job done without using a non-greedy regex.
Try this non-greedy regex [^/]* instead of .*?:
sed 's|\(http://[^/]*/\).*|\1|g'
With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*\)/.*;\1;p'
Output:
http://www.suon.co.uk
this is:
don't output -n
search, match pattern, replace and print s/<pattern>/<replace>/p
use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p
remember match between brackets \( ... \), later accessible with \1,\2...
match http://
followed by anything in brackets [], [ab/] would mean either a or b or /
first ^ in [] means not, so followed by anything but the thing in the []
so [^/] means anything except / character
* is to repeat previous group so [^/]* means characters except /.
so far sed -n 's;\(http://[^/]*\) means search and remember http://followed by any characters except / and remember what you've found
we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;\(http://[^/]*\)/' but we want to match the rest of the line after the domain so add .*
now the match remembered in group 1 (\1) is the domain so replace matched line with stuff saved in group \1 and print: sed -n 's;\(http://[^/]*\)/.*;\1;p'
If you want to include backslash after the domain as well, then add one more backslash in the group to remember:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*/\).*;\1;p'
output:
http://www.suon.co.uk/
Simulating lazy (un-greedy) quantifier in sed
And all other regex flavors!
Finding first occurrence of an expression:
POSIX ERE (using -r option)
Regex:
(EXPRESSION).*|.
Sed:
sed -r ‍'s/(EXPRESSION).*|./\1/g' # Global `g` modifier should be on
Example (finding first sequence of digits) Live demo:
$ sed -r 's/([0-9]+).*|./\1/g' <<< 'foo 12 bar 34'
12
How does it work?
This regex benefits from an alternation |. At each position engine tries to pick the longest match (this is a POSIX standard which is followed by couple of other engines as well) which means it goes with . until a match is found for ([0-9]+).*. But order is important too.
Since global flag is set, engine tries to continue matching character by character up to the end of input string or our target. As soon as the first and only capturing group of left side of alternation is matched (EXPRESSION) rest of line is consumed immediately as well .*. We now hold our value in the first capturing group.
POSIX BRE
Regex:
\(\(\(EXPRESSION\).*\)*.\)*
Sed:
sed 's/\(\(\(EXPRESSION\).*\)*.\)*/\3/'
Example (finding first sequence of digits):
$ sed 's/\(\(\([0-9]\{1,\}\).*\)*.\)*/\3/' <<< 'foo 12 bar 34'
12
This one is like ERE version but with no alternation involved. That's all. At each single position engine tries to match a digit.
If it is found, other following digits are consumed and captured and the rest of line is matched immediately otherwise since * means
more or zero it skips over second capturing group \(\([0-9]\{1,\}\).*\)* and arrives at a dot . to match a single character and this process continues.
Finding first occurrence of a delimited expression:
This approach will match the very first occurrence of a string that is delimited. We can call it a block of string.
sed 's/\(END-DELIMITER-EXPRESSION\).*/\1/; \
s/\(\(START-DELIMITER-EXPRESSION.*\)*.\)*/\1/g'
Input string:
foobar start block #1 end barfoo start block #2 end
-EDE: end
-SDE: start
$ sed 's/\(end\).*/\1/; s/\(\(start.*\)*.\)*/\1/g'
Output:
start block #1 end
First regex \(end\).* matches and captures first end delimiter end and substitues all match with recent captured characters which
is the end delimiter. At this stage our output is: foobar start block #1 end.
Then the result is passed to second regex \(\(start.*\)*.\)* that is same as POSIX BRE version above. It matches a single character
if start delimiter start is not matched otherwise it matches and captures the start delimiter and matches the rest of characters.
Directly answering your question
Using approach #2 (delimited expression) you should select two appropriate expressions:
EDE: [^:/]\/
SDE: http:
Usage:
$ sed 's/\([^:/]\/\).*/\1/g; s/\(\(http:.*\)*.\)*/\1/' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
Output:
http://www.suepearson.co.uk/
Note: this will not work with identical delimiters.
sed does not support "non greedy" operator.
You have to use "[]" operator to exclude "/" from match.
sed 's,\(http://[^/]*\)/.*,\1,'
P.S. there is no need to backslash "/".
sed - non greedy matching by Christoph Sieghart
The trick to get non greedy matching in sed is to match all characters excluding the one that terminates the match. I know, a no-brainer, but I wasted precious minutes on it and shell scripts should be, after all, quick and easy. So in case somebody else might need it:
Greedy matching
% echo "<b>foo</b>bar" | sed 's/<.*>//g'
bar
Non greedy matching
% echo "<b>foo</b>bar" | sed 's/<[^>]*>//g'
foobar
Non-greedy solution for more than a single character
This thread is really old but I assume people still needs it.
Lets say you want to kill everything till the very first occurrence of HELLO. You cannot say [^HELLO]...
So a nice solution involves two steps, assuming that you can spare a unique word that you are not expecting in the input, say top_sekrit.
In this case we can:
s/HELLO/top_sekrit/ #will only replace the very first occurrence
s/.*top_sekrit// #kill everything till end of the first HELLO
Of course, with a simpler input you could use a smaller word, or maybe even a single character.
HTH!
This can be done using cut:
echo "http://www.suepearson.co.uk/product/174/71/3816/" | cut -d'/' -f1-3
another way, not using regex, is to use fields/delimiter method eg
string="http://www.suepearson.co.uk/product/174/71/3816/"
echo $string | awk -F"/" '{print $1,$2,$3}' OFS="/"
sed certainly has its place but this not not one of them !
As Dee has pointed out: Just use cut. It is far simpler and much more safe in this case. Here's an example where we extract various components from the URL using Bash syntax:
url="http://www.suepearson.co.uk/product/174/71/3816/"
protocol=$(echo "$url" | cut -d':' -f1)
host=$(echo "$url" | cut -d'/' -f3)
urlhost=$(echo "$url" | cut -d'/' -f1-3)
urlpath=$(echo "$url" | cut -d'/' -f4-)
gives you:
protocol = "http"
host = "www.suepearson.co.uk"
urlhost = "http://www.suepearson.co.uk"
urlpath = "product/174/71/3816/"
As you can see this is a lot more flexible approach.
(all credit to Dee)
sed 's|(http:\/\/[^\/]+\/).*|\1|'
There is still hope to solve this using pure (GNU) sed. Despite this is not a generic solution in some cases you can use "loops" to eliminate all the unnecessary parts of the string like this:
sed -r -e ":loop" -e 's|(http://.+)/.*|\1|' -e "t loop"
-r: Use extended regex (for + and unescaped parenthesis)
":loop": Define a new label named "loop"
-e: add commands to sed
"t loop": Jump back to label "loop" if there was a successful substitution
The only problem here is it will also cut the last separator character ('/'), but if you really need it you can still simply put it back after the "loop" finished, just append this additional command at the end of the previous command line:
-e "s,$,/,"
sed -E interprets regular expressions as extended (modern) regular expressions
Update: -E on MacOS X, -r in GNU sed.
Because you specifically stated you're trying to use sed (instead of perl, cut, etc.), try grouping. This circumvents the non-greedy identifier potentially not being recognized. The first group is the protocol (i.e. 'http://', 'https://', 'tcp://', etc). The second group is the domain:
echo "http://www.suon.co.uk/product/1/7/3/" | sed "s|^\(.*//\)\([^/]*\).*$|\1\2|"
If you're not familiar with grouping, start here.
I realize this is an old entry, but someone may find it useful.
As the full domain name may not exceed a total length of 253 characters replace .* with .\{1, 255\}
This is how to robustly do non-greedy matching of multi-character strings using sed. Lets say you want to change every foo...bar to <foo...bar> so for example this input:
$ cat file
ABC foo DEF bar GHI foo KLM bar NOP foo QRS bar TUV
should become this output:
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
To do that you convert foo and bar to individual characters and then use the negation of those characters between them:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/g; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
In the above:
s/#/#A/g; s/{/#B/g; s/}/#C/g is converting { and } to placeholder strings that cannot exist in the input so those chars then are available to convert foo and bar to.
s/foo/{/g; s/bar/}/g is converting foo and bar to { and } respectively
s/{[^{}]*}/<&>/g is performing the op we want - converting foo...bar to <foo...bar>
s/}/bar/g; s/{/foo/g is converting { and } back to foo and bar.
s/#C/}/g; s/#B/{/g; s/#A/#/g is converting the placeholder strings back to their original characters.
Note that the above does not rely on any particular string not being present in the input as it manufactures such strings in the first step, nor does it care which occurrence of any particular regexp you want to match since you can use {[^{}]*} as many times as necessary in the expression to isolate the actual match you want and/or with seds numeric match operator, e.g. to only replace the 2nd occurrence:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/2; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC foo DEF bar GHI <foo KLM bar> NOP foo QRS bar TUV
Have not yet seen this answer, so here's how you can do this with vi or vim:
vi -c '%s/\(http:\/\/.\{-}\/\).*/\1/ge | wq' file &>/dev/null
This runs the vi :%s substitution globally (the trailing g), refrains from raising an error if the pattern is not found (e), then saves the resulting changes to disk and quits. The &>/dev/null prevents the GUI from briefly flashing on screen, which can be annoying.
I like using vi sometimes for super complicated regexes, because (1) perl is dead dying, (2) vim has a very advanced regex engine, and (3) I'm already intimately familiar with vi regexes in my day-to-day usage editing documents.
Since PCRE is also tagged here, we could use GNU grep by using non-lazy match in regex .*? which will match first nearest match opposite of .*(which is really greedy and goes till last occurrence of match).
grep -oP '^http[s]?:\/\/.*?/' Input_file
Explanation: using grep's oP options here where -P is responsible for enabling PCRE regex here. In main program of grep mentioning regex which is matching starting http/https followed by :// till next occurrence of / since we have used .*? it will look for first / after (http/https://). It will print matched part only in line.
echo "/home/one/two/three/myfile.txt" | sed 's|\(.*\)/.*|\1|'
don bother, i got it on another forum :)
sed 's|\(http:\/\/www\.[a-z.0-9]*\/\).*|\1| works too
Here is something you can do with a two step approach and awk:
A=http://www.suepearson.co.uk/product/174/71/3816/
echo $A|awk '
{
var=gensub(///,"||",3,$0) ;
sub(/\|\|.*/,"",var);
print var
}'
Output:
http://www.suepearson.co.uk
Hope that helps!
Another sed version:
sed 's|/[:alnum:].*||' file.txt
It matches / followed by an alphanumeric character (so not another forward slash) as well as the rest of characters till the end of the line. Afterwards it replaces it with nothing (ie. deletes it.)
#Daniel H (concerning your comment on andcoz' answer, although long time ago): deleting trailing zeros works with
s,([[:digit:]]\.[[:digit:]]*[1-9])[0]*$,\1,g
it's about clearly defining the matching conditions ...
You should also think about the case where there is no matching delims. Do you want to output the line or not. My examples here do not output anything if there is no match.
You need prefix up to 3rd /, so select two times string of any length not containing / and following / and then string of any length not containing / and then match / following any string and then print selection. This idea works with any single char delims.
echo http://www.suepearson.co.uk/product/174/71/3816/ | \
sed -nr 's,(([^/]*/){2}[^/]*)/.*,\1,p'
Using sed commands you can do fast prefix dropping or delim selection, like:
echo 'aaa #cee: { "foo":" #cee: " }' | \
sed -r 't x;s/ #cee: /\n/;D;:x'
This is lot faster than eating char at a time.
Jump to label if successful match previously. Add \n at / before 1st delim. Remove up to first \n. If \n was added, jump to end and print.
If there is start and end delims, it is just easy to remove end delims until you reach the nth-2 element you want and then do D trick, remove after end delim, jump to delete if no match, remove before start delim and and print. This only works if start/end delims occur in pairs.
echo 'foobar start block #1 end barfoo start block #2 end bazfoo start block #3 end goo start block #4 end faa' | \
sed -r 't x;s/end//;s/end/\n/;D;:x;s/(end).*/\1/;T y;s/.*(start)/\1/;p;:y;d'
If you have access to gnu grep, then can utilize perl regex:
grep -Po '^https?://([^/]+)(?=)' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
http://www.suepearson.co.uk
Alternatively, to get everything after the domain use
grep -Po '^https?://([^/]+)\K.*' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
/product/174/71/3816/
The following solution works for matching / working with multiply present (chained; tandem; compound) HTML or other tags. For example, I wanted to edit HTML code to remove <span> tags, that appeared in tandem.
Issue: regular sed regex expressions greedily matched over all the tags from the first to the last.
Solution: non-greedy pattern matching (per discussions elsewhere in this thread; e.g. https://stackoverflow.com/a/46719361/1904943).
Example:
echo '<span>Will</span>This <span>remove</span>will <span>this.</span>remain.' | \
sed 's/<span>[^>]*>//g' ; echo
This will remain.
Explanation:
s/<span> : find <span>
[^>] : followed by anything that is not >
*> : until you find >
//g : replace any such strings present with nothing.
Addendum
I was trying to clean up URLs, but I was running into difficulty matching / excluding a word - href - using the approach above. I briefly looked at negative lookarounds (Regular expression to match a line that doesn't contain a word) but that approach seemed overly complex and did not provide a satisfactory solution.
I decided to replace href with ` (backtick), do the regex substitutions, then replace ` with href.
Example (formatted here for readability):
printf '\n
<a aaa h href="apple">apple</a>
<a bbb "c=ccc" href="banana">banana</a>
<a class="gtm-content-click"
data-vars-link-text="nope"
data-vars-click-url="https://blablabla"
data-vars-event-category="story"
data-vars-sub-category="story"
data-vars-item="in_content_link"
data-vars-link-text
href="https:example.com">Example.com</a>\n\n' |
sed 's/href/`/g ;
s/<a[^`]*`/\n<a href/g'
apple
banana
Example.com
Explanation: basically as above. Here,
s/href/` : replace href with ` (backtick)
s/<a : find start of URL
[^`] : followed by anything that is not ` (backtick)
*` : until you find a `
/<a href/g : replace each of those found with <a href
Unfortunately, as mentioned, this it is not supported in sed.
To overcome this, I suggest to use the next best thing(actually better even), to use vim sed-like capabilities.
define in .bash-profile
vimdo() { vim $2 --not-a-term -c "$1" -es +"w >> /dev/stdout" -cq! ; }
That will create headless vim to execute a command.
Now you can do for example:
echo $PATH | vimdo "%s_\c:[a-zA-Z0-9\\/]\{-}python[a-zA-Z0-9\\/]\{-}:__g" -
to filter out python in $PATH.
Use - to have input from pipe in vimdo.
While most of the syntax is the same. Vim features more advanced features, and using \{-} is standard for non-greedy match. see help regexp.

How to remove all lines in a file containing a variable, only when located on a line somewhere between braces in BASH?

I am trying to remove all of the matches of $word from a file, but only on lines where $word is placed somewhere within { and } which also appear on the same line, e.g.:
{The cat liked} the fish.
The mouse {did not like} the cat.
The {cat did not} like the spider.
If $word is set to "cat", then lines 1 and 3 are deleted, because "cat" appears between the { and }. If $word is set to "like", then lines 1 and 2 are deleted, because this search term appears on those lines between the { and }. Line 3 is not deleted, because like appears outside of the braces.
The braces are never nested.
The braces never appear split across lines.
I have tried various things, but these all returned errors:
sed -i "/\{*$word*\}/d" ./file.txt
sed -i "/\{.*$word.*\}/d" ./file.txt
sed -i "/\{(.*)$word(.*)\}/d" ./file.txt
How can I remove all of the lines in a file containing a variable, but only when the found variable was on a line and found between two braces?
sed -i "/{.*$word.*}/d" ./file.txt
\{ in sed actually have a special meaning, not the literal {, you should just write a { to represent the literal character. (which would be confusing if you are well familiar with perl regex ...)
Edit:
Be careful with -i, if this is in a script, and accidently $word is not defined or set to empty string, this command will delete all lines containing { no matter what between }.
I would take the answer that #cybeliak gave a little further. If you really want to match cat and not, say scat, then you need to delimit your expression with word boundaries:
sed '/{.*[[:<:]]'$word'[[:>:]].*}/d'
Note - I prefer to use ' ' style quotes to prevent any unintended side-effects...
As an aside, I am a big fan of not using the -i flag. Pipe the result into a different file and confirm for yourself that it's good, before deleting the original.
Much easier to do with awk:
awk -v s="cat" -F '[{}]' '!($2 ~ s)' file
The mouse {did not like} the cat.
awk -v s="like" -F '[{}]' '!($2 ~ s)' file
The {cat did not} like the spider.
This might work for you (GNU sed):
sed -i '/{[^}]*'"$word"'[^}]*}/d' file
N.B. $wordshould not contain } or /.

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources