Print all characters upto a matching pattern from a file

Print all characters upto a matching pattern from a file - bash

Maybe a silly question but I have a text file that needs to display everything upto the first pattern match which is a '/'. (all lines contain no blank spaces)
Example.txt:
somename/for/example/
something/as/another/example
thisfile/dir/dir/example
Preferred output:
somename
something
thisfile
I know this grep code will display everything after a matching pattern:
grep -o '/[^\n]*' '/my/file.txt'
So is there any way to do the complete opposite, maybe rm everything after matching pattern or invert to display my preferred output?
Thanks.

If you're calling an external command like grep, you can get the same results your require with the sed command, i.e.
echo "something/as/another/example" | sed 's:/.*::'
something
Instead of focusing on what you want to keep, think about what you want to remove, in this case everything after the first '/' char. This is what this sed command does.
The leading s means substitute, the :/.*: is the pattern to match, with /.* meaning match the first /' char and all characters after that. The 2nd half of thesedcommand is the replacement. With::`, this means replace with nothing.
The traditional idom for sed is to use s/str/rep/, using / chars to delimit the search from the replacement, but you can use any character you want after the initial s (substitute) command.
Some seds expect the / char, and want a special indication that the following character is the sub/replace delimiter. So if s:/.*:: doesn't work, then s\:/.*:: should work.
IHTH.

Yu can use a much simpler reg exp:
/[^/]*/
The forward slash after the carat is what you're matching to.
jsFiddle

Assuming filename as "file.txt"
cat file.txt | cut -d "/" -f 1
Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).

You just need to include starting anchor ^ and also the / in a negated character class.
grep -o '^[^/]*' file

Related

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.

If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"

Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary

Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file

You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.

from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.

You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.

The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.

fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

Use sed to count periods, commas, and numbers?

I have a file that looks like this:
19.217.179.33,175.176.12.8
253.149.205.57,174.210.221.195
222.118.178.218,255.99.100.202
241.55.199.243,167.98.204.104
38.224.198.117,21.11.184.68
Each line is 2 IP addresses, separated by a comma. So, each line should meet these requirements:
Has 1 comma.
Has 6 periods.
Has ONLY numbers, commas, and periods.
If a line is missing a period, has more/less than one commas, has a letter, is blank, or anything like that - it isn't correct. Basically I just want to use sed or something similar to loop through each line in the file and make sure each of them meets the above requirements.
Is this something that can be done with sed? I know you can use it to delete files that do/don't have matching strings, but I wasn't sure about counting specific characters or verifying that a line only has certain characters.
Any help would be greatly appreciated. Thanks!

I think grep is a better tool for this. You just want to ensure that each line matches a particular regex, so invert the grep with -v and label the input invalid if any line gets output. Something like:
grep -qvE '^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$' input || echo input is valid
You can simplify that a bit:
IP='([0-9]{1,3}\.){3}[0-9]{1,3}'
grep -qvE "^$IP,$IP$" input || echo input is valid
Or if you are more interested in invalid data:
grep -qvE "^$IP,$IP$" input && echo input is invalid

What I'd do is to think up a regular expression that fits the 'proper' lines, and omits them from printing. Like this:
sed -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/d' file
Everything that remains is a wrong line.
Here's the recipe in more detail:
[0-9]{1,3} between one and three digits
\. literal period (just the period is a wildcard and matches any character)
(...){3} three repetitions of something, so together
([0-9]{1,3}\.){3}[0-9]{1,3} makes up something that looks like an IP address. (Though note that it doesn't enforce the <256 rule, so 999.999.999.999 matches.)
/^ ... $/ the match needs to start at the beginning of the line and run until its end.
'/ ... /d' print everything except lines that match what's inside the two slashes
-r is needed to recognise the {1,3} syntax.
This will find and print the lines that are wrong. If you want to delete the wrong lines, you can easily invert this:
sed -i.bak -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/p' file
-i.bak means keep a backup, but overwrite the input file
-n means don't output anything unless expressly directed to output, and
/ ... /p output all the lines that match this regex.

If you would like to display only information about file contents correctness , you can use this command:
sed -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/!{a \
FILE IS INCORRECT
;q;};$aFILE IS OK'
It's modified version of #chw21 answer, but displays only information text:
FILE IS INCORRECT, or
FILE IS OK.

Bash script output text between first match and 2nd match only [duplicate]

I'm trying to use sed to clean up lines of URLs to extract just the domain.
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trailing slash, it doesn't matter)
I have tried:
sed 's|\(http:\/\/.*?\/\).*|\1|'
and (escaping the non-greedy quantifier)
sed 's|\(http:\/\/.*\?\/\).*|\1|'
but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.

Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:
perl -pe 's|(http://.*?/).*|\1|'

In this specific case, you can get the job done without using a non-greedy regex.
Try this non-greedy regex [^/]* instead of .*?:
sed 's|\(http://[^/]*/\).*|\1|g'

With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*\)/.*;\1;p'
Output:
http://www.suon.co.uk
this is:
don't output -n
search, match pattern, replace and print s/<pattern>/<replace>/p
use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p
remember match between brackets \( ... \), later accessible with \1,\2...
match http://
followed by anything in brackets [], [ab/] would mean either a or b or /
first ^ in [] means not, so followed by anything but the thing in the []
so [^/] means anything except / character
* is to repeat previous group so [^/]* means characters except /.
so far sed -n 's;\(http://[^/]*\) means search and remember http://followed by any characters except / and remember what you've found
we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;\(http://[^/]*\)/' but we want to match the rest of the line after the domain so add .*
now the match remembered in group 1 (\1) is the domain so replace matched line with stuff saved in group \1 and print: sed -n 's;\(http://[^/]*\)/.*;\1;p'
If you want to include backslash after the domain as well, then add one more backslash in the group to remember:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*/\).*;\1;p'
output:
http://www.suon.co.uk/

Simulating lazy (un-greedy) quantifier in sed
And all other regex flavors!
Finding first occurrence of an expression:
POSIX ERE (using -r option)
Regex:
(EXPRESSION).*|.
Sed:
sed -r ‍'s/(EXPRESSION).*|./\1/g' # Global `g` modifier should be on
Example (finding first sequence of digits) Live demo:
$ sed -r 's/([0-9]+).*|./\1/g' <<< 'foo 12 bar 34'
12
How does it work?
This regex benefits from an alternation |. At each position engine tries to pick the longest match (this is a POSIX standard which is followed by couple of other engines as well) which means it goes with . until a match is found for ([0-9]+).*. But order is important too.
Since global flag is set, engine tries to continue matching character by character up to the end of input string or our target. As soon as the first and only capturing group of left side of alternation is matched (EXPRESSION) rest of line is consumed immediately as well .*. We now hold our value in the first capturing group.
POSIX BRE
Regex:
\(\(\(EXPRESSION\).*\)*.\)*
Sed:
sed 's/\(\(\(EXPRESSION\).*\)*.\)*/\3/'
Example (finding first sequence of digits):
$ sed 's/\(\(\([0-9]\{1,\}\).*\)*.\)*/\3/' <<< 'foo 12 bar 34'
12
This one is like ERE version but with no alternation involved. That's all. At each single position engine tries to match a digit.
If it is found, other following digits are consumed and captured and the rest of line is matched immediately otherwise since * means
more or zero it skips over second capturing group \(\([0-9]\{1,\}\).*\)* and arrives at a dot . to match a single character and this process continues.
Finding first occurrence of a delimited expression:
This approach will match the very first occurrence of a string that is delimited. We can call it a block of string.
sed 's/\(END-DELIMITER-EXPRESSION\).*/\1/; \
s/\(\(START-DELIMITER-EXPRESSION.*\)*.\)*/\1/g'
Input string:
foobar start block #1 end barfoo start block #2 end
-EDE: end
-SDE: start
$ sed 's/\(end\).*/\1/; s/\(\(start.*\)*.\)*/\1/g'
Output:
start block #1 end
First regex \(end\).* matches and captures first end delimiter end and substitues all match with recent captured characters which
is the end delimiter. At this stage our output is: foobar start block #1 end.
Then the result is passed to second regex \(\(start.*\)*.\)* that is same as POSIX BRE version above. It matches a single character
if start delimiter start is not matched otherwise it matches and captures the start delimiter and matches the rest of characters.
Directly answering your question
Using approach #2 (delimited expression) you should select two appropriate expressions:
EDE: [^:/]\/
SDE: http:
Usage:
$ sed 's/\([^:/]\/\).*/\1/g; s/\(\(http:.*\)*.\)*/\1/' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
Output:
http://www.suepearson.co.uk/
Note: this will not work with identical delimiters.

sed does not support "non greedy" operator.
You have to use "[]" operator to exclude "/" from match.
sed 's,\(http://[^/]*\)/.*,\1,'
P.S. there is no need to backslash "/".

sed - non greedy matching by Christoph Sieghart
The trick to get non greedy matching in sed is to match all characters excluding the one that terminates the match. I know, a no-brainer, but I wasted precious minutes on it and shell scripts should be, after all, quick and easy. So in case somebody else might need it:
Greedy matching
% echo "<b>foo</b>bar" | sed 's/<.*>//g'
bar
Non greedy matching
% echo "<b>foo</b>bar" | sed 's/<[^>]*>//g'
foobar

Non-greedy solution for more than a single character
This thread is really old but I assume people still needs it.
Lets say you want to kill everything till the very first occurrence of HELLO. You cannot say [^HELLO]...
So a nice solution involves two steps, assuming that you can spare a unique word that you are not expecting in the input, say top_sekrit.
In this case we can:
s/HELLO/top_sekrit/ #will only replace the very first occurrence
s/.*top_sekrit// #kill everything till end of the first HELLO
Of course, with a simpler input you could use a smaller word, or maybe even a single character.
HTH!

This can be done using cut:
echo "http://www.suepearson.co.uk/product/174/71/3816/" | cut -d'/' -f1-3

another way, not using regex, is to use fields/delimiter method eg
string="http://www.suepearson.co.uk/product/174/71/3816/"
echo $string | awk -F"/" '{print $1,$2,$3}' OFS="/"

sed certainly has its place but this not not one of them !
As Dee has pointed out: Just use cut. It is far simpler and much more safe in this case. Here's an example where we extract various components from the URL using Bash syntax:
url="http://www.suepearson.co.uk/product/174/71/3816/"
protocol=$(echo "$url" | cut -d':' -f1)
host=$(echo "$url" | cut -d'/' -f3)
urlhost=$(echo "$url" | cut -d'/' -f1-3)
urlpath=$(echo "$url" | cut -d'/' -f4-)
gives you:
protocol = "http"
host = "www.suepearson.co.uk"
urlhost = "http://www.suepearson.co.uk"
urlpath = "product/174/71/3816/"
As you can see this is a lot more flexible approach.
(all credit to Dee)

sed 's|(http:\/\/[^\/]+\/).*|\1|'

There is still hope to solve this using pure (GNU) sed. Despite this is not a generic solution in some cases you can use "loops" to eliminate all the unnecessary parts of the string like this:
sed -r -e ":loop" -e 's|(http://.+)/.*|\1|' -e "t loop"
-r: Use extended regex (for + and unescaped parenthesis)
":loop": Define a new label named "loop"
-e: add commands to sed
"t loop": Jump back to label "loop" if there was a successful substitution
The only problem here is it will also cut the last separator character ('/'), but if you really need it you can still simply put it back after the "loop" finished, just append this additional command at the end of the previous command line:
-e "s,$,/,"

sed -E interprets regular expressions as extended (modern) regular expressions
Update: -E on MacOS X, -r in GNU sed.

Because you specifically stated you're trying to use sed (instead of perl, cut, etc.), try grouping. This circumvents the non-greedy identifier potentially not being recognized. The first group is the protocol (i.e. 'http://', 'https://', 'tcp://', etc). The second group is the domain:
echo "http://www.suon.co.uk/product/1/7/3/" | sed "s|^\(.*//\)\([^/]*\).*$|\1\2|"
If you're not familiar with grouping, start here.

I realize this is an old entry, but someone may find it useful.
As the full domain name may not exceed a total length of 253 characters replace .* with .\{1, 255\}

This is how to robustly do non-greedy matching of multi-character strings using sed. Lets say you want to change every foo...bar to <foo...bar> so for example this input:
$ cat file
ABC foo DEF bar GHI foo KLM bar NOP foo QRS bar TUV
should become this output:
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
To do that you convert foo and bar to individual characters and then use the negation of those characters between them:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/g; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV
In the above:
s/#/#A/g; s/{/#B/g; s/}/#C/g is converting { and } to placeholder strings that cannot exist in the input so those chars then are available to convert foo and bar to.
s/foo/{/g; s/bar/}/g is converting foo and bar to { and } respectively
s/{[^{}]*}/<&>/g is performing the op we want - converting foo...bar to <foo...bar>
s/}/bar/g; s/{/foo/g is converting { and } back to foo and bar.
s/#C/}/g; s/#B/{/g; s/#A/#/g is converting the placeholder strings back to their original characters.
Note that the above does not rely on any particular string not being present in the input as it manufactures such strings in the first step, nor does it care which occurrence of any particular regexp you want to match since you can use {[^{}]*} as many times as necessary in the expression to isolate the actual match you want and/or with seds numeric match operator, e.g. to only replace the 2nd occurrence:
$ sed 's/#/#A/g; s/{/#B/g; s/}/#C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/2; s/}/bar/g; s/{/foo/g; s/#C/}/g; s/#B/{/g; s/#A/#/g' file
ABC foo DEF bar GHI <foo KLM bar> NOP foo QRS bar TUV

Have not yet seen this answer, so here's how you can do this with vi or vim:
vi -c '%s/\(http:\/\/.\{-}\/\).*/\1/ge | wq' file &>/dev/null
This runs the vi :%s substitution globally (the trailing g), refrains from raising an error if the pattern is not found (e), then saves the resulting changes to disk and quits. The &>/dev/null prevents the GUI from briefly flashing on screen, which can be annoying.
I like using vi sometimes for super complicated regexes, because (1) perl is dead dying, (2) vim has a very advanced regex engine, and (3) I'm already intimately familiar with vi regexes in my day-to-day usage editing documents.

Since PCRE is also tagged here, we could use GNU grep by using non-lazy match in regex .*? which will match first nearest match opposite of .*(which is really greedy and goes till last occurrence of match).
grep -oP '^http[s]?:\/\/.*?/' Input_file
Explanation: using grep's oP options here where -P is responsible for enabling PCRE regex here. In main program of grep mentioning regex which is matching starting http/https followed by :// till next occurrence of / since we have used .*? it will look for first / after (http/https://). It will print matched part only in line.

echo "/home/one/two/three/myfile.txt" | sed 's|\(.*\)/.*|\1|'
don bother, i got it on another forum :)

sed 's|\(http:\/\/www\.[a-z.0-9]*\/\).*|\1| works too

Here is something you can do with a two step approach and awk:
A=http://www.suepearson.co.uk/product/174/71/3816/
echo $A|awk '
{
var=gensub(///,"||",3,$0) ;
sub(/\|\|.*/,"",var);
print var
}'
Output:
http://www.suepearson.co.uk
Hope that helps!

Another sed version:
sed 's|/[:alnum:].*||' file.txt
It matches / followed by an alphanumeric character (so not another forward slash) as well as the rest of characters till the end of the line. Afterwards it replaces it with nothing (ie. deletes it.)

#Daniel H (concerning your comment on andcoz' answer, although long time ago): deleting trailing zeros works with
s,([[:digit:]]\.[[:digit:]]*[1-9])[0]*$,\1,g
it's about clearly defining the matching conditions ...

You should also think about the case where there is no matching delims. Do you want to output the line or not. My examples here do not output anything if there is no match.
You need prefix up to 3rd /, so select two times string of any length not containing / and following / and then string of any length not containing / and then match / following any string and then print selection. This idea works with any single char delims.
echo http://www.suepearson.co.uk/product/174/71/3816/ | \
sed -nr 's,(([^/]*/){2}[^/]*)/.*,\1,p'
Using sed commands you can do fast prefix dropping or delim selection, like:
echo 'aaa #cee: { "foo":" #cee: " }' | \
sed -r 't x;s/ #cee: /\n/;D;:x'
This is lot faster than eating char at a time.
Jump to label if successful match previously. Add \n at / before 1st delim. Remove up to first \n. If \n was added, jump to end and print.
If there is start and end delims, it is just easy to remove end delims until you reach the nth-2 element you want and then do D trick, remove after end delim, jump to delete if no match, remove before start delim and and print. This only works if start/end delims occur in pairs.
echo 'foobar start block #1 end barfoo start block #2 end bazfoo start block #3 end goo start block #4 end faa' | \
sed -r 't x;s/end//;s/end/\n/;D;:x;s/(end).*/\1/;T y;s/.*(start)/\1/;p;:y;d'

If you have access to gnu grep, then can utilize perl regex:
grep -Po '^https?://([^/]+)(?=)' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
http://www.suepearson.co.uk
Alternatively, to get everything after the domain use
grep -Po '^https?://([^/]+)\K.*' <<< 'http://www.suepearson.co.uk/product/174/71/3816/'
/product/174/71/3816/

The following solution works for matching / working with multiply present (chained; tandem; compound) HTML or other tags. For example, I wanted to edit HTML code to remove <span> tags, that appeared in tandem.
Issue: regular sed regex expressions greedily matched over all the tags from the first to the last.
Solution: non-greedy pattern matching (per discussions elsewhere in this thread; e.g. https://stackoverflow.com/a/46719361/1904943).
Example:
echo '<span>Will</span>This <span>remove</span>will <span>this.</span>remain.' | \
sed 's/<span>[^>]*>//g' ; echo
This will remain.
Explanation:
s/<span> : find <span>
[^>] : followed by anything that is not >
*> : until you find >
//g : replace any such strings present with nothing.
Addendum
I was trying to clean up URLs, but I was running into difficulty matching / excluding a word - href - using the approach above. I briefly looked at negative lookarounds (Regular expression to match a line that doesn't contain a word) but that approach seemed overly complex and did not provide a satisfactory solution.
I decided to replace href with ` (backtick), do the regex substitutions, then replace ` with href.
Example (formatted here for readability):
printf '\n
<a aaa h href="apple">apple</a>
<a bbb "c=ccc" href="banana">banana</a>
<a class="gtm-content-click"
data-vars-link-text="nope"
data-vars-click-url="https://blablabla"
data-vars-event-category="story"
data-vars-sub-category="story"
data-vars-item="in_content_link"
data-vars-link-text
href="https:example.com">Example.com</a>\n\n' |
sed 's/href/`/g ;
s/<a[^`]*`/\n<a href/g'
apple
banana
Example.com
Explanation: basically as above. Here,
s/href/` : replace href with ` (backtick)
s/<a : find start of URL
[^`] : followed by anything that is not ` (backtick)
*` : until you find a `
/<a href/g : replace each of those found with <a href

Unfortunately, as mentioned, this it is not supported in sed.
To overcome this, I suggest to use the next best thing(actually better even), to use vim sed-like capabilities.
define in .bash-profile
vimdo() { vim $2 --not-a-term -c "$1" -es +"w >> /dev/stdout" -cq! ; }
That will create headless vim to execute a command.
Now you can do for example:
echo $PATH | vimdo "%s_\c:[a-zA-Z0-9\\/]\{-}python[a-zA-Z0-9\\/]\{-}:__g" -
to filter out python in $PATH.
Use - to have input from pipe in vimdo.
While most of the syntax is the same. Vim features more advanced features, and using \{-} is standard for non-greedy match. see help regexp.

grep for a specific pattern in a file?

I have a file textFile.txt
abc_efg#qwe.asd
abc_aer#
#avret
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
qwe.caer
I want to grep to get specific lines :
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
That is the ones that have
[a-z]_[a-z]#[a-z].[a-z]
but the part before the # can have any number of "_"
So far this is what I have :
grep "[a-z]_[a-z]#[a-z].[a-z]" textFile.txt
But I got only one line as the output.
wqe_a#qwea.cae
Could I know a better way to do this ? :)

you can add the _ simply inside [a-z_] so the new command is:
grep "[a-z_]#[a-z].[a-z]" textFile.txt
or if you want it to start with a non _ you can have
grep "[a-z][a-z_]#[a-z].[a-z]" textFile.txt

I would suggest keeping it simple by checking only one # is present in each line:
grep -E '^[^#]+#[^#]+$' file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae

The following selects lines that have at least one underline character followed by letters before the at-sign and one or more letters followed by at least one literal period after the at-sign:
$ grep '_[a-z]\+#[a-z]\+\.' textFile.txt
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
Notes
An unescaped period matches any character. If you want to match a literal period, it must be escaped like '.`.
Thus, #[a-z].[a-z] matches an at-sign, followed by a letter, followed by anything at all, followed by a letter.
[a-z] matches a single letter. Thus _[a-z]# would match only if there was only one character between the underline and the at-sign. To match one or more letters, use [a-z]\+.
#[a-z]\+\. will match an at-sign, followed by one or more letters, followed by a literal period character.

When you do [a-z] it only matches one character of that set. That's why you are only getting wqe_a#qwea.cae back from your grep call because there is only one character between the _ and the #.
To match more than one character, you can use + or *. + means one or more of the set and * any number of that set. As well, an unescaped . means any character.
So something like:
grep "[a-z]\+_[a-z]\+#[a-z]\+\.[a-z]\+" textFile.txt would work for this. There are shorter, less specific ways of doing this as well (that other answers have shown).
Note the escapes before the + signs and the . .

This regex should get all valid email from a text file:
grep -E -o "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
This greps for pattern like this text#text.some_more_text

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Print all characters upto a matching pattern from a file - bash

Yu can use a much simpler reg exp: /[^/]*/ The forward slash after the carat is what you're matching to. jsFiddle

Assuming filename as "file.txt" cat file.txt | cut -d "/" -f 1 Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).

You just need to include starting anchor ^ and also the / in a negated character class. grep -o '^[^/]*' file

Related

grep for a variable content with a dot

Grepping for exact string while ignoring regex for dot character

Use sed to count periods, commas, and numbers?

Bash script output text between first match and 2nd match only [duplicate]

grep for a specific pattern in a file?

Categories

Resources