REGEX strategy to replace line breaks inside group - regexp-replace

I'm trying to make a replacement in one pass with just one Regular expression but I think this is not possible at all. I'm using RegexBuddy but I'm always getting a catastrophic result and the expression cannot be evaluated.
For this text:
3 bla bla! !
4 yep yep! ?
FROM HERE
5 something randdom here!
6 perhaps some HTML there
TO HERE
7 what ever you like over here
8 and that's all folks!enter code here
I want to find a REGEX that replaces the line breaks by something else, let's say $$, but only on the section "from here" "to here". So basically the end result would be this:
3 bla bla! !
4 yep yep! ?
FROM HERE$$
$$
5 something randdom here!$$
$$
6 perhaps some HTML there$$
$$
TO HERE
7 what ever you like over here
8 and that's all folks!
I have this expression
((FROM HERE))((.*)(\n))+(TO HERE)
But I'm stuck so far trying to replace just the \n group by something else. I have done similar things in the past so I would say this should be possible in one go.
If this is not possible in regex I would simply create a C# console app to take first that text to a string and then replace each \n by $$, then put it back. That shouldn't be that difficult.

If you are using .NET, one option could be
(?s)(?<=^FROM HERE\b.*?)\r?\n\r?\n(?=.*?^TO HERE\b)
(?s) Inline modifier, dot matches a newline
(?<=^FROM HERE\b.*?) Assert FROM HERE at the left at the start of the line
\r?\n\r?\n Match 2 newlines
(?=.*?^TO HERE\b) Assert TO HERE at the right at the start of the line
In the replacement use (with double escapes $$)
\n$$$$\n
See a .NET regex demo and a C# demo.

Related

How to write a hashtag matching regex

I have a problem with writing an regex (in Ruby, but I don't think that it changes anything) that selects all proper hashtags.
I used ( /(^|\s)(#+)(\w+)(\s|$)/ ), which doesn't work and I have no idea why.
In this example:
#start #middle #middle2 #middle3 bad#example #another#bad#example #end
it should mark #start, #middle, #middle2, #middle3 and #end.
Why doesn't my code work and how should a proper regex look?
As for why the original does not work lets look at each bit
(^|\s) Start of line or white space
(#+) one or more #
(\w+) one or more alphanumeric characters
(\s|$) white space or end of line
The main problem is a conflict between 1 and 4. When 1 matches white space that white space was already matched in the last group as part 4. So 1 does not exist and the match moves to the next possible
4 is not really needed since 3 will not match white space.
So here is the result
(?:^|\s)#(\w+)
https://regex101.com/r/iU4dZ3/3
does [^#\w](#[\w]*)|^(#[\w]*) works?
getting an # not following a character, and capturing everything until not a word.
the or case handle the case where the first char is #.
Live demo: http://regexr.com/3al01
How's this work for you?
(#[^\s+]+)
This says find a hash tag then everything until a whitespaces.
One more regex:
\B#\w+\b
This one doesn't capture whitespaces...
https://regex101.com/r/iU4dZ3/4

How to escape period in ed

I'm studying the ed text editor.
To exit from input mode, a user should enter a line a single period (.).
Let's say I want to enter the period as text.
I thought of a workaround: first, I insert something like ... Then, I replace .. with ..
But my approach is little unwieldy. Is there a better way to do this?
Reading through the C source for GNU ed(1), there is no escape character. On the occasions that I've wanted to do this, I tend to add a blank line and then use a quick substitution:
a↵
↵
.↵
s/^/.↵
or you can add a character then delete it (which, if you're playing ed(1) golf), is one character more than above)
a↵
x↵
.↵
s/./.↵
I didn't found magic escape sequence.
It seems it doesn't exist.
But this link offers 2 solutions. First I described in my question. Second one is closer to a solution with escape.
r ! echo .

Block Indent Regex

I'm having problems about a regexp.
I'm trying to implement a regex to select just the tab indent blocks, but i cant find a way of make it work:
Example:
INDENT(1)
INDENT(2)
CONTENT(a)
CONTENT(b)
INDENT(3)
CONTENT(c)
So I need blocks like:
INDENT(2)
CONTENT(a)
CONTENT(b)
AND
INDENT(3)
CONTENT(c)
How I can do this?
really tks, its almost that, here is my original need:
table
tr
td
"joao"
"joao"
td
"marcos"
I need separated "td" blocks, could i adapt your example to that?
It depends on exactly what you are trying to do, but maybe something like this:
^(\t+)(\S.*)\n(?:\1\t.*\n)*
Working example: http://www.rubular.com/r/qj3WSWK9JR
The pattern searches for:
^(\t+)(\S.*)\n - a line that begins with a tab (I've also captured the first line in a group, just to see the effect), followed by
(?:\1\t.*\n)* - lines with more tabs.
Similarly, you can use ^( +)(\S.*)\n(?:\1 .*\n)* for spaces (example). Mixing spaces and tabs may be a little problematic though.
For the updated question, consider using ^(\t{2,})(\S.*)\n(?:\1\t.*\n)*, for at least 2 tabs at the beginning of the line.
You could use the following regex to get the groups...
[^\s]*.*\r\n(?:\s+.*\r*\n*)*
this requires that your lines not begin with white space for the beginning of the blocks.

TEXTMATE: delete comments from document

I know that you can use this to remove blank lines
sed /^$/d
and this to remove comments starting with #
sed /^#/d
but how to you do delete all the comments starting with // ?
You just need to "escape" the slashes with the backslash.
/\/\//
the ^ operator binds it to the front of the line, so your example will only affect comments starting in the first column. You could try adding spaces and tabs in there, too, and then use the alternation operator | to choose between two comment identifiers.
/^[ \t]*(\/\/|$)/
Edit:
If you simply want to remove comments from the file, then you can do something like:
/(\/\/|$).*/
I don't know what the 'd' operator at the end does, but the above expression should match for you modulo having to escape the parentheses or the alternation operator (the '|' character)
Edit 2:
I just realized that using a Mac you may be "shelling" that command and using the system sed. In that case, you could try putting quotation marks around the search pattern so that the shell doesn't do anything crazy to all of your magic characters. :) In this case, 'd' means "delete the pattern space," so just stick a 'd' after the last example I gave and you should be set.
Edit 3:
Oh I just realized, you'll want to beware that if you don't catch things inside of quotes (i.e. you don't want to delete from # to end of line if it's in a string!). The regexp becomes quite a bit more complicated in that case, unfortunately, unless you just forgo checking lines with strings for comments. ...but then you'd need to use the substitution operation to sed rather than search-and-delete-match. ...and you'd need to put in more escapes, and it becomes madness. I suggest searching for an online sed helper (there are good regex testers out there, maybe there's one for sed?).
Sorry to sort of abandon the project at this point. This "problem" is one that sed can do but it becomes substantially more complex at every stage, as opposed to just whipping up a bit of Python to do it.

regex to match trailing whitespace, but not lines which are entirely whitespace (indent placeholders)

I've been trying to construct a ruby regex which matches trailing spaces - but not indentation placeholders - so I can gsub them out.
I had this /\b[\t ]+$/ and it was working a treat until I realised it only works when the line ends are [a-zA-Z]. :-( So I evolved it into this /(?!^[\t ]+)[\t ]+$/ and it seems like it's getting better, but it still doesn't work properly. I've spent hours trying to get this to work to no avail. Please help.
Here's some text test so it's easy to throw into Rubular, but the indent lines are getting stripped so it'll need a few spaces and/or tabs. Once lines 3 & 4 have spaces back in, it shouldn't match on lines 3-5, 7, 9.
some test test
some test test
some other test (text)
some other test (text)
likely here{ dfdf }
likely here{ dfdf }
and this ;
and this ;
Alternatively, is there an simpler / more elegant way to do this?
If you're using 1.9, you can use look-behind:
/(?<=\S)[\t ]+$/
but unfortunately, it's not supported in older versions of ruby, so you'll have to handle the captured character:
str.gsub(/(\S)[\t ]+$/) { $1 }
Your first expression is close, and you just need to change the \b to a negated character class. This should work better:
/([^\t ])[\t ]+$
In plain words, this matches all tabs and spaces on lines that follow a character that is not a tab or a space.
Wouldn't this help?
/([^\t ])([\t ]+)$/
You need to do something with the matched last non-space character, though.
edit: oh, you meant non blank lines. Then you would need something like /([^\s])\s+/ and sub them with the first part
I'm not entirely sure what you are asking for, but wouldn't something like this work if you just want to capture the trailing whitespaces?
([\s]+)$
or if you only wanted to capture tabs
([ \t]+)$
Since regexes are greedy, they'll capture as much as they can. You don't really need to give them context beforehand if you know what you want to capture.
I still am not sure what you mean by trailing indentation placeholders, so I'm sorry if I'm misunderstanding.
perhaps this...
[\t|\s]+?$
or
[ ]+$

Resources