gsub same pattern from a string - ruby

I have big problems with figuring out how regex works.
I want this text:
This is an example\e[213] text\e[123] for demonstration
to become this:
This is an example text for demonstration.
So this means that I want to remove all strings that begin with \e[ and end with ]
I just cant find a proper regex for this.
My current regex looks like this:
/.*?(\\e\[.*\])?.*/ig
But it dont work. I appreciate every help.

You only need to do this:
txt.gsub(/\\e\[[^\]]*\]/i, "")
There is no need to match what is before or after with .*
The second problem is that you use .* to describe the content between brackets. Since the * quantifier is by default greedy, it will match all until the last closing bracket in the same line.
To prevent this behaviour a way is to use a negated character class in place of the dot that excludes the closing square brackets [^\]]. In this way you keep the advantage of using a greedy quantifier.

gsub can do the global matching for you.
re = /\\e\[.+?\]/i
'This is an example\e[213] text\e[123] for demonstration'.gsub re, ''
=> "This is an example text for demonstration"

You can make the search less greedy by using .+? in the regex
puts 'This is an example\e[213] text\e[123] for demonstration'.gsub(/\\e\[.+?\]/, '')
This is an example text for demonstration
=> nil

Related

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Regular expression to clean string

I'm struggling to figure out even where to start with this. I believe there is a regular expression to make this a fairly straight forward task. I want to trim off the extra asterisks in a string.
Example string:
test="AM*BE*3***LAST****~"
I would like it to trim asterisks off only the end that don't have repeating symbols. So the resulting value in the variable would be:
test="AM*BE*3***LAST~"
In Perl I was able to use this:
s/\*+~+/~/;
Is there something similar I can do in Ruby? I'm sure there is, just struggling to find it for some reason. Any help would be greatly appreciated.
You could use this regex:
/\*+~$/
Then use the gsub method to replace all matches with a tilde ~:
test = "AM*BE*3***LAST****~"
test.gsub!(/\*+~$/, '~')
# => "AM*BE*3***LAST~"
Or you could use this more flexible regex, which matches any amount of characters after * until end of line:
/\*+([^*])+$/
Then use the first capture group ($1) as the replacement:
test.gsub(/\*+([^*])+$/) { $1 }
Ruby's String class has the [] method, which lets us use regexp as a parameter. We can also assign to that, allowing us to do things like:
foo = "AM*BE*3***LAST****~"
foo[/\*+~+$/] = '~'
foo # => "AM*BE*3***LAST~"
That reuses the match pattern from your Perl search/replace. (I'm assuming you only want to match at the end of the line because of your examples. If it needs to be anywhere in the string remove the trailing $ from the pattern.)
You can use Rubular and try to test the regex and achieve what you need based on the references down the page.
http://rubular.com/

Replacing partial regex matches in place with Ruby

I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

regular expression gsub only if it does not have anything before

Is there anyway to scan only if there is nothing before what I am scanning for.
For example I have a post and I am scanning for a forward slash and what follows it but I do not want to scan for a forward slash if it is not the beginning character.
I want to scan for /this but I do not want to scan for this/this or http://this.com.
The regular expression I am currently using is..
/\/(\w+)/
I am using this with gsub to link each /forwardslash.
I think what you are asking for is to only match words that begin with '/', not strings or lines beginning with '/'. If that is true, I believe the following regex will work: %r{(?:^|\s+)/(\w+)}:
For example:
"/foo /this this/that http://this".scan %r{(?:^|\s+)/(\w+)} # => [["foo"], ["this"]]
The caret (^) character means "beginning of string" -- a dollar sign ($) means "end of string."
So
/^\/(\w+)/
...will get you what you want -- only matching at the beginning of the string.
First thing, since you're using a regex with slashes change the delimiter to something else, then you won't have to escape the backslashes and it will be easier to read.
Secondly, if you want to replace the slash as well then include it in the capture.
On to the regex.
...if it is not the beginning
character...
...of a line:
!^(/\w+)!
if it is not the beginning
character...
...of a word:
!\s(/\w+)!
but that won't match if it's at the very beginning of a line. For that you'll need something a lot more complex, so I'd just run both the regexes here instead of creating that monster.

Strip words beginning with a specific letter from a sentence using regex

I'm not sure how to use regular expressions in a function so that I could grab all the words in a sentence starting with a particular letter. I know that I can do:
word =~ /^#{letter}/
to check if the word starts with the letter, but how do I go from word to word. Do I need to convert the string to an array and then iterate through each word or is there a faster way using regex? I'm using ruby so that would look like:
matching_words = Array.new
sentance.split(" ").each do |word|
matching_words.push(word) if word =~ /^#{letter}/
end
Scan may be a good tool for this:
#!/usr/bin/ruby1.8
s = "I think Paris in the spring is a beautiful place"
p s.scan(/\b[it][[:alpha:]]*/i)
# => ["I", "think", "in", "the", "is"]
\b means 'word boundary."
[:alpha:] means upper or lowercase alpha (a-z).
You can use \b. It matches word boundaries--the invisible spot just before and after a word. (You can't see them, but oh they're there!) Here's the regex:
/\b(a\w*)\b/
The \w matches a word character, like letters and digits and stuff like that.
You can see me testing it here: http://rubular.com/regexes/13347
Similar to Anon.'s answer:
/\b(a\w*)/g
and then see all the results with (usually) $n, where n is the n-th hit. Many libraries will return /g results as arrays on the $n-th set of parenthesis, so in this case $1 would return an array of all the matching words. You'll want to double-check with whatever library you're using to figure out how it returns matches like this, there's a lot of variation on global search returns, sadly.
As to the \w vs [a-zA-Z], you can sometimes get faster execution by using the built-in definitions of things like that, as it can easily have an optimized path for the preset character classes.
The /g at the end makes it a "global" search, so it'll find more than one. It's still restricted by line in some languages / libraries, though, so if you wish to check an entire file you'll sometimes need /gm, to make it multi-line
If you want to remove results, like your title (but not question) suggests, try:
/\ba\w*//g
which does a search-and-replace in most languages (/<search>/<replacement>/). Sometimes you need a "s" at the front. Depends on the language / library. In Ruby's case, use:
string.gsub(/(\b)a\w*(\b)/, "\\1\\2")
to retain the non-word characters, and optionally put any replacement text between \1 and \2. gsub for global, sub for the first result.
/\ba[a-z]*\b/i
will match any word starting with 'a'.
The \b indicates a word boundary - we want to only match starting from the beginning of a word, after all.
Then there's the character we want our word to start with.
Then we have as many as possible letter characters, followed by another word boundary.
To match all words starting with t, use:
\bt\w+
That will match test but not footest; \b means "word boundary".
Personally i think that regex is overkill for this application, simply running a select is more than capable of solving this particular problem.
"this is a test".split(' ').select{ |word| word[0,1] == 't' }
result => ["this", "test"]
or if you are determined to use regex then go with grep
"this is a test".split(' ').grep(/^t/)
result => ["this", "test"]
Hope this helps.

Resources