I want to match a pattern like this:
ssfd
or this:
oifdsofijsdf d
So a first name alone or first name and middle initial.
"dsfsf m" =~ /^[A-Za-z]+\s[A-Za-z]$/
To make the middle initial optional, I added the ?:
"dsfsf" =~ /^[A-Za-z](+\s[A-Za-z])?$/
But it gives me error:
target of repeat operator is not specified
What am I doing wrong?
The problem is you misplaced the opening parenthesis and have a "1 or more" operator (+) right next to it, so the regex doesn't know what you can have one or more of:
^[A-Za-z]+(\s[A-Za-z])?$
Is the regex you likely intended to use (and which seems to work on Rubular for your test cases).
/^[A-Za-z]+\s*[A-Za-z]*$/
* mean length is 0 or any
Related
I'm trying to censor letters in a word with word.gsub(/[^#{guesses}]/i, '-'), where word and guesses are strings.
When guesses is "", I get this error RegexpError: empty char-class: /[^]/i. I could sort such cases with an if/else statement, but can I add something to the regex to make it work in one line?
Since you are only matching (or not matching) letters, you can add a non-letter character to your regex, e.g. # or %:
word.gsub(/[^%#{guesses}]/i, '-')
See IDEONE demo
If #{guesses} is empty, the regex will still be valid, and since % does not appear in a word, there is no risk of censuring some guessed percentage sign.
You have two options. One is to avoid testing if your matches are empty, that is:
unless (guesses.empty?)
word.gsub(/^#{Regex.escape(guesses)}/i, '-')
end
Although that's not your intention, it's really the safest plan here and is the most clear in terms of code.
Or you could use the tr function instead, though only for non-empty strings, so this could be substituted inside the unless block:
word.tr('^' + guesses.downcase + guesses.upcase, '-')
Generally tr performs better than gsub if used frequently. It also doesn't require any special escaping.
Edit: Added a note about tr not working on empty strings.
Since tr treats ^ as a special case on empty strings, you can use an embedded ternary, but that ends up confusing what's going on considerably:
word.tr(guesses.empty? ? '' : ('^' + guesses.downcase + guesses.upcase), '-')
This may look somewhat similar to tadman's answer.
Probably you should keep the string that represents what you want to hide, instead of what you want to show. Let's say this is remains. Then, it would be easy as:
word.tr(remains.upcase + remains.downcase, "-")
I have a text with many expressions like this <.....>, e.g.:
<..> Text1 <.sdfdsvd> Text 2 <....dgdfg> Text3 <...something> Text4
How can I eliminate now all brackets <...> and all commands/texts between these brackets? But the other "real" text between these (like text1, text2 above) should not be touched.
I tried with the regular expression:
<.*>
But this finds also a block like this, including the inbetween text:
<..> Text1 <.sdfdsvd>
My second try was to search for alle expressions <.> without a third bracket between these two, so I tried:
<.*[^>^<]>
But that does not work either, no change in behavior. How to construct the needed expression correctly?
This works in Notepad++:
Find what: <[^>]+?>
Replace with: nothing
Try it out: http://regex101.com/r/lC9mD4
There are a few problems with your attempt: <.*[^>^<]>
.* matches all characters up through the final possible match. This means that all tags except the last will be bypassed. This is called greedy. In my solution, I have changed it to possessive, which goes up to the first possible match: .*?...although I apply this to the character class itself: [^>]+?.
[^>^<] is incorrect for two reasons, one small, one big. The small reason is that the first caret ^ says "do not match any of the following characters", and the characters following it are >, ^, and <. So you are saying you don't want to match the caret character, which is incorrect (but not harmful). The larger problem is that this is attempting to match exactly one character, when it needs to be one or more, which is signified by the plus sign: [^><]+.
Otherwise, your attempt is not that far off from my solution.
This seems to work:
<[^\s]*>
It looks for a left bracket, then anything that isn't whitespace between the brackets, then a right bracket. It would need some adjusting if there's whitespace between the brackets (<text1 text2>), though, and at that point a modification of one of your attempts would work better:
<[^<^>]*>
This one looks for a left bracket, then anything that isn't a left bracket or right bracket, then a right bracket.
Try <.*?>. If you don't use the "?", regular expressions will try to find the longest string that matches. Using "*?" will force to find the shortest.
I have 2 questions regarding the following regex from Why's Poignant Guide to Ruby:
1: What does the minus sign mean here? It doesn't seem to be designating a range because there is nothing to the left of it other than the bracket.
2: Why is it necessary to escape the closing parenthesis? After you escape the opening one, what special meaning could the closing parenthesis have?
/\([-\w]+\)/
1)When the minus sign is at the begining or at the end of a character class, it is seen as literal.
2) escaping closing parenthesis is a convention. The goal is IMO, to avoid an ambiguity with a possible opening parenthesis before. Consider these examples:
/(\([-\w]+\))/ or /(\([-\w]+)\)/
1) The minus sign is a literal minus sign. Since it cannot possibly designate a range, it has no special meaning and so the character class is equivalent to [\-\w] - escaping the hyphen is optional, as you observe in your second point...
2) ...however, it isn't always good form to not escape something just because the regular expression engine allows it. For example, this regex: ([([^)-]+) is perfectly valid (I think...) but entirely unclear because of the fact that characters which normally have special meanings are used as literal characters without being escaped. Valid, yes, but not obvious, and someone who doesn't know all the rules will become very confused trying to understand it.
The the minus sign -, or say the hyphen, means exact just the character -. The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. It's not designating a range, so it's not confusing. You can also choose to use \- if you like.
As to why to escape ), I think it means to reduce the regex engine's work so that it doesn't have to remember if an opening parenthesis is before.
- sign in this regex actually means a - sign that you want to see in the text.
Non-escaped parentheses means a match group, that will be available for you, for example, by $1 variable.
> "(-w)" =~ /\([-\w]+\)/
> $1 # => nil
and
> "(-w)" =~ /([-\w]+)/
> $1 # => -w
You can go to Rubular and try both regexes \([-\w]+\) and ([-\w]+) - and you will see different results by passing (-w) as a test. You can notice match groups appearing.
I wanted to create a RE for currency like $123.45. It should match $123.4, $123.45 It should not match $123.456 or 123.45 I found solutions for this in this site and one of it was
^[$][0-9]+(.[0-9]{1,2})?$
the pattern as expected matches $123.4 and $123.45. But when I put the currency as part of a statement like...
"The cost of one ticket is $123.45 and the cost of 2 is $246.90" Now the pattern doesnt find any match.
I think its because of ^ and $ which are start and end of the line characters respectively.
How can I get the result as 2 matches? Please help me.
Try to remove ^ and $ from your RE. This symbols tells that search string should start with $ and end with number? Instead of them use brackets to select a group ().
I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"