Regular expression for not matching two underscores

Regular expression for not matching two underscores - ruby

I don't know whether it's really easy and I'm out of my mind....
In Ruby's regular expressions, how to match strings which do not contain two consecutive underscores, i.e., "__".
Ex:
Matches: "abcd", "ab_cd", "a_b_cd", "%*##_#+"
Does not match: "ab__cd", "a_b__cd"
-thanks
EDIT: I can't use reverse logic, i.e., checking for "__" strings and excluding them, since need to use with Ruby on Rails "validates_format_of()" which expects a regular expression with which it will match.

You could use negative lookahead:
^((?!__).)*$
The beginning-of-string ^ and end of string $ are important, they force a check of "not followed by double underscore" on every position.

/^([^_]*(_[^_])?)*_?$/
Tests:
regex=/^([^_]*(_[^_])?)*_?$/
# Matches
puts "abcd" =~ regex
puts "ab_cd" =~ regex
puts "a_b_cd" =~ regex
puts "%*##_#+" =~ regex
puts "_" =~ regex
puts "_a_" =~ regex
# Non-matches
puts "__" =~ regex
puts "ab__cd" =~ regex
puts "a_b__cd" =~ regex
But regex is overkill for this task. A simple string test is much easier:
puts ('a_b'['__'])

Would altering your logic still be valid?
You could check if the string contains two underscores with the regular expression [_]{2} and then just ignore it?

Negative lookahead
\b(?!\w*__\w*)\w+\b
Search for two consecutive underscores in the next word from the beginning of the word, and match that word if it is not found.
Edit: To accommodate anything other than whitespaces in the match:
(?!\S*__\S*)\S+
If you wish to accommodate a subset of symbols, you can write something like the following, but then it will match _cd from a_b__cd among other things.
(?![a-zA-Z0-9_%*##+]*__[a-zA-Z0-9_%*##+]*)[a-zA-Z0-9_%*##+]+

Related

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."

You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.

I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.

def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

Ruby, True/false regex

So I've got an issue where my regex looks like this: /true|false/.
When I check the word falsee I get a true from this regex, is there a way to just limit it to the exact true or false words?

Use this regex:
/^(true|false)$/
It will match the beginning and end of the test string with ^ and $, respectively, so nothing else can be in the string (exact match).
See live example at Regex101.
UPDATE (see #w0lf's comment): The parentheses are to isolate the true|false clause so that they are not grouped incorrectly. (This also puts the true or false match in the first capturing group, but since it seems that you are only matching and not capturing an output, this should not make a difference).
Alternatively, if you simply want to match two values, there are easier ways in Ruby. #SimoneCarletti suggests one. You can also use the basic == or eql? operators. Try running the following script to see that these all work:
values = ["true", "false", "almosttrue", "falsealmost"]
values.each do | value |
puts value
# these three are all equivalent
puts "match with if" if value == "true" || value == "false"
puts "match with equals?" if (value.eql? "true") || (value.eql? "false")
puts "match with regex" if /^(true|false)$/.match value
puts
end

You need to use the ^ and $ anchors:
/^(true|false)$/
Edit: As Cary pointed out in the comments, the pattern above will also match multiline strings that happen to contain a line with true or false. To avoid this, use the \A and \z delimiters that match the beginning and end of string respectively:
/\A(true|false)\z/

Try out
/^(true|false)$/
where ^ is the start of a line and $ the end.

You can use
/^(true|false)$/
or even better
/\A(true|false)\z/
that will match the beginning and end of the string (instead of line). If you only need to match for whose words, it may be more efficient to use a simple array and include?:
%w( true false ).include?(value)

Regex to find strings with only letters or numbers or both

I am searching for strings with only letters or numbers or both. How could I write a regex for that?

You can use following regex to check if the string contains letters and/or numbers
^[a-zA-Z0-9]+$
Explanation
^: Starts with
[]: Character class
a-zA-Z: Matches any alphabet
0-9: Matches any number
+: Matches previous characters one or more time
$: Ends with
RegEx101 Demo

"abc&#*(2743438" !~ /[^a-z0-9]/i # => false
"abc2743438" !~ /[^a-z0-9]/i # => true

This example let to avoid multiline anchors use (^ or $) (which may present a security risk) so it's better to use \A and \z, or to add the :multiline => true option in Rails.
Only letters and numbers:
/\A[a-zA-Z0-9]+\z/
Or if you want to leave - and _ chars also:
/\A[a-zA-Z0-9_\-]+\z/

How the Anchor \z and \G works in Ruby?

I am using Ruby1.9.3. I am newbie to this platform.
From the doc I just got familiared with two anchor which are \z and \G. Now I little bit played with \z to see how it works, as the definition(End or End of String) made me confused, I can't understand what it meant say - by End. So I tried the below small snippets. But still unable to catch.
CODE
irb(main):011:0> str = "Hit him on the head me 2\n" + "Hit him on the head wit>
=> "Hit him on the head me 2\nHit him on the head with a 24\n"
irb(main):012:0> str =~ /\d\z/
=> nil
irb(main):013:0> str = "Hit him on the head me 24 2\n" + "Hit him on the head >
=> "Hit him on the head me 24 2\nHit him on the head with a 24\n"
irb(main):014:0> str =~ /\d\z/
=> nil
irb(main):018:0> str = "Hit1 him on the head me 24 2\n" + "Hit him on the head>
=> "Hit1 him on the head me 24 2\nHit him on the head with a11 11 24\n"
irb(main):019:0> str =~ /\d\z/
=> nil
irb(main):020:0>
Every time I got nil as the output. So how the calculation is going on for \z ? what does End mean? - I think my concept took anything wrong with the End word in the doc. So anyone could help me out to understand the reason what is happening with the out why so happening?
And also i didn't find any example for the anchor \G . Any example please from you people to make visualize how \G used in real time programming?
EDIT
irb(main):029:0>
irb(main):030:0* ("{123}{45}{6789}").scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):031:0> ('{123}{45}{6789}').scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):032:0>
Thanks

\z matches the end of the input. You are trying to find a match where 4 occurs at the end of the input. Problem is, there is a newline at the end of the input, so you don't find a match. \Z matches either the end of the input or a newline at the end of the input.
So:
/\d\z/
matches the "4" in:
"24"
and:
/\d\Z/
matches the "4" in the above example and the "4" in:
"24\n"
Check out this question for example of using \G:
Examples of regex matcher \G (The end of the previous match) in Java would be nice
UPDATE: Real-World uses for \G
I came up with a more real world example. Say you have a list of words that are separated by arbitrary characters that cannot be well predicted (or there's too many possibilities to list). You'd like to match these words where each word is its own match up until a particular word, after which you don't want to match any more words. For example:
foo,bar.baz:buz'fuzz*hoo-har/haz|fil^bil!bak
You want to match each word until 'har'. You don't want to match 'har' or any of the words that follow. You can do this relatively easily using the following pattern:
/(?<=^|\G\W)\w+\b(?<!har)/
rubular
The first attempt will match the beginning of the input followed by zero non-word character followed by 3 word characters ('foo') followed by a word boundary. Finally, a negative lookbehind assures that the word which has just been matched is not 'har'.
On the second attempt, matching picks back up at the end of the last match. 1 non-word character is matched (',' - though it is not captured due to the lookbehind, which is a zero-width assertion), followed by 3 characters ('bar').
This continues until 'har' is matched, at which point the negative lookbehind is triggered and the match fails. Because all matches are supposed to be "attached" to the last successful match, no additional words will be matched.
The result is:
foo
bar
baz
buz
fuzz
hoo
If you want to reverse it and have all words after 'har' (but, again, not including 'har'), you can use an expression like this:
/(?!^)(?<=har\W|\G\W)\w+\b/
rubular
This will match either a word which is immediately preceeded by 'har' or the end of the last match (except we have to make sure not to match the beginning of the input). The list of matches is:
haz
fil
bil
bak
If you do want to match 'har' and all following words, you could use this:
/\bhar\b|(?!^)(?<=\G\W)\w+\b/
rubular
This produces the following matches:
har
haz
fil
bil
bak

Sounds like you want to know how Regex works? Or do you want to know how Regex works with ruby?
Check these out.
Regexp Class description
The Regex Coach - Great for testing regex matching
Regex cheat sheet
I understand \G to be a boundary match character. So it would tell the next match to start at the end of the last match. Perhaps since you haven't made a match yet you cant have a second.
Here is the best example I can find. Its not in ruby but the concept should be the same.
I take it back this might be more useful

Remove Duplicate Numbers and Operators from String

I'm trying to take a string which is a simple math expression, remove all spaces, remove all duplicate operators, convert to single digit numbers, and then evaluate.
For example, an string like "2 7+*3*95" should be converted to "2+3*9" and then evaluated as 29.
Here's what I have so far:
expression.slice!(/ /) # Remove whitespace
expression.slice!(/\A([\+\-\*\/]+)/) # Remove operators from the beginning
expression.squeeze!("0123456789") # Single digit numbers (doesn't work)
expression.squeeze!("+-*/") # Removes duplicate operators (doesn't work)
expression.slice!(/([\+\-\*\/]+)\Z/) # Removes operators from the end
puts eval expression
Unfortunately this doesn't make single digit numbers nor remove duplicate operators quite like I expected. Any ideas?

"2 7+*3*95".gsub(/([0-9])[0-9 ]*/, '\1').gsub(/([\+\*\/\-])[ +\*\/\-]+/, '\1')
The first regex handles the single-digit thing and the second handles repeat operators. You could probably condense it into a single regex if you really wanted to.
This works for a quick-and-dirty solution, but you might be better served by a proper parser.

DATA.each { |expr|
expr.gsub!(%r'\s+', '')
expr.gsub!(%r'([*/+-])[*/+-]+', '\1')
expr.gsub!(%r'(\d)\d+', '\1')
expr.sub!(%r'\A[*/+-]+', '')
expr.sub!(%r'[*/+-]+\Z', '')
puts expr + ' = ' + eval(expr).to_s
}
__END__
2 7+*3*95
+-2 888+*3*95+8*-2/+435+-

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regular expression for not matching two underscores - ruby

You could use negative lookahead: ^((?!__).)*$ The beginning-of-string ^ and end of string $ are important, they force a check of "not followed by double underscore" on every position.

Would altering your logic still be valid? You could check if the string contains two underscores with the regular expression [_]{2} and then just ignore it?

Related

How to remove strings that end with a particular character in Ruby

Ruby, True/false regex

Regex to find strings with only letters or numbers or both

How the Anchor \z and \G works in Ruby?

Remove Duplicate Numbers and Operators from String

Categories

Resources