When using gsub with [a-zA-Z] in ruby - ruby

I've seen this [a-zA-Z] for the gsub method:
string.gsub(/[a-zA-Z]/,"-")
where it will find any lower case letters a-z and or uppercase letters A-Z.
My question is why does this a-z work back to back with A-Z : a-zA-Z ?
Where might I find more info on using [a-zA-Z] in ruby?

Inside a character class (the [] inside the regex), you can list all the characters you want :
/[abcdefg]/
to gain some space, you can define a range with an hyphen (-) and a letter on each side of the - :
/[a-g]/
Since it's clear that this range is from a to g, you could write another character directly after :
/[a-gm]/
You could also define another range :
/[a-gm-z]/
From the documentation :
A range can be followed by another range, so [abcdwxyz] is equivalent
to [a-dw-z]
Note that for your example, you could also use a case insensitive regex :
string.gsub(/[a-z]/i,"-")
Finally, you can use ranges with unicode characters :
arrows = /[\u2190-\u21FF]/
"a⇸b⇙c↺d↣e↝f".scan(arrows)
# => ["⇸", "⇙", "↺", "↣", "↝"]

I frequently use http://rubular.com/ as a reference
[a-zA-Z] Any single character in the range a-z or A-Z

Related

Defining a Template in Flex

I want to define a "KEYER" in flex, which is a "KEY" in "[]". A "KEY" is starting with a letter and a string of letters, numbers and the following characters: "~_'?$. -".
I defind:
keyChar ([a-zA-z0-9~_'?$. \-])
letter ([a-zA-Z])
key ({letter}{keyChar}+)
keyer ("["{key}"]")
and:
<*>{keyer} print("KEYER");
Somehow the input:
[keyer1] [keyer2] [keyer 3]
is read as one KEYER and not three of them. what did I do wrong?
You wrote A-z instead of A-Z in the pattern for keyChar. [A-z] includes the characters between Z and a, which include brackets.
On the whole, it is better to avoid range expressions when not necessary. I would have written:
keyChar ([[:alnum:]~_'?$. -])
key ([[:alpha:]]{keyChar}+)
keyer ("["{key}"]")

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Username Regular Expression

I need the username to be two or more characters of a-z, 0-9, all downcase. This is the current regex I am using
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
With this regex, users are able to use uppercase charters in their username. How do I modify the current regex to avoid that?
The regular expression to filter for two to twenty lower-case characters or digits is
/^[a-z0-9]{2,20}$/
which means:
^ at the front of input
a-z accept lower-case 'a' through 'z'
0-9 accept '0' through '9'
{2,20} accept 2 to 20 elements from preceding [] block
$ until the end of input
You can make a regular expression case-insensitive with trailing i, as in your example; that appears to be the root of problem. That said, I don't know Ruby's peculiarities with respect to regular expressions.
If you must keep the RegEx - remove the "i" from the end
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/
the "i" tells the RegEx to be a case-insensitive RegEx.
but you want it to be case-sensitive and only match on lowercase letters.

Why won't my simple regex pattern match and remove a file extension?

I have a string:
app_copy--28.ipa
The result I want is:
app_copy
The number after -- could be of variable length, so I want to match everything including and after --.
I've tried a few patterns, but none are matching for some reason:
gsub("--\*", "")
gsub("--*", "")
gsub("--*.ipa", "")
gsub("--\[0-9].ipa", "")
What am I missing?
Let's take a look at your test patterns:
"--\*" is actually equivalent to "--*" (since the \* is an escape sequence).
"--*" will match a single - character, followed by zero or more - characters.
"--*.ipa" will match a single - character, followed by zero or more - characters, followed by any single character, followed by a literal ipa.
"--\[0-9].ipa" is actually equivalent to "--[0-9].ipa" (since the \[ is an escape sequence), which will match a literal --, followed by a single decimal digit, followed by any single character, followed by a literal ipa.
However, none of these patterns would work as you used them because gsub will not treat it as a regular expression:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally…
You'd need to wrap type convert your pattern to a Regexp (using Regexp.new), or use a regular expression literal.
Try this pattern
--.*
This pattern will find any literal --, followed by zero or more of any character.
For example:
"app_copy--28.ipa".gsub(/--.*/, "") # app_copy
Don't use gsub to try to change the string, simply use a pattern to match the part you want:
"app_copy--28.ipa"[/^(.+?)--/, 1] # => "app_copy"
String's [] takes a lot of different types of parameters. You can pass in a pattern, and the index of the capture that you want, to extract just that part. From the documentation:
str[regexp, capture] → new_str or nil
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
How is this ?
str = "app_copy--28.ipa"
str[0..str.index("-")-1]
# => "app_copy"
str = "app_copy--28.ipa"
str.split("--").first
# => "app_copy"

Using Regexp to check whether a string starts with a consonant

Is there a better way to write the following regular expression in Ruby? The first regex matches a string that begins with a (lower case) consonant, the second with a vowel.
I'm trying to figure out if there's a way to write a regular expression that matches the negative of the second expression, versus writing the first expression with several ranges.
string =~ /\A[b-df-hj-np-tv-z]/
string =~ /\A[aeiou]/
The statement
$string =~ /\A[^aeiou]/
will test whether the string starts with a non-vowel character, which includes digits, punctuation, whitespace and control characters. That is fine if you know beforehand that the string begins with a letter, but to check that it starts with a consonant you can use forward look-ahead to test that it starts with both a letter and a non-vowel, like this
$string =~ /\A(?=[^aeiou])(?=[a-z])/i
To match an arbitrary number of consonants, you can use the sub-expression (?i:(?![aeiou])[a-z]) to match a consonant. It is atomic, so you can put a repetition count like {3} right after it. For example, this program finds all the strings in a list that contain three consonants in a row
list = %w/ aab bybt xeix axei AAsE SAEE eAAs xxsa Xxsr /
puts list.select { |word| word =~ /\A(?i:(?![aeiou])[a-z]){3}/ }
output
bybt
xxsa
Xxsr
I modified the answer provided by #Alexander Cherednichenko in order to get rid of the if statements.
/^[^aeiou\W]/i.match(s) != nil
If you want to catch a string that doesn't start with vowels, but only starts with consonants you can use this code below. It returns true if a string starts with any letter other than A, E, I, O, U. s is any string we give to a function
if /^[^aeiou\W]/i.match(s) == nil
return false
else
return true
end
i added at the end to make regular expression case insensitive.
\W is used to catch any non-word character, for example if a string starts with a digit like: "1something"
[^aeiou] means a range of character except a e i o u
And we put ^ at the beginning before [ to indicate that the following range [^aeiou\W] if for the 1st character
Note that ^[^aeiou\W] pattern is not correct because it also matches a line that starts with a digit, or underscore. Borodin's solution is working well, but there is one more possible solution without lookaheads, based on character class subtraction (more here) and using the more contemporary Regexp#match?:
/\A[a-z&&[^aeiou]]/i.match?(word)
See the Rubular demo.
Details
\A - start of a string (^ in Ruby is start of any line)
[a-z&&[^aeiou]] - an a-z character range matching any ASCII letter (/i flag makes it case insensitive) except for the aeiou chars.
See the Ruby demo:
test = %w/ 1word _word ball area programming /
puts test.select { |w| /\A[a-z&&[^aeiou]]/i.match?(w) }
# => ['ball', 'programming']

Resources