Why is the IRB prompt changing to an askerisk when I try to match this regex? - ruby

I am a newbie in Ruby, I'm using version 1.9.3. I have the following regular expression:
/\\\//
As far as I know, it should match a string which has the characters '\' and '/', one following the other, right?
I am using the following code in order to get true in case the regex matches the string or symbol in the far right:
!(regex !~ :"string or symbol to match")
Because using =~ gives me the index of the match and I simply want a boolean. Besides, I'm trying to see how ugly or hackish can Ruby look compared to C :P
When I try to match the symbol :\/ the IRB prompt changes to an asterisk, and returns nothing. Why?
When I try to match the string "\/" my little ugly snippet returns false. Why?

The symbol :\/ is not a valid symbol. You could do :'\/' if you wanted a symbol version of the string '\/'. And when you feed it "\/" it is false because that has double quotes so it is actually the string '/' so you actually want either '\/' or "\\/".
Finally, it's better code and convention to do your test like so:
!!(regex =~ :'\/')
!!(regex =~ '\/')
!!(regex =~ "\\/")

Related

What is wrong with this extremely simple regex?

I'm trying to test that a regex will match a 2-digit number. I get:
11 =~ /^\d{1,2}$/
# => nil
Yet the regex works flawlessly on Rubular. What am I doing wrong?
The problem is that you are testing the regex against a number and not a string. Regexes are intended for matching strings. Simply:
'11' =~ /^\d{1,2}$/
or
11.to_s =~ /^\d{1,2}$/
You are calling Kernel#=~, which always returns nil.
Rubular does not interpret your input as Ruby code, it interprets is as string literal. That is why it works there.
You are applying regex on number instead of string so convert it to string and try again.

What does this variable assignment do?

I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"
You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.
This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.
As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.

Replace specific characters between brackets in ruby

I have a string
str = "'${1:textbox}',[${2:x},${3:y},${4:w},${5:h}]"
and I would like to replace all , between [ and ] with a single space.
I have attempted to use something like
str.gsub!(/(?<=\[)\,*?(?=\])/," ")
without success. However, if I replace \, in my expression with ., I get the expected output:
str.gsub!(/(?<=\[).*?(?=\])/," ")
== "'${1:textbox}',[ ]"
Could someone please explain the proper regex technique to use in this situation, and perhaps also explain why the examples I have posted above have failed and succeeded?
I am using ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]
It may be possible to do this with a single regex, but even if it is, I can guarantee it'll be ugly beyond description. It's a lot simpler to use "nested" substitution - use one gsub to find bracketed substrings, and then use another to swap out the commas:
str.gsub(/\[.*?\]/) do |substr|
substr.gsub(',', ' ')
end
I'm afraid I can't explain why your attempts have failed - neither of them would run for me (ruby 1.8.7 / irb 0.9.5). IRB gave errors that vaguely said "Bad regexp syntax." And I can't quite grok how they're supposed to work (edit: mu is too short has an awesome breakdown in his answer - check that out). Hope this is helpful anyway!
This regex:
/(?<=\[)\,*?(?=\])/
is looking for an opening bracket followed by a sequence of commas (of any length) followed by a closing bracket. That means things like this:
[]
[,]
[,,,,,,,,,,,]
Your string doesn't look like that so your first gsub! doesn't do anything. If you do this:
'[,,,,,,]'.gsub(/(?<=\[),*?(?=\])/, " ")
You'll get a '[ ]' for your troubles.
Your second regex:
/(?<=\[).*?(?=\])/
works because .*? matches anything (subject to newlines and /m and /s modifiers of course) and the portion of your string between [ and ] certainly qualifies as anything.
If you're trying to produce this:
"'${1:textbox}',[${2:x} ${3:y} ${4:w} ${5:h}]"
then I'd go with Xavier Holt's nested gsub approach, that's simple and clean.

Ruby string containing ${...}

In the Ruby string :
"${0} ${1} ${2:hello}"
is ${i} the ith argument in the command that called this particular file.
Tried searching the web for "Ruby ${0}" however the search engines don't like non-alphanumeric characters.
Consulted a Ruby book which says #{...} will substitute the results of the code in the braces, however this does not mention ${...}, is this a special syntax to substitute argvalues into a string, thanks very much,
Joel
As mentioned above ${0} will do nothing special, $0 gives the name of the script, $1 gives the first match from a regular expression.
To interpolate a command line argument you'd normally do this:
puts "first argument = #{ARGV[0]}"
However, ARGV is also aliased as $* so you could also write
puts "first argument = #{$*[0]}"
Perhaps that's where the confusion arose?

Ruby RegEx problem text.gsub[^\W-], '') fails

I'm trying to learn RegEx in Ruby, based on what I'm reading in "The Rails Way". But, even this simple example has me stumped. I can't tell if it is a typo or not:
text.gsub(/\s/, "-").gsub([^\W-], '').downcase
It seems to me that this would replace all spaces with -, then anywhere a string starts with a non letter or number followed by a dash, replace that with ''. But, using irb, it fails first on ^:
syntax error, unexpected '^', expecting ']'
If I take out the ^, it fails again on the W.
>> text = "I love spaces"
=> "I love spaces"
>> text.gsub(/\s/, "-").gsub(/[^\W-]/, '').downcase
=> "--"
Missing //
Although this makes a little more sense :-)
>> text.gsub(/\s/, "-").gsub(/([^\W-])/, '\1').downcase
=> "i-love-spaces"
And this is probably what is meant
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
\W means "not a word"
\w means "a word"
The // generate a regexp object
/[^\W-]/.class
=> Regexp
Step 1: Add this to your bookmarks. Whenever I need to look up regexes, it's my first stop
Step 2: Let's walk through your code
text.gsub(/\s/, "-")
You're calling the gsub function, and giving it 2 parameters.
The first parameter is /\s/, which is ruby for "create a new regexp containing \s (the // are like special "" for regexes).
The second parameter is the string "-".
This will therefore replace all whitespace characters with hyphens. So far, so good.
.gsub([^\W-], '').downcase
Next you call gsub again, passing it 2 parameters.
The first parameter is [^\W-]. Because we didn't quote it in forward-slashes, ruby will literally try run that code. [] creates an array, then it tries to put ^\W- into the array, which is not valid code, so it breaks.
Changing it to /[^\W-]/ gives us a valid regex.
Looking at the regex, the [] says 'match any character in this group. The group contains \W (which means non-word character) and -, so the regex should match any non-word character, or any hyphen.
As the second thing you pass to gsub is an empty string, it should end up replacing all the non-word characters and hyphens with empty string (thereby stripping them out )
.downcase
Which just converts the string to lower case.
Hope this helps :-)
You forgot the slashes. It should be /[^\W-]/
Well, .gsub(/[^\W-]/,'') says replace anything that's a not word nor a - for nothing.
You probably want
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
Lower case \w (\W is just the opposite)
The slashes are to say that the thing between them is a regular expression, much like quotes say the thing between them is a string.

Resources