How do I print a Ruby regex variable? - ruby

How do I print/display just the part of a regular expression that is between the slashes?
irb> re = /\Ahello\z/
irb> puts "re is /#{re}/"
The result is:
re is /(?-mix:\Ahello\z)/
Whereas I want:
re is /\Ahello\z/
...but not by doing this craziness:
puts "re is /#{re.to_s.gsub( /.*:(.*)\)/, '\1' )}/"

If you want to see the original pattern between the delimiters, use source:
IP_PATTERN = /(?:\d{1,3}\.){3}\d{1,3}/
IP_PATTERN # => /(?:\d{1,3}\.){3}\d{1,3}/
IP_PATTERN.inspect # => "/(?:\\d{1,3}\\.){3}\\d{1,3}/"
IP_PATTERN.to_s # => "(?-mix:(?:\\d{1,3}\\.){3}\\d{1,3})"
Here's what source shows:
IP_PATTERN.source # => "(?:\\d{1,3}\\.){3}\\d{1,3}"
From the documentation:
Returns the original string of the pattern.
/ab+c/ix.source #=> "ab+c"
Note that escape sequences are retained as is.
/\x20\+/.source #=> "\\x20\\+"
NOTE:
It's common to build a complex pattern from small patterns, and it's tempting to use interpolation to insert the simple ones, but that doesn't work as most people think it will. Consider this:
foo = /foo/
bar = /bar/imx
foo_bar = /#{ foo }_#{ bar }/
foo_bar # => /(?-mix:foo)_(?mix:bar)/
Notice that foo_bar has the pattern flags for each of the sub-patterns. Those can REALLY mess you up when trying to match things if you're not aware of their existence. Inside the (?-...) block the pattern can have totally different settings for i, m or x in relation to the outer pattern. Debugging that can make you nuts, worse than trying to debug a complex pattern normally would. How do I know this? I'm a veteran of that particular war.
This is why source is important. It injects the exact pattern, without the flags:
foo_bar = /#{ foo.source}_#{ bar.source}/
foo_bar # => /foo_bar/

Use .inspect instead of .to_s:
> puts "re is #{re.inspect}"
re is /\Ahello\z/

Related

How do I lookup a key/symbol based on which Regex match?

I am extracting files from a zip archive in Ruby using RubyZip, and I need to label files based on characteristics of their filenames:
Example:
I have the following hash:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i
}
So, I have the file name of each file in the zip, let's say an example is
filename = 382582941917841df.xml
Assume that each file will match only one Regex in the labels hash, and if not it doesn't matter, just choose the first match. (In this case the regular expressions are all for detecting extensions, but it could be to detect any filename mask like DSC****.jpg for example.
I am doing this now:
label_match =~ labels.find {|key,value| filename =~ value}
---> label_match = [:metadata, /.\.xml/]
label_sym = label_match.nil? ? nil: label_match.first
So this works fine, however doesn't seem very Ruby-like. Is there something I am missing to clean this up nicely?
A case when does this effortlessly:
filename = "382582941917841df.xml"
category = case filename
when /.\.dat/i ; :data_file
when /.\.xml/i ; :metadata
when /.\.txt/i ; :text_location
end
p category # => :metadata ; nil if nothing matched
I think you're doing it backwards and the hard way. Ruby makes it easy to get the extension of a file, which then makes it easy to map it to something.
Starting with something like:
FILENAMES = %w[ foo.bar foo.baz 382582941917841df.xml DSC****.jpg]
FILETYPES = {
'.bar' => 'bar',
'.baz' => 'baz',
'.xml' => 'metadata',
'.dat' => 'data',
'.jpg' => 'image'
}
FILENAMES.each do |fn|
puts "#{ fn } is a #{ FILETYPES[File.extname(fn)] } file"
end
# >> foo.bar is a bar file
# >> foo.baz is a baz file
# >> 382582941917841df.xml is a metadata file
# >> DSC****.jpg is a image file
File.extname is built into Ruby. The File class contains many similar methods useful for finding out things about files known by the OS and/or tearing apart file paths and file names so it's a really good thing to become very familiar with.
It's also important to understand that an improperly written regexp, such as /.\.dat/i can be the source of a lot of pain. Consider these:
'foo.xml.dat'[/.\.dat/] # => "l.dat"
'foo.database.20010101.csv'[/.\.dat/] # => "o.dat"
Are the files really "data" files?
Why is the character in front of the delimiting . important or necessary?
Do you really want to slow your code using unanchored regexp patterns when a method, such as extname will be faster and less maintenance?
Those are things to consider when writing code.
Rather than using nil to indicate the label when there is no match, consider using another symbol like :unknown.
Then you can do:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i,
:unknown=>/.*/
}
label = labels.find {|key,value| filename =~ value}.first

inserting variable value in regex in ruby script

I am having a ruby script file for patter match. my input string look like below
this.plugin = document.getElementById("pluginPlayer");
my regex look like
regxPlayerVariable = '(.*?)=.*?document\.getElementById\("#{Regexp.escape(pluginPlayeVariable)}"\)'
here pluginPlayeVariable is a variable but its not macthing with input string.
if i change my rege and replace variable with its value it's work fine but i can not do that as it's a run time value which change accordingly.
i also tried some more regex mention below
regxPlayerVariable = '(.*?)=.*?document\.getElementById\("#{pluginPlayeVariable}"\)'
so how can i solve this issue?
First of all, regxPlayerVariable is not a Regexp, it's a String. And the reason why your interpolation does not work is because you are using single quotes. Look:
foo = "bar"
puts '#{foo}' # => #{foo}
puts "#{foo}" # => bar
puts %q{#{foo}} # => #{foo}
puts %Q{#{foo}} # => bar
puts %{#{foo}} # => bar
puts /#{foo}/ # => (?-mix:bar)
puts %r{#{foo}} # => (?-mix:bar)
Only the last two are actually regular expressions, but here you can see which quoting expressions do interpolation, and which don't.

Ruby on Rails - How do I split string and Number?

I have a string "FooFoo2014".
I want the result to be => "Foo Foo 2014"
Any idea?
This works fine:
puts "FooFoo2014".scan(/(\d+|[A-Z][a-z]+)/).join(' ')
# => Foo Foo 2014
Of course in condition that you separate numbers and words from capital letter.
"FooFoo2014"
.gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)|(?<=[a-z])(?=[A-Z])/, " ")
# => "Foo Foo 2014"
Your example is a little generic. So this might be guessing in the wrong direction. That being said, it seems like you want to reformat the string a little:
"FooFoo2014".scan(/^([A-Z].*)([A-Z].*\D*)(\d+)$/).flatten.join(" ")
As "FooFoo2014" is a string with some internal structure important to you, you need to come up with the right regular expression yourself.
From your question, I extract two tasks:
split the FooFoo at the capital letter.
/([A-Z].*)([A-Z].*)/ would do that, given you only have standard latin letters
split the letter from the digits
/(.*\D)(\d+)/ achieves that.
The result of scan is an array in my version of ruby. Please verify that in your setup.
If you think that regular expressions are too complicated for this, I suggest that you take a good look into ActiveSupport. http://api.rubyonrails.org/v3.2.1/ might help you.
If its only letters then only digits:
target = "FooFoo2014"
match_data = target.match(/([A-Za-z]+)(\d+)/)
p match_data[1] # => "FooFoo"
p match_data[2] # => "2014
If it is two words each made of one capitalized letter then lowercase letters, then digits:
target = "FooBar2014"
match_data = target.match(/([A-Z][a-z]+)([A-Z][a-z]+)(\d+)/)
p match_data[1] # => "Foo"
p match_data[2] # => "Bar"
p match_data[3] # => "2014
Better regex are probably possible.

Couldn't understand why the Regexp option i got disabled in my code

I have just started playing with Ruby and I'm stuck on something. Is
there some trick to modify the casefold attribute of a Regexp object after
it's been instantiated?
The best idea what I tried is the following:
irb(main):001:0> a = Regexp.new('a')
=> /a/
irb(main):002:0> aA = Regexp.new(a.to_s, Regexp::IGNORECASE)
=> /(?-mix:a)/i
But none of the below seems to work:
irb(main):003:0> a =~ 'a'
=> 0
irb(main):004:0> a =~ 'A'
=> nil
irb(main):005:0> aA =~ 'a'
=> 0
irb(main):006:0> aA =~ 'A'
=> nil
Something I don't understand is happening here. Where did the 'i' go on line
8?
irb(main):07:0> aA = Regexp.new(a.to_s, Regexp::IGNORECASE)
=> /(?-mix:a)/i
irb(main):08:0> aA.to_s
=> "(?-mix:a)"
irb(main):09:0>
I am using Ruby 1.9.3.
I am also unable understand the below code: why returning false:
/(?i:a)/.casefold? #=> false
As your console output shows, a.to_s includes the case sensitiveness as an option for your subexpression, so aA is being defined as
/(?-mix:a)/i
so you're asking ruby for a regular expression that is case insensitive, but the only thing in that case insensitive regexp is a group for when case sensitivity has be turned on, so the net effect is that 'a' is matched case sensitively
Since the result of to_s is just the regular expression string itself - no delimiters or external flags - the flags are translated into the (?i:...) syntax that sets or clears them temporarily inside the expression itself. This lets you get a Regexp object back out via a simple Regexp.new(s) call that will match the same strings.
The wrapping, unfortunately, includes explicitly clearing the flags that are not set on the object. So your first regex gets stringified into something between (?:-i...) - that is, the casefold option is explicitly turned off between the parentheses. Turning it back on for the object doesn't have any effect.
You can use a.source instead of a.to_s to get just the original expression, without the flag settings:
irb(main):001:0> a=/a/
=> /a/
irb(main):002:0> aA = Regexp.new(a.source, Regexp::IGNORECASE)
=> /a/i
irb(main):003:0> a =~ 'a'
=> 0
irb(main):004:0> a =~ 'A'
=> nil
irb(main):005:0> aA =~ 'a'
=> 0
irb(main):006:0> aA =~ 'A'
=> 0
As Frederick already explains, calling to_s on a regex will add modifiers around it that ensure that its properties like case-sensitiveness are preserved. So if you insert a case-sensitive regex into a case-insensitive regex, the inserted part will still be case-sensitive. Likewise the modifiers given to Regexp.new will have no effect if the first argument is a regex or the result of calling to_s on one.
To solve this issue, call source on the regex instead of to_s. Unlike to_s, source simply returns the source of regex without adding anything:
aA = Regexp.new(a.source, Regexp::IGNORECASE)
I am also unable understand the below code: why returning false:
/(?i:a)/.casefold?
Because (?i:...) sets the i flag locally, not globally. It only applies to the part of the regex within the parentheses, not the whole regex. Of course in this case the whole regex is within the parentheses, but that doesn't matter as far as methods like casefold? are concerned.

How does Ruby's replace work?

I'm looking at ruby's replace: http://www.ruby-doc.org/core/classes/String.html#M001144
It doesn't seem to make sense to me, you call replace and it replaces the entire string.
I was expecting:
replace(old_value, new_value)
Is what I am looking for gsub then?
replace seems to be different than in most other languages.
I agree that replace is generally used as some sort of pattern replace in other languages, but Ruby is different :)
Yes, you are thinking of gsub:
ruby-1.9.2-p136 :001 > "Hello World!".gsub("World", "Earth")
=> "Hello Earth!"
One thing to note is that String#replace may seem pointeless, however it does remove 'taintediness". You can read more up on tained objects here.
I suppose the reason you feel that replace does not make sense is because there is assigment operator = (not much relevant to gsub).
The important point is that String instances are mutable objects. By using replace, you can change the content of the string while retaining its identity as an object. Compare:
a = 'Hello' # => 'Hello'
a.object_id # => 84793190
a.replace('World') # => 'World'
a.object_id # => 84793190
a = 'World' # => 'World'
a.object_id # => 84768100
See that replace has not changed the string object's id, whereas simple assignment did change it. This difference has some consequences. For example, suppose you assigned some instance variables to the string instance. By replace, that information will be retained, but if you assign the same variable simply to a different string, all that information is gone.
Yes, it is gsub and it is taken from awk syntax. I guess replace stands for the internal representation of the string, since, according to documentation, tainted-ness is removed too.

Resources