How can I match everything between a pair of / characters with treetop? I would also like to match escaped / characters as well. For example, if I were to parse a "regex":
/blarg: dup\/md5 [0-9a-zA-Z]{32}/
The result would return:
blarg: dup\/md5 [0-9a-zA-Z]{32}
This should match everything inside two / characters including escaped slashes. I'm using Ruby's DATA __END__ feature so that everything can run in a single file.
Also, note that you can tag parts of a parsed expression and then use them as functions. In the example below I tagged inside. This could also have been accessed as elements[1] instead of being tagged.
This works similar to matching a string which you can find in the treetop docs.
require 'treetop'
Treetop.load_from_string DATA.read
parser = RegexParser.new
puts parser.parse('/blarg: dup\/md5 [0-9a-zA-Z]{32}/').inside.text_value
# => blarg: dup\/md5 [0-9a-zA-Z]{32}
__END__
grammar Regex
rule regex
'/' inside:('\/' / !'/' .)* '/'
end
end
Related
Working on a Ruby challenge to convert dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case).
My solution so far..:
def to_camel_case(str)
str.split('_,-').collect.camelize(:lower).join
end
However .camelize(:lower) is a rails method I believe and doesn't work with Ruby. Is there an alternative method, equally as simplistic? I can't seem to find one. Or do I need to approach the challenge from a completely different angle?
main.rb:4:in `to_camel_case': undefined method `camelize' for #<Enumerator: []:collect> (NoMethodError)
from main.rb:7:in `<main>'
I assume that:
Each "word" is made up of one or more "parts".
Each part is made of up characters other than spaces, hypens and underscores.
The first character of each part is a letter.
Each successive pair of parts is separated by a hyphen or underscore.
It is desired to return a string obtained by modifying each part and removing the hypen or underscore that separates each successive pair of parts.
For each part all letters but the first are to be converted to lowercase.
All characters in each part of a word that are not letters are to remain unchanged.
The first letter of the first part is to remain unchanged.
The first letter of each part other than the first is to be capitalized (if not already capitalized).
Words are separated by spaces.
It this describes the problem correctly the following method could be used.
R = /(?:(?<=^| )|[_-])[A-Za-z][^ _-]*/
def to_camel_case(str)
str.gsub(R) do |s|
c1 = s[0]
case c1
when /[A-Za-z]/
c1 + s[1..-1].downcase
else
s[1].upcase + s[2..-1].downcase
end
end
end
to_camel_case "Little Miss-muffet sat_on_HE$R Tuffett eating-her_cURDS And_whey"
# => "Little MissMuffet satOnHe$r Tuffett eatingHerCurds AndWhey"
The regular expression is can be written in free-spacing mode to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=^| ) # use a positive lookbehind to assert that the next character
# is preceded by the beginning of the string or a space
| # or
[_-] # match '_' or '-'
) # end non-capture group
[A-Za-z] # match a letter
[^ _-]* # match 0+ characters other than ' ', '_' and '-'
/x # free-spacing regex definition mode
Most Rails methods can be added into basic Ruby projects without having to pull in the whole Rails source.
The trick is to figure out the minimum amount of files to require in order to define the method you need. If we go to APIDock, we can see that camelize is defined in active_support/inflector/methods.rb.
Therefore active_support/inflector seems like a good candidate to try. Let's test it:
irb(main)> require 'active_support/inflector'
=> true
irb(main)> 'foo_bar'.camelize
=> "FooBar"
Seems to work. Note that this assumes you already ran gem install activesupport earlier. If not, then do it first (or add it to your Gemfile).
In pure Ruby, no Rails, given str = 'my-var_name' you could do:
delimiters = Regexp.union(['-', '_'])
str.split(delimiters).then { |first, *rest| [first, rest.map(&:capitalize)].join }
#=> "myVarName"
Where str = 'My-var_name' the result is "MyVarName", since the first element of the splitting result is untouched, while the rest is mapped to be capitalized.
It works only with "dash/underscore delimited words", no spaces, or you need to split by spaces, then map with the presented method.
This method is using string splitting by delimiters, as explained here Split string by multiple delimiters,
chained with Object#then.
Consider:
regex1 = /\A[a-z0-9\-\_]+\z/
regex2 = remove_anchors(regex1) # => /[a-z0-9\-\_]+/
How to implement a remove_anchors function that programmatically removes any anchors (\A, \z, ^, $) from regex1, producing regex2? Is it even possible to modify an existing regular expression like this in Ruby?
You can use the following function:
def remove_anchors(regex)
pattern = regex.source.gsub(/\A(?:\\A|\^)|(?:\\[zZ]|\$)\z/, '')
return Regexp.new(pattern);
end
And here is an IDEONE demo
The regex literal notation /.../ compiles the regex and its string pattern can be obtained via the source property. With gsub, the anchors like ^, $, \A and \z can be removed from the string pattern.
It is even possible to modify an existing regular expression like this in Ruby?
No, it is not possible to modify an existing Regexp at all in Ruby.
You can just look at the available methods and you will immediately see that there are no mutating methods.
There is exactly one method, which allows you to build a new Regexp from one or more existing Regexps, namely Regexp::union, but that won't help you here.
Pretty much the only thing you can do, is get a String representation of the Regexp using Regexp#to_s, then parse that String, remove the anchors textually, and create a new Regexp from the String via Regexp::new. Note, however, that the syntax of Ruby Regexps is anything but trivial to parse, this is not a simple endeavor.
It appears there is no documentation for the syntax of Ruby's Regexps, so you will have to look at the parser: regparse.c
According to your comments, you're actually trying to use the regular expression from the Semantic gem in your routes:
module Semantic
class Version
SemVerRegexp = /\A(\d+\.\d+\.\d+)(-([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?(\+([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?\Z/
# ...
end
end
According to the routing docs: (you have already tried this)
:constraints takes regular expressions with the restriction that regexp anchors can't be used.
But there's another way: can specify advanced constraints as a lambda. Here's an example:
Rails.application.routes.draw do
get '/some/path/*version_str' => 'versions#show',
format: false,
constraints: lambda { |request|
Semantic::Version::SemVerRegexp =~ request.params[:version_str]
}
end
format: false prevents Rails from extracting trailing dots.
Testing the route in rails console:
r = Rails.application.routes
r.recognize_path '/some/path/1.6.5'
#=> {:controller=>"versions", :action=>"show", :version_str=>"1.6.5"}
r.recognize_path '/some/path/3.7.9-pre.1+revision.15723'
#=> {:controller=>"versions", :action=>"show", :version_str=>"3.7.9-pre.1+revision.15723"}
r.recognize_path '/some/path/123'
#=> ActionController::RoutingError: No route matches "/some/path/123"
How can I write a regular expression that matches whitespace only?
I have a Ruby application that allows me to match my email subject and body based on regular expressions. I do not have access to the code.
I am trying to take action on empty email bodies. The regex should match if an email body contained all newlines and space characters. It would not match if there was a single alphanumeric character. The regex should allow whitespace at the start and and of the email body.
The following email body would not be matched. I've highlighted it to show the line breaks.
The following email body would be matched.
I've tried various things using Rubular
The best I can get is ^\s*$ but this matches newlines.
It is not clear what you mean, but this is a regex that matches strings that include only whitespaces:
/\A\s*\z/
I'd say sawa has what you want but it's simpler to negate it:
str !~ /\S/ # true if str contains no non-whitespace chars
str !~ /\w/ # true if str contains no alphanumeric chars (plus _)
No regex:
p " \n \t".strip.empty? #=> true
If I have:
["eaacbf7e-37b3-509e-b2d1-ddce7f0e1f6e", "f9e52e06-697a-57af-9566-d05fabb001a4",
"19edb822-eccb-5289-8fee-a39cdda66cd5", "83d3ad63-b468-5a1e-ba6c-6b69eb4a3dc5"]
(where the entire thing is a string)
Is there a simple regular expression that I can use to select content within the quotes (quotes included)?
Since the above comes out as a string. I want to use regex to select out each id within the quotes (along with the quotes) and store them into a ruby array.
Simply use this regex
"[^"]*"
[^"]* says match any character except " i.e [^"] 0 to many times i.e *
Try using the String#scan method with the regular expression /"[^"]+"/:
ids = str.scan(/"[^"]+"/) # => [ "eaacbf7e-...", "f9e52e06-...", ...]
puts ids
"eaacbf7e-37b3-509e-b2d1-ddce7f0e1f6e"
"f9e52e06-697a-57af-9566-d05fabb001a4"
"19edb822-eccb-5289-8fee-a39cdda66cd5"
"83d3ad63-b468-5a1e-ba6c-6b69eb4a3dc5"
That expression breaks down like so:
str.scan(/"[^"]+"/)
# │├──┘│└─ Another literal quotation mark (").
# ││ └─ Match one or more of the previous thing.
# │└─ A class matching any character except (^) quotation marks.
# └─ A literal quotation mark (").
Why are you getting the string in that format? It looks like JSON output, which, if it is, should be parsed by the JSON module.
require 'json'
require 'pp'
foo = [
"eaacbf7e-37b3-509e-b2d1-ddce7f0e1f6e",
"f9e52e06-697a-57af-9566-d05fabb001a4",
"19edb822-eccb-5289-8fee-a39cdda66cd5",
"83d3ad63-b468-5a1e-ba6c-6b69eb4a3dc5"
]
foo.to_json
=> "[\"eaacbf7e-37b3-509e-b2d1-ddce7f0e1f6e\",\"f9e52e06-697a-57af-9566-d05fabb001a4\",\"19edb822-eccb-5289-8fee-a39cdda66cd5\",\"83d3ad63-b468-5a1e-ba6c-6b69eb4a3dc5\"]"
That's probably the string you're getting. If you parse it using the JSON parser, you'll get back a Ruby array:
pp JSON[ foo.to_json ]
=> ["eaacbf7e-37b3-509e-b2d1-ddce7f0e1f6e",
"f9e52e06-697a-57af-9566-d05fabb001a4",
"19edb822-eccb-5289-8fee-a39cdda66cd5",
"83d3ad63-b468-5a1e-ba6c-6b69eb4a3dc5"]
I need to parse some strings which contain paths to directories. The problem is that the contains escaped whitespaces and other escaped symbols. For example:
"/dir_1/dir_2/dir_3/dir/another/dest_dir\ P\&G/"
Note that there is a whitespace before P\&G/.
Here is my treetop grammar(alpha_digit_special contains whitespace in the beginning)
rule alpha_digit_special
[ a-zA-Z0-9.+&\\]
end
rule path_without_quotes
([/] alpha_digit_special*)+
end
rule quot_mark
["]
end
rule path_with_quotes
quot_mark path_without_quotes quot_mark
end
rule path
path_with_quotes / path_without_quotes
end
I get nil after parsing this string. So how can i specify the rule so that the string may contain escaped whitespaces?
You cannot use alpha_digit_special* to handle back-slash escaped spaces.
Instead, you must use a repetition of character units, where a character unit is either a backslashed character pair, or a single non-backslash character. Something like this should work:
rule alpha_digit_special
[a-zA-Z0-9.+&\\]
end
rule path_character
'\\' (alpha_digit_special / ' ')
/
alpha_digit_special
end
rule path_without_quotes
([/] path_character* )+
end
Note that the above won't accept a backslashed character (that's not a space nor in the alpha_digit_special set). I think you can see how to change that though.
Did you try \s?
test = "dest_dir P&G"
test.match(/[a-zA-Z0-9_\s\&]+/)
=> #<MatchData "dest_dir P&G">