How to programmatically remove anchors from a regular expression in Ruby? - ruby

Consider:
regex1 = /\A[a-z0-9\-\_]+\z/
regex2 = remove_anchors(regex1) # => /[a-z0-9\-\_]+/
How to implement a remove_anchors function that programmatically removes any anchors (\A, \z, ^, $) from regex1, producing regex2? Is it even possible to modify an existing regular expression like this in Ruby?

You can use the following function:
def remove_anchors(regex)
pattern = regex.source.gsub(/\A(?:\\A|\^)|(?:\\[zZ]|\$)\z/, '')
return Regexp.new(pattern);
end
And here is an IDEONE demo
The regex literal notation /.../ compiles the regex and its string pattern can be obtained via the source property. With gsub, the anchors like ^, $, \A and \z can be removed from the string pattern.

It is even possible to modify an existing regular expression like this in Ruby?
No, it is not possible to modify an existing Regexp at all in Ruby.
You can just look at the available methods and you will immediately see that there are no mutating methods.
There is exactly one method, which allows you to build a new Regexp from one or more existing Regexps, namely Regexp::union, but that won't help you here.
Pretty much the only thing you can do, is get a String representation of the Regexp using Regexp#to_s, then parse that String, remove the anchors textually, and create a new Regexp from the String via Regexp::new. Note, however, that the syntax of Ruby Regexps is anything but trivial to parse, this is not a simple endeavor.
It appears there is no documentation for the syntax of Ruby's Regexps, so you will have to look at the parser: regparse.c

According to your comments, you're actually trying to use the regular expression from the Semantic gem in your routes:
module Semantic
class Version
SemVerRegexp = /\A(\d+\.\d+\.\d+)(-([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?(\+([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?\Z/
# ...
end
end
According to the routing docs: (you have already tried this)
:constraints takes regular expressions with the restriction that regexp anchors can't be used.
But there's another way: can specify advanced constraints as a lambda. Here's an example:
Rails.application.routes.draw do
get '/some/path/*version_str' => 'versions#show',
format: false,
constraints: lambda { |request|
Semantic::Version::SemVerRegexp =~ request.params[:version_str]
}
end
format: false prevents Rails from extracting trailing dots.
Testing the route in rails console:
r = Rails.application.routes
r.recognize_path '/some/path/1.6.5'
#=> {:controller=>"versions", :action=>"show", :version_str=>"1.6.5"}
r.recognize_path '/some/path/3.7.9-pre.1+revision.15723'
#=> {:controller=>"versions", :action=>"show", :version_str=>"3.7.9-pre.1+revision.15723"}
r.recognize_path '/some/path/123'
#=> ActionController::RoutingError: No route matches "/some/path/123"

Related

Find youtube url in json file with ruby

For testing purpose my json file (test.json) consists of only the string I want to find:
"https://www.youtube.com/watch?v=hBIZF3sDFTI"
Somehow I cannot find the string in file with this ruby code:
if not File.foreach("test.json").grep(/https://www.youtube.com/watch?v=hBIZF3sDFTI/).any?
puts("string not in file")
end
Output: "string not in file"
But the string is in the file.
Searching for other strings works fine, so it must be a problem with this particular string.
Any help is much appreciated!
Problems
Your regex pattern isn't valid, because it's got too many forward slashes in it. Specifically:
/https://www.youtube.com/watch?v=hBIZF3sDFTI/
is not a valid regular expression. Your String is also not a valid a JSON object.
Solution
You need to escape special regular expression characters like / and ? before trying to use your pattern. For example, you could call Regexp#escape on the String like so:
Regexp.escape 'https://www.youtube.com/watch?v=hBIZF3sDFTI'
#=> "https://www\\.youtube\\.com/watch\\?v=hBIZF3sDFTI"
Then, assuming you have a valid JSON object, you could match the expression as follows:
require 'json'
str = 'https://www.youtube.com/watch?v=hBIZF3sDFTI'
json = str.to_json
#=> "\"https://www.youtube.com/watch?v=hBIZF3sDFTI\""
pattern = Regexp.escape str
json.match pattern
#=> #<MatchData "https://www.youtube.com/watch?v=hBIZF3sDFTI">

I want to match all punctuation in my regexp except apostrophes. How do i do that in Ruby?

This is my code so far:
def alternate_words(string)
string.gsub(/[\p{P}]/, "")
end
I am looking for a way to add exceptions to my regular expressions. Is it possible or do I have to list them all out?
string = "jack. o'reilly? mike??!?"
puts string.gsub(/[\p{P}&&[^']]/, '')
# => jack o'reilly mike
Docs:
A character class may contain another character class. By itself this isn’t useful because [a-z[0-9]] describes the same set as [a-z0-9]. However, character classes also support the && operator which performs set intersection on its arguments.
So, [\p{P}&&[^']] is "any character that is punctuation and also not an apostrophe".

opposite of sub in ruby

I want to replace the content (or delete it) that does not match with my filter.
I think the perfect description would be an opposite sub. I cannot find anything similar in the docs, and I'm not sure how to invert the regex, but I think a method would probably be the more convenient.
An example of how it would work (I've just changed the words to make it more clear)
"bird.cats.dogs".opposite_sub(/(dogs|cats)\.(dogs|cats)/, '')
#"cats.dogs"
I hope it's easy enough to understand.
Thanks in advance.
String#[] can take a regular expression as its parameter:
▶ "bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#⇒ "cats.dogs"
For multiple matches one can use String#scan:
▶ "bird.cats.dogs.bird.cats.dogs".scan /(?:dogs|cats)\.(?:dogs|cats)/
#⇒ ["cats.dogs", "cats.dogs"]
So you want to extract the part that matches your regex?
You can use String#slice, for example:
"bird.cats.dogs".slice(/(dogs|cats)\.(dogs|cats)/)
#=> "cats.dogs"
And String#[] does the same.
"bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#=> "cats.dogs"
You cannot have a single replacement string because the part of the string that matches the regex might not be at the beginning or end of the string, in which case it's not clear whether the replacement string should precede or follow the matching string. I've therefore written the following with two replacement strings, one for pre-match, the other for post_match. I've made this a method of the String class as that's what you've asked for (though I've given the method a less-perfect name :-) )
class String
def replace_non_matching(regex, replace_before, replace_after)
first, match, last = partition(regex)
replace_before + match + replace_after
end
end
r = /(dogs|cats)\.(dogs|cats)/
"birds.cats.dogs.pigs".replace_non_matching(r, "", "")
#=> "cats.dogs"
"birds.cats.dogs".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
"birds.cats.dogs.mice.cats.dogs.bats".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
Regarding the last example, the method could be modified to replace "birds.", ".mice." and ".bats", but in that case three replacement strings would be needed. In general, determining in advance the number of replacement strings needed could be problematic.

Generate string for Regex pattern in Ruby

In Python language I find rstr that can generate a string for a regex pattern.
Or in Python we have this method that can return range of string:
re.sre_parse.parse(pattern)
#..... ('range', (97, 122)) ....
But In Ruby I didn't find any thing.
So how to generate string for a regex pattern in Ruby(reverse regex)?
I wanna to some thing like this:
"/[a-z0-9]+/".example
#tvvd
"/[a-z0-9]+/".example
#yt
"/[a-z0-9]+/".example
#bgdf6
"/[a-z0-9]+/".example
#564fb
"/[a-z0-9]+/" is my input.
The outputs must be correct string that available in my regex pattern.
Here outputs were: tvvd , yt , bgdf6 , 564fb that "example" method generated them.
I need that method.
Thanks for your advice.
You can also use the Faker gem https://github.com/stympy/faker and then use this call:
Faker::Base.regexify(/[a-z0-9]{10}/)
In Ruby:
/qweqwe/.to_s
# => "(?-mix:qweqwe)"
When you declare a Regexp, you've got the Regexp class object, to convert it to String class object, you may use Regexp's method #to_s. During conversion the special fields will be expanded, as you may see in the example., using:
(using the (?opts:source) notation. This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original.
Also, you can use Regexp's method #inspect, which:
produces a generally more readable version of rxp.
/ab+c/ix.inspect #=> "/ab+c/ix"
Note: that the above methods are only use for plain conversion Regexp into String, and in order to match or select set of string onto an other one, we use other methods. For example, if you have a sourse array (or string, which you wish to split with #split method), you can grep it, and get result array:
array = "test,ab,yr,OO".split( ',' )
# => ['test', 'ab', 'yr', 'OO']
array = array.grep /[a-z]/
# => ["test", "ab", "yr"]
And then convert the array into string as:
array.join(',')
# => "test,ab,yr"
Or just use #scan method, with slightly changed regexp:
"test,ab,yr,OO".scan( /[a-z]+/ )
# => ["test", "ab", "yr"]
However, if you really need a random string matched the regexp, you have to write your own method, please refer to the post, or use ruby-string-random library. The library:
generates a random string based on Regexp syntax or Patterns.
And the code will be like to the following:
pattern = '[aw-zX][123]'
result = StringRandom.random_regex(pattern)
A bit late to the party, but - originally inspired by this stackoverflow thread - I have created a powerful ruby gem which solves the original problem:
https://github.com/tom-lord/regexp-examples
/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']
UPDATE: Now regular expressions supported in string_pattern gem and it is 30 times faster than other gems
require 'string_pattern'
/[a-z0-9]+/.generate
To see a comparison of speed https://repl.it/#tcblues/Comparison-generating-random-string-from-regular-expression
I created a simple way to generate strings using a pattern without the mess of regular expressions, take a look at the string_pattern gem project: https://github.com/MarioRuiz/string_pattern
To install it: gem install string_pattern
This is an example of use:
# four characters. optional: capitals and numbers, required: lower
"4:XN/x/".gen # aaaa, FF9b, j4em, asdf, ADFt
Maybe you can find what you are looking for over here.

Replacing regex capture with the same capture and an extra string

I am trying to escape certain characters in a string. In particular, I want to turn
abc/def.ghi into abc\/def\.ghi
I tried to use the following syntax:
1.9.3p125 :076 > "abc/def.ghi".gsub(/([\/.])/, '\\\1')
=> "abc\\1def\\1ghi"
Hmm. This behaves as if capture replacements didn't work. Yet, when I tried this:
1.9.3p125 :075 > "abc/def.ghi".gsub(/([\/.])/, '\1')
=> "abc/def.ghi"
... I got the replacement to work, but, of course, my prefixes weren't part of it.
What is the correct syntax to do something like this?
This should be easier
gsub(/(?=[.\/])/, "\\")
If you are trying to prepare a string to be used as a regex pattern, use the right tool:
Regexp.escape('abc/def.ghi')
=> "abc/def\\.ghi"
You can then use the resulting string to create a regex:
/#{ Regexp.escape('abc/def.ghi') }/
=> /abc\/def\.ghi/
or:
Regexp.new(Regexp.escape('abc/def.ghi'))
=> /abc\/def\.ghi/
From the docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.
You can pass a block to gsub:
>> "abc/def.ghi".gsub(/([\/.])/) {|m| "\\#{m}"}
=> "abc\\/def\\.ghi"
Not nearly as elegant as #sawa's answer, but it was the only way I could find to get it to work if you need the replacing string to contain the captured group/backreference (rather than inserting the replacement before the look-ahead).

Resources