For testing purpose my json file (test.json) consists of only the string I want to find:
"https://www.youtube.com/watch?v=hBIZF3sDFTI"
Somehow I cannot find the string in file with this ruby code:
if not File.foreach("test.json").grep(/https://www.youtube.com/watch?v=hBIZF3sDFTI/).any?
puts("string not in file")
end
Output: "string not in file"
But the string is in the file.
Searching for other strings works fine, so it must be a problem with this particular string.
Any help is much appreciated!
Problems
Your regex pattern isn't valid, because it's got too many forward slashes in it. Specifically:
/https://www.youtube.com/watch?v=hBIZF3sDFTI/
is not a valid regular expression. Your String is also not a valid a JSON object.
Solution
You need to escape special regular expression characters like / and ? before trying to use your pattern. For example, you could call Regexp#escape on the String like so:
Regexp.escape 'https://www.youtube.com/watch?v=hBIZF3sDFTI'
#=> "https://www\\.youtube\\.com/watch\\?v=hBIZF3sDFTI"
Then, assuming you have a valid JSON object, you could match the expression as follows:
require 'json'
str = 'https://www.youtube.com/watch?v=hBIZF3sDFTI'
json = str.to_json
#=> "\"https://www.youtube.com/watch?v=hBIZF3sDFTI\""
pattern = Regexp.escape str
json.match pattern
#=> #<MatchData "https://www.youtube.com/watch?v=hBIZF3sDFTI">
Related
I'm parsing a YAML file in Ruby and some of the input is causing a Psych syntax error:
require 'yaml'
example = "my_key: [string] string"
YAML.load(example)
Resulting in:
Psych::SyntaxError: (<unknown>): did not find expected key
while parsing a block mapping at line 1 column 1
from [...]/psych.rb:456:in `parse'
I received this YAML from an external API that I do not have control over. I can see that editing the input to force parsing as a string, using my_key: '[string] string', as noted in "Do I need quotes for strings in YAML?", fixes the issue however I don't control how the input is received.
Is there a way to force the input to be parsed as a string for some keys such as my_key? Is there a workaround to successfully parse this YAML?
One approach would be to process the response before reading it as YAML. Assuming it's a string, you could use a regex to replace the problematic pattern with something valid. I.e.
resp_str = "---\nmy_key: [string] string\n"
re = /(\: )(\[[a-z]*?\] [a-z]*?)(\n)/
resp_str.gsub!(re, "#{$1}'#{$2}'#{$3}")
#=> "---\n" + "my_key: '[string] string'\n"
Then you can do
YAML.load(resp_str)
#=> {"my_key"=>"[string] string"}
It does not work because square brackets have a special meaning in YAML, denoting arrays:
YAML.load "my_key: [string]"
#⇒ {"my_key"=>["string"]}
and [foo] bar is an invalid type. One should escape square brackets explicitly
YAML.load "my_key: \\[string\\] string"
#⇒ {"my_key"=>"\\[string\\] string"}
Also, one might implement the custom Psych parser.
There is very native and easy solution. If you would like to have string context you can always put quotes around it:
YAML.load "my_key: '[string]'"
=> {"my_key"=>"[string]"}
I have a string as given below,
./component/unit
and need to split to get result as component/unit which I will use this as key for inserting hash.
I tried with .split(/.\//).last but its giving result as unit only not getting component/unit.
I think, this should help you:
string = './component/unit'
string.split('./')
#=> ["", "component/unit"]
string.split('./').last
#=> "component/unit"
Your regex was almost fine :
split(/\.\//)
You need to escape both . (any character) and / (regex delimiter).
As an alternative, you could just remove the first './' substring :
'./component/unit'.sub('./','')
#=> "component/unit"
All the other answers are fine, but I think you are not really dealing with a String here but with a URI or Pathname, so I would advise you to use these classes if you can. If so, please adjust the title, as it is not about do-it-yourself-regexes, but about proper use of the available libraries.
Link to the ruby doc:
https://docs.ruby-lang.org/en/2.1.0/URI.html
and
https://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html
An example with Pathname is:
require 'pathname'
pathname = Pathname.new('./component/unit')
puts pathname.cleanpath # => "component/unit"
# pathname.to_s # => "component/unit"
Whether this is a good idea (and/or using URI would be cool too) also depends on what your real problem is, i.e. what you want to do with the extracted String. As stated, I doubt a bit that you are really intested in Strings.
Using a positive lookbehind, you could do use regex:
reg = /(?<=\.\/)[\w+\/]+\w+\z/
Demo
str = './component'
str2 = './component/unit'
str3 = './component/unit/ruby'
str4 = './component/unit/ruby/regex'
[str, str2, str3, str4].each { |s| puts s[reg] }
#component
#component/unit
#component/unit/ruby
#component/unit/ruby/regex
Consider:
regex1 = /\A[a-z0-9\-\_]+\z/
regex2 = remove_anchors(regex1) # => /[a-z0-9\-\_]+/
How to implement a remove_anchors function that programmatically removes any anchors (\A, \z, ^, $) from regex1, producing regex2? Is it even possible to modify an existing regular expression like this in Ruby?
You can use the following function:
def remove_anchors(regex)
pattern = regex.source.gsub(/\A(?:\\A|\^)|(?:\\[zZ]|\$)\z/, '')
return Regexp.new(pattern);
end
And here is an IDEONE demo
The regex literal notation /.../ compiles the regex and its string pattern can be obtained via the source property. With gsub, the anchors like ^, $, \A and \z can be removed from the string pattern.
It is even possible to modify an existing regular expression like this in Ruby?
No, it is not possible to modify an existing Regexp at all in Ruby.
You can just look at the available methods and you will immediately see that there are no mutating methods.
There is exactly one method, which allows you to build a new Regexp from one or more existing Regexps, namely Regexp::union, but that won't help you here.
Pretty much the only thing you can do, is get a String representation of the Regexp using Regexp#to_s, then parse that String, remove the anchors textually, and create a new Regexp from the String via Regexp::new. Note, however, that the syntax of Ruby Regexps is anything but trivial to parse, this is not a simple endeavor.
It appears there is no documentation for the syntax of Ruby's Regexps, so you will have to look at the parser: regparse.c
According to your comments, you're actually trying to use the regular expression from the Semantic gem in your routes:
module Semantic
class Version
SemVerRegexp = /\A(\d+\.\d+\.\d+)(-([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?(\+([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?\Z/
# ...
end
end
According to the routing docs: (you have already tried this)
:constraints takes regular expressions with the restriction that regexp anchors can't be used.
But there's another way: can specify advanced constraints as a lambda. Here's an example:
Rails.application.routes.draw do
get '/some/path/*version_str' => 'versions#show',
format: false,
constraints: lambda { |request|
Semantic::Version::SemVerRegexp =~ request.params[:version_str]
}
end
format: false prevents Rails from extracting trailing dots.
Testing the route in rails console:
r = Rails.application.routes
r.recognize_path '/some/path/1.6.5'
#=> {:controller=>"versions", :action=>"show", :version_str=>"1.6.5"}
r.recognize_path '/some/path/3.7.9-pre.1+revision.15723'
#=> {:controller=>"versions", :action=>"show", :version_str=>"3.7.9-pre.1+revision.15723"}
r.recognize_path '/some/path/123'
#=> ActionController::RoutingError: No route matches "/some/path/123"
I see in the documentation I'm able to do:
/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
puts dollars #=> prints 3
I was wondering if this would be possible:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
/#{Regexp.escape(string)}/ =~ "$3.67"
I get:
`<main>': undefined local variable or method `dlr' for main:Object (NameError)
There are a few mistakes in your approach. First of all, let's look at your string:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
You escape the dollar sign with "\$", but that is the same as just writing "$", consider:
"\$" == "$"
#=> true
To actually end up with the string "backslash followed by dollar" you would need to write "\\$". The same thing applies to the decimal character classes, you would have to write "\\d" to end up with the correct string.
The question marks on the other hand are actually part of the regex syntax, so you do not want to escape these at all. I recommend using single quotes for your original string, because that makes the input much easier:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
#=> "\\$(?<dlr>\\d+)\\.(?<cts>\\d+)"
The next issue is with Regexp.escape. Take a look at what regular expression it produces with the above string:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
Regexp.escape(string)
#=> "\\\\\\$\\(\\?<dlr>\\\\d\\+\\)\\\\\\.\\(\\?<cts>\\\\d\\+\\)"
That's one level too much escaping. Regexp.escape can be used when you want to match the literal characters that are contained in the string. For example, the escaped regex above will match the source string itself:
/#{Regexp.escape(string)}/ =~ string
#=> 0 # matches at offset 0
Instead, you can use Regexp.new to treat the source as an actual regular expression.
The last issue is then how you access the match result. Obviously, you are getting a NoMethodError. You might think that the match result is stored in local variables called dlr and cts, but that is not the case. You have two options to access the match data:
Use Regexp.match, it will return a MatchData object as result
Use regexp =~ string and then access the last match data with the global variable $~
I prefer the former, because it is easier to read. The full code would then look like this:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
regexp = Regexp.new(string)
result = regexp.match("$3.67")
#=> #<MatchData "$3.67" dlr:"3" cts:"67">
result[:dlr]
#=> "3"
result[:cts]
#=> "67"
In Python language I find rstr that can generate a string for a regex pattern.
Or in Python we have this method that can return range of string:
re.sre_parse.parse(pattern)
#..... ('range', (97, 122)) ....
But In Ruby I didn't find any thing.
So how to generate string for a regex pattern in Ruby(reverse regex)?
I wanna to some thing like this:
"/[a-z0-9]+/".example
#tvvd
"/[a-z0-9]+/".example
#yt
"/[a-z0-9]+/".example
#bgdf6
"/[a-z0-9]+/".example
#564fb
"/[a-z0-9]+/" is my input.
The outputs must be correct string that available in my regex pattern.
Here outputs were: tvvd , yt , bgdf6 , 564fb that "example" method generated them.
I need that method.
Thanks for your advice.
You can also use the Faker gem https://github.com/stympy/faker and then use this call:
Faker::Base.regexify(/[a-z0-9]{10}/)
In Ruby:
/qweqwe/.to_s
# => "(?-mix:qweqwe)"
When you declare a Regexp, you've got the Regexp class object, to convert it to String class object, you may use Regexp's method #to_s. During conversion the special fields will be expanded, as you may see in the example., using:
(using the (?opts:source) notation. This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original.
Also, you can use Regexp's method #inspect, which:
produces a generally more readable version of rxp.
/ab+c/ix.inspect #=> "/ab+c/ix"
Note: that the above methods are only use for plain conversion Regexp into String, and in order to match or select set of string onto an other one, we use other methods. For example, if you have a sourse array (or string, which you wish to split with #split method), you can grep it, and get result array:
array = "test,ab,yr,OO".split( ',' )
# => ['test', 'ab', 'yr', 'OO']
array = array.grep /[a-z]/
# => ["test", "ab", "yr"]
And then convert the array into string as:
array.join(',')
# => "test,ab,yr"
Or just use #scan method, with slightly changed regexp:
"test,ab,yr,OO".scan( /[a-z]+/ )
# => ["test", "ab", "yr"]
However, if you really need a random string matched the regexp, you have to write your own method, please refer to the post, or use ruby-string-random library. The library:
generates a random string based on Regexp syntax or Patterns.
And the code will be like to the following:
pattern = '[aw-zX][123]'
result = StringRandom.random_regex(pattern)
A bit late to the party, but - originally inspired by this stackoverflow thread - I have created a powerful ruby gem which solves the original problem:
https://github.com/tom-lord/regexp-examples
/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']
UPDATE: Now regular expressions supported in string_pattern gem and it is 30 times faster than other gems
require 'string_pattern'
/[a-z0-9]+/.generate
To see a comparison of speed https://repl.it/#tcblues/Comparison-generating-random-string-from-regular-expression
I created a simple way to generate strings using a pattern without the mess of regular expressions, take a look at the string_pattern gem project: https://github.com/MarioRuiz/string_pattern
To install it: gem install string_pattern
This is an example of use:
# four characters. optional: capitals and numbers, required: lower
"4:XN/x/".gen # aaaa, FF9b, j4em, asdf, ADFt
Maybe you can find what you are looking for over here.