Regex string with grouping?

Regex string with grouping? - ruby

I see in the documentation I'm able to do:
/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
puts dollars #=> prints 3
I was wondering if this would be possible:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
/#{Regexp.escape(string)}/ =~ "$3.67"
I get:
`<main>': undefined local variable or method `dlr' for main:Object (NameError)

There are a few mistakes in your approach. First of all, let's look at your string:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
You escape the dollar sign with "\$", but that is the same as just writing "$", consider:
"\$" == "$"
#=> true
To actually end up with the string "backslash followed by dollar" you would need to write "\\$". The same thing applies to the decimal character classes, you would have to write "\\d" to end up with the correct string.
The question marks on the other hand are actually part of the regex syntax, so you do not want to escape these at all. I recommend using single quotes for your original string, because that makes the input much easier:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
#=> "\\$(?<dlr>\\d+)\\.(?<cts>\\d+)"
The next issue is with Regexp.escape. Take a look at what regular expression it produces with the above string:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
Regexp.escape(string)
#=> "\\\\\\$\\(\\?<dlr>\\\\d\\+\\)\\\\\\.\\(\\?<cts>\\\\d\\+\\)"
That's one level too much escaping. Regexp.escape can be used when you want to match the literal characters that are contained in the string. For example, the escaped regex above will match the source string itself:
/#{Regexp.escape(string)}/ =~ string
#=> 0 # matches at offset 0
Instead, you can use Regexp.new to treat the source as an actual regular expression.
The last issue is then how you access the match result. Obviously, you are getting a NoMethodError. You might think that the match result is stored in local variables called dlr and cts, but that is not the case. You have two options to access the match data:
Use Regexp.match, it will return a MatchData object as result
Use regexp =~ string and then access the last match data with the global variable $~
I prefer the former, because it is easier to read. The full code would then look like this:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
regexp = Regexp.new(string)
result = regexp.match("$3.67")
#=> #<MatchData "$3.67" dlr:"3" cts:"67">
result[:dlr]
#=> "3"
result[:cts]
#=> "67"

Related

Multiple Ruby chomp! statements

I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.

You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"

Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"

Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.

Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.

Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')

I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

How to validate that a string is a proper hexadecimal value in Ruby?

I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!

!str[/\H/] will look for invalid hex values.

String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.

One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.

This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.

Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods

String containment

Is there a way to check in Ruby whether the string "1:/2" is contained within a larger string str, beside iterating over all positions of str?

You can use the include? method
str = "wdadwada1:/2wwedaw"
# => "wdadwada1:/2wwedaw"
str.include? "1:/2"
# => true

A regular expression will do that.
s =~ /1:\/2/
This will return either nil if s does not contain the string, or the integer position if it does. Since nil is falsy and an integer is truthy, you can use this expression in an if statement:
if s =~ /1:\/2/
...
end
The regular expression is normally delimited by /, which is why the slash within the regular expression is escaped as \/
It is possible to use a different delimiter to avoid having to escape the /:
s =~ %r"1:/2"
You could use other characters than " with this syntax, if you want.

The simplest and most straight-forward is to simply ask the string if it contains the sub-string:
"...the string 1:/2 is contained..."['1:/2']
# => "1:/2"
!!"...the string 1:/2 is contained..."['1:/2']
# => true
The documentation has the full scoop; Look at the last two examples.

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.

You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.

string.gsub(/[^[:alnum:]]/, "")

The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.

You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.

If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regex string with grouping? - ruby

Related

Multiple Ruby chomp! statements

Use ARGV[] argument vector to pass a regular expression in Ruby

How to validate that a string is a proper hexadecimal value in Ruby?

String containment

Remove all non-alphabetical, non-numerical characters from a string?

Categories

Resources