Composing regexp? - ruby

I would like to compose regexps, that is reusing a regex in a new regex.
Is that possible in Ruby ?
For instance, in order to simplify this assembly-like parsing :
LABELED_INSTR = /(\w+):(movi|addi)\s+(\w+),(\w+),(w+)/
NON_LABELED_INSTR = /(movi|addi)\s+(\w+),(\w+),(w+)/
I would like to resort to :
IMMEDIATE = /(movi|addi)/
But then I don't know how to share this regex in the two previous ones.
Any hint ?

Sure, regular expressions can be reused (or composed) within other regexes. Here's an example that combines two regexes to make a third:
>> a = /boo/
=> boo
>> b = /foo/
=> foo
>> c = /#{a}|#{b}/
=> -mix:boo-mix:foo
>> if "boo" =~ c
>> puts "match!"
>> end
match!
=> nil
Your example is pretty similar. Here, it would be:
IMMEDIATE = /(movi|addi)/
LABELED_INSTR = /(\w+):#{IMMEDIATE}\s+(\w+),(\w+),(w+)/
NON_LABELED_INSTR = /#{IMMEDIATE}\s+(\w+),(\w+),(w+)/

You can also use strings:
IMMEDIATE = "(movi)|(addi)"
LABELED_INSTR = Regexp.new("(\\w+):#{IMMEDIATE}\\s+(\\w+),(\\w+),(w+)/")
NON_LABELED_INSTR = Regexp.new(IMMEDIATE + "\\s+(\\w+),(\\w+),(w+)/")
notice you must escape the inverted slashes.

Related

How do I use 'gsub' to make multiple substiuttions?

I have a string that only contains one number on either side of "-", like:
"1-3"
I want to get a result like
"01-03"
If the string had two numbers on one side of the dash like:
"1-10"
then I don't want to make any substitutions. I could do a gsub expression like
str.gsub!(/(^|[^\d]])\d[[:space:]]*\-[[:space:]]*\d([^\d]|$)/, '\1')
but I'm not clear how to do it if there are multiple (e.g. two) things to substitute.
You could probably get away with this:
def dashreplace(str)
str.sub(/\b(\d)\-(\d)\b/) do |s|
'%02d-%02d' % [ $1.to_i, $2.to_i ]
end
end
dashreplace('1-2')
# => "01-02"
dashreplace('1-20')
# => "1-20"
dashreplace('99-1,2-3')
# => "99-1,02-03"
Is there really a need to use regex here, at all? Seems like an over-complication to me. Assuming you know the string will be in the format: <digits><hyphen><digits>, you could do:
def pad_digits(string)
left_digits, right_digits = string.split('-')
if left_digits.length > 1 || right_digits.length > 1
string
else
"%02d-%02d" % [left_digits, right_digits]
end
end
pad_digits("1-3") # => "01-03"
pad_digits("1-10") # => "1-10"
This is a variant of #TomLord's answer.
def pad_single_digits(str)
str.size > 3 ? str : "0%d-0%d" % str.split('-')
end
pad_single_digits "1-3" #=> "01-03"
pad_single_digits "1-10" #=> "1-10"
"0%s-0%s" also works.
You can do:
def nums(s)
rtr=s[/^(\d)(\D+)(\d)$/] ? '0%s%s0%s' % [$1,$2,$3] : s
end

regex to catch everything but the first segment of a URL

Assuming I have the following URL: /members/some/path/route
I want a regex that will return only /some/path/route
So far I have [^\/]\/.*
But that doesn't quite work as it returns s/some/path/route
Can someone give me a hand here? I'd also like it if someone can tell me why my regex doesn't quite work so I can learn why it doesn't work.
I'm using ruby.
Assuming you are using javascript:
your regex will be
result = url.match(/(?:\/.*?)(\/.*)/g);
and your expected string will be held in result[1]
If using RUBY
if url =~ /(?:\/.*?)(\/.*)/
result = $~[1]
end
OR
regexp = /(?:\/.*?)(\/.*)/
match = regexp.match(url)
if match
result = match[1]
else
result = ""
end
You need to find the second "/" and take the rest
preg_match("/\/(.*?)(\/.*)/", $url, $preg);
print_r($preg);
returns
Array
(
[0] => /members/some/path/route
[1] => members
[2] => /some/path/route
)
But, I recommend not using RegExp, since it is a simple string, an explode() function would do
$path_array = array_slice(explode("/", $url), 2); // we are slicing at 2, because there is a leading '/' at the beginning
$new_path = "/".implode("/", $path_array);
echo $new_path;
I would not use a regex for this. Instead I would use Pathname#each_filename to construct an array whose elements are each component of the path.
str = "/members/some/path/route"
require "pathname"
Pathname(str).each_filename.to_a[1..-1].join('/')
#=> "some/path/route"
We see that:
Pathname(str).each_filename.to_a[1..-1]
#=> ["members", "some", "path", "route"]
This uses the class method Kernel::Pathname. I don't find the doc for that (which would be with the docs for Object), but we can verify:
method(:Pathname).owner
#=> Kernel
This is similar to the class methods Kernel::Array, Kernel::String and others:
Array(21) # => [21]
String(19) # => "19"
You could instead write:
Pathname.new(str).each_filename.to_a[1..-1].join('/')
#=> "some/path/route"

Ruby String ASCII operation?

Is it possible to do some ASCII options in Ruby, like what we did in Cpp?
char *s = "test string";
for(int i = 0 ; i < strlen(s) ; i++) printf("%c",s[i]);
// expected output: vguv"uvtkpi
How do I achieve a similar goal in Ruby? From some research I think String.each_byte might help here, but I'm thinking to use high order programming (something like Array.map) to translate the string directly, without using an explicit for loop.
The task I'm trying to solve: Referring to this page, I'm trying to solve it using Ruby, and it seems a character-by-character translation is needed to apply to the string.
Pay close attention to the hint given by the question in the Challenge, then use String's tr method:
"test string".tr('a-z', 'c-zab')
# => "vguv uvtkpi"
An additional hint to solve the problem is, you should only be processing characters. Punctuation and spaces should be left alone.
Use the above tr on the string in the Python Challenge, and you'll see what I mean.
Use String#each_char and String#ord and Integer#chr:
s = "test string"
s.each_char.map { |ch| (ch.ord + 2).chr }.join
# => "vguv\"uvtkpi"
or String#each_byte:
s.each_byte.map { |b| (b + 2).chr }.join
# => "vguv\"uvtkpi"
or String#next:
s.each_char.map { |ch| ch.next.next }.join
# => "vguv\"uvtkpi"
You can use codepoints or each_codepoint methods, for example:
old_string = 'test something'
new_string = ''
old_string.each_codepoint {|x| new_string << (x+2).chr}
p new_string #=> "vguv\"uqogvjkpi"

How to replace the last occurrence of a substring in ruby?

I want to replace the last occurrence of a substring in Ruby. What's the easiest way?
For example, in abc123abc123, I want to replace the last abc to ABC. How do I do that?
How about
new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
For instance:
irb(main):001:0> old_str = "abc123abc123"
=> "abc123abc123"
irb(main):002:0> pattern="abc"
=> "abc"
irb(main):003:0> replacement="ABC"
=> "ABC"
irb(main):004:0> new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
=> "abc123ABC123"
"abc123abc123".gsub(/(.*(abc.*)*)(abc)(.*)/, '\1ABC\4')
#=> "abc123ABC123"
But probably there is a better way...
Edit:
...which Chris kindly provided in the comment below.
So, as * is a greedy operator, the following is enough:
"abc123abc123".gsub(/(.*)(abc)(.*)/, '\1ABC\3')
#=> "abc123ABC123"
Edit2:
There is also a solution which neatly illustrates parallel array assignment in Ruby:
*a, b = "abc123abc123".split('abc', -1)
a.join('abc')+'ABC'+b
#=> "abc123ABC123"
Since Ruby 2.0 we can use \K which removes any text matched before it from the returned match. Combine with a greedy operator and you get this:
'abc123abc123'.sub(/.*\Kabc/, 'ABC')
#=> "abc123ABC123"
This is about 1.4 times faster than using capturing groups as Hirurg103 suggested, but that speed comes at the cost of lowering readability by using a lesser-known pattern.
more info on \K: https://www.regular-expressions.info/keep.html
Here's another possible solution:
>> s = "abc123abc123"
=> "abc123abc123"
>> s[s.rindex('abc')...(s.rindex('abc') + 'abc'.length)] = "ABC"
=> "ABC"
>> s
=> "abc123ABC123"
When searching in huge streams of data, using reverse will definitively* lead to performance issues. I use string.rpartition*:
sub_or_pattern = "!"
replacement = "?"
string = "hello!hello!hello"
array_of_pieces = string.rpartition sub_or_pattern
( array_of_pieces[(array_of_pieces.find_index sub_or_pattern)] = replacement ) rescue nil
p array_of_pieces.join
# "hello!hello?hello"
The same code must work with a string with no occurrences of sub_or_pattern:
string = "hello_hello_hello"
# ...
# "hello_hello_hello"
*rpartition uses rb_str_subseq() internally. I didn't check if that function returns a copy of the string, but I think it preserves the chunk of memory used by that part of the string. reverse uses rb_enc_cr_str_copy_for_substr(), which suggests that copies are done all the time -- although maybe in the future a smarter String class may be implemented (having a flag reversed set to true, and having all of its functions operating backwards when that is set), as of now, it is inefficient.
Moreover, Regex patterns can't be simply reversed. The question only asks for replacing the last occurrence of a sub-string, so, that's OK, but readers in the need of something more robust won't benefit from the most voted answer (as of this writing)
You can achieve this with String#sub and greedy regexp .* like this:
'abc123abc123'.sub(/(.*)abc/, '\1ABC')
simple and efficient:
s = "abc123abc123abc"
p = "123"
s.slice!(s.rindex(p), p.size)
s == "abc123abcabc"
string = "abc123abc123"
pattern = /abc/
replacement = "ABC"
matches = string.scan(pattern).length
index = 0
string.gsub(pattern) do |match|
index += 1
index == matches ? replacement : match
end
#=> abc123ABC123
I've used this handy helper method quite a bit:
def gsub_last(str, source, target)
return str unless str.include?(source)
top, middle, bottom = str.rpartition(source)
"#{top}#{target}#{bottom}"
end
If you want to make it more Rails-y, extend it on the String class itself:
class String
def gsub_last(source, target)
return self unless self.include?(source)
top, middle, bottom = self.rpartition(source)
"#{top}#{target}#{bottom}"
end
end
Then you can just call it directly on any String instance, eg "fooBAR123BAR".gsub_last("BAR", "FOO") == "fooBAR123FOO"
.gsub /abc(?=[^abc]*$)/, 'ABC'
Matches a "abc" and then asserts ((?=) is positive lookahead) that no other characters up to the end of the string are "abc".

What is the canonical way to trim a string in Ruby without creating a new string?

This is what I have now - which looks too verbose for the work it is doing.
#title = tokens[Title].strip! || tokens[Title] if !tokens[Title].nil?
Assume tokens is a array obtained by splitting a CSV line.
now the functions like strip! chomp! et. all return nil if the string was not modified
"abc".strip! # => nil
" abc ".strip! # => "abc"
What is the Ruby way to say trim it if it contains extra leading or trailing spaces without creating copies?
Gets uglier if I want to do tokens[Title].chomp!.strip!
I guess what you want is:
#title = tokens[Title]
#title.strip!
The #strip! method will return nil if it didn't strip anything, and the variable itself if it was stripped.
According to Ruby standards, a method suffixed with an exclamation mark changes the variable in place.
Update: This is output from irb to demonstrate:
>> #title = "abc"
=> "abc"
>> #title.strip!
=> nil
>> #title
=> "abc"
>> #title = " abc "
=> " abc "
>> #title.strip!
=> "abc"
>> #title
=> "abc"
Btw, now ruby already supports just strip without "!".
Compare:
p "abc".strip! == " abc ".strip! # false, because "abc".strip! will return nil
p "abc".strip == " abc ".strip # true
Also it's impossible to strip without duplicates. See sources in string.c:
static VALUE
rb_str_strip(VALUE str)
{
str = rb_str_dup(str);
rb_str_strip_bang(str);
return str;
}
ruby 1.9.3p0 (2011-10-30) [i386-mingw32]
Update 1:
As I see now -- it was created in 1999 year (see rev #372 in SVN):
Update2:
strip! will not create duplicates — both in 1.9.x, 2.x and trunk versions.
There's no need to both strip and chomp as strip will also remove trailing carriage returns - unless you've changed the default record separator and that's what you're chomping.
Olly's answer already has the canonical way of doing this in Ruby, though if you find yourself doing this a lot you could always define a method for it:
def strip_or_self!(str)
str.strip! || str
end
Giving:
#title = strip_or_self!(tokens[Title]) if tokens[Title]
Also keep in mind that the if statement will prevent #title from being assigned if the token is nil, which will result in it keeping its previous value. If you want or don't mind #title always being assigned you can move the check into the method and further reduce duplication:
def strip_or_self!(str)
str.strip! || str if str
end
As an alternative, if you're feeling adventurous you can define a method on String itself:
class String
def strip_or_self!
strip! || self
end
end
Giving one of:
#title = tokens[Title].strip_or_self! if tokens[Title]
#title = tokens[Title] && tokens[Title].strip_or_self!
If you are using Ruby on Rails there is a squish
> #title = " abc "
=> " abc "
> #title.squish
=> "abc"
> #title
=> " abc "
> #title.squish!
=> "abc"
> #title
=> "abc"
If you are using just Ruby you want to use strip
Herein lies the gotcha.. in your case you want to use strip without the bang !
while strip! certainly does return nil if there was no action it still updates the variable so strip! cannot be used inline. If you want to use strip inline you can use the version without the bang !
strip! using multi line approach
> tokens["Title"] = " abc "
=> " abc "
> tokens["Title"].strip!
=> "abc"
> #title = tokens["Title"]
=> "abc"
strip single line approach... YOUR ANSWER
> tokens["Title"] = " abc "
=> " abc "
> #title = tokens["Title"].strip if tokens["Title"].present?
=> "abc"
If you want to use another method after you need something like this:
( str.strip || str ).split(',')
This way you can strip and still do something after :)
I think your example is a sensible approach, although you could simplify it slightly as:
#title = tokens[Title].strip! || tokens[Title] if tokens[Title]
Alternative you could put it on two lines:
#title = tokens[Title] || ''
#title.strip!
If you have either ruby 1.9 or activesupport, you can do simply
#title = tokens[Title].try :tap, &:strip!
This is really cool, as it leverages the :try and the :tap method, which are the most powerful functional constructs in ruby, in my opinion.
An even cuter form, passing functions as symbols altogether:
#title = tokens[Title].send :try, :tap, &:strip!
My way:
> (#title = " abc ").strip!
=> "abc"
> #title
=> "abc"
#title = tokens[Title].strip! || tokens[Title]
It's entirely possible i'm not understanding the topic, but wouldn't this do what you need?
" success ".strip! || "rescue" #=> "success"
"failure".strip! || "rescue" #=> "rescue"

Resources