Is there a different way to replace these brackets using Ruby regex? - ruby

I have a string which contains a 2-D array.
b= "[[1, 2, 3], [4, 5, 6]]"
c = b.gsub(/(\[\[)/,"[").gsub(/(\]\])/,"]")
The above is how I decide to flatten it to:
"[1, 2, 3], [4, 5, 6]"
Is there a way to replace the leftmost and rightmost brackets without doing a double gsub call? I'm doing a deeper dive into regular expressions and would like to see different alternatives.
Sometimes, the string may be in the correct format as comma delimited 1-D arrays.

The gsub method accepts a hash, and anything that matches your regular expression will be replaced using the keys/values in that hash, like so:
b = "[[1, 2, 3], [4, 5, 6]]"
c = b.gsub(/\[\[|\]\]/, '[[' => '[', ']]' => ']')
That may look a little jumbled, and in practice I'd probably define the list of swaps on a different line. But this does what you were looking for with one gsub, in a more intuitive way.
Another option is to take advantage of the fact that gsub also accepts a block:
c = b.gsub(/\[\[|\]\]/){|matched_value| matched_value.first}
Here we match any double opening/closing square brackets, and just take the first letter of any matches. We can clean up the regex:
c = b.gsub(/\[{2}|\]{2}/){|matched_value| matched_value.first}
This is a more succinct way to specify that we want to match exactly two opening brackets, or exactly two closing brackets. We can also refine the block:
c = b.gsub(/\[{2}|\]{2}/, &:first)
Here we're using some Ruby shorthand. If you only need to call a simple method on the object passed into a block, you can use the &: notation to do this. I think I've gotten it about as short and sweet as I can. Happy coding!

\[(?=\[)|(?<=\])\]
You can try this.Replace with ``.See demo.
http://regex101.com/r/hQ1rP0/25

Don't even bother with a regular expression, just do a simple string slice:
b= "[[1, 2, 3], [4, 5, 6]]"
b[1 .. -2] # => "[1, 2, 3], [4, 5, 6]"
the string may be in the correct format as comma delimited 1D arrays
Then sense whether it is and conditionally modify it:
b= "[[1, 2, 3], [4, 5, 6]]"
b = b[1 .. -2] if b[0, 2] == '[[' # => "[1, 2, 3], [4, 5, 6]"
Regular expressions aren't universal hammers, and not everything is a nail to be hit with one.

To "squeeze" consecutive occurrences of a specific character set, you can use tr_s:
"[[1,2],[3,4]]".tr_s('[]','[]')
=> "[1,2],[3,4]"
You're saying "translate all runs of square bracket characters to one of that character". To do the same thing with regular expressions and gsub, you can do:
"[[1,2],[3,4]]".gsub(/(\[|\])+/,'\1')

Related

How is the splat operator understood when applied to a range expression?

I found that the expression [*1..4] returns the same as if I would do a (1..4).to_a, but I don't understand the syntax here. My understanding is that * is - being a unary operator in this case - to be the splat operator, and to the right of it, we have a Range. However, if just write the expression *1..4, this is a syntax error, and *(1..4) is a syntax error too. Why does the first [*1..4] work and how it is understood in detail?
The splat * converts the object to an list of values (usually an argument list) by calling its to_a method, so *1..4 is equivalent to:
1, 2, 3, 4
On its own, the above isn't valid. But wrapped within square brackets, [*1..4] becomes:
[1, 2, 3, 4]
Which is valid.
You could also write a = *1..4 which is equivalent to:
a = 1, 2, 3, 4
#=> [1, 2, 3, 4]
Here, the list of values becomes an array due to Ruby's implicit array assignment.

Block with two parameters

I found this code by user Hirolau:
def sum_to_n?(a, n)
a.combination(2).find{|x, y| x + y == n}
end
a = [1, 2, 3, 4, 5]
sum_to_n?(a, 9) # => [4, 5]
sum_to_n?(a, 11) # => nil
How can I know when I can send two parameters to a predefined method like find? It's not clear to me because sometimes it doesn't work. Is this something that has been redefined?
If you look at the documentation of Enumerable#find, you see that it accepts only one parameter to the block. The reason why you can send it two, is because Ruby conveniently lets you do this with blocks, based on it's "parallel assignment" structure:
[[1,2,3], [4,5,6]].each {|x,y,z| puts "#{x}#{y}#{z}"}
# 123
# 456
So basically, each yields an array element to the block, and because Ruby block syntax allows "expanding" array elements to their components by providing a list of arguments, it works.
You can find more tricks with block arguments here.
a.combination(2) results in an array of arrays, where each of the sub array consists of 2 elements. So:
a = [1,2,3,4]
a.combination(2)
# => [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
As a result, you are sending one array like [1,2] to find's block, and Ruby performs the parallel assignment to assign 1 to x and 2 to y.
Also see this SO question, which brings other powerful examples of parallel assignment, such as this statement:
a,(b,(c,d)) = [1,[2,[3,4]]]
find does not take two parameters, it takes one. The reason the block in your example takes two parameters is because it is using destruction. The preceding code a.combination(2) gives an array of arrays of two elements, and find iterates over it. Each element (an array of two elements) is passed at a time to the block as its single parameter. However, when you write more parameters than there is, Ruby tries to adjust the parameters by destructing the array. The part:
find{|x, y| x + y == n}
is a shorthand for writing:
find{|(x, y)| x + y == n}
The find function iterates over elements, it takes a single argument, in this case a block (which does take two arguments for a hash):
h = {foo: 5, bar: 6}
result = h.find {|k, v| k == :foo && v == 5}
puts result.inspect #=> [:foo, 5]
The block takes only one argument for arrays though unless you use destructuring.
Update: It seems that it is destructuring in this case.

% notation in Ruby

I've read % notation but I could not find the explanation about the followings.
Example 1: The following code with % outputs i. Obviously % changes i to a string. But I am not sure what actually % is doing.
irb(main):200:0> [[1,2,3],[4,5,6]].each{ |row| p row.map{ |i| % i } }
["i", "i", "i"]
["i", "i", "i"]
=> [[1, 2, 3], [4, 5, 6]]
irb(main):201:0> [[1,2,3],[4,5,6]].each{ |row| p row.map{ |i| i } }
[1, 2, 3]
[4, 5, 6]
=> [[1, 2, 3], [4, 5, 6]]
Example 2: It seems %2d adding 2 spaces in front of a number. Again, I am not sure what %2d is doing.
irb(main):194:0> [[1,2,3],[4,5,6],[7,8,9]].each{ |row| p row.map{|i| "%2d" % i } }
[" 1", " 2", " 3"]
[" 4", " 5", " 6"]
[" 7", " 8", " 9"]
=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Where can I find the documentation about these?
Here is the doc - You may also create strings using %:.
There are two different types of % strings %q(...) behaves like a single-quote string (no interpolation or character escaping) while %Q behaves as a double-quote string.....
In your first example p row.map{|i| % i } as per the above doc % i creates a string "i".
Examples :-
[1, 2, 3].map { |i| % i } # => ["i", "i", "i"]
% i # => "i"
Just remember as doc is saying -
Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a percent-string is not last.
From the wikipedia link
Any single non-alpha-numeric character can be used as the delimiter, %[including these], %?or these?,...
Now in your case it is %<space>i<space>. Which in the link I mentioned just above are %[..], %?..? etc.. That is why %<space>i<space> gives "i". (I used <space> to show there is a space)
Read Kernel#format
Returns the string resulting from applying format_string to any additional arguments. Within the format string, any characters other than format sequences are copied to the result.
The syntax of a format sequence is follows.
%[flags][width][.precision]type
A format sequence consists of a percent sign, followed by optional flags, width, and precision indicators, then terminated with a field type character. The field type controls how the corresponding sprintf argument is to be interpreted, while the flags modify that interpretation.
Your last question actually points to a method str % arg → new_str.
If IRB made you fool, like made me while trying to understand % i, don't worry, have a look - why in IRB modulo string literal(%) is behaving differently ?. Good answer Matthew Kerwin is given there.

Marking an unused block variable

When there is a block or local variable that is not to be used, sometimes people mark it with *, and sometimes with _.
{[1, 2] => 3, [4, 5] => 6}.each{|(x, *), *| p x}
{[1, 2] => 3, [4, 5] => 6}.each{|(x, _), _| p x}
{[1, 2, 3], [4, 5, 6]}.each{|*, x, *| p x}
{[1, 2, 3], [4, 5, 6]}.each{|_, x, _| p x}
def (x, *), *; p x; end
def (x, _), _; p x; end
def *, x, *; p x; end
def _, x, _; p x; end
What are the differences between them, and when should I use which? When there is need to mark multiple variables as unused as in the above examples, is either better?
A * means "all remaining parameters". An _ is just another variable name, although it is a bit special. So they are different, for example the following does not make sense:
[[1, 2, 3], [4, 5, 6]].each{|*, x, *| p x} # Syntax error
Indeed, how is Ruby supposed to know if the first star should get 0, 1 or 2 of the values (and the reverse)?
There are very few cases where you want to use a star to ignore parameters. An example would be if you only want to use the last of a variable number of parameters:
[[1], [2, 3], [4, 5, 6]].each{|*, last| p last} # => prints 1, 3 and 6
Ruby allows you to not give a name to the "rest" of the parameters, but you can use _:
[[1], [2, 3], [4, 5, 6]].each{|*_, last| p last} # => prints 1, 3 and 6
Typically, the number of parameters is known and your best choice is to use a _:
[[1, 2, 3], [4, 5, 6]].each{|_, mid, _| p mid} # prints 2 and 5
Note that you could leave the last paramater unnamed too (like you can when using a *), although it is less obvious:
[[1, 2, 3], [4, 5, 6]].each{|_, mid, | p mid} # prints 2 and 5
Now _ is the designated variable name to use when you don't want to use a value. It is a special variable name for two reasons:
Ruby won't complain if you don't use it (if warnings are on)
Ruby will allow you to repeat it in the argument list.
Example of point 1:
> ruby -w -e "def foo; x = 42; end; foo"
-e:1: warning: assigned but unused variable - x
> ruby -w -e "def foo; _ = 42; end; foo"
no warning
Example of point 2:
[[1, 2, 3], [4, 5, 6]].each{|unused, mid, unused| p mid}
# => SyntaxError: (irb):23: duplicated argument name
[[1, 2, 3], [4, 5, 6]].each{|_, mid, _| p mid}
# => prints 2 and 5
Finally, as #DigitalRoss notes, _ holds the last result in irb
Update: In Ruby 2.0, you can use any variable starting with _ to signify it is unused. This way the variable name can be more explicit about what is being ignored:
_scheme, _domain, port, _url = parse_some_url
# ... do something with port
I think it's mostly stylistic and programmer's choice. Using * makes more sense to me in Ruby because its purpose is to accumulate all parameters passed from that position onward. _ is a vestigial variable that rarely sees use in Ruby, and I've heard comments that it needs to go away. So, if I was to use either, I'd use *.
SOME companies might define it in their programming style document, if they have one, but I doubt it's worth most of their time because it is a throw-away variable. I've been developing professionally for over 20 years, and have never seen anything defining the naming of a throw-away.
Personally, I don't worry about this and I'd be more concerned with the use of single-letter variables. Instead of either, I would use unused or void or blackhole for this purpose.
IMO the practice makes code less readable, and less obvious.
Particularly in API methods taking blocks it may not be clear what the block actually expects. This deliberately removes information from the source, making maintenance and modification more difficult.
I'd rather the variables were named appropriately; in a short block it will be obvious it's not being used. In longer blocks, if the non-use is remarkable, a comment may elaborate on the reason.
What are the differences between them?
In the _ case a local variable _ is being created. It's just like using x but named differently.
In the * case the assignment of an expression to * creates [expression]. I'm not quite sure what it's useful for as it doesn't seem to do anything that just surrounding the expression with brackets does.
When should I use which?
In the second case you don't end up with an extra symbol being created but it looks like slightly more work for the interpreter. Also, it's obvious that you will never use that result, whereas with _ one would have to read the loop to know if it's used.
But I predict that the quality of your code will depend on other things than which trick you use to get rid of unused block parameters. The * does have a certain obscure cool-factor that I kind of like.
Note: when experimenting with this, be aware that in irb, _ holds the value of the last expression evaluated.

Rspec: "array.should == another_array" but without concern for order

I often want to compare arrays and make sure that they contain the same elements, in any order. Is there a concise way to do this in RSpec?
Here are methods that aren't acceptable:
#to_set
For example:
expect(array.to_set).to eq another_array.to_set
or
array.to_set.should == another_array.to_set
This fails when the arrays contain duplicate items.
#sort
For example:
expect(array.sort).to eq another_array.sort
or
array.sort.should == another_array.sort
This fails when the arrays elements don't implement #<=>
Try array.should =~ another_array
The best documentation on this I can find is the code itself, which is here.
Since RSpec 2.11 you can also use match_array.
array.should match_array(another_array)
Which could be more readable in some cases.
[1, 2, 3].should =~ [2, 3, 1]
# vs
[1, 2, 3].should match_array([2, 3, 1])
I've found =~ to be unpredictable and it has failed for no apparent reason. Past 2.14, you should probably use
expect([1, 2, 3]).to match_array([2, 3, 1])
Use match_array, which takes another array as an argument, or contain_exactly, which takes each element as a separate argument, and is sometimes useful for readability. (docs)
Examples:
expect([1, 2, 3]).to match_array [3, 2, 1]
or
expect([1, 2, 3]).to contain_exactly 3, 2, 1
For RSpec 3 use contain_exactly:
See https://relishapp.com/rspec/rspec-expectations/v/3-2/docs/built-in-matchers/contain-exactly-matcher for details, but here's an extract:
The contain_exactly matcher provides a way to test arrays against each other in a way
that disregards differences in the ordering between the actual and expected array.
For example:
expect([1, 2, 3]).to contain_exactly(2, 3, 1) # pass
expect([:a, :c, :b]).to contain_exactly(:a, :c ) # fail
As others have pointed out, if you want to assert the opposite, that the arrays should match both contents and order, then use eq, ie.:
expect([1, 2, 3]).to eq([1, 2, 3]) # pass
expect([1, 2, 3]).to eq([2, 3, 1]) # fail
not documented very well but i added links anyways:
Rspec3 docs
expect(actual).to eq(expected)
Rspec2 docs
expect([1, 2, 3]).to match_array([2, 3, 1])

Resources