Is order preserved in arrays when intersection is done? - ruby

When I do an intersection of two arrays, is there any guarantee that the resulting order is based on the order of the first array?
For example, if I have
a = [1,2,3]
b = [3,2,1]
could a & b return [3,2,1], instead of [1,2,3] (which is what I'd expect)?
I can't find anything directly addressing this in either RDoc's or the Pickaxe's documentation of Array.
RubySpec has a specification that it creates an array with elements in order they are first encountered, but should I assume YARV Ruby will honour that specification?

Looks like it is a guaranteed feature. They upgraded the RDoc description in revision 39415. It should be reflected sometime.

YARV is official Ruby interpreter since version 1.9, so I assume it doeas honour that specification. And BTW, RubySpec was created as executable specification of the language, allowing developers to create their own interpreters that pass this spec and I believe that YARV was created with this in RubySpec in mind.

There is no guarantee in the documentation but intersection follows the left-hand set's order.
The source shows that array a sets the order:
...
for (i=0; i<RARRAY_LEN(ary1); i++) {
vv = (st_data_t)(v = rb_ary_elt(ary1, i));
if (st_delete(RHASH_TBL(hash), &vv, 0)) {
rb_ary_push(ary3, v);
}
}
...
It's simple to test:
a = [1,2,3]
b = [3,2,1]
a.size.times do
puts "a = #{ a.join(',') }\ta & b = #{ (a & b).join(',') }\tb & a = #{ (b & a).join(',') }"
a.rotate!
end
Which outputs:
a = 1,2,3 a & b = 1,2,3 b & a = 3,2,1
a = 2,3,1 a & b = 2,3,1 b & a = 3,2,1
a = 3,1,2 a & b = 3,1,2 b & a = 3,2,1
That's using Ruby 1.9.3-p374

Related

How to do pre increment in ruby?

I want to pre-increment a variable value in ruby but i can't.
in java we can do this
int a=50;
++a;
System.out.println(a);
but how to do this in ruby ?
if i do this it gives me error
a=50
1-=a
puts a
Use Ruby's Abbreviated Assignment
Ruby doesn't implement a ++ operator as such, either pre or post. However, you don't really need it, either. Since everything in Ruby is an expression, the following is idiomatic Ruby code that does what you likely expect in a more Ruby-centric way:
a = 50
p a+=1
#=> 51
This works because a+=1 increments the value of a, assigns the result back to a, and then returns the result. Under the hood, this is largely equivalent to writing:
a = 50
a = a + 1
Kernel.p(a)
but is shorter and easier to read because the abbreviated assignment is evaluated and passed as an argument to Kernel#p, where it's both sent to standard output and returned as a value.
For the sake of completeness, here are equivalent pieces of code for pre- and post-increment.
Java:
public class PreProIncrement
{
public static void main(String[] args)
{
int a = 50;
System.out.println(++a);
System.out.println(a);
System.out.println(a++);
System.out.println(a);
}
}
Ruby:
a = 50
p(a = a + 1) # or p(a+=1)
p(a)
a = p(a) + 1
p(a)
Both have the same output:
51
51
51
52
In Ruby, p a is used instead of puts a because p a displays and returns a, while puts a displays a but returns nil.

Ruby: evaluate string with dynamic binding of variables

I have a database of "formulas" stored as strings. Let's assume for simplicity, that each formula contains 2 variables denoted by a and b, and that the formulas are all wellformed and it is ensured that it consists only of characters from the set ()ab+-*.
At runtime, formulas are fetched from this database, and from another source, numeric values for a and b are fetched, and the formulas are evaluated. The evaluation can be programmed like this:
# This is how it works right now
formula = fetch_formula(....)
a = fetch_left_arg(....)
b = fetch_right_arg(....)
result = eval(formula)
This design works, but I'm not entirely happy with it. It requires that my program names the free variables exactly the same as they are named in the formula, which is ugly.
If my "formula" would not be a string, but a Proc object or Lambda which accepts two parameters, I could do something like
# No explicitly named variables
result = fetch_proc(...).call(fetch_left_arg(....),fetch_right_arg(....))
but unfortunately, the formulas have to be strings.
I tried to experiment in the following way: What if the method, which fetches the formula from the database, would wrap the string into something, which behaves like a block, and where I could pass parameters to it?
# This does not work of course, but maybe you get the idea:
block_string = "|a,b| #{fetch_formula(....)}"
Of course I can't eval such a block_string, but is there something similar which I could use? I know that instance_eval can pass parameters, but what object should I apply it to? So this is perhaps not an option either....
This is very nasty approach, but for simple formulas you’ve mentioned it should work:
▶ formula = 'a + b'
▶ vars = formula.scan(/[a-z]+/).uniq.join(',') # getting vars names
#⇒ "a,b"
▶ pr = eval("proc { |#{vars}| #{formula} }") # preparing proc
▶ pr.call 3, 5
#⇒ 8
Here we rely on the fact, that parameters are passed to the proc in the same order, as they appear in the formula.
If I get your question correctly, it is something that I have done recently, and is fairly easy. Given a string:
s = "{|x, y| x + y}"
You can create a proc by doing:
eval("Proc.new#{s}")
One way to avoid creating the variables in the local scope could be to use a Binding:
bind = binding
formula = fetch_formula(....)
bind.local_variable_set :a, fetch_left_arg(....)
bind.local_variable_set :b, fetch_right_arg(....)
result = bind.eval(formula)
The variables a and b now only exist in the binding, and do not pollute the rest of your code.
You can create a lambda from string, as shown below:
formula = "a + b"
lambda_template = "->(a,b) { %s }"
formula_lambda = eval(lambda_template % formula)
p formula_lambda.call(1,2)
#=> 3

Sort array of hashes based on value of key in hash?

I'm attempting to work with Vagrant to perform some automation in spinning up Docker containers. Vagrantfiles are essentially Ruby and thus I should be able to apply Ruby logic to assist with this issue.
I am reading through a conf.d directory filled with YAML files containing configuration data and then pushing a hash of configuration items into an array. Once done I am itering through the array with .each and applying the configuration to each entry in the array based on the values of some of the keys inside the hash. One of these keys is "link". The value of link will correlate to the value of another key "name".
I essentially need to ensure that the hash with link => 'name' is in the array prior to the hash with name => 'value'.
Example of input and expected output:
Input
containers = [{"name"=>"foo", "ports"=>["80:80", "443:443"], "links"=>["bar", "baz"]}, {"name"=>"bar", "ports"=>["8888:8888"]}, {"name"=>"baz","ports"=>"80:80"}]
Expected Output
containers = [{"name"=>"bar", "ports"=>["8888:8888"]}, {"name"=>"baz", "ports"=>"80:80"}, {"name"=>"foo", "ports"=>["80:80", "443:443"], "links"=>["bar", "baz"]}]
The end result is that any entry with "link" appears after an entry in the array where the hash's name key matches it. (Basically dependency ordering based on the link key.)
Note it may occur that a linked container links to another linked container.
It's been puzzling me a bit as I have the ideas of what I need to do but lack the technical chops to actually figure out "How?" :)
Thanks in advance for any assistance.
This should work for you:
def order_containers(containers)
unordered = containers.dup
ordered = []
names_from_ordered = {}
name_is_ordered = names_from_ordered.method(:[])
until unordered.empty?
container = unordered.find do |c|
c.fetch('links', []).all? &name_is_ordered
end
raise 'container ordering impossible' if !container
ordered << container
unordered.delete(container)
names_from_ordered[container.fetch('name')] = true
end
ordered
end
containers = [
{ 'name'=>'foo', 'links'=>['bar'] },
{ 'name'=>'a', 'links'=>['goo'] },
{ 'name'=>'bar' },
{ 'name'=>'goo', 'links'=>['foo'] },
]
containers = order_containers(containers)
require 'pp'
pp containers
# => [{"name"=>"bar"},
# {"name"=>"foo", "links"=>["bar"]},
# {"name"=>"goo", "links"=>["foo"]},
# {"name"=>"a", "links"=>["goo"]}]
The basic idea is that we use a loop, and each iteration of the loop will find one container from the input list that is suitable for adding to the output list. A container is suitable for adding to the output list if all the containers it depends on have already been added to the output list. The container is then removed from the input list and added to the output list.
This loop can terminate in two main ways:
when the input list is empty, which would indicate success, or
when we cannot find a container that we are able to start, which would be an error caused by a circular dependency.
Seems to me the simplest thing would be something like:
linkless_configs = []
linked_configs = []
if config_hash.has_key?("links")
linked_configs.push(config_hash)
else
linkless_configs.push(config_hash)
end
then you can iterator over linkless_configs + linked_configs and be guaranteed that each linked config comes after the corresponding link-less config.
Alternatively, if you must sort, you could
containers.sort_by { |config| config.has_key?("links") ? 1 : 0 }
[Edit: #DavidGrayson has pointed out a flaw with my answer. I'll see if I can find a fix, but if I cannot, and I fear that may be the case, I'll delete the answer. [Edit#2: Oh, my! Someone upvoted my answer after my initial edit. I'm not sure I can delete it now, but to be truthful, I'd already decided not to do so, mainly because my explanation has implications for any proposed solution to the OP's problem. With 10 points in the balance, leaving it up is now even more compelling. 2#tidE]
I believe I understand the problem. sort requires a total order, which is a partial order in which a <= b or a <= b for every pair of elements. ref The latter is not a problem, but the partial order requirement is. A partial order must satisfy axioms of:
reflexivity (x ≤ x),
antisymmetry (if x ≤ y and y ≤ x then x = y) and
transitivity (if x ≤ y and y ≤ z, then x ≤ z).
My ordering only satisfies the reflexivity axiom. David gives the counter-example:
containers = [h0, h1, h2]
where
h0 = {'name'=>'foo', 'links'=>['bar']},
h1 = {'name'=>'a'},
h2 = {'name'=>'bar'},
containers.sort
#=> [{"name"=>"foo", "links"=>["bar"]},
# {"name"=>"a"}, {"name"=>"bar"}]
My method Hash#<=> establishes:
h0 = h1
h0 > h2
h1 = h2
If sort were to find that h0 = h1 = h2, it would conclude, by transitivity, that h0 = h2 (and not check h0 <=> h2), which may result in an incorrect result.
David also points out that o.follows?(self) should raise an exception because I have defined it as private. As I have not yet encountered an exception, I conclude that statement has not been executed, but I have not traced the reason for that, but that's a minor point (though no doubt a useful clue).
I'm grateful to David for identifying the problem. Incorrect answers need to be exposed, of course, but I feel I've learned something useful as well.
tidE]
If I understand the question correctly, and the data provides a valid ordering, I think you could do it as follows.
class Hash
def <=>(o)
case
when follows?(o) then 1
when o.follows?(self) then -1
else 0
end
end
private
def follows?(o)
key?("links") && self["links"].include?(o["name"])
end
end
containers = [{"name"=>"foo", "ports"=>["80:80", "443:443"],
"links"=>["bar", "baz"]},
{"name"=>"bar", "ports"=>["8888:8888"]},
{"name"=>"baz","ports"=>"80:80"}]
containers.sort
#=> [{"name"=>"baz", "ports"=>"80:80"},
# {"name"=>"bar", "ports"=>["8888:8888"]},
# {"name"=>"foo", "ports"=>["80:80", "443:443"],
# "links"=>["bar", "baz"]}]
Addendum
Although I prefaced with the assumption that the data provides a valid ordering, #Ajedi32 asks what happens when there is a circular reference. Let's find out:
containers = [{"name"=>"foo", "links"=>["bar"]},
{"name"=>"bar", "links"=>["baz"]},
{"name"=>"baz", "links"=>["foo"]}]
containers.sort
#=> [{ "name"=>"baz", "links"=>["foo"] },
# { "name"=>"bar", "links"=>["baz"] },
# { "name"=>"foo", "links"=>["bar"] }]
containers = [{"name"=>"foo", "links"=>["bar"]},
{"name"=>"bar", "links"=>["foo"]}]
containers.sort
#=> [{ "name"=>"bar", "links"=>["foo"] },
# { "name"=>"foo", "links"=>["bar"] }]
This shows that if one were not certain that there were no circular references, one should check for that before sorting.

Pig - how to iterate on a bag of maps

Let me explain the problem. I have this line of code:
u = FOREACH persons GENERATE FLATTEN($0#'experiences') as j;
dump u;
which produces this output:
([id#1,date_begin#12 2012,description#blabla,date_end#04 2013],[id#2,date_begin#02 2011,description#blabla2,date_end#04 2013])
([id#1,date_begin#12 2011,description#blabla3,date_end#04 2012],[id#2,date_begin#02 2010,description#blabla4,date_end#04 2011])
Then, when I do this:
p = foreach u generate j#'id', j#'description';
dump p;
I have this output:
(1,blabla)
(1,blabla3)
But that's not what I wanted. I would like to have an output like this:
(1,blabla)
(2,blabla2)
(1,blabla3)
(2,blabla4)
How could I have this ?
Thank you very much.
I'm assuming that the $0 you are FLATTENing in u is a tuple.
The overall problem is that j is only referencing the first map in the tuple. In order to get the output you want, you'll have to convert each tuple into a bag, then FLATTEN it.
If you know that each tuple will have up to two maps, you can do:
-- My B is your u
B = FOREACH A GENERATE (tuple(map[],map[]))$0#'experiences' AS T ;
B2 = FOREACH B GENERATE FLATTEN(TOBAG(T.$0, T.$1)) AS j ;
C = foreach B2 generate j#'id', j#'description' ;
If you don't know how many fields will be in the tuple, then this is will be much harder.
NOTE: This works for pig 0.10.
For tuples with an undefined number of maps, the best answer I can think of is using a UDF to parse the bytearray:
myudf.py
#outputSchema('vals: {(val:map[])}')
def foo(the_input):
# This converts the indeterminate number of maps into a bag.
foo = [chr(i) for i in the_input]
foo = ''.join(foo).strip('()')
out = []
for f in foo.split('],['):
f = f.strip('[]')
out.append(dict((k, v) for k, v in [ i.split('#') for i in f.split(',')]))
return out
myscript.pig
register 'myudf.py' using jython as myudf ;
B = FOREACH A GENERATE FLATTEN($0#'experiences') ;
T1 = FOREACH B GENERATE FLATTEN(myudf.foo($0)) AS M ;
T2 = FOREACH T1 GENERATE M#'id', M#'description' ;
However, this relies on the fact that #, ,, or ],[ will not appear in any of the keys or values in the map.
NOTE: This works for pig 0.11.
So it seems that how pig handles the input to the python UDFs changed in this case. Instead of a bytearray being the input to foo, the bytearray is automatically converted to the appropriate type. In that case it makes everything much easier:
myudf.py
#outputSchema('vals: {(val:map[])}')
def foo(the_input):
# This converts the indeterminate number of maps into a bag.
out = []
for map in the_input:
out.append(map)
return out
myscript.pig
register 'myudf.py' using jython as myudf ;
# This time you should pass in the entire tuple.
B = FOREACH A GENERATE $0#'experiences' ;
T1 = FOREACH B GENERATE FLATTEN(myudf.foo($0)) AS M ;
T2 = FOREACH T1 GENERATE M#'id', M#'description' ;

Ruby 1.9 bug? -- Array.permutation

While trying problem 41 from the Euler Project, I ran across what seems to be a bug in the Ruby 1.9 implementation of Array.permutation. Here's the problem code isolated:
n = 4
slice = '987654321'.chars.to_a[-n..-1]
puts "slice = #{slice.join}"
slice.permutation(n) {|perm| puts perm.join}
slice2 = slice.dup
puts "slice2 = #{slice2.join}"
slice2.permutation(n) {|perm| puts perm.join}
slice3 = []
(0...n).each {|i| slice3[i] = slice[i]}
puts "slice3 = #{slice3.join}"
slice3.permutation(n) {|perm| puts perm.join}
My output for slice and slice2 is:
slice = 4321
9876
9867
9786
9768
9687
...
However, slice3 comes out right, with the digits 1 to 4 being permuted. Also n = 4 is the first value that has this problem. When I set n = 3, I get the expected output. Is this a bug, or am I mis-coding something? A quick Google search didn't turn up anything.
It is a known bug which is fixed in 1.9.2p136 and newer.
Easiest way around it, besides updating to a more recent Ruby, is to insure your array is not "shared", either by building a new one (like your slice3), or simply "modifying" it, e.g. slice += [].

Resources