Prevent duplicate objects in Ruby - ruby

This isn't exactly a singleton, but it's close, so I imagine it's common. I have a class (Foo) in which instances correspond to external data structures with unique IDs. I want to ensure that no two instances of Foo can have the same ID - if a constructor is called with the same id value, the original Foo instance with that ID, and all the other values are simply updated. In other words, something like:
class Foo
def initialize(id, var1, var2)
if Foo.already_has? id
t = Foo.get_object(id)
t.var1 = var1
t.var2 = var2
return t
else
#var1 = var1
#var2 = var2
end
end
I can think of two ways to do this:
I could keep an array of all instances of Foo as a class-level variable, then calling foo_instances.push(self) at the end of the initialization method. This strikes me as kind of ugly.
I believe Ruby already keeps track of instances of each class in some array - if so, is this accessible, and would it be any better than #1?
??? (Ruby seems to support some slick [meta-]programming tricks, so I wouldn't be surprised if there already is a tidy way of doing this that I'm missing.

You can override Foo.new in your object, and do whatever you want in there:
class Foo
def self.new(id, var1, var2)
return instance if instance = self.lookup
instance = self.allocate
instance.send :initialize, var1, var2
return self.store(instance)
end
end
You can also, obviously, use a different class method to obtain the object; make the initialize method private to help discourage accidental allocation.
That way you only have one instance with every ID, which is generally much less painful than the pattern you propose.
You still need to implement the store and lookup bits, and there isn't anything better than a Hash or Array in your class to do that with.
You want to think about wrapping the things you store in your class in a WeakRef instance, but still returning the real object. That way the class can enforce uniqueness without also constraining that every single ID ever used remain in memory all the time.
That isn't appropriate to every version of your circumstances, but certainly to some. An example:
# previous code omitted ...
return store(instance)
end
def self.store(instance)
#instances ||= {}
#instances[instance.id] = WeakRef.new(instance)
instance
end
def self.lookup(id)
#instances ||= {}
if weakref = #instances[id] and weakref.weakref_alive?
return weakref.__getobj__ # the wrapped instance
else
return nil
end
end

Related

How to create "#property" and "#property=" that are actually methods

Is there were a way to make an #property into a method with set and get, so, #property would call a method instead of returning an actual property, and #property = someval would also call a method instead of assigning to an actual property?
In my project, objects store values in a database. Consider this simple database module that stores records in memory. In my real life project it's a DBM like PostgreSQL:
module MyDB
RECORDS = {}
def self.create(pk)
RECORDS[pk] ||= {}
end
def self.set(pk, key, val)
return RECORDS[pk][key] = val
end
def self.get(pk, key)
return RECORDS[pk][key]
end
end
Objects have fields that are stored in that database. So, in this class, the species field is stored in and retrieved from the database:
class Pet
def initialize(pk)
#pk = pk
MyDB.create(#pk)
end
def species=(val)
MyDB.set #pk, 'breed', val
end
def species()
return MyDB.get(#pk, 'breed')
end
end
A simple use of the Pet class could look like this:
motley = Pet.new('motley')
motley.species = 'cat'
It works currently, but here's where I ran into an annoyance. I did something like this within the class:
def some_method(newval)
#species = newval
end
Then, when I ran the code I got this result:
motley.some_method 'whatever'
puts motley.species #=> cat
Then I realize that wasn't corrent and what I should have done is:
def some_method(newval)
self.species = newval
end
I think #species = newval makes sense. It feels like I'm setting a property of the object.
Is were a way to assign a method to the property, something like:
def :#species=(val)
return MyDB.set(#pk, 'breed', 'val')
end
def :#species
return MyDB.get(#pk, 'breed')
end
Is there a way to do such a thing? Should there be?
Is there a way to do such a thing?
No. In Ruby setter and getter methods are the way to get/set the internal state of an object. Instance variables are just lexical variables that are scoped to an instance.
Ruby is a language based on message passing and #foo = bar sends the message =, bar to the recipient that is the lexical variable #foo. If it called self##foo= instead that would break the entire model of the language.
Should there be?
Hell no.
Do we really need a completely new language feature just because you find it hard to remember to call self.foo= instead of #foo =? No.
Would this feature add anything to the language that cannot already be done? No.
Would it break existing code? Yes.

Copy object without any pointers in Ruby

I am trying to write a Greedy Algorithm for a certain problem. Simplified it looks like this:
There's an object called Foo with an randomized attribute called value and a method that changes this value change_value in a way that depends on an integer input
class Foo
def initialize
value = rand(1,10)
end
def change_value(input)
#changes the value in a certain way
end
end
Now the Greedy Algorithmus just gets the new value of Foo for all possible inputs and return the best input.
foo = Foo.new
best_value = 0
best_input = 0
(1..inputs).each do |k|
temp_foo = foo.clone
temp_foo.change_value(k)
if temp_foo.value>best_value
best_value = temp_foo.value
best_input = k
end
end
Foo.change_value(best_input)
The code works nearly as intended. The big problem is that the change_value-method within the each-funtion alters the temp_foo and the foo. What do I need to change to makes those objects completly dependent of each other? I also tried .dub by the way.
I think #clone or #dup won't work because they will share a reference to #value inside Foo.
In any case, you can do it more readably by changing Foo#change_value so it doesn't actually mutate the object but returns a copy:
class Foo
def initialize(value = nil)
#value = value || rand(10)
end
def change_value(input)
# returns a new Foo instance
Foo.new(#value + 1)
end
def value
#value
end
end
Because you're copying data in any case, using an immutable object (Value Object) is more general than some kind of deep clone.
I assume you assign value to the instance variable #value in Foo#initialize not the local variable value.
I also assume you don't have a simple primitive like in your code above but rather another object that contains a pointer, otherwise you most probably would not have such problem. In other words, I assume your change_value method makes an operation that relies on the #value pointer, such as #value[key] = some_new_value and not pure assignment, such as #value = some_new_object. When your object gets copied with clone or dup, that particular pointer is being copied, instead of the underlying structure, and therefore any calls to temp_foo.change_value will result in changes to foo's underlying #value.
To avoid this, you need to duplicate the object #value refers to. There is a trick you can use with Marshal, as discussed in this post, but I recommend against it since it causes a great deal of overhead. Instead, I would define a deep_dup method, such as below:
class Foo
def deep_dup
# Either this
#value = #value.dup
# OR this, and define the method #deep_dup in the class of #value
# to dup its internal structure too:
#value = #value.deep_dup
end
end
Then instead of doing temp_foo = foo.clone do temp_foo = foo.deep_dup.

Ruby semantics for accepting an object or its id as an argument

I'm trying to work on the principle of least surprise here...
Let's say you've got a method that accepts two objects. The method needs these to be object instances, but in the place where you initialize the class you may only have reference IDs. This would be common in a router / controller in a web service, for example. The setup might look something like this:
post "/:foo_id/add_bar/:bar_id" do
AddFooToBar.call(...)
end
There are many different ways that this could be solved. To me the most 'idomatic' here is something like this:
def AddFooToBar.call(foo:nil,foo_id:nil,bar:nil,bar_id:nil)
#foo = foo || Foo[foo_id]
#bar = bar || Bar[bar_id]
...
end
Then when you call the method, you could call it like:
AddFooToBar.call(foo: a_foo, bar: a_bar)
AddFooToBar.call(foo_id: 1, bar_id: 2)
This creates a pretty clear interface, but the implementation is a little verbose, particularly if there are more than 2 objects and their names are longer than foo and bar.
You could use a good old fashioned hash instead...
def AddFooToBar.call(input={})
#foo = input[:foo] || Foo[ input[:foo_id] ]
#bar = input[:bar] || Bar[ input[:bar_id ]
end
The method signature is super simple now, but it loses a lot of clarity compared to what you get using keyword arguments.
You could just use a single key instead, especially if both inputs are required:
def AddFooToBar.call(foo:,bar:)
#foo = foo.is_a?(Foo) ? foo : Foo[foo]
#bar = bar.is_a?(Bar) ? bar : Bar[bar]
end
The method signature is simple, though it's a little weird to pass just an ID using the same argument name you'd pass an object instance to. The lookup in the method definition is also a little uglier and less easy to read.
You could just decide not to internalize this at all and require the caller to initialize instances before passing them in.
post "/:foo_id/add_bar/:bar_id" do
foo = Foo[ params[:foo_id] ]
bar = Bar[ params[:bar_id] ]
AddFooToBar.call(foo: foo, bar: bar)
end
This is quite clear, but it means that every place that calls the method needs to know how to initialize the required objects first, rather than having the option to encapsulate that behavior in the method that needs the objects.
Lastly, you could do the inverse, and only allow object ids to be passed in, ensuring the objects will be looked up in the method. This may cause double lookups though, in case you sometimes have instances already existing that you want to pass in. It's also harder to test since you can't just inject a mock.
I feel like this is a pretty common issue in Ruby, particularly when building web services, but I haven't been able to find much writing about it. So my questions are:
Which of the above approaches (or something else) would you expect as more conventional Ruby? (POLS)
Are there any other gotchas or concerns around one of the approaches above that I didn't list which should influence which one works best, or experiences you've had that led you to choose one option over the others?
Thanks!
I would go with allowing either the objects or the ids indistinctively. However, I would not do like you did:
def AddFooToBar.call(foo:,bar:)
#foo = foo.is_a?(Foo) ? foo : Foo[foo]
#bar = bar.is_a?(Bar) ? bar : Bar[foo]
end
In fact, I do not understand why you have Bar[foo] and not Bar[bar]. But besides this, I would put the conditions built-in within the [] method:
def Foo.[] arg
case arg
when Foo then arg
else ...what_you_originally_had...
end
end
Then, I would have the method in question to be defined like:
def AddFooToBar.call foo:, bar:
#foo, #bar = Foo[foo], Bar[bar]
end

How can I change the return value of a class constructor in Ruby?

I have a class, Foo. I want to be able to pass the constructor a Foo instance, foo and get the same instance back out.
In other words, I want this test to pass:
class Foo; end
foo = Foo.new
bar = Foo.new(foo)
assert_equal foo, bar
Anyone know how I can do that? I tried this:
class Foo
def initialize(arg = nil)
return arg if arg
end
end
foo = Foo.new
bar = Foo.new(foo)
assert_equal foo, bar # => fails
but it doesn't work.
Help?
EDIT
Because a number of people have asked for my rationale:
I'm doing rapid analysis of lots of data (many TB) and I am going to have a lot of instances of a lot of objects. For some of these objects, it doesn't make sense to have two different instances with the same data. For example, one such object is a "window" (as in temporal window) object that has two properties: start time and end time. I want to be able to use the constructor in any of these ways and get a window object back:
window = Window.new(time_a, time_b)
window = Window.new([time_a, time_b])
window = Window.new(seconds_since_epoch_a, seconds_since_epoch_b)
window = Window.new(window_obj)
window = Window.new(end => time_b, start => time_a)
...
Some other object that needs a window might be instantiated this way:
obj = SomeObj.new(data => my_data, window => window_arg)
I don't necessarily know what's in window_arg, and I don't really care -- it will accept any single argument that can be interpreted by the Window constructor. In the case of already having a Window instance, I'd rather just use that instance. But the job of interpreting that seems like a concern of the Window constructor. Anyway, as I mentioned I'm churning through many TB of data and creating lots of instances of things. If a window object gets passed around, I want it just to be recognized as a window object and used.
By definition, constructors are meant to return a newly created object of the class they are a member of, so, no you should not override this behavior.
Besides, in Ruby, new calls initialize somewhere within its method body, and its return value is ignored, so either way the value you return from initialize will not be returned from new.
With that said, I think that in your case, you might want to create a factory method that will return different Foo objects based on arguments passed to the factory method:
class Foo
def self.factory(arg = nil)
return arg if arg.kind_of? Foo
Foo.new
end
end
foo = Foo.factory
bar = Foo.factory(foo)
assert_equal foo, bar #passes
def Foo.new(arg=nil)
arg || super
end
initialize is called by new which ignores its return value. Basically the default new method looks like this (except that it's implemented in C, not in ruby):
class Class
def new(*args, &blk)
o = allocate
o.send(:initialize, *args, &blk)
o
end
end
So the newly allocated object is returned either way, no matter what you do in initialize. The only way to change that is overriding the new method, for example like this:
class Foo
def self.new(arg=nil)
if arg
return arg
else
super
end
end
end
However I'd strongly advise against this since it runs counter to many expectations that people have when calling new:
People expect new to return a new object. I mean it's even called new. If you want a method that does not always create a new object, you should probably call it something else.
At the very least people expect Foo.new to return a Foo object. Your code will return whatever the argument is. I.e. Foo.new(42) would return 42, an Integer, not a Foo object. So if you're going to do this, you should at the very least only return the given object, if it is a Foo object.
Does not work for:
class Some
def self.new( str )
SomeMore.new( str )
end
end
# the Some is parent of SomeMore
class SomeMore < Some
def initialize( str )
#str = str
end
end
For this particular use case, it might be better to use one of these approaches.
class Foo
def self.new(args=nil)
##obj ||= super(args)
end
end
class Foo
def self.new(args)
##obj = super(args)
end
def self.new
##obj
end
end
This allows you to have only a single object that gets created that can be used universally, but returns an object of the Foo class, making it fall more inline with standard expectations of a new method, as Jacob pointed out.

How to access instance variables from one class while inside another class

I'm really new to Ruby. And by new - less than 16 hours, but my boss gave me some Ruby code to add to. However, I found it was one giant file and not modular at all, so I decided to clean it up. Now that I've broken it up into several files/classes (generally speaking, 1 class per file,) I'm having problems piecing it together for it to work again. Originally everything was part of the same class, so the calls worked, but it looked ugly and it took an entire work day just to figure it out. I want to avoid that for the future as this code will grow much larger before it is done.
My main issue looks like the following (simplified, obviously):
class TestDevice
def initialize
#loghash = { }
....
end
end
class Log
def self.msg(identifier, level, section, message)
...
#loghash[identifier] = { level => { section => message }}
...
end
end
device = TestDevice.new
After that, it calls out to other class methods, and those class methods reference back to the class Log for their logging needs. Of course, Log needs to access "device.loghash" somehow to log the information in that hash. But I can't figure out how to make that happen outside of passing the contents of "loghash" to every method, so that they, in turn, can pass it, and then return the value back to the origination point and then logging it at the end, but that seems really clumsy and awkward.
I'm hoping I am really just missing something.
To create accessors for instance variables the simple way, use attr_accessor.
class TestDevice
attr_accessor :loghash
def initialize
#loghash = { }
....
end
end
You can also manually define an accessor.
class TestDevice
def loghash
#loghash
end
def loghash=(val)
#loghash = val
end
end
This is effectively what attr_accessor does behind the scenes.
how about passing the device object as a parameter to the msg function? (I'm assuming that there can be many devices in your program, otherwise you can use singleton pattern).
class TestDevice
attr_accessor :loghash
def initialize
#loghash = { }
....
end
end
class Log
def self.msg(device, identifier, level, section, message)
...
device.loghash[identifier] = { level => { section => message }}
...
end
end
So you need to learn the rules of ruby scoping.
Ruby variables have different scope, depending on their prefix:
$global_variables start with a $, and are available to everyone.
#instance_variables start with a single #, and are stored with the current value of self. If two
scopes share the same value of self (they're both instance methods, for example),
then both share the same instance variables
##class_variable start with ##, and are stored with the class. They're
shared between all instances of a class - and all instances of subclasses
of that class.
Constants start with a capital letter, and may be all caps. Like class
variables, they're stored with the current self.class, but they also
trickle up the hierarchy - so if you have a class defined in a module,
the instances of the class can access the module's constants as well.
Constants defined outside of a class have global scope.
Note that a constant variable means that which object is bound to the constant
won't change, not that the object itself won't change internal state.
local_variables start with a lowercase letter
You can read more about scope here.
Local variables scoping rules are mainly standard - they're available in
all subscopes of the one in which they are defined except when we move into
a module, class, or method definition. So if we look at your code from your
answer
class TestDevice
attr_accessor :loghash
def initialize
#loghash = { }
end
end
device = TestDevice.new
class Somethingelse
def self.something
device.loghash='something here' # doesn't work
end
end
The scope of the device local variable defined at the toplevel does not include the Somethingelse.something
method definition. So the device local variable used in the Somethingelse.something method definition is a different (empty) variable. If you want the scoping to work that way, you should use a constant or a global variable.
class TestDevice
attr_accessor :loghash
def initialize
#loghash = { }
end
end
DEVICE = TestDevice.new
$has_logged = false
class Somethingelse
def self.something
DEVICE.loghash='something here'
$has_logged = true
end
end
p DEVICE.loghash # prints `{}`
p $has_logged # prints `false`
Somethingelse.something
p DEVICE.loghash # prints `"something here"`
p $has_logged # prints `true`

Resources