How do I make a persistent hash in Ruby? - ruby

I would like a persistent hash; an object that act as a hash, but that can persist between program runs.
Ideally, it would only load in memory the value that are accessed.

Since persistent key/value storage is kind of everyones requirement, as it happens there are a large number of solutions.
YAML is probably the easiest way to persist Ruby objects.
JSON works as well but doesn't directly handle symbols.
MySQL and other SQL databases such as sqlite3 also solve this problem, of course. Usually, access is encapsulated within the ActiveRecord ORM library.
The Ruby core has a Marshaling library.

Using sdbm
require 'sdbm'
SDBM.open("/mypath/myfile.dbm") do |myMap|
[...]
myMap[key] = avalue
[...]
myvar = myMap[anotherKey]
[...]
end
create to files : myfile.dbm.dir and myfile.dbm.pag

I would consider using redis-rb, which has a hash datatype. This would not only persist your hash across program runs, but across multiple machines. It's super fast, in memory, and you can have it up and running in < 5 minutes.
in IRB (assuming you've installed and are running redis-server and have installed redis-rb:
require "redis"
redis = Redis.new
The important operations are:
redis.hset(key, field, value)
and
redis.hget(key,field)

Related

Chef Ruby object and collections

I've dipped my toe in chef and am having a few difficulties with what should be simple concepts.
I'm obtains data from a node by running a search; my plan is to iterate over the results and create an object of type X setting its variables as I go.
I'd like to store these objects in a collection so that I can access them later in the recipe to carry out other tasks and so on.
My GoogleFu has so far come up short and I'm worried that I'm tackling this in the wrong way. My search is fine and returning the values, my separate class is also fine but the storing of these objects into a collection and then persisting that is proving more difficult. Many posts frown against using arrays for my purpose(if it's possible) and I've not found anything similar to an ArrayList or Map. Additionally, if I use a ruby collection, does it need to be maintained inside a ruby block?
Thanks for any help / advice.
With Chef, you have several ways of storing persistent data:
1) set node attributes
2) chef data bags
3) chef vault
4) environments
5) environment recipes
6) roles
IMHO, you should decide where this data should reside by determining which of the items I listed it belongs to.
What does it apply to? What does it describe?
You got to be more specific in where you are actually facing the issue but as far as I have understood why not
just define a ruby class and initialize all the variables you are supposed to get. In the recipe instantiate the object and keep settings its properties from the result. There should be no issue in this approach.
But more importantly what is your use-case here, because you could just define an array attribute and then keep syncing the result in that attribute . As in ruby Objects are referenced , thus any change you make to resource attributes which takes that object, changes are persistent to in that object.

Ruby - Sequel Model to access multiple databases

I'm trying to use the Ruby Sequel::Model ORM functionality for a web service, in which every user's data is stored in a separate MySQL database. There may be thousands of users and thus databases.
On every web request I want to construct the connection string to connect to the user's data, do the work, and then close the connection.
When using Sequel, I can specify the database to use for a particular block of code:
Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1') do |db|
db.do_something()
end
This is all very good, I can perform Sequel operations on the particular user's database. However, when using Sequel::Model, when I come to do my db operations it looks like this:
Supplier.create(:field1 => 'TEST')
I.e. it doesn't take db as a parameter, so just uses some shared database configuration.
I can configure the database Model uses in two ways, either set the global DB variable:
DB = Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1')
class Supplier < Sequel::Model
end
Or, I can set the database just for Model:
Sequel::Model.db = Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1')
class Supplier < Sequel::Model
end
In either case, setting a shared variable like this is no good - there may be multiple requests processed concurrently, each of which needs its own database configuration.
Is there any way around this? Is there a way of specifying per-request db configuration using Sequel::Model?
As an aside, I've run into a similar problem with DataMapper, I'm now wondering whether having a single multi-tenanted database is going to be the only option if using Ruby, although I'd prefer to avoid this as it limits scalability.
A solution, or any pertinent discussion would be much appreciated.
Thanks
Pete
Use Sequel's sharding support for this: http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html
Actually in your case it's probably better to use arbitrary_servers extension than sharding:
DB.with_server(:host=>'hash_host_b', :database=>'backup') do
DB.synchronize do
# All queries here default to the backup database on hash_host_b
end
end
See:
http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html#label-arbitrary_servers+Extension

Store a class instance in session server side w/ Padrino?

I have a class that reads from a DB on startup. I'd prefer to be able to store it in the session, but I get the following error when trying to do so:
ERROR TypeError: no marshal_dump is defined for class Mutex
Is what I'm doing possible/reasonable? If so how should I go about doing it? If not, whats a good alternative to storing the class instance in the session? Currently my workaround is just instantiating it whenever I need to use it, but that doesn't strike me as a good solution or one that will be able to scale.
A good alternative is to store the id of the record in the session. Then when you need that data again you'd use a helper to return the data either from memory or from the database. A perfect example is the pattern used in current_user helper methods found in many ruby authentication gems. You could modify this helper to use a cache layer if you find it to be a bottleneck, but I'd leave that as an optimization after the fact.
Issues of having to get the object into a marshaled format that will live happily in a session, there are issues with storage space, stale data and possibly unintentional exposure to confidential data.

Does Dalli only cache strings? (newbie to memcache)

If that's the case, then is it best to store stuff as JSON?
I looked in the documentation, but its not explicitly acknowledged.
Dalli uses Marshal.dump to serialize values you can store any thing that can be dumped (for example procs can't be dumped on most ruby implementations.
Personally I prefer only to store arrays, hashes, strings, numbers and combinations thereof.
Storing arbitrary objects can be inefficient (for example an activerecord object has several copies of its attributes in its instance variables).
Another potential problem is if you store an instance of a class and you later rename that class - you'll no longer be able to retrieve that value from the cache because the cached data still has the old class name in it.
memcache can cache everything that is serializable so even Dalli do.

Ruby: marshal and unmarshal a variable, not an instance

OK, Ruby gurus, this is a hard one to describe in the title, so bear with me for this explanation:
I'm looking to pass a string that represents a variable: not an instance, not the collection of properties that make up an object, but the actual variable: the handle to the object.
The reason for this is that I am dealing with resources that can be located on the filesystem, on the network, or in-memory. I want to create URI handler that can handle each of these in a consistent manner, so I can have schemes like eg.
file://
http://
ftp://
inmemory://
you get the idea. It's the last one that I'm trying to figure out: is there some way to get a string representation of a reference to an object in Ruby, and then use that string to create a new reference? I'm truly interested in marshalling the reference, not the object. Ideally there would be something like taking Object#object_id, which is easy enough to get, and using it to create a new variable elsewhere that refers to the same object. I'm aware that this could be really fragile and so is an unusual use case: it only works within one Ruby process for as long as there is an existing variable to keep the object from being garbage collected, but those are both true for the inmemory scheme I'm developing.
The only alternatives I can think of are:
marshal the whole object and cram it into the URI, but that won't work because the data in the object is an image buffer - very large
Create a global or singleton purgatory area to store a variable for retrieval later using e.g. a hash of object_id:variable pairs. This is a bit smelly, but would work.
Any other thoughts, StackOverflowers?
There's ObjectSpace._id2ref :
f = Foo.new #=> #<Foo:0x10036c9b8>
f.object_id #=> 2149278940
ObjectSpace._id2ref(2149278940) #=> #<Foo:0x10036c9b8>
In addition to the caveats about garbage collection ObjectSpace carries a large performance penalty in jruby (so much so that it's disabled by default)
Variables aren't objects in Ruby. You not only cannot marshal/unmarshal them, you can't do anything with them. You can only do something with objects, which variables aren't.
(It would be really nice if they were objects, though!)
You could look into MagLev which is an alternative Ruby implementation built on top of VMware's Gemstone. It has a distributes object model wiht might suit your use-case.
Objects are saved in the central Gemstne instance (with some nifty caching) and can be accessed by any number of remote worker instances. That way, any of the workers act on the same object space and can access the very same objects simultaneously. That way, you can even do things like having the global Garbage Collector running on a single Ruby instance or seamlessly moving execution at any point to different nodes (while preserving all the stack frames) using Continuations.

Resources