How do I completely avoid using a database in RSpec tests? - rails-activerecord

I want to use FactoryGirl to build in-memory stubs of models, then have all ActiveRecord queries run against only those. For example:
# Assume we start with an empty database, a Foo model,
# and a Foo factory definition.
#foo_spec.rb
stubbed_foo = FactoryGirl.build_stubbed(:foo)
# Elsewhere, deep in the guts of application
Foo.first() # Ideally would return the stubbed_foo we created
# in the test. Currently this returns nil.
The solution might be to use an in-memory database. But is the above scenario possible?

If your reason for avoiding the database, is to speed up your tests, then there are better ways.
Use FactoryGirl.build as much as possible instead of create. This works as long as the record won't be fetched from the database by your code. This works well for unit tests with well-structured code. (For example, it helps to use Service Objects and unit test them independently.
For tests that actually need to read from the database (as in your Foo.first example call), you can use FactoryGirl.create and use transactional fixtures. This creates a database transaction at the beginning of each test example, and then rolls back the transaction at the end of the example. This can cause problems when you use callbacks in your ActiveRecord models such as after_commit.
If you use after_commit or other callbacks in your models that require the database transaction to close (or you use explicit transactions in your code), I recommend setting up DatabaseCleaner. Here's an example of to configure and use it: https://gist.github.com/RobinDaugherty/9f4e5f782d9fdbe191a23de30ad8b539

Related

In Ruby, what are the use cases for adding methods to an instance's singleton class?

Thanks to some other posts and reading, I understand singleton/meta classes. And I understand why we'd want to use them on a class. But I still don't understand why we'd want to use them on instance objects. And I've yet to see it in practice.
I'm referring to something like this:
class Vehicle
def odometer_reading
# some code
end
end
my_car = Vehicle.new
def my_car.open_door
# some code
end
At first thought, this seems like a bad idea as it would lead to difficulties in understanding the code and debugging.
Why would we want to do this? What are some examples of when this is a good idea?
One example is using it for testing purposes: creating mock and double objects, stubbing methods. Debugging is somewhere nearby: re-defining the logging method for a specific object that you suspect is mis-behaving, so that the log info is printed directly to console (or more info is printed) during the debug session.
Another example is dealing with special cases - instead of inheritance you can do just that. Starting from a classical example if you use two types of Employees, say, Engineers and SalesPersons, for which the rules of compensation calculation are different, you can put the common logic into the Employee class, then inherit the other two classes from it and implement their own calculate_salary methods there. Now, if there is an outlier - a star salesman that you have agreed to a different compensation scheme with, a CEO with a very special scheme, etc - instead of creating a whole sub-class for this special employee, you can just define this method for a specific object representing that employee.
The third example is dealing with an object lifecycle and performance considerations. Instead of having a long case of various states in some processing method. E.g. for a file-reading class that transparently caches the entire file in the background (I know a too-simplistic-for-real-life approach, but just as a model) all read requests while the file is not entirely read should check if the requested data is already in the cache or should be read from disk. Once the file is fully read they always go from the cache. Instead of having the if (case if there are more states) to deal with this you could simply re-define the read method at the object-level once the file is fully read to the cache. For this simple example it doesn't lead to any sizable performance benefit (if any benefit at all), but for more complex cases that may be worth it.
You wouldn't add them using def, that's a rather rigid way of doing it, but instead by using something like define_method or extend.
Although this is not the sort of thing you'd do on a routine basis, it does mean you can do some rather unusual things. ActiveRecord in Rails produces results in the form of an Array with additional methods added on to perform other operations.
An Object-Relationship Mapper would be a case where you'd probably want to do this. Sometimes, depending on how you fetch a record, the methods available differ significantly. Being able to add those dynamically means each fetched object can be completely customized even if they have the same class and general-purpose methods.
Another example: You have an array of hashes and you want each hash to have a method-call getter and setter. Something like:
user = HashOnSteroids.new(name: 'John')
user[:name] # => 'John'
user[:name] = 'Joe'
user.name # => 'Joe'
user.name = 'John'
user.set(name: 'Jim', age: 5)
This means you cannot write standard method definitions in the class as each hash will have a different set of keys (method names). This means you have to resort to defining singleton methods so each object has its own set of methods (not a pack of shared methods).
Warning: Using singleton methods for this use case is highly inefficient. A sneaky method_missing is faster and uses way less memory as it doesn't have to allocate a billion of proc objects.

Ruby - Sequel Model to access multiple databases

I'm trying to use the Ruby Sequel::Model ORM functionality for a web service, in which every user's data is stored in a separate MySQL database. There may be thousands of users and thus databases.
On every web request I want to construct the connection string to connect to the user's data, do the work, and then close the connection.
When using Sequel, I can specify the database to use for a particular block of code:
Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1') do |db|
db.do_something()
end
This is all very good, I can perform Sequel operations on the particular user's database. However, when using Sequel::Model, when I come to do my db operations it looks like this:
Supplier.create(:field1 => 'TEST')
I.e. it doesn't take db as a parameter, so just uses some shared database configuration.
I can configure the database Model uses in two ways, either set the global DB variable:
DB = Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1')
class Supplier < Sequel::Model
end
Or, I can set the database just for Model:
Sequel::Model.db = Sequel.connect(:adapter=>'mysql', :host=>'localhost', database=>'test1')
class Supplier < Sequel::Model
end
In either case, setting a shared variable like this is no good - there may be multiple requests processed concurrently, each of which needs its own database configuration.
Is there any way around this? Is there a way of specifying per-request db configuration using Sequel::Model?
As an aside, I've run into a similar problem with DataMapper, I'm now wondering whether having a single multi-tenanted database is going to be the only option if using Ruby, although I'd prefer to avoid this as it limits scalability.
A solution, or any pertinent discussion would be much appreciated.
Thanks
Pete
Use Sequel's sharding support for this: http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html
Actually in your case it's probably better to use arbitrary_servers extension than sharding:
DB.with_server(:host=>'hash_host_b', :database=>'backup') do
DB.synchronize do
# All queries here default to the backup database on hash_host_b
end
end
See:
http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html#label-arbitrary_servers+Extension

Specify table name mid application Ruby-Datamapper

I'm wanting to dynamically create and query tables using Datamapper.
While Datamapper allows you to work with legacy tables and schemas, and in this way set the table name used this is only during initialisation, not within the application.
Is there an easy way to tell Datamapper to migrate/upgrade a Model with an assigned table name in application, and to then tell it to query this table?
This should not be a problem.
All Ruby classes can be created, and re-defined at run-time. Even initialization is at run-time. Initialization just happens to be executed first, before other code is executed.
That is why monkey-patches work so easily. It's just additional code at initialization that just re-defines classes to add extra methods, variables etc.
There is no Ruby code that is "special" in the sense that it only runs at compile time. Ruby is an interpreted language.
To dynamically create a class, see Dynamically creating class in Ruby.
Assuming you don't need to dynamically create classes from an array of strings, you can define additional methods with define_method, or call Datamapper methods at runtime to add attributes.
To define new methods in a class:
Post.send :define_method, :new_method_name do
end
To define a new property using the Datamapper property:
class Post
include DataMapper::Resource
property :title, String # the static way
end
Post.send :property, :title, String # add property the dynamic way (at run-time)
Do note that any tables or properties you define at run-time will not be available if you restart your server, unless the code that dynamically generates these are re-executed.
To update your tables at runtime, you simply do the same thing as normal, that is, call:
DataMapper.auto_upgrade!
To upgrade only a single table, you can also do:
Post.auto_upgrade!
2nd warning: If you have multiple processes, the dynamic code will need to be run in each process, or the additional table Models and Properties will not be available.
This is a problem if you have multiple worker processes, as might happen in production (eg. Nginx with multiple Unicorn workers, or multiple Mongrel workers behind a Ha_proxy).
If you have a single process server, then that is not a problem. However, if you have multiple worker processes, you must run the dynamic code to generate these extra classes and properties in EACH process to make it available.
This is actually the same for initialization, because each process goes through initialization (or if forked, inherit any initialization).
The easiest way without changing anything under the hood is to use separate databases instead of tables (assuming that any relationships will also be stored in the separate database) and open a connection to an additional repository in the block.
DataMapper.setup(:external, "adapter://username:password#hostname/dbname")
DataMapper.repository(:external) do...end

Factory Girl - what's the purpose?

What's the purpose of Factory Girl in rspec tests when I could use before(:each) blocks? It feels like the only difference between Factory Girl and a before(:each) is that the factory prepares object creation outside of the test. Is this right?
Gems like Factory Girl and Sham allow you to create templates for valid and re-usable objects. They were created in response to fixtures which where fixed records that had to be loaded into the database. They allow more customization when you instantiate the objects and they aim to ensure that you have a valid object to work with. They can be used anywhere in your tests and in your before and after test hooks.
before(:each), before(:all), after(:each) and after(:all) all aim to give you a place to do setup and teardown that will be shared amongst a test group. For example, if you are going to be creating a new valid user for every single test, then you'll want to do that in your before(:each) hook. If you are going to be clearing some files out of the filesystem, you want to do that in a before hook. If your tests all create a tmp file and you want to remove it after your test, you'll do that in your after(:each) or after(:all) hook.
The ways these two concepts differ is that Factories are not aimed at creating hooks around your tests, they are aimed at creating valid Ruby objects and records, so that you can keep your object creation flexible and DRY. Before and after hooks are aimed at setup and teardown tasks that are shared in an example group so that you can keep your setup and teardown code DRY.
FactoryGirl replaces fixtures in tests. This way, you won't have to keep the fixtures up-to-date as you change the data model. Fixtures can also tend to get unwieldy as you add more edge cases.
FactoryGirl generates data on the fly and adding and removing fields is much easier. Also, you can use it in setup just how you'd use fixtures.
Does that make it clearer?
also fixture definitions are global across your entire application. Factories can be local - so data specific to a isolated test case is in the setup for that particular context and not in a single global fixtures file.
This book covers the subject very well if you want some more reading matter

Creating mock data for unit testing

I consider myself still pretty new to the TDD scene. But find that no matter which method I use (mock framework or stubbing my own objects) I find that I have to write a lot of code to create mock data. I like the idea of loading up objects to create an in-memory database. But what I don't like is cluttering up my tests with a ton of code for the sole purpose of creating mock data. This is especially the case when the data needs to account for all the different cases.
I'd love some suggestions for a better way of doing this.
It would seem to me that I should be able to load the data once into a known state from some data store and then I could use a snapshot of that state which is loaded in the test setup/initialize before each test method is executed. This would satisfy proper testing practices while providing convenience and let me focus on writing tests instead of writing code to create test data "by hand".
May be you could try the NBuilder library. It provides a very fluent interface and is easy to use. You can use it for generating single instances of a class with defualt values or generate lists with default or overriden values. You can have a look at this one.
If your are using .Net Try NDBUnit
You populate your store and then it reverts your DB to a known state at test time, for each test. The Autumn of Agile screen cast series shows this in pretty good detail.
Or you can do this manually...build a stored procedure or whatever to truncate your tables and copy in the data in your teardown method.
You can have Builder class(es) that helps you building the instances you need / in this case ones you would use related to the repository.
Have the Builder use appropiate defaults, and on your tests you can overwride what you need. This helps you avoid needing to put have every single case of "data" mixed up for all the different tests (which introduces problems, because usually there are cases that aren't compatible for different tests).
**Update 1:**Take a look at www.markhneedham.com/blog/2009/01/21/c-builder-pattern-still-useful-for-test-data
I know exactly what you mean. I think a good approach to solving this problem is to actually have a separate MockFramework project that houses all your mock data, outside the test project. This way you can generate mock data separately, store it in memory if you want to, or not, and then reference the mock framework from the test project. If you use a third party framework to do this, all the better, but you can still wrap that third party framework in your own mock framework so you can get all that "glue" that creates the mock data the way you need it out of your tests so the tests can really be only what they need to be.
Thanks for all the suggestions, I think the solution requires a little bit of everything. I don't want these tests to end up being regression tests, but w/o some kind of existing data store everything still boils down to creating the data by "manually" building the objects.
What would really be nice would be a framework that allowed me to use my existing DAL to either script the data to code for me or get the data in memory and access it like an in memory database.
Untils.org covers this way better than I ever could.
Their whole guide is actually very good.
But basically, if your units require "a lot of data" they may not be unit tests anymore. I'd recommend attempting testing the smaller pieces individually.

Resources