I just ran into an interesting situation about relationships and databases. I am writing a ruby app and for my database I am using postgresql. I have a parent object "user" and a related object "thingies" where a user can have one or more thingies. What would be the advantage of using a separate table vs just embedding data within a field in the parent table?
Example from ActiveRecord:
using a related table:
def change
create_table :users do |i|
i.text :name
end
create_table :thingies do |i|
i.integer :thingie
i.text :discription
end
end
class User < ActiveRecord::Base
has_many :thingies
end
class Thingie < ActiveRecord::Base
belongs_to :user
end
using an embedded data structure (multidimensional array) method:
def change
create_table :users do |i|
i.text :name
i.text :thingies, array: true # example contents: [[thingie,discription],[thingie,discription]]
end
end
class User < ActiveRecord::Base
end
Relevant Information
I am using heroku and heroku-posgres as my database. I am using their free option, which limits me to 10,000 rows. This seems to make me want to use the multidimensional array way, but I don't really know.
Embedding a data structure in a field can work for simple cases but it prevents you from taking advantage of relational databases. Relational databases are designed to find, update, delete and protect your data. With an embedded field containing its own wad-o-data (array, JSON, xml etc), you wind up writing all the code to do this yourself.
There are cases where the embedded field might be more suitable, but for this question as an example I will use a case that highlights the advantages of a related table approch.
Imagine a User and Post example for a blog.
For an embedded post solution, you would have a table something like this (psuedocode - these are probably not valid ddl):
create table Users {
id int auto_increment,
name varchar(200)
post text[][],
}
With related tables, you would do something like
create table Users {
id int auto_increment,
name varchar(200)
}
create table Posts {
id auto_increment,
user_id int,
content text
}
Object Relational Mapping (ORM) tools: With the embedded post, you will be writing the code manually to add posts to a user, navigate through existing posts, validate them, delete them etc. With the separate table design, you can leverage the ActiveRecord (or whatever object relational system you are using) tools for this which should keep your code much simpler.
Flexibility: Imagine you want to add a date field to the post. You can do it with an embedded field, but you will have to write code to parse your array, validate the fields, update the existing embedded posts etc. With the separate table, this is much simpler. In addition, lets say you want to add an Editor to your system who approves all the posts. With the relational example this is easy. As an example to find all posts edited by 'Bob' with ActiveRecord, you would just need:
Editor.where(name: 'Bob').posts
For the embedded side, you would have to write code to walk through every user in the database, parse every one of their posts and look for 'Bob' in the editor field.
Performance: Imagine that you have 10,000 users with an average of 100 posts each. Now you want to find all posts done on a certain date. With the embedded field, you must loop through every record, parse the entire array of all posts, extract the dates and check agains the one you want. This will chew up both cpu and disk i/0. For the database, you can easily index the date field and pull out the exact records you need without parsing every post from every user.
Standards: Using a vendor specific data structure means that moving your application to another database could be a pain. Postgres appears to have a rich set of data types, but they are not the same as MySQL, Oracle, SQL Server etc. If you stick with standard data types, you will have a much easier time swapping backends.
These are the main issues I see off the top. I have made this mistake and paid the price for it, so unless there is a super-compelling reason do do otherwise, I would use the separate table.
what if users John and Ann have the same thingies? the records will be duplicated and if you decide to change the name of thingie you will have to change two or more records. If thingie is stored in the separate table you have to change only one record. FYI https://en.wikipedia.org/wiki/Database_normalization
Benefits of one to many:
Easier ORM (Object Relational Mapping) integration. You can use it either way, but you have to define your tables with native sql. Having distinct tables is easier and you can make use of auto-generated mappings.
Your space limitation of 10,000 rows will go further with the one to many relationship in the case that 2 or more people can have the same "thingies."
Handle users and thingies separately. In some cases, you might only care about people or thingies, not their relationship with each other. Some examples, updating a username or thingy description, getting a list of all thingies (or all users). Selecting from the single table can make it harding to work with.
Maintenance and manipulation is easier. In the case that a user or a thingy is updated (name change, email address update, etc), you only need to update 1 record in their table instead of writing update statements "where user_id=?".
Enforceable database constraints. What if a thingy is not owned by anyone? Is the user column now nillable? It would have to be in the single table case, so you could not enforce a simple "not nillable" username, for example.
There are a lot of reasons of course. If you are using a relational database, you should make use of the one to many by separating your objects (users and thingies) as separate tables. Considering your limitation on number of records and that the size of your dataset is small (under 10,000), you shouldn't feel the down side of normalized data.
The short truth is that there are benefits of both. You could, for example, get faster read times from the single table approach because you don't need complicated joins.
Here is a good reference with the pros/cons of both (normalized is the multiple table approach and denormalized is the single table approach).
http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/
Besides the benefits other mentioned, there is also one thing about standards. If you are working on this app alone, then that's not a problem, but if someone else would want to change something, then the nightmare starts.
It may take this guy a lot of time to understand how it works alone. And modifing something like this will take even more time. This way, some simple improvement may be really time consuming. And at some point, you will be working with other people. So always code like the guy who works with your code at the end is the brutal psychopath who knows where you live.
I'm moving a legacy app from MS-SQL to Postgres which uses Rails to access the data.
The columns in MS-SQL are capitalised, and while using activerecord-sql-server-adapter, they are read like this:
var something = my_model.SomeAttribute
Even though it is constant, it doesn't appear to matter since the app only reads data from the MSSQL db.
The issue I'm having now is after moving to postgres, it converts all column names etc to lowercase (as SQL is not meant to be case-sensitive). Now when I try to access SomeAttribute on my model, it raises an ActiveModel::MissingAttributeError since it's now lowercase.
Some examples of the symptom:
p.SomeAttribute
=> ActiveModel::MissingAttributeError: missing attribute: SomeAttribute
p.read_attribute(:SomeAttribute)
=> nil
p.has_attribute?(:SomeAttribute)
=> false
p.read_attribute(:someattribute)
=> 'expected value'
Is there some way I can get ActiveRecord/ActiveModel to convert attribute names to lowercase before attempting to retrieve them?
Disclaimer: this was a very temporary solution - definitely not the "right" thing to do!
Created views like this:
CREATE VIEW mymodel AS
SELECT
at.someattribute
, at.someattribute AS "SomeAttribute"
FROM actual_table at
Using the double quotes in SQL preserves the case.
I am currently working on two different projects using Rails to connect to a legacy MS SQL Server database...with table and column names that don't match up to what Rails expects.
In my most recent project I think I have just found the golden egg :-) And that is to not change anything on the Rails side at all to make things work--the golden egg is to use SQL Views to "transform" the legacy tables into something Rails understands--I currently have a project in dev right now that I'm working on where this is working fantastically, and I don't have to try and alias any table or columns names on the Rails side since everything looks peachy by the time it reaches my Rails app. I'm posting this now because I had many many issues to work through for my first project and I think this may make many other's lives much easier, and I haven't found this solution posted anywhere else.
So, for example, let's say you have a legacy table in Microsoft SQL Server 2008R2 named "Contact-Table" and it has weird column names like such:
Contact-Table:
ID_Primary
First Name
Last Name
Using MS SQL table views you can 'recreate' this same table. Create a view based off of Legacy-Table; name the view whatever you want the 'table' to be called in Rails and use column aliases to rename the columns. So, here we could create a view called "contacts" (in alignment with Rails conventions) with the following columns:
contacts:
id (alias for ID_Primary)
first_name (alias for First Name)
last_name (alias for Last Name)
Then in your Rails model all you need to do link to your 'contacts' table in MS SQL and your column names are available as expected. So far I've done this and it works with the tiny-tds gem and free-tds. I can query, create and update records and Rails associations (has_many/belongs_to, etc.) work as well. I'm very excited about using MS SQL table views instead of other methods I've used before to get Rails to talk to legacy databases! I'd love to hear what others think.
So I've been doing some research and have yet to come across a good solution for this. I am trying to avoid loading rarely used columns in an ActiveRecord model.
Here's my real world problem: I have an Accounts table:
create_table "accounts", :force => true do |t|
t.string "name"
t.text "policies" # this can be a lot of data
end
I pull accounts from the database all the time and I rarely need the policies field. My concern is overhead. Thats extra data I am transferring that I rarely need.
How do I default rails to only pull the name column and grab the policies column when I need it?
I know DataMapper has a solution for this called "lazy load" for attributes. Is there a standard or generally accepted solution for this in ActiveRecord?
Thanks for your help.
The activerecord-lazy-attributes library may provide the functionality you require.
Excerpt from the README:
This ActiveRecord extension allows to define attributes to be lazy-loaded. It’s main purpose is to avoid loading large columns (such as BLOBs) with every SELECT.
according to this (older) post these Rails 3 finders have race conditions. Something like
User.find_or_create_by_username(:username => 'uuu', :password => 'xxx')
could possibly create two records under some conditions according to the post.
Is this still relevant for Rails 3.0+ ? Thanks
Yes, it is. In the amount of time the first statement is executed and the object created, a second statement can be executed in parallel.
There's no exclusive lock.
The best way to prevent this is to add an unique validation in your model and an unique index in your database. In this way, the database will raise an error if you try to create two records with the same fields.
I am working on a sinatra app with datamapper connected to a mysql database and am having problems retrieving/finding records from only one specific table. I can insert into it with datamapper find but when I try to do #sleepEntries = Sleep_Log.all I get the following error: ArgumentError: argument out of range. When I load everything into irb I get the same error. I also turned on the ability to see the queries and I get back SELECT id, start_time, length, timestamp FROM sleep_logs ORDER BY id when I call Sleep_Log.all. When I connect to the mysql database through the mysql command line tool I can confirm that there are entries in that table. When I run the query that datamapper is erroring out on I have no problem getting the results. Here is my datamapper model info for Sleep_Log
class Sleep_Log
include DataMapper::Resource
property :id, Serial
property :start_time, Time, :required => true
property :length, Integer, :required => true
property :timestamp, DateTime, :writer => :private
belongs_to :user
end
This is what the table looks like in the database accessed through describe sleep_logs;
What is weird is that retrieve results from all other tables.
The backtrace from irb
If you try Sleep_Log.first, do you get the error? If so, could you paste in the record, or one which also shows the error?
How was the table constructed? Are you using DM to inspect already entered records? Or are you entering them through DM too?
We just encountered the exact same problem. In our case it turned out that you have to use DateTime. Mysql doesn't have a Time database type and saves as DateTime. However, DataMapper doesn't get it and blows up. If you switch your model to use DateTime DM will get it.