How to get around strategic eager loading in Datamapper? - ruby

I'm processing a ton of book records (12.5 million) with Ruby and Datamapper. On rare occasion I need to grab associated identifiers for a particular book record, but Datamapper is creating a select statement grabbing all the associated identifiers for all the book records. The query take more than 2 minutes.
http://datamapper.org/why.html
The help document says this is "Strategic Eager Loading" and...
"The idea is that you aren't going to load a set of objects and use only an association in just one of them. This should hold up pretty well against a 99% rule.
When you don't want it to work like this, just load the item you want in it's own set. So DataMapper thinks ahead. We like to call it "performant by default". This feature single-handedly wipes out the "N+1 Query Problem"."
However, how do you load an item in it's own set? I can't seem to find a way to specify that I really only want to query the identifiers for one of the book records.

If you are experiencing this issue, it might be because you are using Model.first() rather than Model.get(). See my comments under the question too.
As of DM 1.1.0...
Example using Model.first:
# this will create a select statement for one book record
book = Books.first(:author => 'Jane Austen')
# this will create select statement for all isbns associated with all books
# if there are a lot of books and identifiers, it will take forever
book.isbns.each do |isbn|
# however, as expected it only iterates through related isbns
puts isbn
end
This is the same behavior as using Book.all, and then selecting the associations on one
Example using Model.get:
# this will create a select statement for one book record
book = Books.get(2345)
# this will create select statement for book with a primary key of 2345
book.isbns.each do |isbn|
puts isbn
end

Related

Iterate through items on a given date within date range rails

I kind of have the feeling this has been asked before, but I have been searching, but cannot come to a clear description.
I have a rails app that holds items that occur on a specific date (like birthdays). Now I would like to make a view that creates a table (or something else, divs are all right as well) that states a specified date once and then iterates over the related items one by one.
Items have a date field and are, of course, not related to a date in a separate table or something.
I can of course query the database for ~30 times (as I want a representation for one months worth of items), but I think it looks ugly and would be massively repetitive. I would like the outcome to look like this (consider it a table with two columns for the time being):
Jan/1 | jan1.item1.desc
| jan1.item2.desc
| jan1.item3.desc
Jan/2 | jan2.item1.desc
| etc.
So I think I need to know two things: how to construct a correct query (but it could be that this is as simple as Item.where("date > ? < ?", lower_bound, upper_bound)) and how to translate that into the view.
I have also thought about a hash with a key for each individual day and an array for the values, but I'd have to construct that like above(repetition) which I expect is not very elegant.
Using GROUP BY does not seem to get me anything different (apart from the grouping, of course, of the items) to work with than other queries. Just an array of objects, but I might do this wrong.
Sorry if it is a basic question. I am relatively new to the field (and programming in general).
If you're making a calendar, you probably want to GROUP BY date:
SELECT COUNT(*) AS instances, DATE(`date`) AS on_date FROM items GROUP BY DATE(`date`)
This is presuming your column is literally called date, which seeing as how that's a SQL reserved word, is probably a bad idea. You'll need to escape that whenever it's used if that's the case, using ``` here in MySQL notation. Postgres and others use a different approach.
For instances in a range, what you want is probably the BETWEEN operator:
#items = Item.where("`date` BETWEEN ? AND ?", lower_bound, upper_bound)

how to chop a DataMapper Collection into one Collection per day?

I have a DataMapper::Collection Object. Each of it's entries has a created_at property. I want to render the entries into html tables, one table per day (I use Sinatra for that).
It was no problem to render everything into one table, but I didn't get it to do so for every day. I thought of an array of DataMapper::Collection objects over which I would just iterate and do the job. But I don't know how to build such an array :/
Does anyone know how to solve my problem, or does anyone have a different/better approach?
Thanks in advance!
You have (at least) two options. The first is to let the database do the work for you. I don't know about datamapper but most database mappers (!) have functionality to group using SQL's GROUP BY. In this case you would have to use a database function to get the date out of the timestamp and then group on that. This is the fastest option and if you and future maintainers are familiar with relational databases probably also the best.
The second option is to to the mapping in your code. I can't come up with an elegant Ruby thing right now but you could at least do:
mapped_result = Hash.new [] # initiates each new entry with empty array
mapper_collection.each do |one_record|
mapped_result[one_record.created_at.strftime '%Y-%m-%d'] << one_record
end
and then you can get to record for a day with
mapped_result['2012-11-19']

Ruby on Rails: Search one table where multiple rows must be present in another table

I'm trying to create a search where a single record must have multiple records in another table (linked by id's and has_many statements) in order to be included as a result.
I have tables users, skill_lists, skill_maps.
users are mapped to individual skills through single entries in the skill_maps table. Many user can share a single skill and single user can have many skills trough multiple entries in the skill_maps table.
e.g.
User_id | Skill_list_id
2 | 9
2 | 15
3 | 9
user 2 has skills 9 and 15
user 3 has only skill 9
I'm trying to create a search that returns a hash of all users which have a set of skills. The set of required skill_ids appear as an array in the params.
Here's the code that I'm using:
skill_selection_user_ids = SkillMap.find_all_by_skill_list_id(params[:skill_ids]).map(&:user_id)
#results = User.find(:all, :conditions => {:id => skill_selection_user_ids})
The problem is that this returns all users that have ANY of these skills not users that have ALL of them.
Also, my users table is linked to the skill_lists table :through => :skill_maps and visa versa so that i can call #user.skill_list etc...
I'm sure this is a real newbie question, I'm totally new to rails (and programming). I searched and searched for a solution but couldn't find anything. I don't really know how to explain the problem in a single search term.
I personally don't know how to do this using ActiveRecord's query interface. The easiest thing to do would be to retrieve lists of users who have each individual skill, and then take the intersection of those lists, perhaps using Set:
require 'set'
skills = [5, 10, 19] # for example
user_ids = skills.map { |s| Set.new(SkillMap.find_all_by_skill_list_id(s).map(&:user_id)) }.reduce(:&)
users = User.where(:id => user_ids.to_a)
For (likely) higher performance, you could "roll your own" SQL and let the DB engine do the work. I may be able to come up with some SQL for you, if you need high performance here. (Or if anyone else can, please edit this answer!)
By the way, you should probably put an index on skill_maps.skill_list_id to ensure good performance even if the skill_maps table gets very large. See the ActiveMigration documentation: http://api.rubyonrails.org/classes/ActiveRecord/Migration.html
You'll probably have to use some custom SQL to get the user IDs. I tested this query on a similar HABTM relationship and it seems to work:
SELECT DISTINCT(user_id) FROM skill_maps AS t1 WHERE (SELECT COUNT(skill_list_id) FROM skill_maps AS t2 WHERE t2.user_id = t1.user_id AND t2.skill_list_id IN (1,2,3)) = 3
The trick is in the subquery. For each row in the outer query, it finds a count of records for that row that match any of the skills that you're interested in. Then it checks whether that count matches the total number of skills you're interested in. If there's a match, then the user must possess all of the skills you searched for.
You could execute this in Rails using find_by_sql:
sql = 'SELECT DISTINCT(user_id) FROM skill_maps AS t1 WHERE (SELECT COUNT(skill_list_id) FROM skill_maps AS t2 WHERE t2.user_id = t1.user_id AND t2.skill_list_id IN (?)) = ?'
skill_ids = params[:skill_ids]
user_ids = SkillMap.find_by_sql([sql, skill_ids, skill_ids.size])
Sorry if the table and column names aren't exactly right, but hopefully this is in the ballpark.

How to quickly search book titles?

I have a database of about 200k books. I wish to give my users a way to quickly search a book by the title. Now, some titles might have prefix like A, THE, etc. and also can have numbers in the title, so search for 12 should match books with "12", "twelve" and "dozen" in the title. This will work via AJAX, so I need to make sure database query is really fast.
I assume that most of the users will try to search using some words of the title, so I'm thinking to split all the titles into words and create a separate database table which would map words to titles. However, I fear this might not give the best results. For example, the book title could be some 2 or 3 commonly used words, and I might get a list of books with longer titles that contain all 2-3 words and the one I'm looking for lost like a needle in a haystack. Also, searching for a book with many words in the title might slow down the query because of a lot of OR clauses.
Basically, I'm looking for a way to:
find the results quickly
sort them by relevance.
I assume this is not the first time someone needs something like this, and I'd hate to reinvent the wheel.
P.S. I'm currently using MySQL, but I could switch to anything else if needed.
Using a SOUNDEX is the best way i think.
SELECT
id,
title
FROM products AS p
WHERE p.title SOUNDS LIKE 'Shaw'
// This will match 'Saw' etc.
For best database performances you can best calculate the SOUNDEX value of your titles and put this in a new column. You can calculate the soundex with SOUNDEX('Hello').
Example usage:
UPDATE `books` SET `soundex_title` = SOUNDEX(title);
You might want to have a look at Apache Lucene. this is a high performance java based Information Retrieval System.
you would want to create an IndexWriter, and index all your titles, and you can add parameters (have a look at the class) linking to the actual book.
when searching, you would need an IndexReader and an IndexSearcher, and use the search() oporation on them.
have a look at the sample at: src/demo and in: http://lucene.apache.org/java/2_4_0/demo2.html
using Information Retrieval techniques makes the indexing take longer, but every search will not require going through most of the titles, and overall you can expect better performance for searching.
also, choosing good Analyzer enables you to ignore words such "the","a"...
One solution that would easily accomodate your volume of data and speed requirment is to use the Redis key-value pair store.
The way I see it, you can go ahead with your solution of mapping titles to keywords and storing them under the form:
keyword : set of book titles
Redis already has a built-in set data-type that you can use.
Next, to get the titles of the books that contains the search keywords you can use the sinter command which will peform set intersection for you.
Everything is done in memory; therefore the response time is very fast.
Also, if you want to save your index, redis has a number of different persistance/caching mechanisms.
Apache Lucene with Solr is definitely a very good option for your problem
You can directly link Solr/Lucene to directly index your MySQL database. Here is a simple tutorial on how to link your MySQL database with Lucene/Solr: http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/
Here are the advantages and pains of using Lucene-Solr instead of MySQL full text search: http://jayant7k.blogspot.com/2006/05/mysql-fulltext-search-versus-lucene.html
Keep it simple. Create an index on the title field and use wildcard pattern matching. You can not possibly make it any faster as your bottleneck is not the string matching but the number of strings you want to match against the title.
And just came up with a different idea. You say that some words can be interpreted differently. Like 12, Twelve, dozen. Instead of creating a query with different interpretations, why not store different interpretations of the titles in a separate table with a one to many to the books. You can then GROUP BY book_id to get unique book titles.
Say the book "A dime in a dozen". In books table it will be:
book_id=356
book_title='A dime in a dozen'
In titles table will be stored:
titles_id=123
titles_book_id=356
titles_title='A dime in a dozen'
--
titles_id=124
titles_book_id=356
titles_title='A dime in a 12'
--
titles_id=125
titles_book_id=356
titles_title='A dime in a twelve'
The query for this:
SELECT b.book_id, b.book_title
FROM books b JOIN titles t on b.book_id=t.titles_book_id
WHERE t.titles_title='%twelve%'
GROUP BY b.book_id
Now, insertions becomes a much bigger task, but creating the variants can be done outside the database and inserted in one swoop.

WCF Data Services - neither .Expand or .LoadProperty seems to do what I need

I am building a school management app where they track student tardiness and absences. I've got three entities to help me in this. A Students entity (first name, last name, ID, etc.); a SystemAbsenceTypes entity with SystemAbsenceTypeID values for Late, Absent-with-Reason, Absent-without-Reason; and a cross-reference table called StudentAbsences (matching the student IDs with the absence-type ID, plus a date, and a Notes field).
What I want to do is query my entities for a given student, and then add up the number of each kind of Absence, for a given date range. I prepare my currentStudent object without a problem, then I do this...
Me.Data.LoadProperty(currentStudent, "StudentAbsences") 'Loads the cross-ref data
lblDaysLate.Text = (From ab In currentStudent.StudentAbsences Where ab.SystemAbsenceTypes.SystemAbsenceTypeID = Common.enuStudentAbsenceTypes.Late).Count.ToString
...and this second line fails, complaining "Object reference not set to an instance of an object."
I presume the problem is that while it DOES see that there are (let's say) four absences for the currentStudent (ie, currentStudent.StudentAbsences.Count = 4) -- it can't yet "peer into" each one of the absences to look at its type. In fact, each of the four StudentAbsence objects has a property called SystemAbsenceType, which then finally has the SystemAbsenceTypeID.
How do I use .Expand or .LoadProperty to make this happen? Do I need to blindly loop through all these collections, firing off .LoadProperty on everything before I can do my query?
Is there some other technique?
When you load the student, try expanding the related properties.
var currentStudent = context.Students.Expand("StudentAbsences")
.Expand("StudentAbsences/SystemAbsenceTypes")
.Where(....).First();

Resources