Renaming multiple items in a database - ruby

I have seeded a lot of data for my database in rails 4. The data that I imported was entered manually by hand by a user of gigabot (using the gigabot) API.
The problem that I have is that I am trying to list "club nights" in my case but I am getting lots of duplicates back as the names are similar but not identical. Is there any way I could group the items where is the name contains a certain word then they would group together.
Currently these are my only validations
class Club < ActiveRecord::Base
has_many :events
validates :name, presence:true, uniqueness:true
validates :location, presence:true
validates :description, presence:true, uniqueness:true
end
Here is some of example data that the table currently displays
Name
DC10
Amnesia
Circo Loco # DC10
Sankeys
Sankeys Ibiza
Cocoon
Privilege Ibiza
Circoloco at Dc 10
Space
Space Ibiza
If you look at the above example you will see that some of the clubs are repeated. I would like to clean up the table so it would only have "DC10" as 1 club and all the clubs which have DC10 in their name are grouped together.
SO in the example above instead of having 10 seperate clubs it would be 6.
DC10,
Amnesia,
Space,
Sankeys,
Priviledge,
Cocoon.

Have a look at the update_all method from ActiveRecord.
This will allow you to update all the values of fields in a collection. So now you just have to get a collection that you're certain fits together.
I suggest doing something like SIMILAR for postgres. So you could do something like:
pattern = '%DC10%' # This can be as advanced as you need it
collection = Club.where('name SIMILAR TO ?', pattern)
collection.update_all(name: 'DC10')

This sounds like a very difficult task. Most likely you won't be able to come up with a regex that can capture your intention.
For example let's imagine you have a club Space and other entries
Void # Space
Outer Space
Inner space
Alien in Outer Space
they all end in Space but which ones should be regrouped ? My examples was a big exaggerated, but it sounds like you are dealing with a lot of data and cases like this one may occur.
Do you not have any other fied which could help you regroup records together ? Like GPS coordinates, city, etc. ?

Related

Going .pluck crazy. What is a realistic limit on query with array

I've used .pluck(:id) quite often (and map before it) to get a set of record ids. This is usually to get a set of related model records (e.g,, :people has_many :scores, as :assessed)
Lets say I have 10,000 people, but a query on People limits it to say 1,000.
people_ids = people.pluck(:id) #people a relation/scoped
scores = Score.where(:assessed_type => 'People', :assessed_id => people_ids)
There would be more to the Score query, but my basic question is querying with an array of say 1000 ids a bad idea?
I should point out that the filtered Score query would be used to get a new set of People. This is a filter on People.
I only have a few hundred records in my test DB, and that works fine - but there must be a point where psql or Rails is going to blow up. In production, I don't see going more that 1000 ids since People is automatically filter before this Score option is used.

Iterate through items on a given date within date range rails

I kind of have the feeling this has been asked before, but I have been searching, but cannot come to a clear description.
I have a rails app that holds items that occur on a specific date (like birthdays). Now I would like to make a view that creates a table (or something else, divs are all right as well) that states a specified date once and then iterates over the related items one by one.
Items have a date field and are, of course, not related to a date in a separate table or something.
I can of course query the database for ~30 times (as I want a representation for one months worth of items), but I think it looks ugly and would be massively repetitive. I would like the outcome to look like this (consider it a table with two columns for the time being):
Jan/1 | jan1.item1.desc
| jan1.item2.desc
| jan1.item3.desc
Jan/2 | jan2.item1.desc
| etc.
So I think I need to know two things: how to construct a correct query (but it could be that this is as simple as Item.where("date > ? < ?", lower_bound, upper_bound)) and how to translate that into the view.
I have also thought about a hash with a key for each individual day and an array for the values, but I'd have to construct that like above(repetition) which I expect is not very elegant.
Using GROUP BY does not seem to get me anything different (apart from the grouping, of course, of the items) to work with than other queries. Just an array of objects, but I might do this wrong.
Sorry if it is a basic question. I am relatively new to the field (and programming in general).
If you're making a calendar, you probably want to GROUP BY date:
SELECT COUNT(*) AS instances, DATE(`date`) AS on_date FROM items GROUP BY DATE(`date`)
This is presuming your column is literally called date, which seeing as how that's a SQL reserved word, is probably a bad idea. You'll need to escape that whenever it's used if that's the case, using ``` here in MySQL notation. Postgres and others use a different approach.
For instances in a range, what you want is probably the BETWEEN operator:
#items = Item.where("`date` BETWEEN ? AND ?", lower_bound, upper_bound)

How to get around strategic eager loading in Datamapper?

I'm processing a ton of book records (12.5 million) with Ruby and Datamapper. On rare occasion I need to grab associated identifiers for a particular book record, but Datamapper is creating a select statement grabbing all the associated identifiers for all the book records. The query take more than 2 minutes.
http://datamapper.org/why.html
The help document says this is "Strategic Eager Loading" and...
"The idea is that you aren't going to load a set of objects and use only an association in just one of them. This should hold up pretty well against a 99% rule.
When you don't want it to work like this, just load the item you want in it's own set. So DataMapper thinks ahead. We like to call it "performant by default". This feature single-handedly wipes out the "N+1 Query Problem"."
However, how do you load an item in it's own set? I can't seem to find a way to specify that I really only want to query the identifiers for one of the book records.
If you are experiencing this issue, it might be because you are using Model.first() rather than Model.get(). See my comments under the question too.
As of DM 1.1.0...
Example using Model.first:
# this will create a select statement for one book record
book = Books.first(:author => 'Jane Austen')
# this will create select statement for all isbns associated with all books
# if there are a lot of books and identifiers, it will take forever
book.isbns.each do |isbn|
# however, as expected it only iterates through related isbns
puts isbn
end
This is the same behavior as using Book.all, and then selecting the associations on one
Example using Model.get:
# this will create a select statement for one book record
book = Books.get(2345)
# this will create select statement for book with a primary key of 2345
book.isbns.each do |isbn|
puts isbn
end

How can I fetch documents in a random order using MongoMapper?

I cannot use Array#shuffle since I don't fetch all documents (I only fetch up to twenty documents). How can I fetch random documents from a MongoDB database using MongoMapper (i.e. in MySQL one would use ORDER BY RAND())?
There's no technique similar to ORDER BY RAND(). And even in MySQL it is advised to avoid it (on large tables).
You could apply some common tricks, however.
For example, if you know min and max value for your id, then pick a random value within the range and get the next object.
db.collection.find({_id: {$gte: random_id}}).limit(1);
Repeat 20 times.
Or you could add "random" field to each document yourself (and recalc it every once in a while). This way you won't get really random results with each query, but it'll be pretty cheap.
db.collection.find().sort({pseudo_random_field: 1}).limit(20)
// you can also skip some records here, but don't skip a lot.
Use skip and Random class.
class Book {
include MongoMapper::Document
key :title
key :author
}
rand = Random.rand(0..(Book.count-1))
Book.skip(rand).first

Stop Activerecord from loading Blob column

How can I tell Activerecord to not load blob columns unless explicitly asked for? There are some pretty large blobs in my legacy DB that must be excluded for 'normal' Objects.
I just ran into this using rail 3.
Fortunately it wasn't that difficult to solve. I set a default_scope that removed the particular columns I didn't want from the result. For example, in the model I had there was an xml text field that could be quite long that wasn't used in most views.
default_scope select((column_names - ['data']).map { |column_name| "`#{table_name}`.`#{column_name}`"})
You'll see from the solution that I had to map the columns to fully qualified versions so I could continue to use the model through relationships without ambiguities in attributes. Later where you do want to have the field just tack on another .select(:data) to have it included.
fd's answer is mostly right, but ActiveRecord doesn't currently accept an array as a :select argument, so you'll need to join the desired columns into a comma-delimited string, like so:
desired_columns = (MyModel.column_names - ['column_to_exclude']).join(', ')
MyModel.find(id, :select => desired_columns)
I believe you can ask AR to load specific columns in your invocation to find:
MyModel.find(id, :select => 'every, attribute, except, the, blobs')
However, this would need to be updated as you add columns, so it's not ideal. I don't think there is any way to specifically exclude one column in rails (nor in a single SQL select).
I guess you could write it like this:
MyModel.find(id, :select => (MyModel.column_names - ['column_to_exclude']).join(', '))
Test these out before you take my word for it though. :)
A clean approach requiring NO CHANGES to the way you code else where in your app, i.e. no messing with :select options
For whatever reason you need or choose to store blobs in databases.
Yet, you do not wish to mix blob columns in the same table as your
regular attributes. BinaryColumnTable helps you store ALL blobs in
a separate table, managed transparently by an ActiveRecord model.
Optionally, it helps you record the content-type of the blob.
http://github.com/choonkeat/binary_column_table
Usage is simple
Member.create(:name => "Michael", :photo => IO.read("avatar.png"))
#=> creates a record in "members" table, saving "Michael" into the "name" column
#=> creates a record in "binary_columns" table, saving "avatar.png" binary into "content" column
m = Member.last #=> only columns in "members" table is fetched (no blobs)
m.name #=> "Michael"
m.photo #=> binary content of the "avatar.png" file

Resources