ActiveRecord: Unique by attribute

ActiveRecord: Unique by attribute - activerecord

I am trying to filter ActiveRecord_AssociationRelations to be unique by parent id.
So, I'd like a list like this:
[#<Message id: 25, posted_by_id: 3, posted_at: "2014-10-30 06:02:47", parent_id: 20, content: "This is a comment", created_at: "2014-10-30 06:02:47", updated_at: "2014-10-30 06:02:47">,
#<Message id: 23, posted_by_id: 3, posted_at: "2014-10-28 16:11:02", parent_id: 20, content: "This is another comment", created_at: "2014-10-28 16:11:02", updated_at: "2014-10-28 16:11:02">]}
to return this:
[#<Message id: 25, posted_by_id: 3, posted_at: "2014-10-30 06:02:47", parent_id: 20, content: "This is a comment", created_at: "2014-10-30 06:02:47", updated_at: "2014-10-30 06:02:47">]
I've tried various techniques including:
#messages.uniq(&:parent_id) # returns the same list (with duplicate parent_ids)
#messages.select(:parent_id).distinct # returns [#<Message id: nil, parent_id: 20>]
and uniq_by has been removed from Rails 4.1.

Have you tried
group(:parent_id)
It sounds to me like that is what you are after. This does return the first entry with the given parent_id. If you want the last entry you will have to reorder the result in a subquery and then use the group.

For me in Rails 3.2 & Postgresql, Foo.group(:bar) works on simple queries but gives me an error if I have any where clauses on there, for instance
irb> Message.where(receiver_id: 434).group(:sender_id)
=> PG::GroupingError: ERROR: column "messages.id" must appear in the
GROUP BY clause or be used in an aggregate function
I ended up specifying an SQL 'DISTINCT ON' clause to select. In a Message class I have the following scope:
scope :latest_from_each_sender, -> { order("sender_id ASC, created_at DESC").select('DISTINCT ON ("sender_id") *') }
Usage:
irb> Message.where(receiver_id: 434).latest_from_each_sender

Related

Algorith to remove duplicate records and records with a repetitive pattern

I have some records in a database tracking the price development on some items. These records often contains duplicates and repetitive sequences of price changes. I need to clean those up. Consider the following:
Record = Struct.new(:id, :created_at, :price)
records = [
Record.new(1, Date.parse('2017-01-01'), 150_000),
Record.new(2, Date.parse('2017-01-02'), 150_000),
Record.new(3, Date.parse('2017-01-03'), 130_000),
Record.new(4, Date.parse('2017-01-04'), 140_000),
Record.new(5, Date.parse('2017-01-05'), 140_000),
Record.new(6, Date.parse('2017-01-06'), 137_000),
Record.new(7, Date.parse('2017-01-07'), 140_000),
Record.new(8, Date.parse('2017-01-08'), 140_000),
Record.new(9, Date.parse('2017-01-09'), 137_000),
Record.new(10, Date.parse('2017-01-10'), 140_000),
Record.new(11, Date.parse('2017-01-11'), 137_000),
Record.new(12, Date.parse('2017-01-12'), 140_000),
Record.new(13, Date.parse('2017-01-13'), 132_000),
Record.new(14, Date.parse('2017-01-14'), 130_000),
Record.new(14, Date.parse('2017-01-15'), 132_000)
]
The policy should in plain words should be:
Remove any duplicates of exactly the same price immediately following each other.
Remove any records of a sequence of records with the same two prices jumping up and down for 2 times or more (e.g. [120, 110, 120, 110] but not [120, 110, 120]), so that only the initial price change is preserved.
In the above example the output that I would expect should be:
[
Record#<id: 1, created_at: Date#<'2017-01-01'>, price: 150_000>,
Record#<id: 3, created_at: Date#<'2017-01-03'>, price: 130_000>,
Record#<id: 4, created_at: Date#<'2017-01-04'>, price: 140_000>,
Record#<id: 6, created_at: Date#<'2017-01-06'>, price: 137_000>,
Record#<id: 13, created_at: Date#<'2017-01-13'>, price: 132_000>,
Record#<id: 14, created_at: Date#<'2017-01-14'>, price: 130_000>,
Record#<id: 14, created_at: Date#<'2017-01-14'>, price: 132_000>
]
Note: This is the most complicated example I can think of for the time being, if I find more, I'll update the question.

I have no problem dear sir of helping you with your challenge, here you go:
records_to_delete = []
# Cleanup duplicates
records.each_with_index do |record, i|
if i != 0 && record.price == records[i - 1].price
records_to_delete << record.id
end
end
records = records.delete_if{|record| records_to_delete.include?(record.id)}
# Remove repetitions
records_to_delete = []
records.each_with_index do |record, i|
if record.price == records[i + 2]&.price && records[i + 1]&.price == records[i + 3]&.price
records_to_delete << records[i+2].id
records_to_delete << records[i+3].id
end
end
records = records.delete_if{|record| records_to_delete.uniq.include?(record.id)}

PouchDB: filtering, ordering and paging

Very similar to these two CouchDB questions: 3311225 and 8924793, except that these approaches don't allow partial matching. Having e.g. these entries:
[{_id: 1, status: 'NEW', name: 'a'},
{_id: 2, status: 'NEW', name: 'aab'},
{_id: 3, status: 'NEW', name: 'ab'},
{_id: 4, status: 'NEW', name: 'aaa'},
{_id: 5, status: 'NEW', name: 'aa'}]
and key
[status, name, _id]
There seems to be no way to
filter these entries by status (full string match) and name (partial string match ~ startsWith)
order them by id
paginate them
because of the partial string match on name. The high value unicode character \uffff that allows this partial match also causes to ignore the _id part of the key, meaning the resulting entries are not sorted by _id, but rather by status and name.
var status = 'NEW';
var name = 'aa'
var query = {
startkey: [status, name],
endkey: [status, name + '\uffff', {}],
skip: 0,
limit: 10
};
results in
[{_id: 5, status: 'NEW', name: 'aa'},
{_id: 4, status: 'NEW', name: 'aaa'},
{_id: 2, status: 'NEW', name: 'aab'}]
There is no option to sort in memory, as this would only sort the individual pages, and not the entire data set. Any ideas about this?

Extract ruby hash element value from an array of objects

I've got the following array
[#<Attachment id: 73, container_id: 1, container_type: "Project", filename: "Eumna.zip", disk_filename: "140307233750_Eumna.zip", filesize: 235303, content_type: nil, digest: "9a10843635b9e9ad4241c96b90f4d331", downloads: 0, author_id: 1, created_on: "2014-03-07 17:37:50", description: "", disk_directory: "2014/03">, #<Attachment id: 74, container_id: 1, container_type: "Project", filename: "MainApp.cs", disk_filename: "140307233750_MainApp.cs", filesize: 1160, content_type: nil, digest: "6b985033e19c5a88bb5ac4e87ba4c4c2", downloads: 0, author_id: 1, created_on: "2014-03-07 17:37:50", description: "", disk_directory: "2014/03">]
I need to extract the value 73 and 74 from this string which is Attachment id.
is there any way to extract this value

just in case author meant he has an actual String instance:
string = '[#<Attachment id: 73, container_id: 1, container_type: "Project", filename: "Eumna.zip", disk_filename: "140307233750_Eumna.zip", filesize: 235303, content_type: nil, digest: "9a10843635b9e9ad4241c96b90f4d331", downloads: 0, author_id: 1, created_on: "2014-03-07 17:37:50", description: "", disk_directory: "2014/03">, #<Attachment id: 74, container_id: 1, container_type: "Project", filename: "MainApp.cs", disk_filename: "140307233750_MainApp.cs", filesize: 1160, content_type: nil, digest: "6b985033e19c5a88bb5ac4e87ba4c4c2", downloads: 0, author_id: 1, created_on: "2014-03-07 17:37:50", description: "", disk_directory: "2014/03">]'
string.scan(/\sid: (\d+)/).flatten
=> ["73", "74"]

Do as below using Array#collect:
array.collect(&:id)
In case it is a string use JSON::parse to get the array back from the string first, then use Array#collect method as below :
require 'json'
array = JSON.parse(string)
array.collect(&:id)

The elements of the array (I'll call it a) look like instances of the class Attachment (not strings). You can confirm that by executing e.class in IRB, where e is any element a (e.g., a.first). My assumption is correct if it returns Attachment. The following assumes that is the case.
#Arup shows how to retrieve the values of the instance variable #id when it has an accessor (for reading):
a.map(&:id)
(aka collect). You can see if #id has an accessor by executing
e.instance_methods(false)
for any element e of a. This returns an array which contains all the instance methods defined for the class Attachment. (The argument false causes Ruby's built-in methods to be excluded.) If #id does not have an accessor, you will need to use Object#instance_variable_get:
a.map { |e| e.instance_variable_get(:#id) }
(You could alternatively write the argument as a string: "#id").
If
s = '[#<Attachment id: 73, container_id: 1,..]'
in fact a string, but you neglected to enclose it in (single) quotes, then you must execute
a = eval(s)
to convert it to an array of instances of Attachment before you can extract the values of :#a.
Hear that 'click'? That was me starting my stop watch. I want to see how long it will take for a comment to appear that scolds me for suggesting the use of (the much-maligned) eval.
Two suggestions: shorten code to the essentials and avoid the need for readers to scroll horizontally to read it. Here, for example, you could have written this:
a = [#<Attachment id: 73, container_id: 1>, #<Attachment id: 74, container_id: 1>]
All the instance variables I've removed are irrelevant to the question.
If that had been too long to fit on one lines (without scrolling horizontally, write it as:
a = [#<Attachment id: 73, container_id: 1>,
#<Attachment id: 74, container_id: 1>]
Lastly, being new to SO, have a look at this guide.

Get Unique contents from Ruby Hash

I have a Hash #estate:
[#<Estate id: 1, Name: "Thane ", Address: "Thane St.", created_at: "2013-06-21 16:40:50", updated_at: "2013-06-21 16:40:50", user_id: 2, asset_file_name: "DSC02358.JPG", asset_content_type: "image/jpeg", asset_file_size: 5520613, asset_updated_at: "2013-06-21 16:40:49", Mgmt: "abc">,
#<Estate id: 2, Name: "Mumbai", Address: "Mumbai St.", created_at: "2013-06-21 19:13:59", updated_at: "2013-06-21 19:14:28", user_id: 2, asset_file_name: "DSC02359.JPG", asset_content_type: "image/jpeg", asset_file_size: 5085580, asset_updated_at: "2013-06-21 19:13:57", Mgmt: "abc">]
Is it possible to make new Hash with unique values according to the user_id: 2, because currently 2 elements have the user_id same i.e 2, I just want it once in the hash, what should I do ?

It seems to be something like a has_many relation between User model and Estate model, right? If I understood you correctly, than you need in fact to group your Estate by user_id:
PostgreSQL:
Estate.select('DISTINCT ON (user_id) *').all
MySQL:
Estate.group(:user_id).all
P.S. I'd not recommend to select all records from a database and then process them with Ruby as databases handle operations with data in much more efficient way.

Here is an sample example to get you a good start:
h = [ { a: 2, b: 3}, { a: 2, c: 3 } ]
h.uniq { |i| i[:a] }
# => [{:a=>2, :b=>3}]

Problem with join

It looks like this:
nearbys(20, :units => :km).joins(:interests)
.where(["users.id NOT IN (?)", blocked_ids])
.where("interests.language_id IN (?)", interests
.collect{|interest| interest.language_id})
This produces the following SQL:
SELECT
*,
(111.19492664455873 * ABS(latitude - 47.4984056) * 0.7071067811865475) +
(96.29763124613503 * ABS(longitude - 19.0407578) * 0.7071067811865475)
AS distance,
CASE
WHEN (latitude >= 47.4984056 AND longitude >= 19.0407578) THEN 45.0
WHEN (latitude < 47.4984056 AND longitude >= 19.0407578) THEN 135.0
WHEN (latitude < 47.4984056 AND longitude < 19.0407578) THEN 225.0
WHEN (latitude >= 47.4984056 AND longitude < 19.0407578) THEN 315.0
END AS bearing
FROM
"users"
INNER JOIN "interests" ON "interests"."user_id" = "users"."id"
WHERE
(latitude BETWEEN 47.38664309234778 AND 47.610168107652214
AND longitude BETWEEN 18.875333386667762 AND 19.20618221333224
AND users.id != 3)
AND (users.id NOT IN (3))
AND (interests.language_id IN (1,1))
GROUP BY
users.id,users.name,users.created_at,users.updated_at,users.location,
users.details,users.hash_id,users.facebook_id,users.blocked,users.locale,
users.latitude,users.longitude
ORDER BY
(111.19492664455873 * ABS(latitude - 47.4984056) * 0.7071067811865475) +
(96.29763124613503 * ABS(longitude - 19.0407578) * 0.7071067811865475)
The result it returns is correct, except it replaces the id of the user with the id of the interest. What am I missing here?
Thanks for the help!
Edit:
I narrowed the problem down to the geocoded gem.
This works perfectly:
User.where(["users.id NOT IN (?)", blocked_ids]).joins(:interests)
.where("interests.language_id IN (?)", interests
.collect{|interest| interest.language_id})
and returns:
[#<User id: 8,
name: "George Supertramp",
created_at: "2011-08-13 15:51:46",
updated_at: "2011-08-21 16:11:05",
location: "Budapest",
details: "{\"image\":\"http://graph.facebook.com/...",
hash_id: 1908133256,
facebook_id: nil,
blocked: nil,
locale: "de",
latitude: 47.4984056,
longitude: 19.0407578>]
but when I add .near([latitude, longitude], 20, :units => :km) it returns
[#<User id: 5,
name: "George Supertramp",
created_at: "2011-08-13 15:52:53",
updated_at: "2011-08-13 15:52:53",
location: "Budapest",
details: "{\"image\":\"http://graph.facebook.com/...",
hash_id: 1908133256,
facebook_id: nil,
blocked: nil,
locale: "de",
latitude: 47.4984056,
longitude: 19.0407578>]
because if somehow merges with the interest result:
[#<Interest id: 5,
user_id: 8,
language_id: 1,
classification: 1,
created_at: "2011-08-13 15:52:53",
updated_at: "2011-08-13 15:52:53">]
It seems the problem is with the grouping. How can I circumvent it without forking the gem.

I've solved the problem temporarily by using include instead of join. It is a stupid solution and it works on small sets of data while aggressively cached.
Here is the code:
User.where(["users.id NOT IN (?)", blocked_ids]).includes(:interests).near([latitude, longitude], 20, :units => :km).select{|user| user if ([user.interests.find_by_classification(1).language_id, user.interests.find_by_classification(2).language_id] - [self.interests.find_by_classification(1).language_id, self.interests.find_by_classification(2).language_id]).size < 2 }

I think you join table has an id field which is causing the issue.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ActiveRecord: Unique by attribute - activerecord

Have you tried group(:parent_id) It sounds to me like that is what you are after. This does return the first entry with the given parent_id. If you want the last entry you will have to reorder the result in a subquery and then use the group.

Related

Algorith to remove duplicate records and records with a repetitive pattern

PouchDB: filtering, ordering and paging

Extract ruby hash element value from an array of objects

Get Unique contents from Ruby Hash

Problem with join

Categories

Resources