Which is the most used name? - ruby

I am working on a ruby on rails site and I want to check its database for which is the most frequent name among the registered users.
There is a row called "First Name" for which I will go through. I don't mind about case sensitive right now.
Any convenient way to for example check what is the most popular name and then the second most popular, the third most popular and so on?
What I thought of is to get all users in an array and then do #users.each do |user|, then record the names in an array and after that to count the duplicates of each record that has more than one element recorded. I am not sure if its the proper way though.

Here is how you can do it using ActiveRecord:
User.group(:first_name).order('popularity desc').pluck(:first_name, 'count(*) as popularity')
This code translates to the SQL:
SELECT "users.first_name", count(*) as popularity FROM "users"
GROUP BY first_name
ORDER BY popularity
and you get something like:
[["John", 2345], ["James", 1986], ["Sam", 1835], ...]
If you want only the top ten names, you can limit the number of results simply by adding limit:
User.group(:first_name).order('popularity desc').limit(10).pluck(:first_name, 'count(*) as popularity')
Another option is to use the count API:
User.group(:first_name).count
=> {"Sam" => 1835, "Stefanos" => 2, ...}
# ordered
User.group(:first_name).order('count_all desc').count
=> {"John" => 2345, "James" => 1986, "Sam" => 1835, ...}
# top 3
User.group(:first_name).order('count_all desc').limit(3).count
=> {"John" => 2345, "James" => 1986, "Sam" => 1835 }

You could do the following SQL statement
select count(*) as count from users group by users.first_name order by count desc
Will return you the top most results. As Boris said, using just sql is the right way to go here.
Otherwise if you want to load all the users, you could do so by map-reduce.
#users.group_by(&:first_name).sort(&:count).reverse
Will give you an array of users sorted descending by their names.

Another way using ActiveRecord:
User.group(:first_name).count
Generated SQL is:
SELECT COUNT(*) AS count_all, name AS name FROM `users` GROUP BY name
Will output a hash of { name => number_of_occurances } e.g
{"John" => 29, "Peter" => 87, "Sarah" => 2}

Related

Identifying duplicates in specific CSV output

Ruby newbie here. I've got a product csv where first col is a unique SKU and second col is a product ID that can be duplicated across multiple products (+ many other cols but these are the pertinent ones). Like:
SKU | Prod ID
99 | 10384
100 | 10385
101 | 10385
102 | 10386
103 | 10386
104 | 10387
In the script I'm writing, the first time a product ID is used will become a 'parent', and any subsequent instances of the product ID get treated differently (ie, different sizes).
Currently am reading in the whole CSV rather than doing foreach line as I assumed I'd need all the data available to find the duplicates.
Issue is I'm not sure on the how to be able to identify the first time a product ID is used and then identifying any further instances of it's use.
My first thought was to somehow identify the duplicates (uniq?) and then create a new column and put a 1 if it's the first time it's occurred and 0 if it's occurred previously. After looking at uniq I'm not sure how I then go back to the main list and mark my 1's and 0's.
Can someone please point me in the direction of the classes/methods I need to be looking at?
Thanks,
Liam
Edit for John D: This gives me the hashes but in 1:1 format not 1: all instances of prod ID
CSV.foreach(INPUT, :headers => true , :header_converters => :symbol, :col_sep => "|", :quote_char => "\x00") do |csv_obj|
items[csv_obj.fields[0]] = [csv_obj.fields[1]]
end
so gives;
"230709"=>["88507"], "109064"=>["9019"]
You're thinking of the Sku as the unique identifier, which it may in fact be. But if you turn that on it's head and think of the ProductID as the unique identifier, then you can build a Hash where the key is the ProductID and the value is an Array of Skus. Then you'll be able to track which Skus are associated with which ProductID.
Of course you'll read this in some other way, but the end result would be similar to:
products =
{
10384 => [99],
10385 => [100, 101],
10386 => [102, 103],
10387 => [104]
}
Here's an example of how to construct this Hash:
#!/usr/bin/env ruby
require 'csv'
source = [
"99|110384",
"100|10385",
"101|10385",
"102|10386",
"103|10386",
"104|10387"
].join("\n")
source = CSV.parse(source, :col_sep => "|")
hh = source.inject({}) do |memo, row|
sku = row[0]
prod = row[1]
memo[prod] = [] unless memo.include?(prod)
memo[prod] << sku
memo
end
puts hh
.group_by() is relatively new (though it has an older counterpart in Rails), but is awfully convenient and should do most of your heavy lifting.
If you create a class to hold each row and put them in an Array, then you can call the group_by method with a block that just checks each object's Product ID field.
That gives you a Hash, which you can iterate through with .keys.each.
Assuming a whole bunch of things about your program that are hopefully semi-obvious, something like:
transactionHash = transactions.group_by { |x| x.productId }
Then, you can go through your transaction lists per product with:
transactionHash.each do |prodId,transList|
# transList has all of your transaction objects per product
end
Again, that assumes you're keeping your transactions in a list of objects. The x.productId would be something like x[1] if you store each transaction in an array, for example.

Select value via index using watir and ruby

I have such code:
total_terms = #driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').length
if (1...5).include?(total_terms)
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').option(:index, total_terms).select
else
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').option(:index, (total_terms-2)).select
end
and I am trying to select some value via index. First, I calculate how long my select_list is, and then I select. But in the browser, I see that nothing is selected. What did I do wrong?
Your code is probably throwing exceptions.
Select lists do not have a method length
The line
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').length
is not valid since select lists do not have a method length. Assuming you want the number of options, need to add the options method to get a collection of options in the select list:
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').options.length
5 or less options selects non-existent option
The line
if (1...5).include?(total_terms)
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').option(:index, total_terms).select
will throw an exception due to there being nothing at the specified index. The :index locator is 0-based - ie 0 means the first option, 1 means the second option, etc. This means that when there are two options, you will try to select :index => 2, which does not exist. You need to subtract 1:
if (1...5).include?(total_terms)
#driver.select_list(:name => 'ctl00$cp$cbRodzajUslugi').option(:index, total_terms-1).select

Linq - group by a name, but still get the id

I'm trying to find duplicates in linq by a particular column (the name column), but I also wish to return the unique id, as I wish to bind to the ID to display additional information about the row.
I've dug around on stackoverflow, but can only find ways of finding duplicates in the fashion off:
By the whole object
By a particular property
Getting the number of duplicates
The closest thing I could find was by specifying "Key" in my group by, but I'm ensure if that is working.
Ideally I'm hoping to output something that has the ID, Number of Duplicates.
Thanks
Assume you have people collection:
from p in people
group p by p.Name into g
select new {
Name = g.Key,
NumberOfDuplicates = g.Count(),
IDs = g.Select(x => x.ID)
}

Using Ruby to tag records that contain repeat phrases in a table

I'm trying to use Ruby to 'tag' records in a CSV table, based on whether or not a particular field contains a certain phrase that is repeated. I'm not sure if there are libraries to assist with this kind of job, and I recognize that Ruby might not be the most efficient language to do this sort of thing.
My CSV table contains a unique ID and a text field that I want to search:
ID,NOTES
1,MISSING DOB; ID CANNOT BE BLANK
2,INVALID MEMBER ID - unable to verify
3,needs follow-up
4,ID CANNOT BE BLANK-- additional info needed
From this CSV table, I've extracted keywords and assigned them a tag, which I've stored in another CSV table.
PHRASE,TAG
MISSING DOB,BLANKDOB
ID CANNOT BE BLANK,BLANKID
INVALID MEMBER ID,INVALIDID
Note that the NOTES column in my source contains punctuation and other phrases in addition to the phrases I have identified and want to map. Additionally, not all records have phrases that will match.
I want to create a table that looks something like this:
ID, TAG
1, BLANKDOB
1, BLANKID
2, INVALIDID
4, BLANKID
Or, alternately with the tags delimited with another character:
ID, TAG
1, BLANKDOB; BLANKID
2, INVALIDID
4, BLANKID
I have loaded the mapping table into a hash, with the phrase as the key.
phrase_hash = {}
CSV.foreach("phrase_lookup.csv") do |row|
phrase, tag = row
next if name == "PHRASE"
phrase_hash[phrase] = tag
end
The keys of the hash are then the search phrases that I want to iterate through. I'm having trouble expressing what I want to do next in Ruby, but here's the idea:
Load the NOTES table into an array. For each phrase (i.e. key), select the records from the array that contain the phrase, gather the IDs associated with these rows, and output them with the associated tag for that phrase, as above.
Can anyone help?
I'll give you an example using hash inputs instead of CSV:
notes = { 1 => "MISSING DOB; ID CANNOT BE BLANK",
2 => "INVALID MEMBER ID - unable to verify",
3 => "needs follow-up",
4 => "ID CANNOT BE BLANK-- additional info needed"
}
tags = { "MISSING DOB" => "BLANKDOB",
"ID CANNOT BE BLANK" => "BLANKID",
"INVALID MEMBER ID" => "INVALIDID"
}
output = {}
tags.each_pair do |tags_key,tags_value|
notes.each_pair do |notes_key, notes_value|
if notes_value.match(tags_key)
output[notes_key] ||= []
output[notes_key] << tags_value
end
end
end
puts output.map {|k,v| "#{k}, #{v.join("; ")}"}.sort

ActiveRecord count of distinct days from created_at?

Is it possible to retrieve a count of distinct records based on a field value if the field needs to be interrogated (ideally, using ActiveRecord alone)?
For example, the following returns a count of unique records based on the 'created_at' field:
Record.count('created_at', :distinct => true)
However, is it possible to get a count of, say, unique days based on the 'created_at' field in a similar way?
A naive ActiveRecord example to explain my intent being:
Record.count('created_at'.day, :distinct => true)
(I know the string 'created_at' isn't a 'Time', but that's the sort of query I'd like to ask ActiveRecord.)
You need to group the records. For example
Record.group('DATE(created_at)').count('created_at')
tells you the number of rows created on each particular date, or
Record.group('DAYOFWEEK(created_at)').count('created_at')
would tell you the number of rows created on individual days of the week.
Beware that this with the usual active record setup this will do date calculations in UTC, if you want to do your calculations in a specific timezone you'll have to add that conversion to the group statement.

Resources