Loop though multi-dimensional array in ruby - ruby

This is the question i'm having trouble with.
"Loop through the multi-dimensional Array and print out the full information of even items in the Array (ie the 2nd and 4th array in your multidimensional array)".I'm tasked with outputting all the data in the even numbered array which should be [1] [3], which would output all the information from array "derrick" & "andrew" only.
kristopher = ["kris", "palos hills", "708-200", "green"]
derrick = ["D-Rock", "New York", "773-933", "green"]
willie = ["William", "Humbolt Park", "773-987", "Black"]
andrew = ["drew", "northside", "773-123","blue"]
friends = [kristopher, derrick, willie, andrew]
friends.each do |arr|
print arr [0..4]
end
#Output
["kris", "palos hills", "708-200", "green"]["D-Rock", "New York", "773-933", "green"]["William", "Humbolt Park", "773-987", "Black"]["drew", "northside", "773-123", "blue"]

Something like this:
kristopher = ["kris", "palos hills", "708-200", "green"]
derrick = ["D-Rock", "New York", "773-933", "green"]
willie = ["William", "Humbolt Park", "773-987", "Black"]
andrew = ["drew", "northside", "773-123","blue"]
friends = [kristopher, derrick, willie, andrew]
(1...friends.length).step(2).each do |friendIndex|
friend = friends[friendIndex]
print friend
end

You can check Enumerable#partition and Enumerable#each_with_index which are helpful for splitting the array by a condition on the index of elements. If you use Integer#even? you can make a partition between even and odd indexes (+ 1 in this case).
friends.partition.with_index { |_, i| (i + 1).even? }
#=> [[["D-Rock", "New York", "773-933", "green"], ["drew", "northside", "773-123", "blue"]], [["kris", "palos hills", "708-200", "green"], ["William", "Humbolt Park", "773-987", "Black"]]]
So, for your case, take the first element:
friends.partition.with_index { |_, i| (i + 1).even? }.first
Or you can go straight with Enumerable#select:
friends.select.with_index { |_, i| (i + 1).even? }

Related

Detecting Multiple Anagrams

I was wondering if anybody could provide me with some assistance on detecting multiple anagrams inside of one large array.
I know that I can do a basic check using something like:
x = "Red"
y = "der"
x.downcase.split("").sort == y.downcase.split("").sort
However, I need help with something a little more complex than that. What I currently have is a large array containing well over 10,000 words, and I'm looking for to find the cleanest way to iterate through the array and return all anagrams separated into different lists.
For example, let's pretend the array is:
["Red", "Blue", "uLeB", "der"]
It should return:
[["Red", "der"], ["Blue", "uLeB"]]
They don't have to be returned in an array -- I just need a way of separating them.
Thanks in advance for any help!
Instead of comparing each word to every other word, you can group the entries via group_by, using the same logic:
words = %w(Red Blue uLeB der)
words.group_by { |w| w.downcase.chars.sort }
#=> {
# ["d", "e", "r"] => ["Red", "der"],
# ["b", "e", "l", "u"] => ["Blue", "uLeB"]
# }
I would suggest another approach using Hash
h = Hash.new { |hash, key| hash[key] = [] }
array = ["Red", "Blue", "uLeB", "der"]
array.each {|e| h[e.downcase.split('').sort.join] << e }
=> {"der"=>["Red", "der"], "belu"=>["Blue", "uLeB"]}

How to efficiently check text against a word list in Ruby?

Given a text of 1,000 words, what is an efficient way to check against a dictionary of 10,000 words? I would like to count the number of non-unique matches.
One idea was to store the dictionary as a hash. But then I would have to check each word against the hash, which would be 1,000 operations. That doesn't seem efficient.
Another idea is Postgres text search. But is it possible to do this check in one query?
Another idea is to store the words in a Memcache or Redis database, but that would require 1,000 queries and be very slow.
So, is there a more efficient solution?
Working in Ruby.
EDIT: Add benchmarks for a :
Cary's assertion that dict_set is faster is true:
aw.length
=> 250
dw.length
=> 1233
dict_set.length
=> 1223
t = Time.now; 1000.times{ aw & dw }; Time.now - t
=> 0.682465
t = Time.now; 1000.times{ aw.count{ |w| dict_set.include? w }}; Time.now - t
=> 0.063375
So, Set#include? seems quite efficient.
Suppose:
text = "The quick brown fox and the quick brown bear jumped over the lazy dog"
and
dictionary = ["dog", "lazy", "quick", "sloth", "the"]
Let's first convert dictionary to a set:
require 'set'
dict_set = dictionary.to_set
#=> #<Set: {"dog", "lazy", "quick", "sloth", "the"}>
and convert text to an array of downcased words:
words = text.downcase.split
#=> ["the", "quick", "brown", "fox", "the", "and", "quick",
# "brown", "bear", "jumped", "over", "the", "lazy", "dog"]
Here are a couple of ways of counting the number of words in text that are in dictionary.
#1: Simply count
words.count { |w| dict_set.include?(w) }
#=> 7
#2: group same words and count
words.group_by(&:itself).reduce(0) { |tot,(k,v)|
tot + ((dict_set.include?(k)) ? v.size : 0) }
#=> 7
Object#itself was introduced in v2.2. For earlier versions, replace:
group_by(&:itself)
with
group_by { |w| w }
The steps:
h = words.group_by(&:itself)
#=> {"the" =>["the", "the", "the"],
# "quick"=>["quick", "quick"],
# "brown"=>["brown", "brown"],
# "fox"=>["fox"],
# ...
# "dog"=>["dog"]}
h.reduce(0) { |tot,(k,v)| tot + ((dict_set.include?(k)) ? v.size : 0) }
#=> 7}
I would expect #1 to generally be fastest, considering that Set#include? is very fast. That is, I doubt that the time to group same words is less than the savings in dictionary look-ups.

Searching through two multidimensional arrays and grouping together similar subarrays

I am trying to search through two multidimensional arrays to find any elements in common in a given subarray and then put the results in a third array where the entire subarrays with similar elements are grouped together (not just the similar elements).
The data is imported from two CSVs:
require 'csv'
array = CSV.read('primary_csv.csv')
#=> [["account_num", "account_name", "primary_phone", "second_phone", "status],
#=> ["11111", "John Smith", "8675309", " ", "active"],
#=> ["11112", "Tina F.", "5551234", "5555678" , "disconnected"],
#=> ["11113", "Troy P.", "9874321", " ", "active"]]
# and so on...
second_array = CSV.read('customer_service.csv')
#=> [["date", "name", "agent", "call_length", "phone", "second_phone", "complaint"],
#=> ["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", " ", "rude"],
#=> ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]]
# and so on...
If any number is present as an element in a subarray on both primary.csv and customer_service.csv, I want that entire subarray (as opposed to just the common elements), put into a third array, results_array. The desire output based upon the above sample is:
results_array = [["11111", "John Smith", "8675309", " ", "active"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]] # and so on...
I then want to export the array into a new CSV, where each subarray is its own row of the CSV. I intend to iterate over each subarray by joining it with a , to make it comma delimited and then put the results into a new CSV:
results_array.each do {|j| j.join(",")}
File.open("results.csv", "w") {|f| f.puts results_array}
#=> 11111,John Smith,8675309, ,active
#=> 3/2/15,Mrs. Smith,Stew,1:45,9995678,8675309,says shes not a customer
# and so on...
How can I achieve the desired output? I am aware that the final product will look messy because similar data (for example, phone number) will be in different columns. But I need to find a way to generally group the data together.
Suppose a1 and a2 are the two arrays (excluding header rows).
Code
def combine(a1, a2)
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
a1.each_with_object([]) do |arr, b|
d = arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
b << d.uniq.unshift(arr) if d.any?
end
end
def number?(str)
str =~ /^\d+$/
end
Example
Here is your example, modified somewhat:
a1 = [
["11111", "John Smith", "8675309", "", "active" ],
["11112", "Tina F.", "5551234", "5555678", "disconnected"],
["11113", "Troy P.", "9874321", "", "active" ]
]
a2 = [
["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", "", "rude"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309", "surly"],
["3/7/15", "Cher", "Sonny", "7:45", "9874321", "8675309", "Hey Jude"]
]
combine(a1, a2)
#=> [[["11111", "John Smith", "8675309", "",
# "active"],
# ["3/2/15", "Mrs. Smith", "Stew", "1:45",
# "9995678", "8675309", "surly"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ],
# [["11112", "Tina F.", "5551234", "5555678",
# "disconnected"],
# ["3/1/15", "Mary ?", "Bob X", "5:00",
# "5551234", "", "rude"]
# ],
# [["11113", "Troy P.", "9874321", "",
# "active"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ]
# ]
Explanation
First, we define a helper:
def number?(str)
str =~ /^\d+$/
end
For example:
number?("8675309") #=> 0 ("truthy)
number?("3/1/15") #=> nil
Now index a2 on the values that represent numbers:
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
#=> {"5551234"=>[0], "9995678"=>[1], "8675309"=>[1, 2], "9874321"=>[2]}
This says, for example, that the "numeric" field "8675309" is contained in elements at offsets 1 and 2 of a2 (i.e, for Mrs. Smith and Cher).
We can now simply run through the elements of a1 looking for matches.
The code:
arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
steps through the elements of arr, assigning each to the block variable str. For example, if arr holds the first element of a1 str will in turn equals "11111", "John Smith", and so on. After s = str.strip, this says that if a s has a numerical representation and there is a matching key in h2, the (initially empty) array d is concatenated with the elements of a2 given by the value of h2[s].
After completing this loop we see if d contains any elements of a2:
b << d.uniq.unshift(arr) if d.any?
If it does, we remove duplicates, prepend the array with arr and save it to b.
Note that this allows one element of a2 to match multiple elements of a1.

Ruby - Sort array into sub categories

Lets say that I have an array of words, such as this:
words = ["apple", "zebra", "boat", "dog", "ape", "bingo"]
and I want to sort them alphabetically , but group them like so:
sorted = [["ape", "apple"], ["bingo", "boat"], ["dog"], ["zebra"]]
How would I be able to do this in Ruby? Help appreciated.
I'm assuming that you are trying to group by the first letter of each word. In which case you can use sort to sort the array and group_by to group by the first character of each word (as returned by chr).
words = ["apple", "zebra", "boat", "dog", "ape", "bingo"]
sorted = words.sort.group_by(&:chr).values
You could do something like this:
words.sort.chunk { |s| s[0] }.map(&:last)
This first sorts the array alphabetically (.sort), then it "chunks" together elements with the same first character (.chunk { |s| s[0] }), then it grabs the last element from each sub-array .map(&:last).
Something like this should do the trick
words = ["apple", "zebra", "boat", "dog", "ape", "bingo"]
sorted = words.sort.group_by { |s| s[0] }.map { |k,v| v }
You're going to need an array that also represents your categories. In this case, the letters of the alphabet. Below does what you're looking for using nested loops.
words = ["apple", "zebra", "boat", "dog", "ape", "bingo"]
results = []
('a'..'z').to_a.each_with_index do |letter, index|
letter_result = []
words.each do |word|
if(word[0]==letter)
letter_result.push(word)
end
end
unless letter_result.empty?
results.push(letter_result)
end
end

Ruby: Sorting an array of strings, in alphabetical order, that includes some arrays of strings

Say I have:
a = ["apple", "pear", ["grapes", "berries"], "peach"]
and I want to sort by:
a.sort_by do |f|
f.class == Array ? f.to_s : f
end
I get:
[["grapes", "berries"], "apple", "peach", "pear"]
Where I actually want the items in alphabetical order, with array items being sorted on their first element:
["apple", ["grapes", "berries"], "peach", "pear"]
or, preferably, I want:
["apple", "grapes, berries", "peach", "pear"]
If the example isn't clear enough, I'm looking to sort the items in alphabetical order.
Any suggestions on how to get there?
I've tried a few things so far yet can't seem to get it there. Thanks.
I think this is what you want:
a.sort_by { |f| f.class == Array ? f.first : f }
I would do
a = ["apple", "pear", ["grapes", "berries"], "peach"]
a.map { |e| Array(e).join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Array#sort_by clearly is the right method, but here's a reminder of how Array#sort would be used here:
a.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end.map {|e| (e.is_a? Array) ? e.join(', ') : e }
#=> ["apple", "grapes, berries", "peach", "pear"]
#theTinMan pointed out that sort is quite a bit slower than sort_by here, and gave a reference that explains why. I've been meaning to see how the Benchmark module is used, so took the opportunity to compare the two methods for the problem at hand. I used #Rafa's solution for sort_by and mine for sort.
For testing, I constructed an array of 100 random samples (each with 10,000 random elements to be sorted) in advance, so the benchmarks would not include the time needed to construct the samples (which was not insignificant). 8,000 of the 10,000 elements were random strings of 8 lowercase letters. The other 2,000 elements were 2-tuples of the form [str1, str2], where str1 and str2 were each random strings of 8 lowercase letters. I benchmarked with other parameters, but the bottom-line results did not vary significantly.
require 'benchmark'
# n: total number of items to sort
# m: number of two-tuples [str1, str2] among n items to sort
# n-m: number of strings among n items to sort
# k: length of each string in samples
# s: number of sorts to perform when benchmarking
def make_samples(n, m, k, s)
s.times.with_object([]) { |_, a| a << test_array(n,m,k) }
end
def test_array(n,m,k)
a = ('a'..'z').to_a
r = []
(n-m).times { r << a.sample(k).join }
m.times { r << [a.sample(k).join, a.sample(k).join] }
r.shuffle!
end
# Here's what the samples look like:
make_samples(6,2,4,4)
#=> [["bloj", "izlh", "tebz", ["lfzx", "rxko"], ["ljnv", "tpze"], "ryel"],
# ["jyoh", "ixmt", "opnv", "qdtk", ["jsve", "itjw"], ["pnog", "fkdr"]],
# ["sxme", ["emqo", "cawq"], "kbsl", "xgwk", "kanj", ["cylb", "kgpx"]],
# [["rdah", "ohgq"], "bnup", ["ytlr", "czmo"], "yxqa", "yrmh", "mzin"]]
n = 10000 # total number of items to sort
m = 2000 # number of two-tuples [str1, str2] (n-m strings)
k = 8 # length of each string
s = 100 # number of sorts to perform
samples = make_samples(n,m,k,s)
Benchmark.bm('sort_by'.size) do |bm|
bm.report 'sort_by' do
samples.each do |s|
s.sort_by { |f| f.class == Array ? f.first : f }
end
end
bm.report 'sort' do
samples.each do |s|
s.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end
end
end
end
user system total real
sort_by 1.360000 0.000000 1.360000 ( 1.364781)
sort 4.050000 0.010000 4.060000 ( 4.057673)
Though it was never in doubt, #theTinMan was right! I did a few other runs with different parameters, but sort_by consistently thumped sort by similar performance ratios.
Note the "system" time is zero for sort_by. In other runs it was sometimes zero for sort. The values were always zero or 0.010000, leading me to wonder what's going on there. (I ran these on a Mac.)
For readers unfamiliar with Benchmark, Benchmark#bm takes an argument that equals the amount of left-padding desired for the header row (user system...). bm.report takes a row label as an argument.
You are really close. Just switch .to_s to .first.
irb(main):005:0> b = ["grapes", "berries"]
=> ["grapes", "berries"]
irb(main):006:0> b.to_s
=> "[\"grapes\", \"berries\"]"
irb(main):007:0> b.first
=> "grapes"
Here is one that works:
a.sort_by do |f|
f.class == Array ? f.first : f
end
Yields:
["apple", ["grapes", "berries"], "peach", "pear"]
a.map { |b| b.is_a?(Array) ? b.join(', ') : b }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Replace to_s with join.
a.sort_by do |f|
f.class == Array ? f.join : f
end
# => ["apple", ["grapes", "berries"], "peach", "pear"]
Or more concisely:
a.sort_by {|x| [*x].join }
# => ["apple", ["grapes", "berries"], "peach", "pear"]
The problem with to_s is that it converts your Array to a string that starts with "[":
"[\"grapes\", \"berries\"]"
which comes alphabetically before the rest of your strings.
join actually creates the string that you had expected to sort by:
"grapesberries"
which is alphabetized correctly, according to your logic.
If you don't want the arrays to remain arrays, then it's a slightly different operation, but you will still use join.
a.map {|x| [*x].join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Sort a Flattened Array
If you just want all elements of your nested array flattened and then sorted in alphabetical order, all you need to do is flatten and sort. For example:
["apple", "pear", ["grapes", "berries"], "peach"].flatten.sort
#=> ["apple", "berries", "grapes", "peach", "pear"]

Resources