extracting from 2 dimensional array and creating a hash with array values - ruby

I have a 2 dimensional array
v = [ ["ab","12"], ["ab","31"], ["gh","54"] ]
The first element of the subarray of v will have repeating elements, such as "ab". I want to create a hash that puts the key as the first element of the subarray, and values as an array of corresponding second elements from v.
please advice.
Further, I want this, h={"ab"=>["12","31"],"gh"=>["54"]} and then I want to return h.values, such that the array [["12","31"],["54"]] is returned

v.inject(Hash.new{|h,k|h[k]=[]}) { |h, (k, v)| h[k] << v ; h}
What it does:
inject (also called reduce) is a fold. Wikipedia defines folds like this: "a family of higher-order functions that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its constituent parts, building up a return value".
The block form of Hash.new takes two arguments, the hash itself and the key. If your default argument is a mutable object, you have to set the default this way, otherwise all keys will point to the same array instance.
In inject's block, we get two arguments, the hash and the current value of the iteration. Since this is a two element array, (k, v) is used to destructure the latter into two variables.
Finally we add each value to the array for its key and return the entire hash for the next iteration.

v.inject({­}) do |res,­ ar|
res[ar.fir­st] ||= []
res[ar.fir­st] << ar.la­st
res
end

v = [ ["ab","12"], ["ab","31"], ["gh","54"] ]
This gets you a hash, where the keys are the
unique first elements from the sub arrays.
h = v.inject({}) { |c,i| (c[i.first] ||= []) << i.last; c }
This turns that hash back into an array, just in case you need the array of arrays format.
arr = h.collect { |k,v| [k,v] }

Related

Why do I get a 'typeerror' when using inject in Ruby?

I am using this inject method to make a running total of values into an array. I am trying to figure out why I am getting an error.
def running_totals(myarray)
results = []
myarray.inject([]) do |sum,n|
results << sum + n
end
results
end
p running_totals([1,2,3,4,5])
I am getting the error
in `+': no implicit conversion of Fixnum into Array (TypeError)
When breaking this method down, isn't this the same as adding two integers and adding that into an array? I'm a bit confused here. Thanks for the help.
In the first iteration sum will be an array (as you specified an array as the default when calling inject([])) and you try to add a number to it. in the results << sum + n statement
Instead, set the initial value to 0, then add, then add the result to the array, then make sure you let sum get passed into the next iteration of inject.
def running_totals(myarray)
results = []
myarray.inject(0) do |sum,n| # First iteration sum will be 0.
sum += n # Add value to sum.
results << sum # Push to the result array.
sum # Make sure sum is passed to next iteration.
end
results
end
p running_totals([1,2,3,4,5]) #=> [1, 3, 6, 10, 15]
The result of results << sum + n is an array results and it's this that's replacing the sum value and so the next iteration you're trying to add a fixnum n into an array sum ... plus it doesn't help that you're initializing the value of sum to be an array.
Make sure that the last executed statement in your inject block is what you want the accumulated value to be.
def running_totals(myarray)
results = []
results << myarray.inject do |sum, n|
results << sum
sum + n
end
results
end
p running_totals([1,2,3,4,5])
=> [1, 3, 6, 10, 15]
Note that I moved the result of the inject into results array as well, so that the final value is also included, otherwise you'd only have the four values and would be missing the final (15) value.
The return value of the inject block is passed as the first argument the next time the block is called, so those have to match. In your code, you're passing an array as an intital value, and then returning an array; so far, so good. But inside the code block you treat that array parameter (sum) as a number, which won't work. Try this:
def running_totals(myarray)
myarray.inject([]) do |results,n|
results << n + (results.last || 0)
end
end
The [] passed as an argument to inject becomes the first value of results; the first array element (1 in your example) becomes the first value of n. Since results is empty, results.last is nil and the result of (results.last || 0) is 0, which we add to n to get 1, which we push onto results and then return that newly-modified array value from the block.
The second time into the block, results is the array we just returned from the first pass, [1], and n is 2. This time results.last is 1 instead of nil, so we add 1 to 2 to get 3 and push that onto the array, returning [1,3].
The third time into the block, results is [1,3], and n is 3, so it returns [1,3,6]. And so on.
According to ri, you have to return result of the computation from inject's block.
From: enum.c (C Method):
Owner: Enumerable
Visibility: public
Signature: inject(*arg1)
Number of lines: 31
Combines all elements of enum by applying a binary
operation, specified by a block or a symbol that names a
method or operator.
If you specify a block, then for each element in enum
the block is passed an accumulator value (memo) and the element.
If you specify a symbol instead, then each element in the collection
will be passed to the named method of memo.
In either case, the result becomes the new value for memo.
At the end of the iteration, the final value of memo is the
return value for the method.
If you do not explicitly specify an initial value for memo,
then uses the first element of collection is used as the initial value
of memo.
Examples:
# Sum some numbers
(5..10).reduce(:+) #=> 45
# Same using a block and inject
(5..10).inject {|sum, n| sum + n } #=> 45
# Multiply some numbers
(5..10).reduce(1, :*) #=> 151200
# Same using a block
(5..10).inject(1) {|product, n| product * n } #=> 151200
# find the longest word
longest = %w{ cat sheep bear }.inject do |memo,word|
memo.length > word.length ? memo : word
end
longest
So your sample would work if you return computation result for each iteration, something like this:
def running_totals(myarray)
results = []
myarray.inject do |sum,n|
results << sum + n
results.last # return computation result back to Array's inject
end
results
end
Hope it helps.

Finding the indexes of specific strings in an array, using a differently ordered equivalent array, ruby

I have two arrays: fasta_ids & frags_by_density. Both contain the same set of ≈1300 strings.
fasta_ids is ordered numerically e.g. ['frag1', 'frag2', 'frag3'...]
frags_by_density contains the same strings ordered differently e.g. ['frag14', 'frag1000'...]
The way in which frag_by_density is ordered is irrelevant to the question (but for any bioinformaticians, the 'frags' are contigs ordered by snp density).
What I want to do is find the indexes in the frag_by_density array, that contain each of the strings in fasta_ids. I want to end up with a new array of those positions (indexes), which will be in the same order as the fasta_ids array.
For example, if the order of the 'frag' strings was identical in both the fasta_ids and frags_by_density arrays, the output array would be: [0, 1, 2, 3...].
In this example, the value at index 2 of the output array (2), corresponds to the value at index 2 of fasta_ids ('frag3') - so I can deduce from this that the 'frag3' string is at index 2 in frags_by_density.
Below is the code I have come up with, at the moment it gets stuck in what I think is an infinite loop. I have annotated what each part should do:
x = 0 #the value of x will represent the position (index) in the density array
position_each_frag_id_in_d = [] #want to get positions of the values in frag_ids in frags_by_density
iteration = []
fasta_ids.each do |i|
if frags_by_density[x] == i
position_each_frag_id_in_d << x #if the value at position x matches the value at i, add it to the new array
iteration << i
else
until frags_by_density[x] == i #otherwise increment x until they do match, and add the position
x +=1
end
position_each_frag_id_in_d << x
iteration << i
end
x = iteration.length # x should be incremented, however I cannot simply do: x += 1, as x may have been incremented by the until loop
end
puts position_each_frag_id_in_d
This was quite a complex question to put into words. Hopefully there is a much easier solution, or at least someone can modify what I have started.
Update: renamed the array fasta_ids, as it is in the code (sorry if any confusion)
fasta_id = frag_id
Non optimized version. array.index(x) returns index of x in array or nil if not found. compact then removes nil elements from the array.
position_of_frag_id_in_d = frag_ids.map{|x| frag_by_density.index(x)}.compact

Replace near working array code with hash code in ruby to find mode

I was attempting to find a mode without using a hash, but now do not know if its possible, so I am wondering if someone can help me to translate my near working array code, into hash mode to make it work.
I have seen a shorter solution which I will post, but I do not quite follow it, I'm hoping this translation will help me to understand a hash better.
Here is my code, with my comments - I have bolded the part that I know will not work, as I'm comparing a frequency value, to the value of an element itself
#new = [0]
def mode(arr)
arr.each do |x| #do each element in the array
freq = arr.count(x) #set freq equal to the result of the count of each element
if freq > #new[0] && #new.include?(x) == false #if **frequency of element is greater than the frequency of the first element in #new array** and is not already there
#new.unshift(x) #send that element to the front of the array
#new.pop #and get rid of the element at the end(which was the former most frequent element)
elsif freq == #new[0] && #new.include?(x) == false #else **if frequency of element is equal to the frequency of the first element in #new array** and is not already there
#new << x #send that element to #new array
end
end
if #new.length > 1 #if #new array has multiple elements
#new.inject(:+)/#new.length.to_f #find the average of the elements
end
#new #return the final value
end
mode([2,2,6,9,9,15,15,15])
mode([2,2,2,3,3,3,4,5])
Now I have read this post:
Ruby: How to find item in array which has the most occurrences?
And looked at this code
arr = [1, 1, 1, 2, 3]
freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
arr.sort_by { |v| freq[v] }.last
But I dont quite understand it.
What I'd like my code to do, is, as it finds the most frequent element,
to store that element as a key, and its frequency as its value.
And then I'd like to compare the next elements frequency to the frequency of the existing pair,
and if it is equal to the most frequent, store it as well,
if it is greater, replace the existing,
and if it is less than, to disregard and move to the next element.
Then of course, I'd like to return the element which has most frequencies, not the amount of frequencies,
and if two or more elements share the most frequencies, then to find the average of those numbers.
I'd love to see it with some hint of my array attempt, and maybe an explanation of that hash method that I posted above, or one that is broken down a little more simply.
This seems to fit your requirements:
def mode(array)
histogram = array.each_with_object(Hash.new(0)) do |element, histogram|
histogram[element] += 1
end
most_frequent = histogram.delete_if do |element, frequency|
frequency < histogram.values.max
end
most_frequent.keys.reduce(&:+) / most_frequent.size.to_f
end
It creates a hash of frequencies histogram, where the keys are the elements of the input array and the values are the frequency of that element in the array. Then, it removes all but the most frequent elements. Finally, it averages the remaining keys.

Ruby Exercise: Count the numbers in an array between a given range

So working through the above exercise and found this solution on GitHub.
def count_between arr, lower, upper
return 0 if arr.length == 0 || lower > upper
return arr.length if lower == upper
range = (lower..upper).to_a
arr.select { |value| range.include?(value) }.length
end
I understand what the first three lines mean and why they return the values they do. What I'd like to understand are the following lines of code.
Line 4 (below) is defining "range" as a variable and uses the lower...upper as the range variables (just discovered you don't need to put an integer value in a range. What does '.to_a' mean, can't seem to find it in the ruby docs, and what does it do?
range = (lower..upper).to_a
Line 5 (below) is using an Array#select method and its saying select this value if the value is included in this range and then give me the Array#length of all selected values, but I don't quite understand A. what |value| is doing and what it means. B. range.include?(value) means is this value included in this range I am assuming.
arr.select { |value| range.include?(value) }.length
Actually, I'd simplify to this:
def count_between arr, lower, upper
return 0 if lower > upper
arr.count{|v| (lower..upper).include?(v)}
end
to_a is documented here; it returns an Array containing each element in the Range. However, there's no reason to call to_a on the Range before calling include?.
There's also no reason to special-case the empty array.
Returning the length of the array when lower equals upper makes no sense.
value is the name given to the value the block is called with. I think a simple v is better for such a trivial case.
select calls the block for each value in arr and returns a new Array containing the elements for which the block returns true, so the length of that new Array is the number of matching values. However, count exists, and makes more sense to use, since the count is all we care about.
Update: As #steenslag points out in the comments, Comparable#between? can be used instead of creating a Range on which to call include?, and this eliminates the need to ensure that lower is less than or equal to upper:
def count_between arr, lower, upper
arr.count{|v| v.between?(lower, upper)}
end
to_a means convert to array
irb(main):001:0> (1..5).to_a
=> [1, 2, 3, 4, 5]
select method passes each element to the block and Returns a new array containing all elements of ary for which the given block returns a true value.. In your case it simply checks if the value is contained in the range array. range is an array not a range.
## if arr is [1,5] for eg:
irb(main):005:0> [1,5].select {|value| range.include?(value)}
=> [1, 5]
irb(main):006:0> [1,5].select {|value| range.include?(value)}.length
=> 2
so the elements of arr are contained in the |value| variable inside the block.
It's a block.
As the documentation says: select "Returns a new array containing all elements of ary for which the given block returns a true value."
So for each object in arr it is passed to the block in which you provide whatever code you want to that returns true or false, and the select statement uses this result to add the value to the the array that it returns. And after that, length is called on the array.
So you have an array, you filter the array to contain only the numbers that are in the range, and then you take the length - effectively counting the number of elements.

Ensuring unique elements in a nested ruby array

I have an array which can have N nested array where N can contain M arrays where both N and M are >= 1. Some examples include the following:
[[[1,2,3],[3,4,5]],[[2,1,1]]]
or
[[[1,2,3]],[]]]
and finally
[[[1,2,3],[3,4,5]],[[2,1,1]], [[1,1,1],[2,2,2]]]
I need something that returns a boolean true or false if there is a duplicate value for the 0th element in a nested array, the issue is the composite array is not the unique identifier. Only the 0th element in each value array such as [1,2,3] or [3,4,5], in this case the integers 1 and 3, are what I need a unique against. so in the case of the last array, [1,1,1] and [1,2,3] would clash as the 1 is repeated.
What's the best way to iterate through this type of nesting and signal true or false on whether there are duplicates?
def uniq_prime_elements?(arr)
prime_elements = arr.map(&:first).map(&:first).compact
prime_elements.length == prime_elements.uniq.length
end

Resources