Ruby Array Maniplulation - Scanning and Counting - ruby

I'm trying to read though each string in the array and count the number of times the letters occurs in each position (ie 1 , 2, 3, 4). How am I not using the multidimensional array and += operator correctly?
def scan_str(arr)
position = [[]]
x = 0
arr.select do |word|
word.length.times do |i|
if word.index('G') == x
position[x+1,0] += 1
x += 1
elsif word.index('A') == x
position[x+1,1] += 1
x += 1
elsif word.index('T') == x
position[x+1,2] += 1
x += 1
elsif word.index('C') == x
position[x+1,3] += 1
x += 1
else
x += 1
end
end
end
p position
end
input = ["CTAGATA","CCCGAT","AAATT","TTCAAATGA"]
scan_str(input)
Thanks this is helpful. But now how do I manipulate the array without the error message "`[]': no implicit conversion from nil to integer (TypeError)"... There must be something I'm not getting about the index or position [][] syntax.
def scan_str(arr)
position = [[]]
z=arr.count
x = 0
arr.select do |word|
if word.index('G') == x
position[y][0] += (countG =+ 1)/z
x += 1
y += 1
elsif word.index('A') == x
position[y][1] += (countA =+ 1)/z
x += 1
y += 1
elsif word.index('T') == x
position[y][2] += (countT =+ 1)/z
x += 1
y += 1
elsif word.index('C') == x
position[y][3] += (countC =+ 1)/z
x += 1
y += 1
else
x += 1
y += 1
end
end
p position
end
input = ["CTAGATA","CCCGAT","AAATT","TTCAAATGA"]
scan_str(input)

AS they almost answered it in comments:
position[1,3] is 3 elements from 2nd position, counting from 0.
Correct syntax is: position[1][3].
ps. Example:
arr=[[1,2,3], [4,5,6]]
arr[1][2]
# 6 # 3rd element from 2nd array, counting from 0!

It is no very beautiful and should probably be split into a few functions:
a = ["CTAGATA","CCCGAT","AAATT","TTCAAATGA"]
p Hash[
a.map{|sub| sub.chars.with_index(1).to_a}
.flatten(1).group_by(&:last)
.map{|pos, values|
[pos, Hash[values.group_by{|char,|char}.map{|char,s|[char, s.size.to_f/values.length]}]]
}
] #=> {1=>{"C"=>0.5, "A"=>0.25, "T"=>0.25}, 2=>{"T"=>0.5, "C"=>0.25, "A"=>0.25}, 3=>

As the problem with your code has been explained, I would like to suggest a more "Ruby-like" approach:
TEST = ['G', 'A', 'T', 'C']
def scan_str(arr)
TEST.each_with_object({}) {|c,h| h[c] = arr.each_with_object(Hash.new(0)) {|line, hh| \
line.chars.each_with_index {|s,i| hh[i] += 1 if s==c}}}
end
arr = ["CTAGATA","CCCGAT","AAATT","TTCAAATGA"]
scan_str(arr)
# => {"G"=>{3=>2, 7=>1}, \
# => "A"=>{2=>2, 4=>3, 6=>1, 0=>1, 1=>1, 3=>1, 5=>1, 8=>1}, \
# => "T"=>{1=>2, 5=>2, 3=>1, 4=>1, 0=>1, 6=>1}, \
# => "C"=>{0=>2, 1=>1, 2=>2}}
A few points:
It is probably most convenient to put the results in a hash. Here I have scan_str returning a hash whose keys are the elements of TEST. The value of each key is itself a hash, with each key being a line offset position and the associated value being the number of times the letter given by the outer key is located at that position.
I first iterate over the elements of TEST using Enumerable#each_with_object with the default object being an empty hash {}. Inside the block the hash is referenced by h. The alternative would be to define an empty has (h = {}) in the line above and then use TEST.each {|c|... instead. Had I done that it would have also been necessary to add the line h at the end of the method, so that the hash would be returned.
For each element c of TEST, I iterate over the lines of the array, again using each_with_object. This time, however, the default value of the object is Hash.new(0) which creates a hash with default values of zero. By doing that, when hh[i] += 1 is executed in the inner loop, we don't have to check if hh has a key i; if it does not, Ruby first executes hh[i] = 0 (zero being the default value), then hh[i] += 1 => 1.
For each line, line.chars converts the line to an array of characters. I then iterate with Enumerable#each_with_index. Inside the block, the character (string of length one) and line offset are referenced by s and i respectively.
There are a couple of ways to obtain the result you want. The first, and probably easiest, would be just to change the code I've already offered. I'll do that later today.
The second is to use the code above as a "helper method".
Use helper" method
To use the code we already have, rename the scan_str method above to scan_str_helper and add this:
def scan_str(arr)
h = scan_str_helper(arr)
posh = Hash[h.values.map(&:keys).flatten.uniq.map {|e| \
[e,Hash[TEST.zip([0]*TEST.size)]]}]
h.each {|k,v| v.each {|kk,vv| posh[kk][k] += vv}}
posh.each_with_object({}) {|(k,v),hp| tot = 1.0 * v.values.reduce(&:+); \
hp[k] = Hash[v.keys.zip(v.values.map {|e| e/tot})]}
end
scan_str(arr)
# {3=>{"G"=>0.5, "A"=>0.25, "T"=>0.25, "C"=>0.0}, 7=>{"G"=>1.0, "A"=>0.0, "T"=>0.0, "C"=>0.0},
# 2=>{"G"=>0.0, "A"=>0.5, "T"=>0.0, "C"=>0.5}, 4=>{"G"=>0.0, "A"=>0.75, "T"=>0.25, "C"=>0.0},
# 6=>{"G"=>0.0, "A"=>0.5, "T"=>0.5, "C"=>0.0}, 0=>{"G"=>0.0, "A"=>0.25, "T"=>0.25, "C"=>0.5},
# 1=>{"G"=>0.0, "A"=>0.25, "T"=>0.5, "C"=>0.25},
# 5=>{"G"=>0.0, "A"=>0.3333333333333333, "T"=>0.6666666666666666, "C"=>0.0},
# 8=>{"G"=>0.0, "A"=>1.0, "T"=>0.0, "C"=>0.0}}
A few more notes:
h.values.map(&:keys).flatten.uniq => [3, 7, 2, 4, 6, 0, 1, 5, 8]` merely constructs an array of the position offsets that contain one or more TEST elements.
h.keys.zip([0]*TEST.size) => h.keys.zip([0, 0, 0, 0]) => Hash[["G",0], ["A",0], ["T",0], ["C",0]]] => {"G"=>0, "A"=>0, "T"=>0, "C"=>0}, so for e = 3 (say), Hash[3, {"G"=>0, "A"=>0, "T"=>0, "C"=>0}] => {3=>{"G"=>0, "A"=>0, "T"=>0, "C"=>0}}.
Instead of h.keys.zip([0]*TEST.size), you may be tempted to write a = [0]*TEST.size; TEST.zip(a). That doesn't work. I'll leave it to you to figure out why.
h.each {|k,v| v.each {|kk,vv| posh[kk][k] += vv}} fills the hash posh =>
# {3=>{"G"=>2, "A"=>1, "T"=>1, "C"=>0}, 7=>{"G"=>1, "A"=>0, "T"=>0, "C"=>0},
# 2=>{"G"=>0, "A"=>2, "T"=>0, "C"=>2}, 4=>{"G"=>0, "A"=>3, "T"=>1, "C"=>0},
# 6=>{"G"=>0, "A"=>1, "T"=>1, "C"=>0}, 0=>{"G"=>0, "A"=>1, "T"=>1, "C"=>2},
# 1=>{"G"=>0, "A"=>1, "T"=>2, "C"=>1}, 5=>{"G"=>0, "A"=>1, "T"=>2, "C"=>0},
# 8=>{"G"=>0, "A"=>1, "T"=>0, "C"=>0}}.
The last line merely converts the numbers of occurrences to fractions. For example, 3=>{"G"=>2, "A"=>1, "T"=>1, "C"=>0} is converted to 3=>{"G"=>0.5, "A"=>0.25, "T"=>0.25, "C"=>0.0}
Modification of Initial Code
def scan_str(arr)
a = Array.new(arr.map(&:size).max).map {|e| \
Hash[TEST.zip(Array.new(TEST.size,0))]}
arr.each {|s| s.chars.each_with_index {|c,i| TEST.each \
{|ss| a[i][ss] += 1 if c == ss}}}
Hash[a.map.with_index {|h,i| tot = 1.0 * h.values.reduce(&:+); tot > 0.0 ? \
[i, Hash[h.keys.zip(h.values.map {|e| e/tot})]] : nil}.compact]
end
The first statement creates an array a, the ith element corresponding to character offset i in each line. The value of the ith element is the hash referred to in the next note, with all values equal to zero.
The second statement fills the array a:
# => [{"G"=>0, "A"=>1, "T"=>1, "C"=>2}, {"G"=>0, "A"=>1, "T"=>2, "C"=>1},
# => {"G"=>0, "A"=>2, "T"=>0, "C"=>2}, {"G"=>2, "A"=>1, "T"=>1, "C"=>0},
# => {"G"=>0, "A"=>3, "T"=>1, "C"=>0}, {"G"=>0, "A"=>1, "T"=>2, "C"=>0},
# => {"G"=>0, "A"=>1, "T"=>1, "C"=>0}, {"G"=>1, "A"=>0, "T"=>0, "C"=>0},
# => {"G"=>0, "A"=>1, "T"=>0, "C"=>0}].
The last statement converts each element of a to a hash if the sum of the values is positive; else to nil. compact removes all elements that are nil. Putting Hash[ and the beginning and ] at the end converts the array to a hash, which is returned by scan_str.
Note this approach gives the same result as the method that the used the "helper" method, though the order of the elements of the hash is different.

Related

Ruby select multiple elements from a .split

String:
string = "this;is;a;string;yes"
I can split the string and append each element to an array like this
arr = []
string.split(";").each do |x|
arr << x
end
Is there an easy way to take the first third and fourth values other than something like this.
x = 0
string.split(";").each do |x|
if x == 0 or x == 2 or x == 3 then arr << x end
x += 1
end
Sure. Use Array#values_at:
string = "this;is;a;string;yes"
string.split(";").values_at(0, 2, 3)
# => ["this", "a", "string"]
See it on repl.it: https://repl.it/#jrunning/FussyRecursiveSpools

How to change a value in an array via a hash?

I want to change the value of an array via a hash, for example:
arr = ['g','g','e','z']
positions = {1 => arr[0], 2 => arr[1]}
positions[1] = "ee"
Problem is that the one that changed is hash and not array. When I do p arr It still outputs ['g','g','e','z']. Is there a way around this?
You're going to need to add another line of code to do what you want:
arr = ['g','g','e','z']
positions = {1 => arr[0], 2 => arr[1]}
positions[1] = "ee"
arr[0] = positions[1]
Another option would be to make a method that automatically updated the array for you, something like this:
def update_hash_and_array(hash, array, val, index)
# Assume that index is not zero indexed like you have
hash[index] = val
array[index - 1] = val
end
update_hash_and_array(positions, arr, "ee", 1) # Does what you want
This is possible to code into your hash using procs.
arr = ['g','g','e','z']
positions = {1 => -> (val) { arr[0] = val } }
positions[1].('hello')
# arr => ['hello', 'g', 'e', 'z']
You can generalize this a bit if you want to generate a hash that can modify any array.
def remap_arr(arr, idx)
(idx...arr.length+idx).zip(arr.map.with_index{|_,i| -> (val) {arr[i] = val}}).to_h
end
arr = [1,2,3,4,5,6]
positions = remap_arr(arr, 1)
positions[2].('hello')
# arr => [1,'hello',3,4,5,6]
positions[6].('goodbye')
# arr => [1,'hello',3,4,5,'goodbye']
But I'm hoping this is just a thought experiment, there is no reason to change the way array indexing behavior works to start from 1 rather than 0. In such cases, you would normally just want to offset the index you have to match the proper array indexing (starting at zero). If that is not sufficient, it's a sign you need a different data structure.
#!/usr/bin/env ruby
a = %w(q w e)
h = {
1 => a[0]
}
puts a[0].object_id # 70114787518660
puts h[1].object_id # 70114787518660
puts a[0] === h[1] # true
# It is a NEW object of a string. Look at their object_ids.
# That why you can not change value in an array via a hash.
h[1] = 'Z'
puts a[0].object_id # 70114787518660
puts h[1].object_id # 70114574058580
puts a[0] === h[1] # false
h[2] = a
puts a.object_id # 70308472111520
puts h[2].object_id # 70308472111520
puts h[2] === a # true
puts a[0] === h[2][0] # true
# Here we can change value in the array via the hash.
# Why?
# Because 'h[2]' and 'a' are associated with the same object '%w(q w e)'.
# We will change the VALUE without creating a new object.
h[2][0] = 'X'
puts a[0] # X
puts h[2][0] # X
puts a[0] === h[2][0] # true

Double index numbers in array (Ruby)

The array counts is as follows:
counts = ["a", 1]
What does this:
counts[0][0]
refer to?
I've only seen this before:
array[idx]
but never this:
array[idx][idx]
where idx is an integer.
This is the entire code where the snippet of code before was from:
def num_repeats(string) #abab
counts = [] #array
str_idx = 0
while str_idx < string.length #1 < 4
letter = string[str_idx] #b
counts_idx = 0
while counts_idx < counts.length #0 < 1
if counts[counts_idx][0] == letter #if counts[0][0] == b
counts[counts_idx][1] += 1
break
end
counts_idx += 1
end
if counts_idx == counts.length #0 = 0
# didn't find this letter in the counts array; count it for the
# first time
counts.push([letter, 1]) #counts = ["a", 1]
end
str_idx += 1
end
num_repeats = 0
counts_idx = 0
while counts_idx < counts.length
if counts[counts_idx][1] > 1
num_repeats += 1
end
counts_idx += 1
end
return counts
end
The statement
arr[0]
Gets the first item of the array arr, in some cases this may also be an array (Or another indexable object) this means you can get that object and get an object from that array:
# if arr = [["item", "another"], "last"]
item = arr[0]
inner_item = item[0]
puts inner_item # => "item"
This can be shortened to
arr[0][0]
So any 2 dimensional array or array containing indexable objects can work like this, e.g. with an array of strings:
arr = ["String 1", "Geoff", "things"]
arr[0] # => "String 1"
arr[0][0] # => "S"
arr[1][0] # => "G"
It's for nested indexing
a = [ "item 0", [1, 2, 3] ]
a[0] #=> "item 0"
a[1] #=> [1, 2, 3]
a[1][0] #=> 1
Since the value at index 1 is another array you can use index referencing on that value as well.
EDIT
Sorry I didn't thoroughly read the original question. The array in question is
counts = ["a", 1]
In this case counts[0] returns "a" and since we can use indexes to references characters of a string, the 0th index in the string "a" is simply "a".
str = "hello"
str[2] #=> "l"
str[1] #=> "e"

How can I convert a human-readable number to a computer-readable number in Ruby?

I'm working in Ruby with an array that contains a series of numbers in human-readable format (e.g., 2.5B, 1.27M, 600,000, where "B" stands for billion, "M" stands for million). I'm trying to convert all elements of the array to the same format.
Here is the code I've written:
array.each do |elem|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
else if elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
end
When I inspect the elements of the array using puts(array), however, the numbers appear with the "B" and "M" sliced off but the multiplication conversion does not appear to have been applied (e.g., the numbers now read 2.5, 1.27, 600,000, instead of 2500000000, 1270000, 600,000).
What am I doing wrong?
First thing to note is that else if in ruby is elsif. See http://www.tutorialspoint.com/ruby/ruby_if_else.htm
Here is a working function for you to try out:
def convert_array_items_from_human_to_integers(array)
array.each_with_index do |elem,i|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
elsif elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
array[i] = elem
end
return array
end
Calling convert_array_items_from_human_to_integers(["2.5B", "1.2M"])
returns [2500000000.0, 1200000.0]
Another variation:
array = ['2.5B', '1.27M', '$600000']
p array.each_with_object([]) { |i, a|
i = i.gsub('$', '')
a << if i.include? 'B'
i.to_f * 1E9
elsif i.include? 'M'
i.to_f * 1E6
else
i.to_f
end
}
#=> [2500000000.0, 1270000.0, 600000.0]
Try this:
array.map do |elem|
elem = elem.gsub('$','')
if elem.include? 'B'
elem.to_f * 1000000000
elsif elem.include? 'M'
elem.to_f * 1000000
else
elem.to_f
end
end
This uses map instead of each to return a new array. Your attempt assigns copies of the array elements, leaving the original array in place (except for the slice!, which modifies in place). You can dispense with the slicing in the first place, since to_f will simply ignore any non-numeric characters.
EDIT:
If you have leading characters such as $2.5B, as your question title indicates (but not your example), you'll need to strip those explicitly. But your sample code doesn't handle those either, so I assume that's not an issue.
Expanding a bit on pjs' answer:
array.each do |elem|
elem is a local variable pointing to each array element, one at a time. When you do this:
elem.slice! "B"
you are sending a message to that array element telling it to slice the B. And you're seeing that in the end result. But when you do this:
elem = elem.to_f
now you've reassigned your local variable elem to something completely new. You haven't reassigned what's in the array, just what elem is.
Here's how I'd go about it:
ARY = %w[2.5B 1.27M 600,000]
def clean_number(s)
s.gsub(/[^\d.]+/, '')
end
ARY.map{ |v|
case v
when /b$/i
clean_number(v).to_f * 1_000_000_000
when /m$/i
clean_number(v).to_f * 1_000_000
else
clean_number(v).to_f
end
}
# => [2500000000.0, 1270000.0, 600000.0]
The guts of the code are in the case statement. A simple check for the multiplier allows me to strip the undesired characters and multiply by the right value.
Normally we could use to_f to find the floating-point number to be multiplied for strings like '1.2', but it breaks down for things like '$1.2M' because of the "$". The same thing is true for embedded commas marking thousands:
'$1.2M'.to_f # => 0.0
'1.2M'.to_f # => 1.2
'6,000'.to_f # => 6.0
'6000'.to_f # => 6000.0
To fix the problem for simple strings containing just the value, it's not necessary to do anything fancier than stripping undesirable characters using gsub(/[^\d.]+/, ''):
'$1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'6,000'.gsub(/[^\d.]+/, '') # => "6000"
'6000'.gsub(/[^\d.]+/, '') # => "6000"
[^\d.] means "anything NOT a digit or '.'.
Be careful how you convert your decimal values to integers. You could end up throwing away important precision:
'0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000 # => 200000.0
'0.2M'.gsub(/[^\d.]+/, '').to_i * 1_000_000 # => 0
('0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000).to_i # => 200000
Of course all this breaks down if your string is more complex than a simple number and multiplier. It's easy to break down a string and identify those sort of sub-strings, but that's a different question.
I would do it like this:
Code
T, M, B = 1_000, 1_000_000, 1_000_000_000
def convert(arr)
arr.map do |n|
m = n.gsub(/[^\d.TMB]/,'')
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
end
end
Example
arr = %w[$2.5B 1.27M 22.5T, 600,000]
convert(arr)
# => [2500000000.0, 1270000.0, 22500.0, 600000.0]
Explanation
The line
m = n.gsub(/[^\d.TMB]/,'')
# => ["2.5B", "1.27M", "22.5T", "600000"]
merely eliminates unwanted characters.
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
returns the product of the string converted to a float and a constant given by the last character of the string, if that character is T, M or B, else 1.
Actual implementation might be like this:
class A
T, M, B = 1_000, 1_000_000, 1_000_000_000
def doit(arr)
c = self.class.constants.map(&:to_s).join
arr.map do |n|
m = n.gsub(/[^\d.#{c}]/,'')
m.to_f * (m[-1][/[#{c}]/] ? self.class.const_get(m[-1]) : 1)
end
end
end
If we wished to change the reference for 1,000 from T to K and add T for trillion, we would need only change
T, M, B = 1_000, 1_000_000, 1_000_000_000
to
K, M, B, T = 1_000, 1_000_000, 1_000_000_000, 1_000_000_000_000

Use Each on an array to iterate over and return the index of greater values

a = [2,2,4,8,9]
ind = 1
a.each do |x|
if a[ind] < a[x]
puts x
end
end
How can I use "each" on an array to iterate over and return the index of all values greater than a certain value in Ruby?
I would like to iterate over the given array a = [2,2,4,8,9]. I want to iterate over the entire array and, using a conditional, put out all values where a[ind] < a[x].
I receive the error comparison of fixnum nil failed. - How can I resolve this?
I did try this as well, seting a range for the process:
a = [ 2,2,3,4,5]
x = 0
while x >= 0 && x <= 4
a.each do |x|
if a[1] < a[x]
puts x
end
end
x += 1
end
You want to select all elements whose index is less than themselves. You can just say exactly that in Ruby:
a.select.with_index {|el, idx| idx < el }
or even
a.select.with_index(&:>)
When you are iterating over an array using each the x denotes the value of the item, not its position:
a = [2,2,4,8,9]
ind = 1
a.each do |x|
if a[ind] < x
puts x
end
end
# prints:
# 4
# 8
# 9
Update:
If you want to print the indexes of the elements with value greater than the value, you should use each_with_index:
a = [2,2,4,8,9]
ind = 1
a.each_with_index do |x, i|
if a[ind] < x
puts i
end
end
# prints:
# 2
# 3
# 4
def filtered_index(array,n)
array.each_with_index{|e,i| puts i if e > n}
end

Resources