I have the following data:
gene strain
A1 S1
A1 S4
A1 S8
A2 S5
A2 S4
A2 S9
A3 S4
A3 S1
A3 S10
I need to produce a matrix that has the genes vs strains, I.E., I need to show which genes are present in which strains, so the matrix will look like this:
S1 S4 S5 S8 S9 S10
A1
A2
A3
Can anyone guide me through the best and quickest way to do this in Ruby? I have the array of strains and genes.
There are many ways you could represent the gene-strain matrix you need. The best way will depend on what you want to do with the matrix. Do you want to compare which strains are present in different genes? Or compare which genes have a given strain? Do you just want to be able to look up whether a given gene has a given strain?
One simple way would be a Hash whose keys are Sets:
require 'set'
h = Hash.new { |h,k| h[k] = Set.new }
# assuming you already have the data in an array of arrays...
data.each do |gene,strain|
h[gene] << strain
end
If you only want to print a matrix out on the screen, here is a little script to do so:
require 'set'
genes, strains = Set.new, Set.new
h = Hash.new { |h,k| h[k] = Set.new }
# again assuming you already have the data in an array of arrays
data.each { |g,s| h[g] << s; genes << g; strains << s }
genes, strains = genes.sort, strains.sort
FIELD_WIDTH = 5
BLANK = " "*FIELD_WIDTH
X = "X" + (" " * (FIELD_WIDTH - 1))
def print_fixed_width(str)
str = str[0,FIELD_WIDTH]
print str
print " "*(FIELD_WIDTH-str.length)
end
# now print the matrix
print BLANK
strains.each { |s| print_fixed_width(s) }
puts
genes.each do |g|
print_fixed_width(g)
strains.each { |s| h[g].include?(s) ? print X : print BLANK }
puts
end
Please post more details on what you want to do with the matrix and I will provide a more appropriate option if necessary.
You can represent this in a 2d array:
arr = [[1,1],[1,4],[1,8],[2,5],[2,4],[2,9],[3,4],[3,1],[3,10]]
quick and dirty table:
s = " 1234567890\n"
(1..3).each do |i|
s << i.to_s << ' '
(1..10).each do |j|
s << ( arr.include?( [i,j] ) ? 'x' : ' ' )
end
s << "\n"
end
puts s
1234567890
1 x x x
2 xx x
3 x x x
If you " need to check which genes are present in which strains", then a Hash would be sufficient:
str = <<DOC
A1 S1
A1 S4
A1 S8
A2 S5
A2 S4
A2 S9
A3 S4
A3 S1
A3 S10
DOC
ar = str.lines.map{|line| line.split(/\s+/) } #string to array of arrays
genes_from_strain = Hash.new{|h,k| h[k]=[] } #This hash will give an empty array if key is not present
ar.each{|pair| genes_from_strain[pair.last] << pair.first }
p genes_from_strain['S1'] #=>["A1", "A3"]
Related
How do I iterate 9 times and produce three arrays like this:
1 a
2 b
3 c
["a","b","c"]
4 d
5 e
6 f
["d","e","f"]
7 g
8 h
9 i
["g","h","i"] ?
-------------------------------------
1.upto(9) do
xxx = gets.chomp
wn << xxx
if wn.length ==3
puts wn.inspect
end
end
------------------------------------
I get the following output:
a
b
c
["a", "b", "c"]
d
e
f
g
h
i
Not the results I hoped for :(
A simple solution:
a1 = []
a2 = []
a3 = []
1.upto(9) do |i|
if a1.empty? || a1.size < 3
a1 << gets.chomp!
elsif a2.size < 3
a2 << gets.chomp!
else
a3 << gets.chomp!
end
end
puts a1
puts a2
puts a3
Create the 3 arrays, iterate 9 times, create conditions to populate them.
Do you have to iterate? you could always break your string by length, like so:
"abcdefghi".scan(/.{3}/).map{|i| i.split('')} # => [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]
If you really must iterate:
1.upto(9) do
xxx = gets.chomp
wn << xxx
if wn.length % 3 == 0
puts wn.inspect
end
end
I think nested loops are a clean solution for you.
a = [[],[],[]]
3.times do |i|
3.times { |j| a[i][j] = gets.strip }
puts a[i].inspect
end
I am building a base converter. Here is my code so far:
def num_to_s(num, base)
remainders = [num]
while base <= num
num /= base #divide initial value of num
remainders << num #shovel results into array to map over for remainders
end
return remainders.map{|i| result = i % base}.reverse.to_s #map for remainders and shovel to new array
puts num_to_s(40002, 16)
end
Now it's time to account for bases over 10 where letters replace numbers. The instructions (of the exercise) suggest using a hash. Here is my hash:
conversion = {10 => 'A', 11 => 'B', 12 => 'C', 13 => 'D', 14 => 'E', 15 => 'F',}
The problem is now, how do I incorporate it so that it modifies the array? I have tried:
return remainders.map{|i| result = i % base}.map{|i| [i, i]}.flatten.merge(conversion).reverse.to_s
In an attempt to convert the 'remainders' array into a hash and merge them so the values in 'conversion' override the ones in 'remainders', but I get an 'odd list for Hash' error. After some research it seems to be due to the version of Ruby (1.8.7) I am running, and was unable to update. I also tried converting the array into a hash outside of the return:
Hashes = Hash[remainders.each {|i, i| [i, i]}].merge(conversion)
and I get an 'dynamic constant assignment' error. I have tried a bunch of different ways to do this... Can a hash even be used to modify an array? I was also thinking maybe I could accomplish this by using a conditional statement within an enumerator (each? map?) but haven't been able to make that work. CAN one put a conditional inside an enumerator?
Yes, you could use a hash:
def digit_hash(base)
digit = {}
(0...[10,base].min).each { |i| digit.update({ i=>i.to_s }) }
if base > 10
s = ('A'.ord-1).chr
(10...base).each { |i| digit.update({ i=>s=s.next }) }
end
digit
end
digit_hash(40)
#=> { 0=>"0", 1=>"1", 2=>"2", 3=>"3", 4=>"4",
# 5=>"5", 6=>"6", 7=>"7", 8=>"8", 9=>"9",
# 10=>"A", 11=>"B", 12=>"C", ..., 34=>"Y", 35=>"Z",
# 36=>"AA", 37=>"AB", 38=>"AC", 39=>"AD" }
There is a problem in displaying digits after 'Z'. Suppose, for example, the base were 65. Then one would not know if "ABC" was 10-11-12, 37-12 or 10-64. That's detail we needn't worry about.
For variety, I've done the base conversion from high to low, as one might do with paper and pencil for base 10:
def num_to_s(num, base)
digit = digit_hash(base)
str = ''
fac = base**((0..Float::INFINITY).find { |i| base**i > num } - 1)
until fac.zero?
d = num/fac
str << digit[d]
num -= d*fac
fac /= base
end
str
end
Let's try it:
num_to_s(134562,10) #=> "134562"
num_to_s(134562, 2) #=> "100000110110100010"
num_to_s(134562, 8) #=> "406642"
num_to_s(134562,16) #=> "20DA2"
num_to_s(134562,36) #=> "2VTU"
Let's check the last one:
digit_inv = digit_hash(36).invert
digit_inv["2"] #=> 2
digit_inv["V"] #=> 31
digit_inv["T"] #=> 29
digit_inv["U"] #=> 30
So
36*36*36*digit_inv["2"] + 36*36*digit_inv["V"] +
36*digit_inv["T"] + digit_inv["U"]
#=> 36*36*36*2 + 36*36*31 + 36*29 + 30
#=> 134562
The expression:
(0..Float::INFINITY).find { |i| base**i > num }
computes the smallest integer i such that base**i > num. Suppose, for example,
base = 10
num = 12345
then i is found to equal 5 (10**5 = 100_000). We then raise base to this number less one to get the initial factor:
fac = base**(5-1) #=> 10000
Then the first (base-10) digit is
d = num/fac #=> 1
the remainder is
num -= d*fac #=> 12345 - 1*10000 => 2345
and the factor for the next digit is:
fac /= base #=> 10000/10 => 1000
I made a couple of changes from my initial answer to make it 1.87-friedly (I removed Enumerator#with_object and Integer#times), but I haven't tested with 1.8.7, as I don't have that version installed. Let me know if there are any problems.
Apart from question, you can use Fixnum#to_s(base) to convert base.
255.to_s(16) # 'ff'
I would do a
def get_symbol_in_base(blah)
if blah < 10
return blah
else
return (blah - 10 + 65).chr
end
end
and after that do something like:
remainders << get_symbol_in_base(num)
return remainders.reverse.to_s
I have the following array:
array = [["Group EX (Instructor)", 0.018867924528301886], ["Personal Reasons", 0.018867924528301886]]
and I need to split this array up, dynamically, into two arrays:
text_array = ["Group EX (Instructor)", "Personal Reasons"]
number_array = [0.018867924528301886,0.018867924528301886]
I'm currently doing this, which can't be the right way:
array.each do |array|
text_array << array[0]
number_array << array[1]
end
Simply use #transpose.
array = [["Group EX (Instructor)", 0.018867924528301886], ["Personal Reasons", 0.018867924528301886]]
a1, a2 = array.transpose
#=> [["Group EX (Instructor)", "Personal Reasons"],
[0.018867924528301886, 0.018867924528301886]]
Repairing your existing code,
text_array = array.map { |x| x[0] } #give back first element of each subarray
number_array = array.map { |x| x[1] } #give back second element of each subarray
I would do as below :
array = [["Group EX (Instructor)", 0.018867924528301886], ["Personal Reasons", 0.018867924528301886]]
text_array,number_array = array.flatten.partition{|e| e.is_a? String }
text_array # => ["Group EX (Instructor)", "Personal Reasons"]
number_array # => [0.018867924528301886, 0.018867924528301886]
This too works:
text_array, number_array = array.first.zip(array.last)
but transpose clearly is what you want.
Two files with the same structure (first file = unique field/index)
File X
1,'a1','b1'
2,'a2','b20'
3,'a3','b3'
4,'a4','b4'
File Y
1,'a1','b1'
2,'a2','b2'
3,'a30','b3'
5,'a5','b5'
Goal: identify differences between these files. There are a lot of fields to compare in each file.
Requested output (maybe there is a better way to present it):
Index X:a X:b Y:a Y:b Result
===== === === === === ======
1 a1 b1 a1 b1 No diff
2 a2 b20 a2 b2 Diff in field b (Xb=b20, Yb=b2)
3 a3 b3 a30 b3 Diff in field a (Xa=a3, Ya=a30
4 a4 b4 null null missing entries in file Y
5 null null a5 b5 missing entries in file X
Ruby code - what I have so far:
x = [[1,'a1','b1'], [2,'a2','b20'], [3, 'a3', 'b3'], [4, 'a4', 'b4']]
y = [[1,'a1','b1'], [2,'a2','b2'], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
h = Hash.new(0)
x.each {|e|
h[e[0]] = 1
}
y.each {|e|
h[e[0]] = 1
}
x.each {|e|
p e[0]
}
I already have all keys (index) from both arrays in hash = h
It seems to be some kind of SQL join using index as a common key.
Can you give me some direction on how to iterate over both arrays to find the differences?
The problem of comparing two files is old. At the time of punched cards, forty years ago, we already had to solve it to print bills for items sold every day. One file was the customer file (primary file), the second was the deck of cards punched from delivery forms (secondary file). Each record (card) in this secondary file contained both the customer number and the item number. Both files were sorted on customer number, and the algorithm was called matching. It consists of reading one record from each file, comparing the common key, and selecting one of three possible cases :
primary key < secondary key : skip this customer (normal, there are
more customers in the customer file than in today's sales)
Read next primary record
primary key = secondary key : print a bill
Read next customer record
Read and print items from secondary file until the customer number changes
primary key > secondary key : typo in the secondary file or new customer,
not yet added in the customer file
Print error message (not a valid customer)
Read next secondary record
The read loop continues as long as there are records to read, that is as long as both files are not at EOF (end of file). The core part of a bigger Matching module I have written in Ruby is :
def matching(p_actionSmaller, p_actionEqual, p_actionGreater)
read_primary
read_secondary
while ! #eof_primary || ! #eof_secondary
case
when #primary_key < #secondary_key
p_actionSmaller.call(self)
read_primary
when #primary_key == #secondary_key
p_actionEqual.call(self)
read_primary
read_secondary
when #primary_key > #secondary_key
p_actionGreater.call(self)
read_secondary
end
end
end
Here is a simplified version adapted to your array problem :
# input "files" :
x = [ [2,'a2','b20'], [3, 'a3', 'b3'], [4,'a4','b4'] ]
y = [[1,'a1','b1'], [2,'a2','b2' ], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
puts '--- input --- :'
print 'x='; p x
print 'y='; p y
xh = Hash.new
yh = Hash.new
# converted to hash for easy extraction of data :
x.each do |a|
key, *value = a
xh[key] = value
end
y.each do |a|
key, *value = a
yh[key] = value
end
puts '--- as hash --- :'
print 'xh='; p xh
print 'yh='; p yh
# sort keys for matching both "files" on the same key :
#xkeys = xh.keys.sort
#ykeys = yh.keys.sort
print '#xkeys='; p #xkeys
print '#ykeys='; p #ykeys
# simplified algorithm, where EOF is replaced by HIGH_VALUE :
#x_index = -1
#y_index = -1
HIGH_VALUE = 255
def read_primary
#x_index += 1 # read next record
# The primary key is extracted from the record.
# At EOF it is replaced by HIGH_VALUE, usually x'FFFFFF'
#primary_key = #xkeys[#x_index] || HIGH_VALUE
# #xkeys[#x_index] returns nil if key does not exist, nil || H returns H
end
def read_secondary
#y_index += 1
#secondary_key = #ykeys[#y_index] || HIGH_VALUE
end
puts '--- matching --- :'
read_primary
read_secondary
while #x_index < #xkeys.length || #y_index < #ykeys.length
case
when #primary_key < #secondary_key
puts "case < : #{#primary_key} < #{#secondary_key}"
puts "x #{xh[#primary_key].inspect} has no equivalent in y"
read_primary
when #primary_key == #secondary_key
puts "case = : #{#primary_key} = #{#secondary_key}"
puts "compare #{xh[#primary_key].inspect} with #{yh[#primary_key].inspect}"
read_primary
read_secondary
when #primary_key > #secondary_key
puts "case > : #{#primary_key} > #{#secondary_key}"
puts "y #{yh[#secondary_key].inspect} has no equivalent in x"
read_secondary
end
end
Execution :
$ ruby -w t.rb
--- input --- :
x=[[2, "a2", "b20"], [3, "a3", "b3"], [4, "a4", "b4"]]
y=[[1, "a1", "b1"], [2, "a2", "b2"], [3, "a30", "b3"], [5, "a5", "b5"]]
--- as hash --- :
xh={2=>["a2", "b20"], 3=>["a3", "b3"], 4=>["a4", "b4"]}
yh={5=>["a5", "b5"], 1=>["a1", "b1"], 2=>["a2", "b2"], 3=>["a30", "b3"]}
#xkeys=[2, 3, 4]
#ykeys=[1, 2, 3, 5]
--- matching --- :
case > : 2 > 1
y ["a1", "b1"] has no equivalent in x
case = : 2 = 2
compare ["a2", "b20"] with ["a2", "b2"]
case = : 3 = 3
compare ["a3", "b3"] with ["a30", "b3"]
case < : 4 < 5
x ["a4", "b4"] has no equivalent in y
case > : 255 > 5
y ["a5", "b5"] has no equivalent in x
I leave the presentation of the differences to you.
HTH
If i had a list of balls each of which has a color property. how can i cleanly get the list of balls with the most frequent color.
[m1,m2,m3,m4]
say,
m1.color = blue
m2.color = blue
m3.color = red
m4.color = blue
[m1,m2,m4] is the list of balls with the most frequent color
My Approach is to do:
[m1,m2,m3,m4].group_by{|ball| ball.color}.each do |samecolor|
my_items = samecolor.count
end
where count is defined as
class Array
def count
k =Hash.new(0)
self.each{|x|k[x]+=1}
k
end
end
my_items will be a hash of frequencies foreach same color group. My implementation could be buggy and i feel there must be a better and more smarter way.
any ideas please?
You found group_by but missed max_by
max_color, max_balls = [m1,m2,m3,m4].group_by {|b| b.color}.max_by {|color, balls| balls.length}
Your code isn't bad, but it is inefficient. If I were you I would seek a solution that iterates through your array only once, like this:
balls = [m1, m2, m3, m4]
most_idx = nil
groups = balls.inject({}) do |hsh, ball|
hsh[ball.color] = [] if hsh[ball.color].nil?
hsh[ball.color] << ball
most_idx = ball.color if hsh[most_idx].nil? || hsh[ball.color].size > hsh[most_idx].size
hsh
end
groups[most_idx] # => [m1,m2,m4]
This does basically the same thing as group_by, but at the same time it counts up the groups and keeps a record of which group is largest (most_idx).
How about:
color,balls = [m1,m2,m3,m4].group_by { |b| b.color }.max_by(&:size)
Here's how I'd do it. The basic idea uses inject to accumulate the values into a hash, and comes from "12 - Building a Histogram" in "The Ruby Cookbook".
#!/usr/bin/env ruby
class M
attr_reader :color
def initialize(c)
#color = c
end
end
m1 = M.new('blue')
m2 = M.new('blue')
m3 = M.new('red')
m4 = M.new('blue')
hash = [m1.color, m2.color, m3.color, m4.color].inject(Hash.new(0)){ |h, x| h[x] += 1; h } # => {"blue"=>3, "red"=>1}
hash = [m1, m2, m3, m4].inject(Hash.new(0)){ |h, x| h[x.color] += 1; h } # => {"blue"=>3, "red"=>1}
There are two different ways to do it, depending on how much knowledge you want the inject() to know about your objects.
this produces a reverse sorted list of balls by frequency
balls.group_by { |b| b.color }
.map { |k, v| [k, v.size] }
.sort_by { |k, count| -count}
two parts, I'll use your strange balls example but will also include my own rails example
ary = [m1,m2,m3,m4]
colors = ary.each.map(&:color) #or ary.each.map {|t| t.color }
Hash[colors.group_by(&:w).map {|w, ws| [w, ws.length] }]
#=> {"blue" => 3, "red" => 1 }
my ActiveRecord example
stocks = Sp500Stock.all
Hash[stocks.group_by(&:sector).map {|w, s| [w, s.length] }].sort_by { |k,v| v }
#=> {"Health Care" => 36, etc]
myhash = {}
mylist.each do |ball|
if myhash[ball.color]
myhash[ball.color] += 1
else
myhash[ball.color] = 1
end
end
puts myhash.sort{|a,b| b[1] <=> a[1]}