Ruby: identify differences in two arrays with multiple fields - ruby

Two files with the same structure (first file = unique field/index)
File X
1,'a1','b1'
2,'a2','b20'
3,'a3','b3'
4,'a4','b4'
File Y
1,'a1','b1'
2,'a2','b2'
3,'a30','b3'
5,'a5','b5'
Goal: identify differences between these files. There are a lot of fields to compare in each file.
Requested output (maybe there is a better way to present it):
Index X:a X:b Y:a Y:b Result
===== === === === === ======
1 a1 b1 a1 b1 No diff
2 a2 b20 a2 b2 Diff in field b (Xb=b20, Yb=b2)
3 a3 b3 a30 b3 Diff in field a (Xa=a3, Ya=a30
4 a4 b4 null null missing entries in file Y
5 null null a5 b5 missing entries in file X
Ruby code - what I have so far:
x = [[1,'a1','b1'], [2,'a2','b20'], [3, 'a3', 'b3'], [4, 'a4', 'b4']]
y = [[1,'a1','b1'], [2,'a2','b2'], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
h = Hash.new(0)
x.each {|e|
h[e[0]] = 1
}
y.each {|e|
h[e[0]] = 1
}
x.each {|e|
p e[0]
}
I already have all keys (index) from both arrays in hash = h
It seems to be some kind of SQL join using index as a common key.
Can you give me some direction on how to iterate over both arrays to find the differences?

The problem of comparing two files is old. At the time of punched cards, forty years ago, we already had to solve it to print bills for items sold every day. One file was the customer file (primary file), the second was the deck of cards punched from delivery forms (secondary file). Each record (card) in this secondary file contained both the customer number and the item number. Both files were sorted on customer number, and the algorithm was called matching. It consists of reading one record from each file, comparing the common key, and selecting one of three possible cases :
primary key < secondary key : skip this customer (normal, there are
more customers in the customer file than in today's sales)
Read next primary record
primary key = secondary key : print a bill
Read next customer record
Read and print items from secondary file until the customer number changes
primary key > secondary key : typo in the secondary file or new customer,
not yet added in the customer file
Print error message (not a valid customer)
Read next secondary record
The read loop continues as long as there are records to read, that is as long as both files are not at EOF (end of file). The core part of a bigger Matching module I have written in Ruby is :
def matching(p_actionSmaller, p_actionEqual, p_actionGreater)
read_primary
read_secondary
while ! #eof_primary || ! #eof_secondary
case
when #primary_key < #secondary_key
p_actionSmaller.call(self)
read_primary
when #primary_key == #secondary_key
p_actionEqual.call(self)
read_primary
read_secondary
when #primary_key > #secondary_key
p_actionGreater.call(self)
read_secondary
end
end
end
Here is a simplified version adapted to your array problem :
# input "files" :
x = [ [2,'a2','b20'], [3, 'a3', 'b3'], [4,'a4','b4'] ]
y = [[1,'a1','b1'], [2,'a2','b2' ], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
puts '--- input --- :'
print 'x='; p x
print 'y='; p y
xh = Hash.new
yh = Hash.new
# converted to hash for easy extraction of data :
x.each do |a|
key, *value = a
xh[key] = value
end
y.each do |a|
key, *value = a
yh[key] = value
end
puts '--- as hash --- :'
print 'xh='; p xh
print 'yh='; p yh
# sort keys for matching both "files" on the same key :
#xkeys = xh.keys.sort
#ykeys = yh.keys.sort
print '#xkeys='; p #xkeys
print '#ykeys='; p #ykeys
# simplified algorithm, where EOF is replaced by HIGH_VALUE :
#x_index = -1
#y_index = -1
HIGH_VALUE = 255
def read_primary
#x_index += 1 # read next record
# The primary key is extracted from the record.
# At EOF it is replaced by HIGH_VALUE, usually x'FFFFFF'
#primary_key = #xkeys[#x_index] || HIGH_VALUE
# #xkeys[#x_index] returns nil if key does not exist, nil || H returns H
end
def read_secondary
#y_index += 1
#secondary_key = #ykeys[#y_index] || HIGH_VALUE
end
puts '--- matching --- :'
read_primary
read_secondary
while #x_index < #xkeys.length || #y_index < #ykeys.length
case
when #primary_key < #secondary_key
puts "case < : #{#primary_key} < #{#secondary_key}"
puts "x #{xh[#primary_key].inspect} has no equivalent in y"
read_primary
when #primary_key == #secondary_key
puts "case = : #{#primary_key} = #{#secondary_key}"
puts "compare #{xh[#primary_key].inspect} with #{yh[#primary_key].inspect}"
read_primary
read_secondary
when #primary_key > #secondary_key
puts "case > : #{#primary_key} > #{#secondary_key}"
puts "y #{yh[#secondary_key].inspect} has no equivalent in x"
read_secondary
end
end
Execution :
$ ruby -w t.rb
--- input --- :
x=[[2, "a2", "b20"], [3, "a3", "b3"], [4, "a4", "b4"]]
y=[[1, "a1", "b1"], [2, "a2", "b2"], [3, "a30", "b3"], [5, "a5", "b5"]]
--- as hash --- :
xh={2=>["a2", "b20"], 3=>["a3", "b3"], 4=>["a4", "b4"]}
yh={5=>["a5", "b5"], 1=>["a1", "b1"], 2=>["a2", "b2"], 3=>["a30", "b3"]}
#xkeys=[2, 3, 4]
#ykeys=[1, 2, 3, 5]
--- matching --- :
case > : 2 > 1
y ["a1", "b1"] has no equivalent in x
case = : 2 = 2
compare ["a2", "b20"] with ["a2", "b2"]
case = : 3 = 3
compare ["a3", "b3"] with ["a30", "b3"]
case < : 4 < 5
x ["a4", "b4"] has no equivalent in y
case > : 255 > 5
y ["a5", "b5"] has no equivalent in x
I leave the presentation of the differences to you.
HTH

Related

Ruby select multiple elements from a .split

String:
string = "this;is;a;string;yes"
I can split the string and append each element to an array like this
arr = []
string.split(";").each do |x|
arr << x
end
Is there an easy way to take the first third and fourth values other than something like this.
x = 0
string.split(";").each do |x|
if x == 0 or x == 2 or x == 3 then arr << x end
x += 1
end
Sure. Use Array#values_at:
string = "this;is;a;string;yes"
string.split(";").values_at(0, 2, 3)
# => ["this", "a", "string"]
See it on repl.it: https://repl.it/#jrunning/FussyRecursiveSpools

Why am I getting an IndexError

I'm trying to write some code that will take an array of numbers and print a string representation of the range of the numbers.
def rng (arr)
str = arr[0].to_s
idx = 1
arr.each do |i|
next if arr.index(i) == 0
if arr[arr.index(i)-1] == i - 1
unless str[idx - 1] == "-"
str[idx] = "-"
#else next
end
#puts "if statement str: #{str}, idx: #{idx}"
else
str[idx] = arr[arr.index(i)-1].to_s
idx += 1
str[idx] = ","+ i.to_s
end
idx += 1
end
puts "str = #{str} and idx = #{idx}"
end
rng [0, 1, 2, 3, 8] #"0-3, 8"
I get this error:
arrayRange_0.rb:9:in `[]=': index 3 out of string (IndexError)
Can anyone explain why? When I uncomment the else next it works. Not sure why.
When you get that error, str contains the value 0- which is only 2 characters long - therefore it can't be indexed to the position of 3.
Add this line before line 9, which is causing your error:
puts "str = #{str}, idx = #{idx}"
It will output:
str = 0, idx = 1
str = 0-, idx = 3
Here is how you could do it:
def rng(arr)
ranges = []
arr.each do |v|
if ranges.last && ranges.last.include?(v-1)
# If this is the next consecutive number
# store it in the second element
ranges.last[1] = v
else
# Add a new array with current value as the first number
ranges << [v]
end
end
# Make a list of strings from the ranges
# [[0,3], [8]] becomes ["0-3", "8"]
range_strings = ranges.map{|range| range.join('-') }
range_strings.join(', ')
end
p rng [0, 1, 2, 3, 8]
# results in "0-3, 8"
Like the previous answer says, your index is outside of the string

I have a issue.. Appending

I have this code:
1 #!/local/usr/bin/ruby
2
3 users = (1..255).to_a
4
5 x = " "
6 y = " "
7 z = " "
8 #a = " "
9
10 count = 1
11 users.each do |i|
12 x << i if count == 1
13 y << i if count == 2
14 z << i if count == 3
15 # if x.length == 60
16 # a << i if count == 1
17 # a << i if count == 2
18 # a << i if count == 3
19 # else
20 # end
21 if count == 3
22 count = 1
23 else
24 count += 1
25 end
26 end
27
28 puts x.length
29 puts y.length
30 puts z.length
31 #puts a.length
32
What this code does is append The numbers 1-255 into three different strings and outputs how many numbers are in each string.
IT WORKS
Example of working code:
[user#server ruby]$ ruby loadtest.rb
86
86
86
[user#server ruby]$
Now what I want it to do is have a failsafe called a as seen above, commented out, What I want is this, if each string contains 60 numbers I want it to append into the a string until there are no more numbers.
When I try to do it with the commented out section it outputs this:
[user#server ruby]$ ruby loadtest.rb
86
86
86
4
[user#server ruby]$ ruby loadtest.rb
WHY?! What am I doing wrong?
What this code does is append The numbers 1-255 into three different strings and outputs how many numbers are in each string.
After reducing the number of values being iterated for readability, here's what it's doing:
users = (1..5).to_a
x = " "
y = " "
z = " "
count = 1
users.each do |i|
x << i if count == 1 # => " \u0001", nil, nil, " \u0001\u0004", nil
y << i if count == 2 # => nil, " \u0002", nil, nil, " \u0002\u0005"
z << i if count == 3 # => nil, nil, " \u0003", nil, nil
if count == 3
count = 1
else
count += 1
end
end
x # => " \u0001\u0004"
y # => " \u0002\u0005"
z # => " \u0003"
puts x.length
puts y.length
puts z.length
# >> 3
# >> 3
# >> 2
Your code is creating binary inside the strings, not "numbers" as we normally think of them, as digits.
Moving on, you can clean up your logic using each_with_index and case/when. To make the results more readable I switched from accumulating into strings into arrays:
users = (1..5).to_a
x = []
y = []
z = []
users.each_with_index do |i, count|
case count % 3
when 0
x << i
when 1
y << i
when 2
z << i
end
end
x # => [1, 4]
y # => [2, 5]
z # => [3]
puts x.length
puts y.length
puts z.length
# >> 2
# >> 2
# >> 1
The real trick in this is the use of %, which does a modulo on the value.
... if each string contains 60 numbers I want it to append into the a string until there are no more numbers
As written, you are unconditionally appending to x,y,z even after they hit your limit.
You need to add a conditional around this code:
x << i if count == 1
y << i if count == 2
z << i if count == 3
so that it stops appending once it hits your limit.
By the looks of the else block that does nothing, I think you were headed in that direction:
if x.length == 60
a << i if count == 1
a << i if count == 2
a << i if count == 3
else
x << i if count == 1
y << i if count == 2
z << i if count == 3
end
Even that, though, won't do exactly what you want.
You'll want to check the string you are appending to to see if it has hit your limit yet.
I'd suggest refactoring to make it cleaner:
users.each do |i|
target_string = case count
when 1 then x
when 2 then y
when 3 then z
end
target_string = a if target_string.length == 60
target_string << i
if count == 3
count = 1
else
count += 1
end
end
It may be better to use an array instead of string as you are pushing numbers into those variables.
Let me propose a solution which achieves more or less what you are trying to do, but uses few Ruby tricks that may be useful in future.
x, y, z = r = Array.new(3) {[]}
a = []
iter = [0,1,2].cycle
(1..255).each do |i|
r.all? {|i| i.size == 60} ? a << i : r[iter.next] << i
end
p x.size, y.size, z.size
p a.size
Let's define our arrays. Even though I have arrays x, y, and z, they are there only because they were present in your code - I think we just need three arrays, each of which would collect numbers as they are picked from a range of numbers - between 1 to 255 - one by one. x,y,z = r uses parallel assignment technique and is equivalent to x,y,z = r[0],r[1],r[2]. Also, use of Array.new(3) {[]} helps in creating the Array of Array such that when we access r[1] it is initialized with empty array([]) by default.
x, y, z = r = Array.new(3) {[]}
a = []
In order to determine which array the next number picked from range has to be placed in, we will use an Enumerator generated from Enumerable#cycle. This enumerator is special - because it is soft of infinite in nature - and we can keep asking it to give an element by calling next, and it will cycle through the array elements of [0,1,2] - returning us 0,1,2,0,1,2,0,1,2... infinitely.
iter = [0,1,2].cycle
Next, we will iterate through the range of numbers 1..255. During each iteration, we will check whether all the 3 arrays in which we are collecting number have desired size of 60 with the help of Enumerable#all? - if so, we will append the number to array a - else we will assign it to one of the sub arrays of r based on the array index returned by iter enumerator.
(1..255).each do |i|
r.all? {|i| i.size == 60} ? a << i : r[iter.next] << i
end
Finally, we print the size of each of the array.
p x.size, y.size, z.size
#=> 60, 60, 60
p a.size
#=> 75

Double index numbers in array (Ruby)

The array counts is as follows:
counts = ["a", 1]
What does this:
counts[0][0]
refer to?
I've only seen this before:
array[idx]
but never this:
array[idx][idx]
where idx is an integer.
This is the entire code where the snippet of code before was from:
def num_repeats(string) #abab
counts = [] #array
str_idx = 0
while str_idx < string.length #1 < 4
letter = string[str_idx] #b
counts_idx = 0
while counts_idx < counts.length #0 < 1
if counts[counts_idx][0] == letter #if counts[0][0] == b
counts[counts_idx][1] += 1
break
end
counts_idx += 1
end
if counts_idx == counts.length #0 = 0
# didn't find this letter in the counts array; count it for the
# first time
counts.push([letter, 1]) #counts = ["a", 1]
end
str_idx += 1
end
num_repeats = 0
counts_idx = 0
while counts_idx < counts.length
if counts[counts_idx][1] > 1
num_repeats += 1
end
counts_idx += 1
end
return counts
end
The statement
arr[0]
Gets the first item of the array arr, in some cases this may also be an array (Or another indexable object) this means you can get that object and get an object from that array:
# if arr = [["item", "another"], "last"]
item = arr[0]
inner_item = item[0]
puts inner_item # => "item"
This can be shortened to
arr[0][0]
So any 2 dimensional array or array containing indexable objects can work like this, e.g. with an array of strings:
arr = ["String 1", "Geoff", "things"]
arr[0] # => "String 1"
arr[0][0] # => "S"
arr[1][0] # => "G"
It's for nested indexing
a = [ "item 0", [1, 2, 3] ]
a[0] #=> "item 0"
a[1] #=> [1, 2, 3]
a[1][0] #=> 1
Since the value at index 1 is another array you can use index referencing on that value as well.
EDIT
Sorry I didn't thoroughly read the original question. The array in question is
counts = ["a", 1]
In this case counts[0] returns "a" and since we can use indexes to references characters of a string, the 0th index in the string "a" is simply "a".
str = "hello"
str[2] #=> "l"
str[1] #=> "e"

Printing a readable Matrix in Ruby

Is there a built in way of printing a readable matrix in Ruby?
For example
require 'matrix'
m1 = Matrix[[1,2], [3,4]]
print m1
and have it show
=> 1 2
3 4
in the REPL instead of:
=> Matrix[[1,2][3,4]]
The Ruby Docs for matrix make it look like that's what should show happen, but that's not what I'm seeing. I know that it would be trivial to write a function to do this, but if there is a 'right' way I'd rather learn!
You could convert it to an array:
m1.to_a.each {|r| puts r.inspect}
=> [1, 2]
[3, 4]
EDIT:
Here is a "point free" version:
puts m1.to_a.map(&:inspect)
I couldn't get it to look like the documentation so I wrote a function for you that accomplishes the same task.
require 'matrix'
m1 = Matrix[[1,2],[3,4],[5,6]]
class Matrix
def to_readable
i = 0
self.each do |number|
print number.to_s + " "
i+= 1
if i == self.column_size
print "\n"
i = 0
end
end
end
end
m1.to_readable
=> 1 2
3 4
5 6
Disclaimer: I'm the lead developer for NMatrix.
It's trivial in NMatrix. Just do matrix.pretty_print.
The columns aren't cleanly aligned, but that'd be easy to fix and we'd love any contributions to that effect.
Incidentally, nice to see a fellow VT person on here. =)
You can use the each_slice method combined with the column_size method.
m1.each_slice(m1.column_size) {|r| p r }
=> [1,2]
[3,4]
Ok, I'm a total newbie in ruby programming. I'm just making my very first incursions, but it happens I got the same problem and made this quick'n'dirty approach.
Works with the standard Matrix library and will print columns formatted with same size.
class Matrix
def to_readable
column_counter = 0
columns_arrays = []
while column_counter < self.column_size
maximum_length = 0
self.column(column_counter).each do |column_element|# Get maximal size
length = column_element.to_s.size
if length > maximal_length
maximum_length = length
end
end # now we've got the maximum size
column_array = []
self.column(column_counter).each do |column_element| # Add needed spaces to equalize each column
element_string = column_element.to_s
element_size = element_string.size
space_needed = maximal_length - element_size +1
if space_needed > 0
space_needed.times {element_string.prepend " "}
if column_counter == 0
element_string.prepend "["
else
element_string.prepend ","
end
end
column_array << element_string
end
columns_arrays << column_array # Now columns contains equal size strings
column_counter += 1
end
row_counter = 0
while row_counter < self.row_size
columns_arrays.each do |column|
element = column[row_counter]
print element #Each column yield the correspondant row in order
end
print "]\n"
row_counter += 1
end
end
end
Any correction or upgrades welcome!
This is working for me
require 'matrix'
class Matrix
def print
matrix = self.to_a
field_size = matrix.flatten.collect{|i|i.to_s.size}.max
matrix.each do |row|
puts (row.collect{|i| ' ' * (field_size - i.to_s.size) + i.to_s}).join(' ')
end
end
end
m = Matrix[[1,23,3],[123,64.5, 2],[0,0,0]]
m.print
Here is my answer:
require 'matrix'
class Matrix
def to_pretty_s
s = ""
i = 0
while i < self.column_size
s += "\n" if i != 0
j = 0
while j < self.row_size
s += ' ' if j != 0
s += self.element(i, j).to_s
j += 1
end
i += 1
end
s
end
end
m = Matrix[[0, 3], [3, 4]]
puts m # same as 'puts m.to_s'
# Matrix[[0, 3], [3, 4]]
puts m.to_pretty_s
# 0 3
# 3 4
p m.to_pretty_s
# "0 3\n3 4"
You could use Matrix#to_pretty_s to get a pretty string for format.
There is no inbuilt Ruby way of doing this. However, I have created a Module which can be included into Matrix that includes a method readable. You can find this code here, but it is also in the following code block.
require 'matrix'
module ReadableArrays
def readable(factor: 1, method: :rjust)
repr = to_a.map { |row|
row.map(&:inspect)
}
column_widths = repr.transpose.map { |col|
col.map(&:size).max + factor
}
res = ""
repr.each { |row|
row.each_with_index { |el, j|
res += el.send method, column_widths[j]
}
res += "\n"
}
res.chomp
end
end
## example usage ##
class Matrix
include ReadableArrays
end
class Array
include ReadableArrays
end
arr = [[1, 20, 3], [20, 3, 19], [-32, 3, 5]]
mat = Matrix[*arr]
p arr
#=> [[1, 20, 3], [20, 3, 19], [-2, 3, 5]]
p mat
#=> Matrix[[1, 20, 3], [20, 3, 19], [-2, 3, 5]]
puts arr.readable
#=>
# 1 20 3
# 20 3 19
# -32 3 5
puts mat.readable
#=>
# 1 20 3
# 20 3 19
# -32 3 5
puts mat.readable(method: :ljust)
#=>
# 1 20 3
# 20 3 19
# -32 3 5
puts mat.readable(method: :center)
#=>
# 1 20 3
# 20 3 19
# -32 3 5
I had this problem just yet and haven't seen anyone posting it here, so I will put my solution if it helps someone. I know 2 for loops are not the best idea, but for smaller matrix it should be okay, and it prints beautifully and just how you want it, also without of use of require 'matrix' nor 'pp'
matrix = Array.new(numRows) { Array.new(numCols) { arrToTakeValuesFrom.sample } }
for i in 0..numRows-1 do
for j in 0..numCols-1 do
print " #{matrix[i][j]} "
end
puts ""
end

Resources