pure ruby: calculate sparse matrix rank fast(er) - ruby

How do I speed up the rank calculation of a sparse matrix in pure ruby?
I'm currently calculating the rank of a matrix (std lib) to determine the rigidity of a graph.
That means I have a sparse matrix of about 2 rows * 9 columns to about 300 rows * 300 columns.
That translates to times of several seconds to determine the rank of the matrix, which is very slow for a GUI application.
Because I use Sketchup I am bound to Ruby 2.0.0.
I'd like to avoid the hassle of setting up gcc on windows, so nmatrix is (I think) not a good option.
Edit:
Example matrix:
[[12, -21, 0, -12, 21, 0, 0, 0, 0],
[12, -7, -20, 0, 0, 0, -12, 7, 20],
[0, 0, 0, 0, 14, -20, 0, -14, 20]]
Edit2:
I am using integers instead of floats to speed it up considerably.
I have also added a fail fast mechanism earlier in the code in order to not call the slow rank function at all.
Edit3:
Part of the code
def rigid?(proto_matrix, nodes)
matrix_base = Array.new(proto_matrix.size) { |index|
# initialize the row with 0
arr = Array.new(nodes.size * 3, 0.to_int)
proto_row = proto_matrix[index]
# ids of the nodes in the graph
node_ids = proto_row.map { |hash| hash[:id] }
# set the values of both of the nodes' positions
[0, 1].each { |i|
vertex_index = vertices.find_index(node_ids[i])
# predetermined vector associated to the node
vec = proto_row[i][:vec]
arr[vertex_index * 3] = vec.x.to_int
arr[vertex_index * 3 + 1] = vec.y.to_int
arr[vertex_index * 3 + 2] = vec.z.to_int
}
arr
}
matrix = Matrix::rows(matrix_base, false)
rank = matrix.rank
# graph is rigid if the rank of the matrix is bigger or equal
# to the amount of node coordinates minus the degrees of freedom
# of the whole graph
rank >= nodes.size * 3 - 6
end

Related

Algorithm to find some rows from a matrix, whose sum is equal to a given row

For example, here is a matrix:
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],
I want to find some rows, whose sum is equal to [4, 3, 2, 1].
The expected answer is rows: {0,1,3,4}.
Because:
[1, 0, 0, 0] + [1, 1, 0, 0] + [1, 1, 1, 0] + [1, 1, 1, 1] = [4, 3, 2, 1]
Is there some famous or related algrithoms to resolve the problem?
Thank #sascha and #N. Wouda for the comments.
To clarify it, here I provide some more details.
In my problem, the matrix will have about 50 rows and 25 columns. But echo row will just have less than 4 elements (other is zero). And every solution has 8 rows.
If I try all combinations, c(8, 50) is about 0.55 billion times of attempt. Too complex. So I want to find a more effective algrithom.
If you want to make the jump to using a solver, I'd recommend it. This is a pretty straightforward Integer Program. Below solutions use python, python's pyomo math programming package to formulate the problem, and COIN OR's cbc solver for Integer Programs and Mixed Integer Programs, which needs to be installed separately (freeware) available: https://www.coin-or.org/downloading/
Here is the an example with your data followed by an example with 100,000 rows. The example above solves instantly, the 100,000 row example takes about 2 seconds on my machine.
# row selection Integer Program
import pyomo.environ as pyo
data1 = [ [1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],]
data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
model = pyo.ConcreteModel()
# sets
model.I = pyo.Set(initialize=range(len(data1))) # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0]))) # a simple column index
# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict) # hold the sparse matrix of values
magic_sum = [4, 3, 2, 1 ]
# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j]
model.c1 = pyo.Constraint(model.J, rule=min_sum)
# objective function
# minimze the overage
def objective(model):
delta = 0
for j in model.J:
delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j]
return delta
model.OBJ = pyo.Objective(rule=objective)
model.pprint() # verify everything
solver = pyo.SolverFactory('cbc') # need to have cbc solver installed
result = solver.solve(model)
result.write() # solver details
model.row_select.display() # output
Output:
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
User time: -1.0
System time: 0.0
Wallclock time: 0.0
Termination condition: optimal
Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Black box:
Number of iterations: 0
Error rc: 0
Time: 0.01792597770690918
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
row_select : Size=5, Index=I
Key : Lower : Value : Upper : Fixed : Stale : Domain
0 : 0 : 1.0 : 1 : False : False : Boolean
1 : 0 : 1.0 : 1 : False : False : Boolean
2 : 0 : 0.0 : 1 : False : False : Boolean
3 : 0 : 1.0 : 1 : False : False : Boolean
4 : 0 : 1.0 : 1 : False : False : Boolean
A more stressful rendition with 100,000 rows:
# row selection Integer Program stress test
import pyomo.environ as pyo
import numpy as np
# make a large matrix 100,000 x 8
data1 = np.random.randint(0, 1000, size=(100_000, 8))
# inject "the right answer into 3 rows"
data1[42602] = [8, 0, 0, 0, 0, 0, 0, 0 ]
data1[3] = [0, 0, 0, 0, 4, 3, 2, 1 ]
data1[10986] = [0, 7, 6, 5, 0, 0, 0, 0 ]
data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
model = pyo.ConcreteModel()
# sets
model.I = pyo.Set(initialize=range(len(data1))) # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0]))) # a simple column index
# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict) # hold the sparse matrix of values
magic_sum = [8, 7, 6, 5, 4, 3, 2, 1 ]
# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j]
model.c1 = pyo.Constraint(model.J, rule=min_sum)
# objective function
# minimze the overage
def objective(model):
delta = 0
for j in model.J:
delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j]
return delta
model.OBJ = pyo.Objective(rule=objective)
solver = pyo.SolverFactory('cbc')
result = solver.solve(model)
result.write()
print('\n\n======== row selections =======')
for i in model.I:
if model.row_select[i].value > 0:
print (f'row {i} selected')
Output:
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
User time: -1.0
System time: 2.18
Wallclock time: 2.61
Termination condition: optimal
Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Black box:
Number of iterations: 0
Error rc: 0
Time: 2.800779104232788
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
======== row selections =======
row 3 selected
row 10986 selected
row 42602 selected
This one picks and not picks an element (recursivly). As soon as the tree is impossible to solve (no elements left or any target value negative) it will return false. In case the sum of the target is 0 a solution is found and returned in form of the picked elements.
Feel free to add time and memory complexity in the comments. Worst case should be 2^(n+1)
Please let me know how it performs on your 8/50 data.
const elements = [
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1]
];
const target = [4, 3, 2, 1];
let iterations = 0;
console.log(iter(elements, target, [], 0));
console.log(`Iterations: ${iterations}`);
function iter(elements, target, picked, index) {
iterations++;
const sum = target.reduce(function(element, sum) {
return sum + element;
});
if (sum === 0) return picked;
if (elements.length === 0) return false;
const result = iter(
removeElement(elements, 0),
target,
picked,
index + 1
);
if (result !== false) return result;
const newTarget = matrixSubtract(target, elements[0]);
const hasNegatives = newTarget.some(function(element) {
return element < 0;
});
if (hasNegatives) return false;
return iter(
removeElement(elements, 0),
newTarget,
picked.concat(index),
index + 1
);
}
function removeElement(target, i) {
return target.slice(0, i).concat(target.slice(i + 1));
}
function matrixSubtract(minuend, subtrahend) {
let i = 0;
return minuend.map(function(element) {
return minuend[i] - subtrahend[i++]
});
}

Is there a common name for a function that maps by an index?

Names such as map, filter or sum are generally understood by every resonably good programmer.
I wonder whether the following function f also has such a standard name:
def f(data, idx): return [data[i] for i in idx]
Example usages:
r = f(['world', '!', 'hello'], [2, 0, 1, 1, 1])
piecePrice = [100, 50, 20, 180]
pieceIdx = [0, 2, 3, 0, 0]
total Price = sum(f(piecePrice, pieceIdx))
I started with map, but map is generally understood as a function that applies a function on each element of a list.

Sort algorithms with time complexity in the order of number of elements [duplicate]

This question already has answers here:
How to sort a list with given range in O(n)
(4 answers)
Is there an O(n) integer sorting algorithm?
(6 answers)
Closed 5 years ago.
I am looking for an O(n) sort algorithm where n is the number of elements to sort. I know that highly optimized sorting algorithms are O(n log n) but I was told that under the following condition we can do better. The condition is:
We are sorting numbers in a small enough range, say 0 to 100.
Say we have the following
unsortedArray = [4, 3, 4, 2]
Here is the algorithm:
Step 1) Iterate over the unsortedArray and use each element as the index into a new array we call countingArray. The value we will hold in each position is the count of times that that number appears. Each time we access a position we increment it by 1.
countingArray = [0, 0, 0, 0, 0, ..., 0, 0, 0, 0] // before iteration
countingArray = [0, 0, 0, 0, 1, ..., 0, 0, 0, 0] // after handling 4
countingArray = [0, 0, 0, 1, 1, ..., 0, 0, 0, 0] // after handling 3
countingArray = [0, 0, 0, 1, 2, ..., 0, 0, 0, 0] // after the second 4
countingArray = [0, 0, 1, 1, 2, ..., 0, 0, 0, 0] // after handling 2
We can allocate countingArray in advance because the range of the numbers we wish to sort is limited and known a-priori. In your example countingArray will have 101 elements.
Time complexity of this step is O(n) because you are iterating over n elements from unsortedArray. Inserting them into countingArray has constant time complexity.
Step 2) As shown in the example above countingArray is going to have positions with value 0 where there were no numbers to count in unsortedArray. We are going to skip these positions in the following iteration we will describe.
In countingArray non-zero positions define a number that we want to sort, and the content in that position define the count of how many times that number should appear in the final sortedArray.
We iterate over countingArray and starting at the first position of sortedArray put that number into count number of adjacent positions. This builds sortedArray and takes O(n).
countingArray = [0, 0, 1, 1, 2, ..., 0, 0, 0, 0]
// After skipping the first 2 0s and seeing a count of 1 in position 2
sortedArray = [2, 0, 0, 0]
// After seeing a count of 1 in position 3
sortedArray = [2, 3, 0, 0]
// In position 4 we have a count of 2 so we fill 4 in 2 positions
sortedArray = [2, 3, 4, 4]
=======
Total time complexity is O(n) * 2 = O(n)

Quantizing an array so that a subset of quantized values is still consistently quantized

Given an array of ints I want to quantize each value so that the sum of quantized values is 100. Each quantized value should also be an integer. This works when the whole array is quantized, but when a subset of quantized values is added up it doesn't remain quantized with respect to the rest of the values.
For example, the values 44, 40, 7, 2, 0, 0 are quantized to 47, 43, 8, 2, 0, 0 (the sum of which is 100). If you take the last 4 quantized values the sum is 53 which is consistent with the first value (i.e. 47 + 53 = 100).
But with the values 78, 7, 7, 1, 0, 0, the sum of the last 4 quantized values (8, 8, 1, 0, 0) is 17. The first quantized value is 84 which when added to 17 does not equal 100. Clearly the reason for this is due to the rounding. Is there a way to adjust the rounding so that subsets are still consistent?
Here is the Ruby code:
class Quantize
def initialize(array)
#array = array.map { |a| a.to_i }
end
def values
#array.map { |a| quantize(a) }
end
def sub_total(i, j)
#array[i..j].map { |a| quantize(a) }.reduce(:+)
end
private
def quantize(val)
(val * 100.0 / total).round(0)
end
def total
#array.reduce(:+)
end
end
And the (failing) tests:
require 'quantize'
describe Quantize do
context 'first example' do
let(:subject) { described_class.new([44, 40, 7, 2, 0, 0]) }
context '#values' do
it 'quantizes array to add up to 100' do
expect(subject.values).to eq([47, 43, 8, 2, 0, 0])
end
end
context '#sub_total' do
it 'adds a subset of array' do
expect(subject.sub_total(1, 5)).to eq(53)
end
end
end
context 'second example' do
let(:subject) { described_class.new([78, 7, 7, 1, 0, 0]) }
context '#values' do
it 'quantizes array to add up to 100' do
expect(subject.values).to eq([84, 8, 8, 1, 0, 0])
end
end
context '#sub_total' do
it 'adds a subset of array' do
expect(subject.sub_total(1, 5)).to eq(16)
end
end
end
end
As noted in the comments on the question, the quantization routine does not perform correctly: the second example [78, 7, 7, 1, 0, 0] is quantized as [84, 8, 8, 1, 0, 0] — which adds to 101 and not to 100.
Here is an approach that will yield correct results:
def quantize(array, value)
quantized = array.map(&:to_i)
total = array.reduce(:+)
remainder = value - total
index = 0
if remainder > 0
while remainder > 0
quantized[index] += 1
remainder -= 1
index = (index + 1) % quantized.length
end
else
while remainder < 0
quantized[index] -= 1
remainder += 1
index = (index + 1) % quantized.length
end
end
quantized
end
This solves your problem, as stated in the question. The troublesome result becomes [80, 8, 8, 2, 1, 1], which adds to 100 and maintains the subset relationship that you described. The solution can, of course, be made more performant — but it has the advantage of working and being dead simple to understand.

Mark M cells on a NxN board randomly with equal probability [duplicate]

This question already has answers here:
Algorithm to select a single, random combination of values?
(7 answers)
Closed 8 years ago.
An interview question:
Given a NxN board with all cells set to 0, mark M (M < NxN) cells to 1. The M cells should be chosen from all cells with equal probability.
E.g. Mark 30 cells in a 10x10 board, then the probability for a cell to be chosen is 0.3.
My idea is to iterate all cells and on each cell compute a random number in range [1-100], mark the cell to 1 if the number is less than or equal to 30.
The interviewer is not impressed by this solution. Any good idea? (You can use any language)
Put 70 zeros (NxN - M) and 30 ones (M) into a vector. Shuffle the vector. Iterate through and map each index k to 2-d indices via i = k / 10 and j = k % 10 for your example (use N as the divisor more generally).
ADDENDUM
After checking out #candu's link, I decided to give that approach a try. Here's an implementation in Ruby:
require 'set'
# implementation of Floyd's uniform subset algorithm for
# values in the range [0,n).
def generateMfromN(m, n)
s = Set.new
((n-m)...n).each {|j| s.add?(rand(j+1)) || s.add(j)}
s.to_a
end
#initialize a 10x10 array of zeros
a = Array.new(10)
10.times {|i| a[i] = Array.new(10,0)}
# create an array of 10 random indices between 0 and 99,
# map each index to 2-d indices, and set the corresponing
# element to 1.
generateMfromN(10,100).each {|index| a[index/10][index%10] = 1}
# show the results
a.each {|v| puts v.to_s}
This produces results such as...
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
and appears to require only O(M) work for Floyd's algorithm, since on each of M iterations an element always gets added to the set.
If M is bigger than N*N/2, initialize the array with 1's and randomize placement of zeros instead, as suggested by #btilly.
This can be done in expected running time O(m).
First let's deal with the case where we need at most half the board. So m <= n*n/2. For this case we can keep choosing random points and changing their values, throwing away and we chose before, until we have m of them. The probability of throwing away the next random choice is never more than half, so the number of random choices needed is at worst 2 m = O(m).
In the case where we need more than half the board, it takes time O(m) to flip every cell to 1, and then we use the previous solution to find n*n - m cells to turn back to 0.

Resources