What would be the idiomatic way of extracting a submatrix from a matrix in ruby.
I have a matrix like, this is an object of Matrix
[131, 673, 234, 103, 18]
[201, 96, 342, 965, 150]
[630, 803, 746, 422, 111]
[537, 699, 497, 121, 956]
[805, 732, 524, 37, 331]
I'm looking for a method with a signature like
matrix.submatrix(1,1) this should return
[96, 342, 965, 150]
[803, 746, 422, 111]
[699, 497, 121, 956]
[732, 524, 37, 331]
matrix.submatrix(2,2) would return
[746, 422, 111]
[497, 121, 956]
[524, 37, 331]
I browsed through the rubydoc but couldn't find any method that would give me what I wanted. How would I do this in ruby?
For a 2D array I have come up with
def submatrix(matrix)
submatrix = matrix.collect{|row| row.slice(1..-1)}
# Pop off the first row
submatrix[1..-1]
end
I am wondering if I should reinvent the wheel or could I use something from the Matrix class.
Take a look at Matrix#minor:
a = [[131, 673, 234, 103, 18],
[201, 96, 342, 965, 150],
[630, 803, 746, 422, 111],
[537, 699, 497, 121, 956],
[805, 732, 524, 37, 331]]
m = Matrix[*a]
m1 = m.minor(1..4, 1..4)
=> Matrix[[96, 342, 965, 150], [803, 746, 422, 111],
[699, 497, 121, 956], [732, 524, 37, 331]]
m2 = m1.minor(1..3, 1..3)
=> Matrix[[746, 422, 111], [497, 121, 956], [524, 37, 331]]
You can also do:
m1 = m.minor(1..-1, 1..-1)
m2 = m1.minor(1..-1, 1..-1)
Or:
class Matrix
def submatrix(x, y)
self.minor(x..-1, y..-1)
end
end
m.submatrix(2, 2)
=> Matrix[[746, 422, 111], [497, 121, 956], [524, 37, 331]]
If you are using Ruby 2.2.0 or later, you can use Matrix#first_minor, which removes a specified row and column. I'm not sure how efficient it is, but here is some code that solves your problem:
require 'matrix'
def my_submatrix(matrix, n)
matrix = matrix.first_minor(0, 0) while matrix.row_count > n
matrix
end
m = Matrix[[131, 673, 234, 103, 18],
[201, 96, 342, 965, 150],
[630, 803, 746, 422, 111],
[537, 699, 497, 121, 956],
[805, 732, 524, 37, 331]]
p my_submatrix(m, 3)
# => Matrix[[746, 422, 111], [497, 121, 956], [524, 37, 331]]
Here are a couple of ways, one using a Matrix object, the other just manipulating an array.
Manipulating a Matrix object
Code
require 'matrix'
def doit(matrix,i,j)
selection_matrix(matrix.row_count,i) * matrix *
selection_matrix(matrix.row_size,j)
end
def selection_matrix(n,m)
Matrix.diagonal(*(0...n).map { |i| (i<m) ? 0 : 1 })
end
Use doit(matrix,i,j).to_a to return an Array object.
Examples
a = [[131, 673, 234, 103, 18],
[201, 96, 342, 965, 150],
[630, 803, 746, 422, 111],
[537, 699, 497, 121, 956],
[805, 732, 524, 37, 331]]
matrix = Matrix[*a]
doit(matrix,2,2)
#=> Matrix[[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0],
# [0, 0, 746, 422, 111],
# [0, 0, 497, 121, 956],
# [0, 0, 524, 37, 331]]
doit(matrix,1,1)
#=> Matrix[[0, 0, 0, 0, 0],
# [0, 96, 342, 965, 150],
# [0, 803, 746, 422, 111],
# [0, 699, 497, 121, 956],
# [0, 732, 524, 37, 331]]
Explanation
selection_matrix(n,m) returns a diagonal matrix whose diagonal elements are ones and zeros, the ones stabing out the appropriate rows or columns of the matrix. The matrix is pre-multiplied (post-multiplied) by a diagonal matrix whose order equals the number of rows (columns) of the matrix.
selection_matrix(5,2)
#=> Matrix[[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0],
# [0, 0, 1, 0, 0],
# [0, 0, 0, 1, 0],
# [0, 0, 0, 0, 1]]
a = selection_matrix(5,2) * matrix
#=> Matrix[[ 0, 0, 0, 0, 0],
# [ 0, 0, 0, 0, 0],
# [630, 803, 746, 422, 111],
# [537, 699, 497, 121, 956],
# [805, 732, 524, 37, 331]]
b = a * selection_matrix(5,2)
#=> Matrix[[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0],
# [0, 0, 746, 422, 111],
# [0, 0, 497, 121, 956],
# [0, 0, 524, 37, 331]]
b.to_a
#=> [[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0],
# [0, 0, 746, 422, 111],
# [0, 0, 497, 121, 956],
# [0, 0, 524, 37, 331]]
Manipulating an Array object
Without creating a Matrix object, you could just do this:
Code
def doit(a,i,j)
a[i..-1].transpose[j..-1].transpose
end
Examples
a = [[131, 673, 234, 103, 18],
[201, 96, 342, 965, 150],
[630, 803, 746, 422, 111],
[537, 699, 497, 121, 956],
[805, 732, 524, 37, 331]]
doit(a,1,1)
#=> [[ 96, 342, 965, 150],
# [803, 746, 422, 111],
# [699, 497, 121, 956],
# [732, 524, 37, 331]]
doit(a,2,2)
#=> [[746, 422, 111],
# [497, 121, 956],
# [524, 37, 331]]
Related
I want to use FindFit to the logistic population model which I define as
model2 = L/(1 + (L/P0 - 1) e^(-kt))
on the data
data = {19, 39, 46, 73, 92, 109, 137, 160, 177, 202, 230, 257, 299, 342,
384, 419, 464, 511, 553, 597, 646, 684, 734, 779, 814, 851, 895, 929,
962, 988, 1011, 1040, 1069, 1110, 1141, 1165, 1195, 1212, 1226, 1247,
1269, 1288, 1303, 1318, 1332, 1341, 1354, 1367}
but I get this Error. I am using FindFit as follows
fit = FindFit[data, model2, {P0, L, k}, t]
The data is supposed to represent population size at different days, so 19 corresponds to population at day 1, 39 is population at day 2, etc.
Use capital E as in E^(-k t)
See https://reference.wolfram.com/language/ref/E.html
Homework: return the maximal sum of k consecutive elements in a list. I have tried the following 3, which work for 6 of the 7 tests by which the solution is verified. The 7th test is a very long input with a very large k value. I cannot put the input list in because the shown list is truncated due to its length. Here are the 3 methods I tried. Reiterating, each timed out, while the last one also gave me a SyntaxError.
Method 1: [verbose]
def arrayMaxConsecutiveSum(inputArray, k):
sum_array = []
for i in range(len(inputArray)-(k+1)):
sum_array.append(sum(inputArray[i:i+k]))
return max(sum_array)
Method 2: [one line = efficiency??]
def arrayMaxConsecutiveSum(inputArray, k):
return max([sum(inputArray[i:i+k]) for i in range(len(inputArray)-(k+1))])
Method 3: Lambda call
def arrayMaxConsecutiveSum(inputArray, k):
f = lambda data, n: [data[i:i+n] for i in range(len(data) - n + 1)]
sum_array = [sum(val) for val in f(inputArray,k)]
return max(sum_array)
Some examples of inputs and (correct) outputs:
IN:[2, 3, 5, 1, 6]
k: 2 OUT: 8
IN:[2, 4, 10, 1]
k: 2 OUT: 14
IN: [1, 3, 4, 2, 4, 2, 4]
k: 4 OUT: 13
Again, I would like to mention that I passed the other tests (6 was very long with a large k value as well[k was an order of magnitude smaller than 7's, however]) and just need to identify a method or a revision that would be more efficient/make these more efficient. Lastly, I would like to add that I attempted both 6 and 7 with the (truncated) inputs on IDLE3 and each produced a ValueError:
Traceback (most recent call last):
File "/Users/ryanflynn/arrmaxconsecsum.py", line 15, in <module>
962, 244, 390, 854, 406, 457, 160, 612, 693, 896, 800, 670, 776, 65, 81, 336, 305, 262, 877, 217, 50, 835, 307, 865, 774, 163, 556, 186, 734, 404, 610, 621, 538, 370, 153, 105, 816, 172, 149, 404, 634, 105, 74, 303, 304, 145, 592, 472, 778, 301, 480, 693, 954, 628, 355, 400, 327, 916, 458, 599, 157, 424, 957, 340, 51, 60, 688, 325, 456, 148, 189, 365, 358, 618, 462, 125, 863, 530, 942, 978, 898, 858, 671, 527, 877, 614, 826, 163, 380, 442, 68, 825, 978, 965, 562, 724, 553, 18, 554, 516, 694, 802, 650, 434, 520, 685, 581, 445, 441, 711, 757, 167, 594, 686, 993, 543, 694, 950, 812, 765, 483, 474, 961, 566, 224, 879, 403, 649, 27, 205, 841, 35, 35, 816, 723, 276, 984, 869, 502, 248, 695, 273, 689, 885, 157, 246, 684, 642, 172, 313, 683, 968, 29, 52, 915, 800, 608, 974, 266, 5, 252, 6, 15, 725, 788, 137, 200, 107, 173, 245, 753, 594, 47, 795, 477, 37, 904, 4, 781, 804, 352, 460, 244, 119, 410, 333, 187, 231, 48, 560, 771, 921, 595, 794, 925, 35, 312, 561, 173, 233, 669, 300, 73, 977, 977, 591, 322, 187, 199, 817, 386, 806, 625, 500, 1, 294, 40, 271, 306, 724, 713, 600, 126, 263, 591, 855, 976, 515, 850, 219, 118, 921, 522, 587, 498, 420, 724, 716],6886)
File "/Users/ryanflynn/arrmaxconsecsum.py", line 6, in arrayMaxConsecutiveSum
return max(sum_array)
ValueError: max() arg is an empty sequence
(Note: this used method 3) I checked with print statements both the value for f(inputArray,k) and sum_array: [] Any help would be appreciated :)
Try:
def arrayMaxConsecutiveSum(inputArray, k):
S = sum(inputArray[:k])
M = S
for i in range(len(inputArray) - k):
S += ( inputArray[i+k] - inputArray[i])
if M < S:
M = S
return M
S stands for sum and M stands for max.
This solution have a complexity of O(n), when your's have O(n*k)
You are summing k numbers n-k times, when I am summing 3 numbers n times.
I want to add an array to a two dimensional array like this:
arrays = [[8300, 6732, 4101, 3137, 3097], [1088, 647, 410, 138, 52], [623, 362, 191, 25, 0]]
new_array = [10, 100, 1000]
arrays.map.with_index{|v,i| v << new_array[i]}
# => [[8300, 6732, 4101, 3137, 3097, 10], [1088, 647, 410, 138, 52, 100], [623, 362, 191, 25, 0, 1000]]
It works well, but I want to know if there is more simpler way to accomplish this behavior.
I appreciate any suggestion.
arrays.zip(new_array).map(&:flatten)
# => [[8300, 6732, 4101, 3137, 3097, 10], [1088, 647, 410, 138, 52, 100], [623, 362, 191, 25, 0, 1000]]
You can use zip:
arrays.zip(new_array).each { |arr, item| arr << item }
arrays
# => [[8300, 6732, 4101, 3137, 3097, 10], [1088, 647, 410, 138, 52, 100], [623, 362, 191, 25, 0, 1000]]
Just a little extension to Santosh answer. If there are nested arrays and you want to the result to be as nested as in original arrays like
arrays = [[8300, [6732], 4101, [3137], 3097], [1088, [647], 410, 138, 52], [623, [362], 191, 25, 0]]
new_array = [10, [100], 1000]
required_answer = [[8300, [6732], 4101, [3137], 3097, 10], [1088, [647], 410, 138, 52, 100], [623, [362], 191, 25, 0, 1000]]
then you can use
arrays.zip(new_array).map{|x| x.flatten(1)}
this will flatten the array to one level.
I am trying to process some data and write the output in such a way that the result is partitioned by a key, and is sorted by another parameter- say ASC. For example,
>>> data =sc.parallelize(range(10000))
>>> mapped = data.map(lambda x: (x%2,x))
>>> grouped = mapped.groupByKey().partitionBy(2).map(lambda x: x[1] ).saveAsTextFile("mymr-output")
$ hadoop fs -cat mymr-output/part-00000 |cut -c1-1000
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420,
$ hadoop fs -cat mymr-output/part-00001 |cut -c1-1000
[2049, 2051, 2053, 2055, 2057, 2059, 2061, 2063, 2065, 2067, 2069, 2071, 2073, 2075, 2077, 2079, 2081, 2083, 2085, 2087, 2089, 2091, 2093, 2095, 2097, 2099, 2101, 2103, 2105, 2107, 2109, 2111, 2113, 2115, 2117, 2119, 2121, 2123, 2125, 2127, 2129, 2131, 2133, 2135, 2137, 2139, 2141, 2143, 2145, 2147, 2149, 2151, 2153, 2155, 2157, 2159, 2161, 2163, 2165, 2167, 2169, 2171, 2173, 2175, 2177, 2179, 2181, 2183, 2185, 2187, 2189, 2191, 2193, 2195, 2197, 2199, 2201, 2203, 2205, 2207, 2209, 2211, 2213, 2215, 2217, 2219, 2221, 2223, 2225, 2227, 2229, 2231, 2233, 2235, 2237, 2239, 2241, 2243, 2245, 2247, 2249, 2251, 2253, 2255, 2257, 2259, 2261, 2263, 2265, 2267, 2269, 2271, 2273, 2275, 2277, 2279, 2281, 2283, 2285, 2287, 2289, 2291, 2293, 2295, 2297, 2299, 2301, 2303, 2305, 2307, 2309, 2311, 2313, 2315, 2317, 2319, 2321, 2323, 2325, 2327, 2329, 2331, 2333, 2335, 2337, 2339, 2341, 2343, 2345, 2347, 2349, 2351, 2353, 2355, 2357, 2359, 2361, 2363, 2365, 2367, 2369, 2371, 2373, 2375, 2377, 2379, 238
$
Which is perfect- satisfies my first criteria, which is to have results partitioned by key. But I want the result sorted. I tried sorted(), but it didn't work.
>>> grouped= sorted(mapped.groupByKey().partitionBy(2).map(lambda x: x[1] ))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'PipelinedRDD' object is not iterable
I don't want to use parallelize again, and go recursive. Any help would be greatly appreciated.
PS: I did go through this: Does groupByKey in Spark preserve the original order? but it didn't help.
Thanks,
Jeevan.
Yes, that's an RDD, not a Python object that you can sort as if it's a local collection. After groupByKey() though, the value in each key-value tuple is a collection of numbers, and that is what you want to sort? You can use mapValues() which called sorted() on its argument.
I realize it's a toy example but be careful with groupByKey as it has to get all values for a key in memory. Also it is not even necessarily guaranteed that an RDD with 2 elements, and 2 partitions, means 1 goes in each partition. It's probable but not guarnateed.
PS you should be able to replace map(lambda x: x[1]) with values(). It may be faster.
Similar to what is said above the value in key-value is an RDD collection; you can test this by checking type(value). However, you can access a python list via the member .data and call sort or sorted on that.
grouped = mapped.groupByKey().partitionBy(2).map(lambda x: sorted(x[1].data) )
This is just a curiosity - I don't have a real question.
The output of AbsoluteTiming has a definite pattern; can anyone confirm/explain ?
xxx = RandomReal[NormalDistribution[0, 1], 10^6];
Sin[#] & /# xxx; // AbsoluteTiming
(* {0.0890089, Null} *)
Max[Exp[#] - 0.5, 0] & /# xxx; // AbsoluteTiming
(* {0.1560156, Null} *)
$Version
8.0 for Microsoft Windows (64-bit) (February 23, 2011)
According to the Documentation, "AbsoluteTiming is always accurate down to a granularity of $TimeUnit seconds, but on many systems is much more accurate." So evaluating $TimeUnit probably can elucidate this issue.
Yep. Let´s check if the time quantum is consistent:
Differences#
Round[10^5 Sort#
Union[AbsoluteTiming[
Sin[#] & /#
RandomReal[NormalDistribution[0, 1], #];][[1]] & /#
RandomInteger[10^6, 100]]]
(*
-> {1562, 1563, 1563, 1562, 1562, 1563, 1563, 1562, 1562, 1563, 1563, \
1562, 1562, 1563, 1563, 1562, 1562}
*)
Edit
Better code
Differences#
Sort#Union[
Round[10^5 AbsoluteTiming[
Sin[#] & /#
RandomReal[NormalDistribution[0, 1], #];][[1]] & /#
RandomInteger[10^6, 100]]]
Presumably your system's clock only has granularity to some fraction of a second that happens to produce a repeating decimal. I have never noticed this on my Macs.
It's cool, though.
EDIT
Now that I am home I can confirm this must be system-specific: here is my output from the code in belisarius's answer:
{56, 119, 28, 25, 33, 397, 35, 82, 185, 67, 41, 67, 218, 192, 115, \
28, 74, 16, 187, 222, 194, 8, 129, 399, 68, 75, 71, 34, 5, 37, 62, \
64, 137, 173, 24, 98, 135, 308, 63, 155, 208, 861, 22, 72, 72, 184, \
609, 564, 112, 1011, 118, 81, 158, 90, 351, 33, 35, 68, 10, 126, 39, \
194, 7, 108, 278, 75, 37, 214, 34, 166, 119, 10, 335, 141, 4, 988, \
90, 121, 71, 130, 117, 186, 33, 123, 111, 110, 57, 64, 213, 217, 210, \
204, 98, 247, 20, 1421, 28, 2003, 353}