Given the following float values :
n00.0, n0.0, n.0, 0.n, 0.0n, 0.00n, 0.000n
where n can be 1, 2 or 5, what is the smartest and fastest way to extract the corresponding integer values, to be used as rounding parameters
-2, -1, 0, 1, 2, 3, 4
At the moment, I am using a hash table : fast, but not so smart, I think!
I thought using a hash was a good approach and was skeptical that it would be outperformed by -Math.log10(k).floor, so I ran the following benchmark.
Construct the hash
h = { 100.0=>-2, 10.0=>-1, 1.0=>0, 0.1=>1, 0.01=>2,
200.0=>-2, 20.0=>-1, 2.0=>0, 0.2=>1, 0.02=>2,
500.0=>-2, 50.0=>-1, 5.0=>0, 0.5=>1, 0.05=>2 }
Construct test array for benchmark (15 million elements)
n = 1_000_000
(arr = h.keys.flat_map { |k| [k]*n }.shuffle).size
#=> 15_000_000
arr.first(10)
#=> [20.0, 0.02, 5.0, 0.5, 0.02, 0.05, 500.0, 50.0, 50.0, 20.0]
arr.last(10)
#=> [500.0, 0.5, 0.1, 20.0, 0.01, 0.1, 500.0, 50.0, 0.05, 0.5]
Perform benchmark
require 'fruity'
compare(
hash: -> { arr.each { |k| h[k] } },
log10: -> { arr.each { |k| -Math.log10(k).floor } }
)
Running each test once. Test will take about 42 seconds.
hash is faster than log10 by 60.0% ± 10.0%
Related
It seems if lightgbm.train is used with an initial score (init_score) it cannot boost this score.
Here is a simple example:
params = {"learning_rate": 0.1,"metric": "binary_logloss","objective": "binary",
"boosting_type": "gbdt","num_iterations": 5, "num_leaves": 2 ** 2,
"max_depth": 2, "num_threads": 1, "verbose": 0, "min_data_in_leaf": 1}
x = pd.DataFrame([[1, 0.1, 0.3], [1, 0.1, 0.3], [1, 0.1, 0.3],
[0, 0.9, 0.3], [0, 0.9, 0.3], [0, 0.9, 0.3]], columns=["a", "b", "prob"])
y = pd.Series([0, 1, 0, 0, 1, 0])
d_train = lgb.Dataset(x, label=y)
model = lgb.train(params, d_train)
y_pred_default = model.predict(x, raw_score=False)
In the case above, no init_score is used. The predictions are correct:
y_pred_default = [0.33333333, ... ,0.33333333]
d_train = lgb.Dataset(x, label=y, init_score=scipy.special.logit(x["prob"]))
model = lgb.train(params, d_train)
y_pred_raw = model.predict(x, raw_score=True)
In this part, we assume column "prob" from x to be our initial guess (maybe by some other model). We apply logit and use it as initial score. However, the model cannot improve and the boosting will always return 0: y_pred_raw = [0, 0, 0, 0, 0, 0]
y_pred_raw_with_init = scipy.special.logit(x["prob"]) + y_pred_raw
y_pred = scipy.special.expit(y_pred_raw_with_init)
This part above shows the way I suppose is correct to translate the initial scores together with the boosting back to probabilities. Since the boosting is zero y_pred yields [0.3, ..., 0.3] which is our initial probability.
How do I speed up the rank calculation of a sparse matrix in pure ruby?
I'm currently calculating the rank of a matrix (std lib) to determine the rigidity of a graph.
That means I have a sparse matrix of about 2 rows * 9 columns to about 300 rows * 300 columns.
That translates to times of several seconds to determine the rank of the matrix, which is very slow for a GUI application.
Because I use Sketchup I am bound to Ruby 2.0.0.
I'd like to avoid the hassle of setting up gcc on windows, so nmatrix is (I think) not a good option.
Edit:
Example matrix:
[[12, -21, 0, -12, 21, 0, 0, 0, 0],
[12, -7, -20, 0, 0, 0, -12, 7, 20],
[0, 0, 0, 0, 14, -20, 0, -14, 20]]
Edit2:
I am using integers instead of floats to speed it up considerably.
I have also added a fail fast mechanism earlier in the code in order to not call the slow rank function at all.
Edit3:
Part of the code
def rigid?(proto_matrix, nodes)
matrix_base = Array.new(proto_matrix.size) { |index|
# initialize the row with 0
arr = Array.new(nodes.size * 3, 0.to_int)
proto_row = proto_matrix[index]
# ids of the nodes in the graph
node_ids = proto_row.map { |hash| hash[:id] }
# set the values of both of the nodes' positions
[0, 1].each { |i|
vertex_index = vertices.find_index(node_ids[i])
# predetermined vector associated to the node
vec = proto_row[i][:vec]
arr[vertex_index * 3] = vec.x.to_int
arr[vertex_index * 3 + 1] = vec.y.to_int
arr[vertex_index * 3 + 2] = vec.z.to_int
}
arr
}
matrix = Matrix::rows(matrix_base, false)
rank = matrix.rank
# graph is rigid if the rank of the matrix is bigger or equal
# to the amount of node coordinates minus the degrees of freedom
# of the whole graph
rank >= nodes.size * 3 - 6
end
I'm doing an exercise now where I'm looking for all of the zeros in an array.
The input is:
numbers = [1, 3, 500, 200, 4000, 3000, 10000, 90, 20, 500000]
I want to sort them into a hash by the number of zeros. The expected output is:
expected = {0=>[1, 3], 2=>[500, 200], 3=>[4000, 3000], 4=>[10000], 1=>[90, 20], 5=>[500000]}
I have the structure built but I'm not sure how to count the number of zeros:
grouped = Hash.new {|hash, key| hash[key] = []}
numbers.each do |num|
grouped[num] << num
end
EDITED for clarity:
Any advice would be appreciated. Also, a lot of the advice I read on this recommended converting the array of integers to a string in order to solve the problem. Is there a way to count the number of digits (not just zeros) without converting the array to a string? The expected output in this case would look like:
expected = {1=>[1, 3], 2=>[90, 20], 3=>[500, 200], 4=>[4000, 3000], 5=>[10000], 6=>[500000]
Thanks in advance.
Like many transformations you'll want to do, this one's found in Enumerable.
Grouping by number of digits:
grouped = numbers.group_by { |n| Math.log10(n).to_i + 1 }
# => {1=>[1, 3], 3=>[500, 200], 4=>[4000, 3000], 5=>[10000], 2=>[90, 20], 6=>[500000]}
Grouping by number of zeroes:
grouped = numbers.group_by { |n| n.to_s.match(/0+$/) ? $&.length : 0 }
# => {0=>[1, 3], 2=>[500, 200], 3=>[4000, 3000], 4=>[10000], 1=>[90, 20], 5=>[500000]}
The group_by method is a handy way to convert an Array to a Hash with things organized into pigeon-holes.
I wound up using
grouped = Hash.new {|hash, key| hash[key] = []}
numbers.each do |num|
grouped[num.to_s.count('0')] << num
end
but I really liked the variation in responses. I didn't realize there were so many ways to go about this. Thank you everyone.
If you wish to group non-negative integers by the number of zero digits they contain, you can do this:
def nbr_zeroes(n)
return 1 if n == 0
m = n
i = 0
while m > 0
i += 1 if m % 10 == 0
m /= 10
end
i
end
numbers = [1, 3, 500, 200, 4000, 3000, 10000, 90, 20, 500000]
numbers.group_by { |i| nbr_zeroes(i) }
#=> { 0=>[1, 3], 2=>[500, 200], 3=>[4000, 3000], 4=>[10000] }
numbers = [100000, 100001, 304070, 3500040, 314073, 2000, 314873, 0]
numbers.group_by { |i| nbr_zeroes(i) }
#=> { 5=>[100000], 4=>[100001, 3500040], 3=>[304070, 2000],
# 1=>[314073, 0], 0=>[314873] }
Group by floor of log base 10?
1.9.3-p484 :014 > numbers.each {|n| grouped[Math.log10(n).floor] << n}
=> [1, 3, 500, 200, 4000, 3000, 10000, 90, 20, 500000]
1.9.3-p484 :016 > grouped
=> {0=>[1, 3], 2=>[500, 200], 3=>[4000, 3000], 4=>[10000], 1=>[90, 20], 5=>[500000]}
Or try 1 + Math.log10(n).floor if you need the keys to be the actual number of digits.
I'm doing a bit 'o matrix algebra in ruby. When testing the results, I'm seeing what I can only assume is a rounding error.
All I'm doing is multiplying 3 matrices, but the values are fairly small:
c_xy:
[0.9702957262759965, 0.012661213742314235, -0.24159035004964077]
[0, 0.9986295347545738, 0.05233595624294383]
[0.24192189559966773, -0.050781354673095955, 0.9689659697053497]
i2k = Matrix[[8.1144E-06, 0.0, 0.0],
[0.0, 8.1144E-06, 0.0],
[0.0, 0.0, 8.1144E-06]]
c_yx:
[0.9702957262759965, 0, 0.24192189559966773]
[0.012661213742314235, 0.9986295347545738, -0.050781354673095955]
[-0.24159035004964077, 0.05233595624294383, 0.9689659697053497]
What I'm trying to do is c_xy * i2k * c_yx. Here's what I expect (this was done in Excel):
8.1144E-06 0 2.11758E-22
0 8.1144E-06 0
2.11758E-22 -5.29396E-23 8.1144E-06
And what I get:
[8.1144e-06, 1.3234889800848443e-23, 6.352747104407253e-22]
[0.0, 8.114399999999998e-06, -5.293955920339377e-23]
[2.117582368135751e-22, 0.0, 8.1144e-06]
As you can see, the first column matches, as does the diagonal. But then (in r,c indexing) (0,1) is wrong (though approaching 0), (0,2) is very wrong, and (1,2) and (2,1) seem to be transposed. I thought it had something to do with the8.1144e-6 value, and tried wrapping it in a BigDecimal to no avail.
Any ideas on places I can look? I'm using the standard Ruby Matrix library
edit
here's the code.
phi1 = 0.24434609527920614
phi2 = 0.05235987755982988
i2k = Matrix[[8.1144E-06, 0.0, 0.0],
[0.0, 8.1144E-06, 0.0],
[0.0, 0.0, 8.1144E-06]]
c_x = Matrix[[1, 0, 0],
[0, Math.cos(phi2), Math.sin(phi2)],
[0, -Math.sin(phi2), Math.cos(phi2)]]
c_y = Matrix[[Math.cos(phi1), 0, -Math.sin(phi1)],
[0, 1, 0],
[Math.sin(phi1), 0, Math.cos(phi1)]]
c_xy = c_y * c_x
c_yx = c_xy.transpose
c_xy * i2k * c_yx
i2k is equal to the identity matrix times 8.1144E-06. This simplifies the answer to:
c_xy * i2k * c_yx = 8.1144E-06 * c_xy * c_yx
However since c_yx = c_xy.transpose and c_xy is a rotation matrix, the transpose of any rotation matrix is its inverse. So c_xy * c_yx is the identity matrix, and thus the exact answer is 8.1144E-06 times the identity matrix.
Here is one way to calculate c_xy * c_yx without using the matrix algebra a priori:
require 'matrix'
require 'pp'
phi1 = 14 * Math::PI/180
phi2 = 3 * Math::PI/180
c_x = Matrix[
[1,0,0],
[0, Math.cos(phi2), Math.sin(phi2) ],
[0, -Math.sin(phi2), Math.cos(phi2) ] ]
c_y = Matrix[
[Math.cos(phi1), 0, -Math.sin(phi1) ],
[0,1,0],
[Math.sin(phi1), 0, Math.cos(phi1) ] ]
c_xy = c_y * c_x
c_yx = c_xy.transpose
product = c_xy * c_yx
pp *product
clone = *product
puts "\nApplying EPSILON:"
product.each_with_index do |e,i,j|
clone[i][j] = 0 if e.abs <= Float::EPSILON
end
pp clone
Output:
[1.0, 0.0, 2.7755575615628914e-17]
[0.0, 0.9999999999999999, -6.938893903907228e-18]
[2.7755575615628914e-17, -6.938893903907228e-18, 0.9999999999999999]
Applying EPSILON:
[1.0, 0, 0]
[0, 0.9999999999999999, 0]
[0, 0, 0.9999999999999999]
which one can then surmise should be the identity matrix. This uses Float::EPSILON which is about 2.220446049250313e-16 in order to set values that have an absolute value no more than this equal to 0. These kinds of approximations are inevitable in floating point calculations; one must evaluate the appropriateness of these approximations on a case-by-case basis.
An alternative is to do symbolic computation where possible rather than numeric.
Floating point numbers have a precision:
puts Float::DIG # => 15
That's the number of decimal digits a Float can have on my, and probably your system. Numbers smaller than 1E-15 can not be represented with a float. You could try BigDecimal for arbitrary large precision.
I have thought of a couple of different ways to generate the following array: [1, 10, 100, 1_000, 10_000, 100_000, 1_000_000]
It seems like it might be possible to generate this array with the step function in an elegant manner, but I was not able to figure it out. Something that passes in a second argument to the step function and says you want the last value times 10:
0.step(1_000_000, ???).to_a
Here are the solutions I have come up with so far:
I don't really like the inject solution because I would prefer to specify 1_000_000 as the upper bound:
(0..6).inject([]) { |memo, number| memo << 10**number; memo }
This is the ugly step solution I came up with:
result = []
0.step(6) {|number| result << 10 ** number}
result
A while loop does not feel right either, but at least it lets me specify the upper_bound (instead of Math.log10(upper_bound)):
result = [1]
while result.last < 1_000_000
result << result.last * 10
end
result
Thanks for the help.
You had many solutions. What about using map this way.
7.times.map { |i| 10**i }
#=> [1, 10, 100, 1000, 10000, 100000, 1000000]
If you want to set the upper bound you could always to something like this
1_000_000.to_s.size.times.map { |i| 10**i }
#=> [1, 10, 100, 1000, 10000, 100000, 1000000]
How about this?
0.upto(Math.log10(1_000_000)).map { |i| 10**i }
It's only going to properly work for powers of 10, but it lets you specify the upper bound, and then computes the powers of 10 to iterate through.
If you want to lead with the upper bound, you can do so easily via:
Math.log10(10_000_000).to_i.downto(0).map {|i| 10 ** i }.reverse
If terseness is really important, you can always reopen Fixnum with a generalized solution:
class Fixnum
def by_powers_of(base = 10)
0.upto(Math.log(self, base)).map {|i| base ** i }
end
end
10_000_000.by_powers_of(10)
# => [1, 10, 100, 1000, 10000, 100000, 1000000, 10000000]
(64**2).by_powers_of(2)
# => [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]
class Integer
def powers_upto(max)
results = []
exp = 0
loop do
result = self**exp
break if result > max
results << result
exp += 1
end
results
end
end
p 10.powers_upto(1_000_000)
p 2.powers_upto(11)
--output:--
[1, 10, 100, 1000, 10000, 100000, 1000000]
[1, 2, 4, 8]