Getting accurate file size in megabytes? - ruby

How can I get the accurate file size in MB? I tried this:
compressed_file_size = File.size("Compressed/#{project}.tar.bz2") / 1024000
puts "file size is #{compressed_file_size} MB"
But it chopped the 0.9 and showed 2 MB instead of 2.9 MB

Try:
compressed_file_size = File.size("Compressed/#{project}.tar.bz2").to_f / 2**20
formatted_file_size = '%.2f' % compressed_file_size
One-liner:
compressed_file_size = '%.2f' % (File.size("Compressed/#{project}.tar.bz2").to_f / 2**20)
or:
compressed_file_size = (File.size("Compressed/#{project}.tar.bz2").to_f / 2**20).round(2)
Further information on %-operator of String:
http://ruby-doc.org/core-1.9/classes/String.html#M000207
BTW: I prefer "MiB" instead of "MB" if I use base2 calculations (see: http://en.wikipedia.org/wiki/Mebibyte)

You're doing integer division (which drops the fractional part). Try dividing by 1024000.0 so ruby knows you want to do floating point math.

Try:
compressed_file_size = File.size("Compressed/#{project}.tar.bz2").to_f / 1024000

You might find a formatting function useful (pretty print file size), and here is my example,
def format_mb(size)
conv = [ 'b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb' ];
scale = 1024;
ndx=1
if( size < 2*(scale**ndx) ) then
return "#{(size)} #{conv[ndx-1]}"
end
size=size.to_f
[2,3,4,5,6,7].each do |ndx|
if( size < 2*(scale**ndx) ) then
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
end
ndx=7
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
Test it out,
tries = [ 1,2,3,500,1000,1024,3000,99999,999999,999999999,9999999999,999999999999,99999999999999,3333333333333333,555555555555555555555]
tries.each { |x|
print "size #{x} -> #{format_mb(x)}\n"
}
Which produces,
size 1 -> 1 b
size 2 -> 2 b
size 3 -> 3 b
size 500 -> 500 b
size 1000 -> 1000 b
size 1024 -> 1024 b
size 3000 -> 2.930 kb
size 99999 -> 97.655 kb
size 999999 -> 976.562 kb
size 999999999 -> 953.674 mb
size 9999999999 -> 9.313 gb
size 999999999999 -> 931.323 gb
size 99999999999999 -> 90.949 tb
size 3333333333333333 -> 2.961 pb
size 555555555555555555555 -> 481.868 eb

Related

Decoding a .xwd image in Julia

I am trying to write a decoder for a .xwd image (X Window Dump), since ImageMagick is quite slow.
The only specifications I found are:
http://www.opensource.apple.com/source/X11/X11-0.40.80/xc/include/XWDFile.h?txt
https://formats.kaitai.io/xwd/index.html
From which I managed to read the header:
xwd_data = read(`xwd -id $id`)
function get_header(data)
args = [reinterpret(Int32, reverse(data[4*i-3:4*i]))[1] for i in 1:25]
xwd = XwdHeader(args...)
return xwd
end
struct XwdHeader
header_size::Int32
file_version::Int32
pixmap_format::Int32
pixmap_depth::Int32
pixmap_width::Int32
pixmap_height::Int32
xoffset::Int32
byte_order::Int32
bitmap_unit::Int32
bitmap_bit_order::Int32
bitmap_pad::Int32
bits_per_pixel::Int32
bytes_per_line::Int32
visual_class::Int32
red_mask::Int32
green_mask::Int32
blue_mask::Int32
bits_per_rgb::Int32
colormap_entries::Int32
ncolors::Int32
window_width::Int32
window_height::Int32
window_x::Int32
window_y::Int32
window_bdrwidth::Int32
end
and the colormap, which is stored in blocks of 12 bytes and in little-endian byte order:
function read_colormap_entry(n, data, header)
offset = header.header_size + 1
poff = 12*n
px = Pixel(reinterpret(UInt32, reverse(data[offset+poff:offset+poff+3]))[1],
reinterpret(UInt16, reverse(data[offset+poff+4:offset+poff+5]))[1],
reinterpret(UInt16, reverse(data[offset+poff+6:offset+poff+7]))[1],
reinterpret(UInt16, reverse(data[offset+poff+8:offset+poff+9]))[1],
reinterpret(UInt8, data[offset+poff+10])[1],
reinterpret(UInt8, data[offset+poff+11])[1])
println("Pixel number ", px.entry_number >> 16)
println("R ", px.red >> 8)
println("G ", px.green >> 8)
println("B ", px.blue >> 8)
println("flags ", px.flags)
println("padding ",px.padding)
end
struct Pixel
entry_number::UInt32
red::UInt16
green::UInt16
blue::UInt16
flags::UInt8
padding::UInt8
end
julia> read_colormap_entry(0, data, header)
Pixel number 0
R 0
G 0
B 0
flags 7
padding 0
julia> read_colormap_entry(1, data, header)
Pixel number 1
R 1
G 1
B 1
flags 7
padding 0
julia> read_colormap_entry(2, data, header)
Pixel number 2
R 2
G 2
B 2
flags 7
padding 0
Now I have the actual image data stored in 4 byte blocks per pixel in the "Direct Color" visual class. Does anybody know howto extract the RGB values from this ?
edit:
By playing around with the data I found out how to extract the R and G values
function read_pixel(i, j, data, header::XwdHeader)
w = header.window_width
h = header.window_height
offset = header.header_size + header.colormap_entries * 12 + 1
poff = 4*((i-1)*w + (j-1))
px = reinterpret(UInt32, reverse(data[offset+poff:offset+poff+3]))[1]
println("Px value ", px)
r = (px & xwd.red_mask) >> 16
g = (px & xwd.green_mask) >> 8
b = (px & xwd.blue_mask)
println("r ", r)
println("g ", g)
println("b ", b)
end
which gives the correct R and G values, but the B value should be non zero.
julia> read_pixel(31, 31, data, xwd_header)
Px value 741685248
r 53
g 56
b 0
I basically have no idea what I am doing with the color masks and the
bit-shifts. Can anyone explain this ? Thanks !

Convert human readable file size to bytes in ruby

I went through this link. My requirement is the exact reverse of this. Example a string 10KB needs to be converted to 10240 (its equivalent byte size). Do we have any gem for this? or inbuilt method in ruby? I did my research, I wasn't able to spot it
There's filesize (rubygems)
It's quite trivial to write your own:
module ToBytes
def to_bytes
md = match(/^(?<num>\d+)\s?(?<unit>\w+)?$/)
md[:num].to_i *
case md[:unit]
when 'KB'
1024
when 'MB'
1024**2
when 'GB'
1024**3
when 'TB'
1024**4
when 'PB'
1024**5
when 'EB'
1024**6
when 'ZB'
1024**7
when 'YB'
1024**8
else
1
end
end
end
size_string = "10KB"
size_string.extend(ToBytes).to_bytes
=> 10240
String.include(ToBytes)
"1024 KB".to_bytes
=> 1048576
If you need KiB, MiB etc then you just add multipliers.
Here is a method using while:
def number_format(n)
n2, n3 = n, 0
while n2 >= 1e3
n2 /= 1e3
n3 += 1
end
return '%.3f' % n2 + ['', ' k', ' M', ' G'][n3]
end
s = number_format(9012345678)
puts s == '9.012 G'
https://ruby-doc.org/core/doc/syntax/control_expressions_rdoc.html#label-while+Loop

Filling a matrix using parallel processing in Julia

I'm trying to speed up the solution time for a dynamic programming problem in Julia (v. 0.5.0), via parallel processing. The problem involves choosing the optimal values for every element of a 1073 x 19 matrix at every iteration, until successive matrix differences fall within a tolerance. I thought that, within each iteration, filling in the values for each element of the matrix could be parallelized. However, I'm seeing a huge performance degradation using SharedArray, and I'm wondering if there's a better way to approach parallel processing for this problem.
I construct the arguments for the function below:
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = (1+r)^(-1)
sigma_range = 4
gz = 19
gp = 29
gk = 37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float32,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float32,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float32,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
I then constructed the function with serial processing. The two for loops iterate through each element to find the largest value in a 1072-sized (=the gm scalar argument in the function) array:
function dynam_serial(E,gm,gz,beta,Zprob)
v = Array(Float32,gm,gz )
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array(Float32,gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
for h=1:gm
for j=1:gz
Tv[h,j]=findmax(E[:,h,j] + beta*exp_v[:,j])[1]
end
end
diff = maxabs(Tv - v)
v=copy(Tv)
end
end
Timing this, I get:
#time dynam_serial(E,gm,gz,beta,Zprob)
> 106.880008 seconds (91.70 M allocations: 203.233 GB, 15.22% gc time)
Now, I try using Shared Arrays to benefit from parallel processing. Note that I reconfigured the iteration so that I only have one for loop, rather than two. I also use v=deepcopy(Tv); otherwise, v is copied as an Array object, rather than a SharedArray:
function dynam_parallel(E,gm,gz,beta,Zprob)
v = SharedArray(Float32,(gm,gz),init = S -> S[Base.localindexes(S)] = myid() )
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
Tv = SharedArray(Float32,gm,gz,init = S -> S[Base.localindexes(S)] = myid() )
#sync #parallel for hj=1:(gm*gz)
j=cld(hj,gm)
h=mod(hj,gm)
if h==0;h=gm;end;
#async Tv[h,j]=findmax(E[:,h,j] + beta*exp_v[:,j])[1]
end
diff = maxabs(Tv - v)
v=deepcopy(Tv)
end
end
Timing the parallel version; and using a 4-core 2.5 GHz I7 processor with 16GB of memory, I get:
addprocs(3)
#time dynam_parallel(E,gm,gz,beta,Zprob)
> 164.237208 seconds (2.64 M allocations: 201.812 MB, 0.04% gc time)
Am I doing something incorrect here? Or is there a better way to approach parallel processing in Julia for this particular problem? I've considered using Distributed Arrays, but it's difficult for me to see how to apply them to the present problem.
UPDATE:
Per #DanGetz and his helpful comments, I turned instead to trying to speed up the serial processing version. I was able to get performance down to 53.469780 seconds (67.36 M allocations: 103.419 GiB, 19.12% gc time) through:
1) Upgrading to 0.6.0 (saved about 25 seconds), which includes the helpful #views macro.
2) Preallocating the main array I'm trying to fill in (Tv), per the section on Preallocating Outputs in the Julia Performance Tips: https://docs.julialang.org/en/latest/manual/performance-tips/. (saved another 25 or so seconds)
The biggest remaining slow-down seems to be coming from the add_vecs function, which sums together subarrays of two larger matrices. I've tried devectorizing and using BLAS functions, but haven't been able to produce better performance.
In any event, the improved code for dynam_serial is below:
function add_vecs(r::Array{Float32},h::Int,j::Int,E::Array{Float32},exp_v::Array{Float32},beta::Float32)
#views r=E[:,h,j] + beta*exp_v[:,j]
return r
end
function dynam_serial(E::Array{Float32},gm::Int,gz::Int,beta::Float32,Zprob::Array{Float32})
v = Array{Float32}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float32}(gm,gz)
r = Array{Float32}(gm)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
for h=1:gm
for j=1:gz
#views Tv[h,j]=findmax(add_vecs(r,h,j,E,exp_v,beta))[1]
end
end
diff = maximum(abs,Tv - v)
v=copy(Tv)
end
return Tv
end
If add_vecs seems to be the critical function, writing an explicit for loop could offer more optimization. How does the following benchmark:
function add_vecs!(r::Array{Float32},h::Int,j::Int,E::Array{Float32},
exp_v::Array{Float32},beta::Float32)
#inbounds for i=1:size(E,1)
r[i]=E[i,h,j] + beta*exp_v[i,j]
end
return r
end
UPDATE
To continue optimizing dynam_serial I have tried to remove more allocations. The result is:
function add_vecs_and_max!(gm::Int,r::Array{Float64},h::Int,j::Int,E::Array{Float64},
exp_v::Array{Float64},beta::Float64)
#inbounds for i=1:gm
r[i] = E[i,h,j]+beta*exp_v[i,j]
end
return findmax(r)[1]
end
function dynam_serial(E::Array{Float64},gm::Int,gz::Int,
beta::Float64,Zprob::Array{Float64})
v = Array{Float64}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
r = Array{Float64}(gm)
exp_v = Array{Float64}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.0 # arbitrary initial value greater than convcrit
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
diff = -Inf
for h=1:gm
for j=1:gz
oldv = v[h,j]
newv = add_vecs_and_max!(gm,r,h,j,E,exp_v,beta)
v[h,j]= newv
diff = max(diff, oldv-newv, newv-oldv)
end
end
end
return v
end
Switching the functions to use Float64 should increase speed (as CPUs are inherently optimized for 64-bit word lengths). Also, using the mutating A_mul_Bt! directly saves another allocation. Avoiding the copy(...) by switching the arrays v and Tv.
How do these optimizations improve your running time?
2nd UPDATE
Updated the code in the UPDATE section to use findmax. Also, changed dynam_serial to use v without Tv, as there was no need to save the old version except for the diff calculation, which is now done inside the loop.
Here's the code I copied-and-pasted, provided by Dan Getz above. I include the array and scalar definitions exactly as I ran them. Performance was: 39.507005 seconds (11 allocations: 486.891 KiB) when running #time dynam_serial(E,gm,gz,beta,Zprob).
using SpecialFunctions
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = (1+r)^(-1)
sigma_range = 4
gz = 19 #15 #19
gp = 29 #19 #29
gk = 37 #25 #37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp.(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float64,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float64,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float64,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
function add_vecs_and_max!(gm::Int,r::Array{Float64},h::Int,j::Int,E::Array{Float64},
exp_v::Array{Float64},beta::Float64)
maxr = -Inf
#inbounds for i=1:gm r[i] = E[i,h,j]+beta*exp_v[i,j]
maxr = max(r[i],maxr)
end
return maxr
end
function dynam_serial(E::Array{Float64},gm::Int,gz::Int,
beta::Float64,Zprob::Array{Float64})
v = Array{Float64}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float64}(gm,gz)
r = Array{Float64}(gm)
exp_v = Array{Float64}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.0 # arbitrary initial value greater than convcrit
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
diff = -Inf
for h=1:gm
for j=1:gz
Tv[h,j]=add_vecs_and_max!(gm,r,h,j,E,exp_v,beta)
diff = max(abs(Tv[h,j]-v[h,j]),diff)
end
end
(v,Tv)=(Tv,v)
end
return v
end
Now, here's another version of the algorithm and inputs. The functions are similar to what Dan Getz suggested, except that I use findmax rather than an iterated max function to find the array maximum. In the input construction, I am using both Float32 and mixing different bit-types together. However, I've consistently achieved better performance this way: 24.905569 seconds (1.81 k allocations: 46.829 MiB, 0.01% gc time). But it's not clear at all why.
using SpecialFunctions
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = Float32((1+r)^(-1))
sigma_range = 4
gz = 19
gp = 29
gk = 37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float32,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float32,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float32,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
function add_vecs!(gm::Int,r::Array{Float32},h::Int,j::Int,E::Array{Float32},
exp_v::Array{Float32},beta::Float32)
#inbounds #views for i=1:gm
r[i]=E[i,h,j] + beta*exp_v[i,j]
end
return r
end
function dynam_serial(E::Array{Float32},gm::Int,gz::Int,beta::Float32,Zprob::Array{Float32})
v = Array{Float32}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float32}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.00000 # arbitrary initial value greater than convcrit
iter=0
exp_v=Array{Float32}(gm,gz)
r=Array{Float32}(gm)
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
for h=1:gm
for j=1:gz
Tv[h,j]=findmax(add_vecs!(gm,r,h,j,E,exp_v,beta))[1]
end
end
diff = maximum(abs,Tv - v)
(v,Tv)=(Tv,v)
end
return v
end

How do I get HSV values of an average pixel of an image?

In this code
im = Vips::Image.new_from_file "some.jpg"
r = (im * [1,0,0]).avg
g = (im * [0,1,0]).avg
b = (im * [0,0,1]).avg
p [r,g,b] # => [57.1024, 53.818933333333334, 51.9258]
p Vips::Image.sRGB2HSV [r,g,b]
the last line throws
/ruby-vips-1.0.3/lib/vips/argument.rb:154:in `set_property': invalid argument Array (expect #<Class:0x007fbd7c923600>) (ArgumentError)`
P.S.: temporary took and refactored the ChunkyPNG implementation:
def to_hsv r, g, b
r, g, b = [r, g, b].map{ |component| component.fdiv 255 }
min, max = [r, g, b].minmax
chroma = max - min
[
60.0 * ( chroma.zero? ? 0 : case max
when r ; (g - b) / chroma
when g ; (b - r) / chroma + 2
when b ; (r - g) / chroma + 4
else 0
end % 6 ),
chroma / max,
max,
]
end
Pixel averaging should really be in a linear colorspace. XYZ is an easy one, but scRGB would work well too. Once you have a 1x1 pixel image, convert to HSV and read out the value.
#!/usr/bin/ruby
require 'vips'
im = Vips::Image.new_from_file ARGV[0]
# xyz colourspace is linear, ie. the value is each channel is proportional to
# the number of photons of that frequency
im = im.colourspace "xyz"
# 'shrink' is a fast box filter, so each output pixel is the simple average of
# the corresponding input pixels ... this will shrink the whole image to a
# single pixel
im = im.shrink im.width, im.height
# now convert the one pixel image to hsv and read out the values
im = im.colourspace "hsv"
h, s, v = im.getpoint 0, 0
puts "h = #{h}"
puts "s = #{s}"
puts "v = #{v}"
I wouldn't use HSV myself, LCh is generally much better.
https://en.wikipedia.org/wiki/Lab_color_space#Cylindrical_representation:_CIELCh_or_CIEHLC
For LCh, just change the end to:
im = im.colourspace "lch"
l, c, h = im.getpoint 0, 0
I realised, that it is obviously wrong to calculate average Hue as arithmetic average, so I solved it by adding vectors of length equal to Saturation. But I didn't find how to iterate over pixels in vips so I used a crutch of chunky_png:
require "vips"
require "chunky_png"
def get_average_hsv_by_filename filename
im = Vips::Image.new filename
im.write_to_file "temp.png"
y, x = 0, 0
ChunkyPNG::Canvas.from_file("temp.png").to_rgba_stream.unpack("N*").each do |rgba|
h, s, v = ChunkyPNG::Color.to_hsv(rgba)
a = h * Math::PI / 180
y += Math::sin(a) * s
x += Math::cos(a) * s
end
h = Math::atan2(y, x) / Math::PI * 180
_, s, v = im.colourspace("hsv").bandsplit.map(&:avg)
[h, s, v]
end
For large images I used .resize that seems to inflict only up to ~2% error when resizing down to 10000 square pixels area with default kernel.

Pretty file size in Ruby?

I'm trying to make a method that converts an integer that represents bytes to a string with a 'prettied up' format.
Here's my half-working attempt:
class Integer
def to_filesize
{
'B' => 1024,
'KB' => 1024 * 1024,
'MB' => 1024 * 1024 * 1024,
'GB' => 1024 * 1024 * 1024 * 1024,
'TB' => 1024 * 1024 * 1024 * 1024 * 1024
}.each_pair { |e, s| return "#{s / self}#{e}" if self < s }
end
end
What am I doing wrong?
If you use it with Rails - what about standard Rails number helper?
http://api.rubyonrails.org/classes/ActionView/Helpers/NumberHelper.html#method-i-number_to_human_size
number_to_human_size(number, options = {})
?
How about the Filesize gem ? It seems to be able to convert from bytes (and other formats) into pretty printed values:
example:
Filesize.from("12502343 B").pretty # => "11.92 MiB"
http://rubygems.org/gems/filesize
I agree with #David that it's probably best to use an existing solution, but to answer your question about what you're doing wrong:
The primary error is dividing s by self rather than the other way around.
You really want to divide by the previous s, so divide s by 1024.
Doing integer arithmetic will give you confusing results, so convert to float.
Perhaps round the answer.
So:
class Integer
def to_filesize
{
'B' => 1024,
'KB' => 1024 * 1024,
'MB' => 1024 * 1024 * 1024,
'GB' => 1024 * 1024 * 1024 * 1024,
'TB' => 1024 * 1024 * 1024 * 1024 * 1024
}.each_pair { |e, s| return "#{(self.to_f / (s / 1024)).round(2)}#{e}" if self < s }
end
end
lets you:
1.to_filesize
# => "1.0B"
1020.to_filesize
# => "1020.0B"
1024.to_filesize
# => "1.0KB"
1048576.to_filesize
# => "1.0MB"
Again, I don't recommend actually doing that, but it seems worth correcting the bugs.
This is my solution:
def filesize(size)
units = %w[B KiB MiB GiB TiB Pib EiB ZiB]
return '0.0 B' if size == 0
exp = (Math.log(size) / Math.log(1024)).to_i
exp += 1 if (size.to_f / 1024 ** exp >= 1024 - 0.05)
exp = units.size - 1 if exp > units.size - 1
'%.1f %s' % [size.to_f / 1024 ** exp, units[exp]]
end
Compared to other solutions it's simpler, more efficient, and generates a more proper output.
Format
All other methods have the problem that they report 1023.95 bytes wrong. Moreover to_filesize simply errors out with big numbers (it returns an array).
- method: [ filesize, Filesize, number_to_human, to_filesize ]
- 0 B: [ 0.0 B, 0.00 B, 0 Bytes, 0.0B ]
- 1 B: [ 1.0 B, 1.00 B, 1 Byte, 1.0B ]
- 10 B: [ 10.0 B, 10.00 B, 10 Bytes, 10.0B ]
- 1000 B: [ 1000.0 B, 1000.00 B, 1000 Bytes, 1000.0B ]
- 1 KiB: [ 1.0 KiB, 1.00 KiB, 1 KB, 1.0KB ]
- 1.5 KiB: [ 1.5 KiB, 1.50 KiB, 1.5 KB, 1.5KB ]
- 10 KiB: [ 10.0 KiB, 10.00 KiB, 10 KB, 10.0KB ]
- 1000 KiB: [ 1000.0 KiB, 1000.00 KiB, 1000 KB, 1000.0KB ]
- 1 MiB: [ 1.0 MiB, 1.00 MiB, 1 MB, 1.0MB ]
- 1 GiB: [ 1.0 GiB, 1.00 GiB, 1 GB, 1.0GB ]
- 1023.95 GiB: [ 1.0 TiB, 1023.95 GiB, 1020 GB, 1023.95GB ]
- 1 TiB: [ 1.0 TiB, 1.00 TiB, 1 TB, 1.0TB ]
- 1 EiB: [ 1.0 EiB, 1.00 EiB, 1 EB, ERROR ]
- 1 ZiB: [ 1.0 ZiB, 1.00 ZiB, 1020 EB, ERROR ]
- 1 YiB: [ 1024.0 ZiB, 1024.00 ZiB, 1050000 EB, ERROR ]
Performance
Also, it has the best performance (seconds to process 1 million numbers):
- filesize: 2.15
- Filesize: 15.53
- number_to_human: 139.63
- to_filesize: 2.41
Here is a method using log10:
def number_format(d)
e = Math.log10(d).to_i / 3
return '%.3f' % (d / 1000 ** e) + ['', ' k', ' M', ' G'][e]
end
s = number_format(9012345678.0)
puts s == '9.012 G'
https://ruby-doc.org/core/Math.html#method-c-log10
You get points for adding a method to Integer, but this seems more File specific, so I would suggest monkeying around with File, say by adding a method to File called .prettysize().
But here is an alternative solution that uses iteration, and avoids printing single bytes as float :-)
def format_mb(size)
conv = [ 'b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb' ];
scale = 1024;
ndx=1
if( size < 2*(scale**ndx) ) then
return "#{(size)} #{conv[ndx-1]}"
end
size=size.to_f
[2,3,4,5,6,7].each do |ndx|
if( size < 2*(scale**ndx) ) then
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
end
ndx=7
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
#Darshan Computing's solution is only partial here. Since the hash keys are not guaranteed to be ordered this approach will not work reliably. You could fix this by doing something like this inside the to_filesize method,
conv={
1024=>'B',
1024*1024=>'KB',
...
}
conv.keys.sort.each { |s|
next if self >= s
e=conv[s]
return "#{(self.to_f / (s / 1024)).round(2)}#{e}" if self < s }
}
This is what I ended up doing for a similar method inside Float,
class Float
def to_human
conv={
1024=>'B',
1024*1024=>'KB',
1024*1024*1024=>'MB',
1024*1024*1024*1024=>'GB',
1024*1024*1024*1024*1024=>'TB',
1024*1024*1024*1024*1024*1024=>'PB',
1024*1024*1024*1024*1024*1024*1024=>'EB'
}
conv.keys.sort.each { |mult|
next if self >= mult
suffix=conv[mult]
return "%.2f %s" % [ self / (mult / 1024), suffix ]
}
end
end
FileSize may be dead, but now there is ByteSize.
require 'bytesize'
ByteSize.new(1210000000) #=> (1.21 GB)
ByteSize.new(1210000000).to_s #=> 1.21 GB

Resources