Round up to the nearest tenth? - ruby

I need to round up to the nearest tenth. What I need is ceil but with precision to the first decimal place.
Examples:
10.38 would be 10.4
10.31 would be 10.4
10.4 would be 10.4
So if it is any amount past a full tenth, it should be rounded up.
I'm running Ruby 1.8.7.

This works in general:
ceil(number*10)/10
So in Ruby it should be like:
(number*10).ceil/10.0

Ruby's round method can consume precisions:
10.38.round(1) # => 10.4
In this case 1 gets you rounding to the nearest tenth.

If you have ActiveSupport available, it adds a round method:
3.14.round(1) # => 3.1
3.14159.round(3) # => 3.142
The source is as follows:
def round_with_precision(precision = nil)
precision.nil? ? round_without_precision : (self * (10 ** precision)).round / (10 ** precision).to_f
end

To round up to the nearest tenth in Ruby you could do
(number/10.0).ceil*10
(12345/10.0).ceil*10 # => 12350

(10.33 + 0.05).round(1) # => 10.4
This always rounds up like ceil, is concise, supports precision, and without the goofy /10 *10.0 thing.
Eg. round up to nearest hundredth:
(10.333 + 0.005).round(2) # => 10.34
To nearest thousandth:
(10.3333 + 0.0005).round(3) # => 10.334
etc.

Related

Ruby - Unpack array with mixed types

I am trying to use unpack to decode a binary file. The binary file has the following structure:
ABCDEF\tFFFABCDEF\tFFFF....
where
ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times
I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do
s.unpack('F*')
Or if I had integers and floats like
[1, 3.4, 5.2, 4, 2.3, 7.8]
I would do
s.unpack('CF2CF2')
But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.
I need to use Ruby 2.0.0-p247 if that matters
Example
ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')
then
s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y#33\xB3#\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 #ff\x0EAff"]
Finally
s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'
You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information.
You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.
Here's an example, which is similar to the documentation of IO.read:
data = [["ABCDEF\t", 3.4, 5.6, 9.1],
["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'
File.open(binary_file, 'wb') do |o|
data.each do |row|
o.write row.pack(pattern)
end
end
raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size
File.open(binary_file, 'rb') do |f|
while record = f.read(chunk_size)
puts '%s %g %g %g' % record.unpack(pattern)
end
end
# =>
# ABCDEF 3.4 5.6 9.1
# FEDCBA 2.5 8.9 3.1
You could use a multiple of 19 to speed up the process if your file is large.
When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,
Quick example would be:
binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }
Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t', you would use a size of 19 (7 for the string, 12 for the 3 floats).
Your final product would look like this:
str.scan(/.{19}/).map { |item| item.unpack('P7fff') }
Per example:
irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y#33\xB3#\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 #ff\x0EAffF#"
irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]
The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.

Error in setting max features parameter in Isolation Forest algorithm using sklearn

I'm trying to train a dataset with 357 features using Isolation Forest sklearn implementation. I can successfully train and get results when the max features variable is set to 1.0 (the default value).
However when max features is set to 2, it gives the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 2 and input n_features is 357
It also gives the same error when the feature count is 1 (int) and not 1.0 (float).
How I understood was that when the feature count is 2 (int), two features should be considered in creating each tree. Is this wrong? How can I change the max features parameter?
The code is as follows:
from sklearn.ensemble.iforest import IsolationForest
def isolation_forest_imp(dataset):
estimators = 10
samples = 100
features = 2
contamination = 0.1
bootstrap = False
random_state = None
verbosity = 0
estimator = IsolationForest(n_estimators=estimators, max_samples=samples, contamination=contamination,
max_features=features,
bootstrap=boostrap, random_state=random_state, verbose=verbosity)
model = estimator.fit(dataset)
In the documentation it states:
max_features : int or float, optional (default=1.0)
The number of features to draw from X to train each base estimator.
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape[1]` features.
So, 2 should mean take two features and 1.0 should mean take all of the features, 0.5 take half and so on, from what I understand.
I think this could be a bug, since, taking a look in IsolationForest's fit:
# Isolation Forest inherits from BaseBagging
# and when _fit is called, BaseBagging takes care of the features correctly
super(IsolationForest, self)._fit(X, y, max_samples,
max_depth=max_depth,
sample_weight=sample_weight)
# however, when after _fit the decision_function is called using X - the whole sample - not taking into account the max_features
self.threshold_ = -sp.stats.scoreatpercentile(
-self.decision_function(X), 100. * (1. - self.contamination))
then:
# when the decision function _validate_X_predict is called, with X unmodified,
# it calls the base estimator's (dt) _validate_X_predict with the whole X
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
...
# from tree.py:
def _validate_X_predict(self, X, check_input):
"""Validate X whenever one tries to predict, apply, predict_proba"""
if self.tree_ is None:
raise NotFittedError("Estimator not fitted, "
"call `fit` before exploiting the model.")
if check_input:
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
if issparse(X) and (X.indices.dtype != np.intc or
X.indptr.dtype != np.intc):
raise ValueError("No support for np.int64 index based "
"sparse matrices")
# so, this check fails because X is the original X, not with the max_features applied
n_features = X.shape[1]
if self.n_features_ != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is %s and "
"input n_features is %s "
% (self.n_features_, n_features))
return X
So, I am not sure on how you can handle this. Maybe figure out the percentage that leads to just the two features you need - even though I am not sure it'll work as expected.
Note: I am using scikit-learn v.0.18
Edit: as #Vivek Kumar commented this is an issue and upgrading to 0.20 should do the trick.

Add 1 millisecond to a Time/DateTime object

Is there a way to add 1 millisecond to a Time/DateTime object in Ruby?
For an Webservice Request i need a time scoping with milliseconds:
irb(main):034:0> time_start = Date.today.to_time.utc.iso8601(3)
=> "2016-09-27T22:00:00.000Z"
irb(main):035:0> time_end = ((Date.today + 1).to_time).utc.iso8601(3)
=> "2016-09-28T22:00:00.000Z"
-- or --
irb(main):036:0> time_end = ((Date.today + 1).to_time - 1).utc.iso8601(3)
=> "2016-09-28T21:59:59.000Z"
So I'm near my prefered solution, but time_end should be 2016-09-28T21:59:59.999Z.
I didn't find solutions that Ruby can handle calculating with milliseconds. I only did it with strftime, but it would be great if there is a possibility to calculate.
-- This works, but hard coded --
time_end = ((Date.today + 1).to_time - 1).utc.strftime("%Y-%m-%dT%H:%M:%S.999Z")
=> "2016-09-28T21:59:59.999Z"
FYI: I'm on plain Ruby, no Rails.
Ok i found a solution. With real calculation i looks like.
time_end = ((Date.today + 1).to_time - 1/1001).utc.iso8601(3)
=> "2016-09-28T21:59:59.999Z"
EXAMPLE
Formatting in iso8601(3) is only to show behavior.
irb(main):055:0> Date.today.to_time.iso8601(3)
=> "2016-09-28T00:00:00.000+02:00
Adding a millisecond"
irb(main):058:0> (Date.today.to_time + 1/1000.0).iso8601(3)
=> "2016-09-28T00:00:00.001+02:00"
Subtract a millisecond
!DONT USE, see result with subtracted 2 milliseconds!
irb(main):060:0> (Date.today.to_time - 1/1000.0).iso8601(3)
=> "2016-09-27T23:59:59.998+02:00"
USE
irb(main):061:0> (Date.today.to_time - 1/1001.0).iso8601(3)
=> "2016-09-27T23:59:59.999+02:00"

How to convert negative integers to binary in Ruby

Question 1: I cannot find a way to convert negative integers to binary in the following way. I am supposed to convert it like this.
-3 => "11111111111111111111111111111101"
I tried below:
sprintf('%b', -3) => "..101" # .. appears and does not show 111111 bit.
-3.to_s(2) => "-11" # This just adds - to the binary of the positive integer 3.
Question 2: Interestingly, if I use online converter, it tells me that binary of -3 is "00101101 00110011".
What is the difference between "11111111111111111111111111111101" and "00101101 00110011"?
Packing then unpacking will convert -3 to 4294967293 (232 - 3):
[-3].pack('L').unpack('L')
=> [4294967293]
sprintf('%b', [-3].pack('L').unpack('L')[0])
# => "11111111111111111111111111111101"
sprintf('%b', [3].pack('L').unpack('L')[0])
# => "11"
Try:
> 32.downto(0).map { |n| -3[n] }.join
#=> "111111111111111111111111111111101
Note: This applies to negative number's only.

How do I calculate a String's width in Ruby?

String.length will only tell me how many characters are in the String. (In fact, before Ruby 1.9, it will only tell me how many bytes, which is even less useful.)
I'd really like to be able to find out how many 'en' wide a String is. For example:
'foo'.width
# => 3
'moo'.width
# => 3.5 # m's, w's, etc. are wide
'foi'.width
# => 2.5 # i's, j's, etc. are narrow
'foo bar'.width
# => 6.25 # spaces are very narrow
Even better would be if I could get the first n en of a String:
'foo'[0, 2.en]
# => "fo"
'filial'[0, 3.en]
# => "fili"
'foo bar baz'[0, 4.5en]
# => "foo b"
And better still would be if I could strategize the whole thing. Some people think a space should be 0.25en, some think it should be 0.33, etc.
You should use the RMagick gem to render a "Draw" object using the font you want (you can load .ttf files and such)
The code would look something like this:
the_text = "TheTextYouWantTheWidthOf"
label = Draw.new
label.font = "Vera" #you can also specify a file name... check the rmagick docs to be sure
label.text_antialias(true)
label.font_style=Magick::NormalStyle
label.font_weight=Magick::BoldWeight
label.gravity=Magick::CenterGravity
label.text(0,0,the_text)
metrics = label.get_type_metrics(the_text)
width = metrics.width
height = metrics.height
You can see it in action in my button maker here: http://risingcode.com/button/everybodywangchungtonite
Use the ttfunk gem to read the metrics from the font file. You can then get the width of a string of text in em. Here's my pull request to get this example added to the gem.
require 'rubygems'
require 'ttfunk'
require 'valuable'
# Everything you never wanted to know about glyphs:
# http://chanae.walon.org/pub/ttf/ttf_glyphs.htm
# this code is a substantial reworking of:
# https://github.com/prawnpdf/ttfunk/blob/master/examples/metrics.rb
class Font
attr_reader :file
def initialize(path_to_file)
#file = TTFunk::File.open(path_to_file)
end
def width_of( string )
string.split('').map{|char| character_width( char )}.inject{|sum, x| sum + x}
end
def character_width( character )
width_in_units = ( horizontal_metrics.for( glyph_id( character )).advance_width )
width_in_units.to_f / units_per_em
end
def units_per_em
#u_per_em ||= file.header.units_per_em
end
def horizontal_metrics
#hm = file.horizontal_metrics
end
def glyph_id(character)
character_code = character.unpack("U*").first
file.cmap.unicode.first[character_code]
end
end
Here it is in action:
>> din = Font.new("#{File.dirname(__FILE__)}/../../fonts/DIN/DINPro-Light.ttf")
>> din.width_of("Hypertension")
=> 5.832
# which is correct! Hypertension in that font takes up about 5.832 em! It's over by maybe ... 0.015.
You could attempt to create a standarized "width proportion table" to calculate an aproximation, basically you need to store the width of each character and then traverse the string adding up the widths.
I found this table here:
Left, Width, Advance values for ArialBD16 'c' through 'm'
Letter Left Width Advance
c 1 7 9
d 1 8 10
e 1 8 9
f 0 6 5
g 0 9 10
h 1 8 10
i 1 2 4
j -1 4 4
k 1 8 9
l 1 2 4
m 1 12 14
If you want to get serious, I'd start by looking at webkit, gecko, and OO.org, but I guess the algorithms for kerning and size calculation are not trivial.
If you have ImageMagick installed you can access this information from the command line.
$ convert xc: -font ./.fonts/HelveticaRoundedLTStd-Bd.otf -pointsize 24 -debug annotate -annotate 0 'MyTestString' null: 2>&1
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/RenderFreetype/1155/Annotate
Font ./.fonts/HelveticaRoundedLTStd-Bd.otf; font-encoding none; text-encoding none; pointsize 24
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/GetTypeMetrics/736/Annotate
Metrics: text: MyTestString; width: 157; height: 29; ascent: 18; descent: -7; max advance: 24; bounds: 0,-5 20,17; origin: 158,0; pixels per em: 24,24; underline position: -1.5625; underline thickness: 0.78125
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/RenderFreetype/1155/Annotate
Font ./.fonts/HelveticaRoundedLTStd-Bd.otf; font-encoding none; text-encoding none; pointsize 24
To do it from Ruby, use backticks:
result = `convert xc: -font #{path_to_font} -pointsize #{size} -debug annotate -annotate 0 '#{string}' null: 2>&1`
if result =~ /width: (\d+);/
$1
end
This is a good problem!
I'm trying to solve it using pango/cairo in ruby for SVG output. I am probably going to use pango to calculate the width and then use a simple svg element.
I use the following code:
require "cairo"
require "pango"
paper = Cairo::Paper::A4_LANDSCAPE
TEXT = "Don't you love me anymore?"
def pac(surface)
cr = Cairo::Context.new(surface)
cr.select_font_face("Calibri",
Cairo::FONT_SLANT_NORMAL,
Cairo::FONT_WEIGHT_NORMAL)
cr.set_font_size(12)
extents = cr.text_extents(TEXT)
puts extents
end
Cairo::ImageSurface.new(*paper.size("pt")) do |surface|
cr = pac(surface)
end
Once I had to display a string array (containing the coming world days, current namedays, etc) in two lines, putting the linebreak after the appropriate string I had to determine the cumulative widths of the strings, printed in Arial. I opened my word editor, typed the alphabet, and I classified the characters into two classes, based on their width in the given font:
w="023456789AÁBCDEFGHJKLMNOÓÖŐPQRSTUÚÜŰWZYaábcdeghksoóöőpqwuúüűzymn".chars.yield_self{|z| z.zip(Array.new(z.size){1.5})}.to_h.merge("1rfiíjltIÍ ".chars.yield_self{|z| z.zip(Array.new(z.size){1})}.to_h)
w.default=1
nntd=["01-21:A vallások világnapja", "01-19:Kanut", "Kenéz", "Margaréta", "Márió", "Máriusz", "Megyer", "Sára", "Szultána", "Vázsony"]
nntd.sort_by!{|z| z.chars.map{|q| w[q]}.sum}.reverse
Then I was able to determine the position of the linebreak:
ind=nntd.collect.with_index.find_index{|z,i| nntd[0..i].join.chars.map{|q| w[q]}.sum >=nntd.join.chars.map{|q| w[q]}.sum/2}
t=[nntd[0..ind],nntd[ind+1..-1]].map{|z| z.join(",")}.join("\n")
After all I got a nice, balanced output, divided into two lines:
01-21:A vallások világnapja,01-19:Margaréta,Szultána
Vázsony,Máriusz,Megyer,Kenéz,Kanut,Márió,Sára
This way I can check with an eyeblink the incoming world days, and current namedays.

Resources