How do I calculate a String's width in Ruby? - ruby

String.length will only tell me how many characters are in the String. (In fact, before Ruby 1.9, it will only tell me how many bytes, which is even less useful.)
I'd really like to be able to find out how many 'en' wide a String is. For example:
'foo'.width
# => 3
'moo'.width
# => 3.5 # m's, w's, etc. are wide
'foi'.width
# => 2.5 # i's, j's, etc. are narrow
'foo bar'.width
# => 6.25 # spaces are very narrow
Even better would be if I could get the first n en of a String:
'foo'[0, 2.en]
# => "fo"
'filial'[0, 3.en]
# => "fili"
'foo bar baz'[0, 4.5en]
# => "foo b"
And better still would be if I could strategize the whole thing. Some people think a space should be 0.25en, some think it should be 0.33, etc.

You should use the RMagick gem to render a "Draw" object using the font you want (you can load .ttf files and such)
The code would look something like this:
the_text = "TheTextYouWantTheWidthOf"
label = Draw.new
label.font = "Vera" #you can also specify a file name... check the rmagick docs to be sure
label.text_antialias(true)
label.font_style=Magick::NormalStyle
label.font_weight=Magick::BoldWeight
label.gravity=Magick::CenterGravity
label.text(0,0,the_text)
metrics = label.get_type_metrics(the_text)
width = metrics.width
height = metrics.height
You can see it in action in my button maker here: http://risingcode.com/button/everybodywangchungtonite

Use the ttfunk gem to read the metrics from the font file. You can then get the width of a string of text in em. Here's my pull request to get this example added to the gem.
require 'rubygems'
require 'ttfunk'
require 'valuable'
# Everything you never wanted to know about glyphs:
# http://chanae.walon.org/pub/ttf/ttf_glyphs.htm
# this code is a substantial reworking of:
# https://github.com/prawnpdf/ttfunk/blob/master/examples/metrics.rb
class Font
attr_reader :file
def initialize(path_to_file)
#file = TTFunk::File.open(path_to_file)
end
def width_of( string )
string.split('').map{|char| character_width( char )}.inject{|sum, x| sum + x}
end
def character_width( character )
width_in_units = ( horizontal_metrics.for( glyph_id( character )).advance_width )
width_in_units.to_f / units_per_em
end
def units_per_em
#u_per_em ||= file.header.units_per_em
end
def horizontal_metrics
#hm = file.horizontal_metrics
end
def glyph_id(character)
character_code = character.unpack("U*").first
file.cmap.unicode.first[character_code]
end
end
Here it is in action:
>> din = Font.new("#{File.dirname(__FILE__)}/../../fonts/DIN/DINPro-Light.ttf")
>> din.width_of("Hypertension")
=> 5.832
# which is correct! Hypertension in that font takes up about 5.832 em! It's over by maybe ... 0.015.

You could attempt to create a standarized "width proportion table" to calculate an aproximation, basically you need to store the width of each character and then traverse the string adding up the widths.
I found this table here:
Left, Width, Advance values for ArialBD16 'c' through 'm'
Letter Left Width Advance
c 1 7 9
d 1 8 10
e 1 8 9
f 0 6 5
g 0 9 10
h 1 8 10
i 1 2 4
j -1 4 4
k 1 8 9
l 1 2 4
m 1 12 14
If you want to get serious, I'd start by looking at webkit, gecko, and OO.org, but I guess the algorithms for kerning and size calculation are not trivial.

If you have ImageMagick installed you can access this information from the command line.
$ convert xc: -font ./.fonts/HelveticaRoundedLTStd-Bd.otf -pointsize 24 -debug annotate -annotate 0 'MyTestString' null: 2>&1
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/RenderFreetype/1155/Annotate
Font ./.fonts/HelveticaRoundedLTStd-Bd.otf; font-encoding none; text-encoding none; pointsize 24
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/GetTypeMetrics/736/Annotate
Metrics: text: MyTestString; width: 157; height: 29; ascent: 18; descent: -7; max advance: 24; bounds: 0,-5 20,17; origin: 158,0; pixels per em: 24,24; underline position: -1.5625; underline thickness: 0.78125
2010-11-02T19:17:48+00:00 0:00.010 0.010u 6.6.5 Annotate convert[22496]: annotate.c/RenderFreetype/1155/Annotate
Font ./.fonts/HelveticaRoundedLTStd-Bd.otf; font-encoding none; text-encoding none; pointsize 24
To do it from Ruby, use backticks:
result = `convert xc: -font #{path_to_font} -pointsize #{size} -debug annotate -annotate 0 '#{string}' null: 2>&1`
if result =~ /width: (\d+);/
$1
end

This is a good problem!
I'm trying to solve it using pango/cairo in ruby for SVG output. I am probably going to use pango to calculate the width and then use a simple svg element.
I use the following code:
require "cairo"
require "pango"
paper = Cairo::Paper::A4_LANDSCAPE
TEXT = "Don't you love me anymore?"
def pac(surface)
cr = Cairo::Context.new(surface)
cr.select_font_face("Calibri",
Cairo::FONT_SLANT_NORMAL,
Cairo::FONT_WEIGHT_NORMAL)
cr.set_font_size(12)
extents = cr.text_extents(TEXT)
puts extents
end
Cairo::ImageSurface.new(*paper.size("pt")) do |surface|
cr = pac(surface)
end

Once I had to display a string array (containing the coming world days, current namedays, etc) in two lines, putting the linebreak after the appropriate string I had to determine the cumulative widths of the strings, printed in Arial. I opened my word editor, typed the alphabet, and I classified the characters into two classes, based on their width in the given font:
w="023456789AÁBCDEFGHJKLMNOÓÖŐPQRSTUÚÜŰWZYaábcdeghksoóöőpqwuúüűzymn".chars.yield_self{|z| z.zip(Array.new(z.size){1.5})}.to_h.merge("1rfiíjltIÍ ".chars.yield_self{|z| z.zip(Array.new(z.size){1})}.to_h)
w.default=1
nntd=["01-21:A vallások világnapja", "01-19:Kanut", "Kenéz", "Margaréta", "Márió", "Máriusz", "Megyer", "Sára", "Szultána", "Vázsony"]
nntd.sort_by!{|z| z.chars.map{|q| w[q]}.sum}.reverse
Then I was able to determine the position of the linebreak:
ind=nntd.collect.with_index.find_index{|z,i| nntd[0..i].join.chars.map{|q| w[q]}.sum >=nntd.join.chars.map{|q| w[q]}.sum/2}
t=[nntd[0..ind],nntd[ind+1..-1]].map{|z| z.join(",")}.join("\n")
After all I got a nice, balanced output, divided into two lines:
01-21:A vallások világnapja,01-19:Margaréta,Szultána
Vázsony,Máriusz,Megyer,Kenéz,Kanut,Márió,Sára
This way I can check with an eyeblink the incoming world days, and current namedays.

Related

Get unicode block element based on matrix

A unique question I guess, given these unciode block elements:
https://en.wikipedia.org/wiki/Block_Elements
I want to get the relevant block element based on the matrix I get, so
11
01 will give ▜
00
10 will give ▖
and so on
I managed to do this in python, but I wonder if anyone got a more elegant solution.
from itertools import product
elements = [0, 1]
a = product(elements, repeat=2)
b = product(a, repeat=2)
matrices = [c for c in b]
"""
Matrices generated possiblities
00 00 00 00 01 01 01 01 10 10 10 10 11 11 11 11
00 01 10 11 00 01 10 11 00 01 11 10 00 01 10 11
"""
blocks = [' ', '▗', '▖', '▄', '▝', '▐', '▞', '▟', '▘', '▚', '▙', '▌', '▀', '▜', '▛', '█']
given = (
(0,1),
(1,0)
)
print(blocks[matrices.index(given)])
output: ▞
These characters, although existing, were not meant to have a direct correlation
of numbers-to-set-1/4 blocks.
So, I have a solution in a published package, and it is not necessarily
more "elegant" than yours, as it is far more verbose.
However, the code around it allows one to "draw" on a text terminal
using these 1/4 blocks as pixels, in a somewhat clean API.
So, this is the class I use to set/reset pixels in a character block. The relevant methods can be used straight from the class, and they take the"pixel coordinates", and the current character block upon which to set or reset the addressed pixel. The code instantiates the class just to be able to use the in operator to check for block-characters.
The project can be installed with "pip install terminedia".
The function and class bellow, extracted from the project, will work in standalone to do the same as you do:
# Snippets from jsbueno/terminedia, v. 0.2.0
def _mirror_dict(dct):
"""Creates a new dictionary exchanging values for keys
Args:
- dct (mapping): Dictionary to be inverted
"""
return {value: key for key, value in dct.items()}
class BlockChars_:
"""Used internaly to emulate pixel setting/resetting/reading inside 1/4 block characters
Contains a listing and other mappings of all block characters used in order, so that
bits in numbers from 0 to 15 will match the "pixels" on the corresponding block character.
Although this class is purposed for internal use in the emulation of
a higher resolution canvas, its functions can be used by any application
that decides to manipulate block chars.
The class itself is stateless, and it is used as a single-instance which
uses the name :any:`BlockChars`. The instance is needed so that one can use the operator
``in`` to check if a character is a block-character.
"""
EMPTY = " "
QUADRANT_UPPER_LEFT = '\u2598'
QUADRANT_UPPER_RIGHT = '\u259D'
UPPER_HALF_BLOCK = '\u2580'
QUADRANT_LOWER_LEFT = '\u2596'
LEFT_HALF_BLOCK = '\u258C'
QUADRANT_UPPER_RIGHT_AND_LOWER_LEFT = '\u259E'
QUADRANT_UPPER_LEFT_AND_UPPER_RIGHT_AND_LOWER_LEFT = '\u259B'
QUADRANT_LOWER_RIGHT = '\u2597'
QUADRANT_UPPER_LEFT_AND_LOWER_RIGHT = '\u259A'
RIGHT_HALF_BLOCK = '\u2590'
QUADRANT_UPPER_LEFT_AND_UPPER_RIGHT_AND_LOWER_RIGHT = '\u259C'
LOWER_HALF_BLOCK = '\u2584'
QUADRANT_UPPER_LEFT_AND_LOWER_LEFT_AND_LOWER_RIGHT = '\u2599'
QUADRANT_UPPER_RIGHT_AND_LOWER_LEFT_AND_LOWER_RIGHT = '\u259F'
FULL_BLOCK = '\u2588'
# This depends on Python 3.6+ ordered behavior for local namespaces and dicts:
block_chars_by_name = {key: value for key, value in locals().items() if key.isupper()}
block_chars_to_name = _mirror_dict(block_chars_by_name)
blocks_in_order = {i: value for i, value in enumerate(block_chars_by_name.values())}
block_to_order = _mirror_dict(blocks_in_order)
def __contains__(self, char):
"""True if a char is a "pixel representing" block char"""
return char in self.block_chars_to_name
#classmethod
def _op(cls, pos, data, operation):
number = cls.block_to_order[data]
index = 2 ** (pos[0] + 2 * pos[1])
return operation(number, index)
#classmethod
def set(cls, pos, data):
""""Sets" a pixel in a block character
Args:
- pos (2-sequence): coordinate of the pixel inside the character
(0,0) is top-left corner, (1,1) bottom-right corner and so on)
- data: initial character to be composed with the bit to be set. Use
space ("\x20") to start with an empty block.
"""
op = lambda n, index: n | index
return cls.blocks_in_order[cls._op(pos, data, op)]
#classmethod
def reset(cls, pos, data):
""""resets" a pixel in a block character
Args:
- pos (2-sequence): coordinate of the pixel inside the character
(0,0) is top-left corner, (1,1) bottom-right corner and so on)
- data: initial character to be composed with the bit to be reset.
"""
op = lambda n, index: n & (0xf - index)
return cls.blocks_in_order[cls._op(pos, data, op)]
#classmethod
def get_at(cls, pos, data):
"""Retrieves whether a pixel in a block character is set
Args:
- pos (2-sequence): The pixel coordinate
- data (character): The character were to look at blocks.
Raises KeyError if an invalid character is passed in "data".
"""
op = lambda n, index: bool(n & index)
return cls._op(pos, data, op)
#: :any:`BlockChars_` single instance: enables ``__contains__``:
BlockChars = BlockChars_()
After pasting only this in the terminal it is possible to do:
In [131]: pixels = BlockChars.set((0,0), " ")
In [132]: print(BlockChars.set((1,1), pixels))
# And this internal "side-product" is closer to what you have posted:
In [133]: BlockChars.blocks_in_order[0b1111]
Out[133]: '█'
In [134]: BlockChars.blocks_in_order[0b1010]
Out[134]: '▐'
The project at https://github.com/jsbueno/terminedia have a complete
drawing API do use these as pixels in an ANSI text terminal -
including bezier curves, filled ellipses, and RGB image display
(check the "examples" folder)

Combine remote images to one image using ruby-vips

I had a template image and need to append on that a specific images on X , Y positions. Is there any equivalent to that function in rmagick
ImageList.new("https://365psd.com/images/istock/previews/8479/84796157-football-field-template-with-goal-on-top-view.jpg")
and draw on that other images and generate one image.
You can read and write URIs in ruby-vips like this:
#!/usr/bin/ruby
require "vips"
require "down"
def new_from_uri(uri)
byte_source = Down.open uri
source = Vips::SourceCustom.new
source.on_read do |length|
puts "reading #{length} bytes from #{uri} ..."
byte_source.read length
end
source.on_seek do |offset, whence|
puts "seeking to #{offset}, #{whence} in #{uri}"
byte_source.seek(offset, whence)
end
return Vips::Image.new_from_source source, "", access: :sequential
end
a = new_from_uri "https://upload.wikimedia.org/wikipedia/commons/a/a6/Big_Ben_Clock_Face.jpg"
b = new_from_uri "https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png"
out = a.composite b, "over", x: 100, y: 100
out.write_to_file "x.jpg"
If you watch the console output you can see it loading the two source images and interleaving the pixels. It makes this output:
The docs on Vips::Source have more details.

Improving an algorithm for substring search when reading ZIP files

So I have a ZIP reader library, and I read ZIP files by first figuring out where the EOCD record is (the standard way "from the tail"). I have to look for a pattern that is roughly this:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
The bytesize of comment is provided in the 2_bytes_of_comment_size. Just scanning for the magic number is insufficient, because I eager-read a substantial portion at the tail of the file - basically the maximum size the ZIP EOCD record can be, and then look for this pattern in there.
So far, I came up with this
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
but it just screams ugly at me. Is there a better way to do something like this? Is there a pack specifier that says "all the bytes in binary until the the end of the string" that I do not know of? Then I could tack that onto the end of the pack specifier for example... A bit at loss here.
In the end I opted for the following optimization. First, I made a method for finding all the indices of a given substring in a string - there is no stdlib builtin for this.
def all_indices_of_substr_in_str(of_substring, in_string)
last_i = 0
found_at_indices = []
while last_i = in_string.index(of_substring, last_i)
found_at_indices << last_i
last_i += of_substring.bytesize
end
found_at_indices
end
Then, we use it to "latch" onto the offsets in our buffer where our signature was found.
def locate_eocd_signature(in_str)
eocd_signature = 0x06054b50
eocd_signature_str = [eocd_signature].pack('V')
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
str_size = in_str.bytesize
indices = all_indices_of_substr_in_str(eocd_signature_str, in_str)
indices.each do |check_at|
maybe_record = in_str[check_at..str_size]
# If the record is smaller than the minimum - we will never recover anything
break if maybe_record.bytesize < minimum_record_size
# Now we check if the record ends with the combination
# of the comment size and an arbitrary byte string of that size.
# If it does - we found our match
*_unused, comment_size = maybe_record.unpack(unpack_pattern)
if (maybe_record.bytesize - minimum_record_size) == comment_size
return check_at # Found the EOCD marker location
end
end
# If we haven't caught anything, return nil deliberately instead of returning the last statement
nil
end

Getting the dimensions of the image in ruby

To get the image dimensions in ruby, I tried to use identify to get image dimensions. I wanted to retrieve the output of this system call and get the output as a string
str = system('identify -format "%[fx:w]x%[fx:h]" image.png')
output = `ls`
print output
But, I'm getting the last lines of output and not the output to this particular system call.
Also, if there is a simpler way to get the image dimensions without external gems or libraries, please suggest as it would be great !
Since you already use an external library (ImageMagick), you could use its Ruby wrapper RMagick:
require 'RMagick'
img = Magick::Image::read('image.png').first
arr = [img.columns, img.rows]
Here's an example of a very simple PNG parser:
data = File.binread('image.png', 100) # read first 100 bytes
if data[0, 8] == [137, 80, 78, 71, 13, 10, 26, 10].pack("C*")
# file has a PNG file signature, let's get the image header chunk
length, chunk_type = data[8, 8].unpack("l>a4")
raise "unknown format, expecting image header" unless chunk_type == "IHDR"
chunk_data = data[16, length].unpack("l>l>CCCCC")
width = chunk_data[0]
height = chunk_data[1]
bit_depth = chunk_data[2]
color_type = chunk_data[3]
compression_method = chunk_data[4]
filter_method = chunk_data[5]
interlace_method = chunk_data[6]
puts "image size: #{width}x#{height}"
else
# handle other formats
end
Okay, I finally found a solution after some experiments.
str = `identify -format "%[fx:w]x%[fx:h]" image.png`
arr = str.split('x')
The array arr now contains dimensions in it [width,height] .
This worked for me ! Please suggest other approaches that might be more easier or simpler.

Encoding issue with Sqlite3 in Ruby

I have a list of sql queries beautifully encoded in utf-8. I read them from files, perform the inserts and than do a select.
# encoding: utf-8
def exec_sql_lines(file_name)
puts "----> #{file_name} <----"
File.open(file_name, 'r') do |f|
# sometimes a query doesn't fit one line
previous_line=""
i = 0
while line = f.gets do
puts i+=1
if(line[-2] != ')')
previous_line += line[0..-2]
next
end
puts (previous_line + line) # <---- (1)
$db.execute((previous_line + line))
previous_line =""
end
a = $db.execute("select * from Table where _id=6")
puts a <---- (2)
end
end
$db=SQLite3::Database.new($DBNAME)
exec_sql_lines("creates.txt")
exec_sql_lines("inserts.txt")
$db.close
The text in (1) is different than the one in (2). Polish letters get broken. If I use IRB and call $db.open ; $db.encoding is says UTF-8.
Why do Polish letters come out broken? How to fix it?
I need this database properly encoded in UTF-8 for my Android app, so I'm not interested in manipulating with database output. I need to fix it's content.
EDIT
Significant lines from the output:
6
INSERT INTO 'Leki' VALUES (NULL, '6', 'Acenocoumarolum', 'Acenocumarol WZF', 'tabl. ', '4 mg', '60 tabl.', '5909990055715', '2012-01-01', '2 lata', '21.0, Leki przeciwzakrzepowe z grupy antagonistów witaminy K', '8.32', '12.07', '12.07', 'We wszystkich zarejestrowanych wskazaniach na dzień wydania decyzji', '', 'ryczałt', '5.12')
out:
6
6
Acenocoumarolum
Acenocumarol WZF
tabl.
4 mg
60 tabl.
5909990055715
2012-01-01
2 lata
21.0, Leki przeciwzakrzepowe z grupy antagonistĂł[<--HERE]w witaminy K
8.32
12.07
12.07
We wszystkich zarejestrowanych wskazaniach na dzieĹ[<--HERE] wydania decyzji
ryczaĹ[<--HERE]t
5.12
There are three default encoding.
In you code you set the source encoding.
Perhaps there is a problem with External and Internal Encoding?
A quick test in windows:
#encoding: utf-8
File.open(__FILE__,'r'){|f|
p f.external_encoding
p f.internal_encoding
p f.read.encoding
}
Result:
#<Encoding:CP850>
nil
#<Encoding:CP850>
Even if UTF-8 is used, the data are read as cp850.
In your case:
Does File.open(filename,'r:utf-8') help?

Resources