ruby: building string with length constraint composed from many variable length strings - ruby

I thought I'd throw out this problem to see what elegant solutions folk
could come up with and, in the process, hopefully learn some new ruby
tricks.
I'll set the problem in the context of producing a twitter message,
which has a maximum length of 140 characters. I'm looking for a concise
function that will deliver a tweet no longer than 140 characters from
three inputs: text_a (mandatory), text_b (optional), boolean that
triggers a function that returns a string (optional).
(I've used the twitter-text gem to take byte, char, and encoding issues
out of play, as that is not the focus of the problem.)
The main constraint is that to achieve the required maximum length, it
is text_a that must be truncated.
Here's some long-winded sample code (working, I think) that hopefully
makes the requirement clear.
# encoding: utf-8
require 'twitter-text'
def tweet(text_a, text_b=nil, suffix=false)
text = "fixed preamble #{text_a}"
text << " #{text_b}" if text_b
text << get_suffix if suffix
return text unless Twitter::Validation.tweet_invalid?(text) == :too_long
excess_length = Twitter::Validation.tweet_length(text) - Twitter::Validation::MAX_LENGTH
text_a = text_a[0..-(excess_length + 1)]
text = "fixed preamble #{text_a}"
text << " #{text_b}" if text_b
text << get_suffix if suffix
text
end
def get_suffix
" some generated suffix"
end
It's ugly, especially with the duplication. Ideas?

Why not build the string properly in the first place?
def tweet(text_a, text_b=nil, suffix=false)
text = ""
text << " #{text_b}" if text_b
text << get_suffix if suffix
space = Twitter::Validation::MAX_LENGTH - Twitter::Validation.tweet_length(text)
raise "too long" unless space > 0
"fixed preamble #{text_a}"[0, space] + text
end

Related

Parsing PDF removing month

I'm parsing a pdf that has some dates by splitting the lines and then searching them. The following are example lines:
Posted Date: 02/11/2015
Effective Date: 02/05/2015
When I find Posted Date, I split on the : and pull out 02/11/2015. But when I do the same for effective date, it only returns /05/2015. When I write all lines, it displays that date as /05/2015 while the PDF has the 02. Would 02 be converted to nil for some reason? Am I missing something?
lines = reader.pages[0].text.split(/\r?\n/)
lines.each_with_index do |line, index|
values_to_insert = []
if line.include? "Legal Name:"
name_line = line.split(":")
values_to_insert.push(name_line[1])
end
if line.include? "Active/Pending Insurance"
topLine = lines[index+2].split(" ")
middleLine = lines[index+5].split(" ")
insuranceLine = lines[index + 7]
insurance_line_split = insuranceLine.split(" ")
insurance_line_split.each_with_index do |word, i|
if word.include? "Insurance"
values_to_insert.push(insuranceLine.split(":")[1])
end
end
topLine.each_with_index do |word, i|
if word.include? "Posted"
values_to_insert.push(topLine[i + 2])
end
end
middleLine.each_with_index do |word, i|
if word.include? "Effective" or word.include? "Cancellation"
#puts middleLine[0]
puts middleLine[1]
#puts middleLine[i + 1].split(":")[1]
end
end
end
end
Here is what happens when I print all lines:
Active/Pending Insurance:
Form: 91X Type: BIPD/Primary Posted Date: 02/11
/2015
Policy/Surety Number:A 3491819 Coverage From: $0
To: $1,000,000
Effective Date:/05/2015 Cancellation Date:
Insurance Carrier: PROGRESSIVE EXPRESS INSURANCE COMPANY
Attn: CUSTOMER SERVICE
Address: P. O. BOX 94739
CLEVELAND, OH 44101 US
Telephone: (800) 444 - 4487 Fax: (440) 603 - 4555
Edited to show the code and even add a picture. I'm splitting by lines and then splitting again on colons and sometimes spaces. It's not amazingly clean but I don't think there's a much better way.
The problem occurs at positions where multiple pieces of text are on the same line but don't use exactly the same base line. In case of the PDF at hands,
(at least) the policy number and the effective date are positioned slightly higher than their respective labels.
The cause for this is the way the pdf-reader library used by the OP brings together the text pieces drawn on the page:
It determines a number of columns and rows to arrange the letters in and
creates an array of the rows number of strings filled with the columns number of spaces.
It then combines consecutive text pieces from the PDF on exactly the same base line and
finally puts these combined text pieces into the string array starting from the position best matching their starting position in the PDF.
As fonts used in PDFs usually are not monospaced, this procedure can result in overlapping strings, i.e. erasure of one of the two. The step combining strings on the same baseline prevents erasure in that case, but for strings on slightly different base lines, this overlapping effect can still occur.
What one can do, is increase the number of columns used here.
The library in page_layout.rb defines
def col_count
#col_count ||= ((#page_width / #mean_glyph_width) * 1.05).floor
end
As you see there already is some magic number 1.05 in use to slightly increase the number of columns. By increasing this number even more, no erasures as observed by the OP should occur anymore. One should not increase the factor too much, though, because that can introduce unwanted space characters where none belong.
The OP reported that increasing the magic number to 1.10 sufficed in his case.

Ruby + Prawn PDF: How to add space prefix in string inside of a table?

I have a two dimensional array:
line_items = []
line_item.product.book_versions.each do |book_version|
data = []
data << ""
data << " #{book_version.book.title} - #{book_version.isbn}" #<-- notice the extra spaces in the beginning of the string
data << "#{line_item.quantity}"
line_items << data
end
And I load this data into my table with pdf.table line_items ... do ... end
However, the extra spaces in my 2nd column don't show up. How would I escape these spaces so that they aren't stripped?
Depending on what you want to do, you can also use the constant Prawn::Text::NBSP. If it is purely blank space, then the column padding is what you want. However, I had a situation where I had to simulate a "checkmark space" such that an X character was underlined. My table looked like:
table([["<u>X</u>", "Do you agree to these terms and conditions?"]]) do
columns(0).style(:inline_format => true)
end
However, this produced a simple X with an underline. I wanted to underlined section to be wider, i.e., space (blank) characters that still received an underline. So I changed the table data to be
table([["<u>#{Prawn::Text::NBSP*3} X #{Prawn::Text::NBSP*3}</u>", ...]]) do
Then in the PDF, it looked like I wanted: ___X___ (with the X obviously underlined too).
You can do a hack with a character that prawns does not understand
I was able to do it with adding: ⁔ character before the text.
Prawns did not understand it, and ended up giving me a space :)
Your best bet will probably be to use a custom padding for that column. Something like this:
pdf.table line_items do
column(1).padding = [5, 5, 5, 30] # Default padding is 5; just increase the left padding
...
end
I don't think it is an escaping problem, maybe you should use a more formal way for spacing, try using the \t character instead of spaces. It is intended for that use.
line_items = []
line_item.product.book_versions.each do |book_version|
data = []
data << ""
data << "\t\t#{book_version.book.title} - #{book_version.isbn}"
data << "#{line_item.quantity}"
line_items << data
end

Can't convert String into Integer (TypeError)

I am learning Ruby
I am trying to create a simple script that will convert a given number to roman numerals (old style roman numerals)
I am unable to understand why I get the "can't convert String into Integer (TypeError)"
def convert_to_roman number
romans_array = [[1000,'M'],[500,'D'],[100,'C'],[50,'L'],[10,'X'],[5,'V'][1,'I']]
converted_array = []
romans_array.each do |rom_num|
num = rom_num[0]
letter = rom_num[1]
if number > num
times = number / num
roman_letter = letter*times
converted_array.push(roman_letter)
number = number % num
end
end
converted_array.join()
end
number = ''
puts 'please write a number and I will convert it to old style Roman numerals :)'
puts 'p.s. to exit this program simply hit enter on an empty line, or type 0 and enter :)'
while number != 0
number = gets.chomp.to_i
puts convert_to_roman number
end
My code is at:
https://github.com/stefanonyn/ruby-excercises/blob/master/roman_numerals.rb
You will see that at the end of the file commented out there is an old revision of the code, which actually does work but has a lot of repetition.
I would appreciate if someone could clarify why I get the error described above.
Please don't write the code for me, I am trying to learn Ruby, I would appreciate just some support in moving to the next step.
Thank you very much!
You are missing a comma in your array
romans_array = [[1000,'M'],[500,'D'],[100,'C'],[50,'L'],[10,'X'],[5,'V'][1,'I']]
^ here
This error is definitely not all that helpful, but the reason that it is appearing is that to the interpreter it looks like you are attempting to access a range of indexes in the [5,'V'] array for the last element. However the index's that are being provided go from 1 to 'I' which of course makes no sense. If it had been written [5,'V'][1,1] the last element of the array would be ['V'], which might have been even more confusing to debug!

Ruby: Fuzzing through all unicode characters ‎(UTF8/Encoding/String Manipulation)

I can't iterate over the entire range of unicode characters.
I searched everywhere...
I am building a fuzzer and want to embed into a url, all unicode characters (one at a time).
For example:
http://www.example.com?a=\uff1c
I know that there are some built tools but I need more flexibility.
If i could do someting like the following: "\u" + "ff1c" it would be great.
This is the closest I got:
char = "\u0000"
...
#within iteration
char.succ!
...
but after the character "\u0039", which is the number 9, I will get "10" instead of ":"
You could use pack to convert numbers to UTF8 characters but I'm not sure if this solves your problem.
You can either create an array with numeric values of all the characters and use pack to get an UTF8 string or you can just loop from 0 to whatever you need and use pack within the loop.
I've written a small example to explain myself. The code below prints out the hex value of each character followed by the character itself.
0.upto(100) do |i|
puts "%04x" % i + ": " + [i].pack("U*")
end
Here's some simpler code, albeit slightly obfuscated, that takes advantage of the fact that Ruby will convert an integer on the right hand side of the << operator to a codepoint. This only works with Ruby 1.8 up for integer values <= 255. It will work for values greater than 255 in 1.9.
0.upto(100) do |i|
puts "" << i
end

problem with parsing string from excel file

i have ruby code to parse data in excel file using Parseexcel gem. I need to save 2 columns in that file into a Hash, here is my code:
worksheet.each { |row|
if row != nil
key = row.at(1).to_s.strip
value = row.at(0).to_s.strip
if !parts.has_key?(key) and key.length > 0
parts[key] = value
end
end
}
however it still save duplicate keys into the hash: "020098-10". I checked the excel file at the specified row and found the difference are " 020098-10" and "020098-10". the first one has a leading space while the second doesn't. I dont' understand is it true that .strip function already remove all leading and trailing white space?
also when i tried to print out key.length, it gave me these weird number:
020098-10 length 18
020098-10 length 17
which should be 9....
If you will inspect the strings you receive, you will probably get something like:
" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"
This happens because of the strings encoding. Excel works with unicode while ruby uses ISO-8859-1 by default. The encodings will differ on various platforms.
You need to convert the data you receive from excel to a printable encoding.
However when you should not encode strings created in ruby as you will end with garbage.
Consider this code:
#enc = Encoding::Converter.new("UTF-16LE", "UTF-8")
def convert(cell)
if cell.numeric
cell.value
else
#enc.convert(cell.value).strip
end
end
parts = {}
worksheet.each do |row|
continue unless row
key = convert row.at(1)
value = convert row.at(0)
parts[key] = value unless parts.has_key?(key) or key.empty?
end
You may want change the encodings to a different ones.
The newer Spreadsheet-gem handles charset conversion automatically for you, to UTF-8 I think as standard but you can change it, so I'd recommend using it instead.

Resources