ruby open pathname with specified encoding - ruby

I am trying to open files telling Ruby 1.9.3 to treat them as UTF-8 encoding.
require 'pathname'
Pathname.glob("/Users/Wes/Desktop/uf2/*.ics").each { |f|
puts f.read(["encoding:UTF-8"])
}
The class documentation goes through several levels of indirection, so I am not sure I am specifying the encoding properly. When I try it, however, I get this error message
ICS_scanner_strucdoc.rb:4:in read': can't convert Array into Integer (TypeError)
from ICS_scanner_strucdoc.rb:4:inread'
from ICS_scanner_strucdoc.rb:4:in block in <main>'
from ICS_scanner_strucdoc.rb:3:ineach'
from ICS_scanner_strucdoc.rb:3:in `'
This error message leads me to believe that read is trying to interpret the open_args as the optional leading argument, which would be the length of the read.
If I put the optional parameters in, as in puts f.read(100000, 0, ["encoding:UTF-8"]) I get an error message that says there are too many arguments.
What is the appropriate way to specify only the encoding? Would it be correct to say that this is an inconsistency between the documentation and the behavior of the class?
Mac OS 10.8
rvm current reports "ruby-1.9.3-p484"

I'm not sure you want to specify encoding for path name or for file itself.
If it is latter, this maybe what you want.
Pathname.glob("/Users/Wes/Desktop/uf2/*.ics").each { |f|
puts File.open(f,"r:UTF-8")
}
With Pathname.read you can write like this.
Pathname.glob("/Users/Wes/Desktop/uf2/*.ics").each do |f|
path = Pathname(f)
puts path.read
end

Related

ASCII incompatible encoding with normal run, not in debug mode

I'm really confused on this one, and maybe it's a bug in Ruby 2.6.2. I have files that were written as UTF-8 with BOM, so I'm using the following:
filelist = Dir.entries(#input_dirname).join(' ')
filelist = filelist.split(' ').grep(/xml/)
filelist.each do |indfile|
filecontents_tmp = File.read("#{#input_dirname}/#{indfile}", :encoding =>'bom|utf-8')
puts filecontents_tmp
end
If I put a debug breakpoint at the puts line, my file is read in properly. If I just run the simple script, I get the following error:
in `read': ASCII incompatible encoding needs binmode (ArgumentError)
I'm confused as to why this would work in debug, but not when run normally. Ideas?
Have you tried printing the default encoding when you run the file as opposed to when you debug the file? There are 3 ways to set / change the encoding in Ruby (that I'm aware of), so I wonder if it's different between running the file and debugging. You should be able to tell by printing the default encoding: puts Encoding.default_external.
As for actually fixing the issue, I ran into a similar problem and found this answer which said to add bin mode as an option to the File.open call and it worked for me.

How to declare a ruby constant (RMagick) in a YAML file

I am looking for a possibility to declare a Magick::StyleType constant in a .yml file and then to load this constant into a ruby file.
Or if that's not possible then I need to know how to convert a String into a Magick::StyleType constant in ruby.
Here are the details:
I am trying to write a ruby program, which places some text on a picture and I use the RMagick interface for it.
In my ruby program I have a method which specifies different properties of the text like font-family or font-style. This method includes the line:
self.font_style = ItalicStyle
Now I want to store all changeable parameters in a YAML-config file (config.yml), so this config.yml includes these lines:
#font style (like bold, italic and so on)
:font_style: ItalicStyle
Now I load the config.yml in my ruby file and the above mentioned line in my method reads now
self.font_style = config_file[:font_style]
When I run my ruby file now I get the error message:
`font_style=': wrong enumeration type - expected Magick::StyleType, got String (TypeError)
So after having searched a little about the topic I changed my config.yml first to
:font_style: !/ruby/constant ItalicStyle
which got me the same error message as above and then I tried this:
:font_style: !/ruby/symbol :ItalicStyle
and got this error message:
`font_style=': wrong enumeration type - expected Magick::StyleType, got Symbol (TypeError)
Then I checked in irb:
require 'rmagick' => true
Magick.const_get(ItalicStyle) => ItalicStyle=2
Magick.const_get(ItalicStyle).class => Magick::StyleType
So, finally I get to my question: How do I need to change the line
:font_style: !/ruby/symbol :ItalicStyle
in my config.yml file so that when loaded into my ruby file ItalicStyle will be recognized as a Magick::StyleType constant?
Or when I leave
:font_style: ItalicStyle
in the config.yml and load the ItalicStyle as a String into my ruby file: is there a possibility, to convert ItalicStyle from a String to the Magick:StyleType constant in the ruby file directly?
I would be really happy if someone could help. I have tried for days to find a solution and I really need it for my project.
I would just store a String in the YAML file, because that is easier to write and to read:
:font_style: ItalicStyle
Than I would get the constant by its name to configure Magick:
self.font_style = Object.const_get(config_file[:font_style])

Encoding problems with ruby while reading in command line arguments with optparse

I'm writing a small programm in ruby, which essentially changes some files within a zip-file. The zip-file is specified as a parameter on the command line and interpreted via the OptionParser.
The problem is, that when specifiying a file, which contains non-ascii characters, the file cannot be opened, saying that it could not be found. This problem occurs using cmd.exe under Windows.
Here is a minimal example:
# example.rb
require "zip"
require "optparse"
zip_file_name = String.new
# read and interprete command line arguments:
OptionParser.new do |opts|
opts.on("-f", "--file FILE", String, "The zip-file, which will be modified") do |f|
zip_file_name = f
end
end.parse!
# Open the zip file:
Zip::File.open(zip_file_name) do |zipfile|
end
If you create a zip-file test.zip and run example.rb -f test.zip everything is okay (it does finish without errors). Doing the same with a zip-file täst.zip gives me an error. I tried doing zip_file_name.encode!(Encoding::UTF_8), but this didn't solve the problem.
It seems to be an encoding problem (the encoding of zip_file_name is cp850) but the transcoding does not seem to work correctly.
So my question would be: How can I change my program to also allow non-ascii characters for specifying files on the command line?
Adding zip_file_name.force_encoding(Encoding::Windows_1252) before opening the file solves the issue (on Western Europe Windows).
Apparently, the CP850 file names encoding is a wrong assumption from Ruby. On my Windows system, it seems that filenames are encoded in Windows_1252 (a custom version of Latin1 or ISO 8859-1).

How to avoid undefined method error for Nilclass

I use the dbf gem to read data out of an df file. I wrote some code:
# encoding: UTF-8
require 'dbf'
widgets = DBF::Table.new("patient.dbf")
widgets.each do |record|
puts record.vorname
end
Basically the code works but after ruby writes about 400 record.vorname to the console i get this error:
...
Gisela
G?nter
mycode.rb:5:in `block in <main>': undefined method `vorname' for nil:NilClass (NoM
ethodError)
from C:/RailsInstaller/Ruby1.9.3/lib/ruby/gems/1.9.1/gems/dbf-2.0.6/lib/
dbf/table.rb:101:in `block in each'
......
My question is how can i avoid this error? Therefore it would be intresting why ( how you can see in the error) the record.vorname's with ä,ö,ü are displayed like ?,?,? for eg:
Günter is transformed to G?nter
Thanks
For some reason, your DBF driver returns nil records. You can pretend that this problem doesn't exist by skipping those.
widgets.each do |record|
puts record.vorname if record
end
About your question about the wrong chars, according to the dfb documentation:
Encodings (Code Pages)
dBase supports encoding non-english characters in different formats.
Unfortunately, the format used is not always set, so you may have to
specify it manually. For example, you have a DBF file from Russia and
you are getting bad data. Try using the 'Russion OEM' encoding:
table = DBF::Table.new('dbf/books.dbf', nil, 'cp866')
See doc/supported_encodings.csv for a full list of supported
encodings.
So make sure you use the right encoding to read from the DB.
To avoid the NoMethodError for nil:Nil Class you can probably try this:
require 'dbf'
widgets = DBF::Table.new("patient.dbf")
widgets.each do |record|
puts record.vorname unless record.blank?
end

Invalid characters before my XML in Ruby

When I look in an XML file, it looks fine, and starts with <?xml version="1.0" encoding="utf-16le" standalone="yes"?>
But when I read it in Ruby and print it to stout, there are two ?s in front of that: ??<?xml version="1.0" encoding="utf-16le" standalone="yes"?>
Where do these come from, and how do I remove them? Parsing it like this with REXML fails immediately. Removing the first to characters and then parsing it, gives me this error:
REXML::ParseException: #<REXML::ParseException: malformed XML: missing tag start
Line:
Position:
Last 80 unconsumed characters:
<?xml version="1.0" encoding="utf-16le" s>
What is the right way to handle this?
Edit: Below is my code. The ftp.get downloads the xml from an ftp server. (I wonder if that might be relevant.)
xml = ftp.get
puts xml
until xml[0,1] == "<" # to remove the 2 invalid characters
puts xml[0,2]
xml.slice! 0
end
puts xml
document = REXML::Document.new(xml)
The last puts prints the correct xml. But because of the two invalid characters, I've got the feeling something else went wrong. It shouldn't be necessary to remove anything. I'm at a loss what the problem might be, though.
Edit 2: I'm using Net::FTP to download the XML, but with this new method that lets me read the contents into a string instead of a file:
class Net::FTP
def gettextcontent(remotefile, &block) # :yield: line
f = StringIO.new()
begin
retrlines("RETR " + remotefile) do |line|
f.puts(line)
yield(line) if block
end
ensure
f.close
return f
end
end
end
Edit 3: It seems to be caused by StringIO (in Ruby 1.8.7) not supporting unicode. I'm not sure if there's a workaround for that.
Those 2 characters are most likely a unicode bom: bytes that tell whoever is reading the file what the byte order is.
As long as you know what the encoding of the file is, it should be safe to strip them - they aren't actual content
To answer my own question, the real problem here is that encoding support in Ruby 1.8.7 is lacking. StringIO is particular seems to make a mess of it. REXML also has trouble handling unicode in Ruby 1.8.7.
The most attractive solution would be of course to upgrade to 1.9.3, but that's not practical for this project right now.
So what I ended up doing is, avoid StringIO and simply download to a file on disk, and then instead of processing the XML with REXML, use nokogiri instead.
Together, that solves all my problems.

Resources