Encoding problems with ruby while reading in command line arguments with optparse - ruby

I'm writing a small programm in ruby, which essentially changes some files within a zip-file. The zip-file is specified as a parameter on the command line and interpreted via the OptionParser.
The problem is, that when specifiying a file, which contains non-ascii characters, the file cannot be opened, saying that it could not be found. This problem occurs using cmd.exe under Windows.
Here is a minimal example:
# example.rb
require "zip"
require "optparse"
zip_file_name = String.new
# read and interprete command line arguments:
OptionParser.new do |opts|
opts.on("-f", "--file FILE", String, "The zip-file, which will be modified") do |f|
zip_file_name = f
end
end.parse!
# Open the zip file:
Zip::File.open(zip_file_name) do |zipfile|
end
If you create a zip-file test.zip and run example.rb -f test.zip everything is okay (it does finish without errors). Doing the same with a zip-file täst.zip gives me an error. I tried doing zip_file_name.encode!(Encoding::UTF_8), but this didn't solve the problem.
It seems to be an encoding problem (the encoding of zip_file_name is cp850) but the transcoding does not seem to work correctly.
So my question would be: How can I change my program to also allow non-ascii characters for specifying files on the command line?

Adding zip_file_name.force_encoding(Encoding::Windows_1252) before opening the file solves the issue (on Western Europe Windows).
Apparently, the CP850 file names encoding is a wrong assumption from Ruby. On my Windows system, it seems that filenames are encoded in Windows_1252 (a custom version of Latin1 or ISO 8859-1).

Related

ASCII incompatible encoding with normal run, not in debug mode

I'm really confused on this one, and maybe it's a bug in Ruby 2.6.2. I have files that were written as UTF-8 with BOM, so I'm using the following:
filelist = Dir.entries(#input_dirname).join(' ')
filelist = filelist.split(' ').grep(/xml/)
filelist.each do |indfile|
filecontents_tmp = File.read("#{#input_dirname}/#{indfile}", :encoding =>'bom|utf-8')
puts filecontents_tmp
end
If I put a debug breakpoint at the puts line, my file is read in properly. If I just run the simple script, I get the following error:
in `read': ASCII incompatible encoding needs binmode (ArgumentError)
I'm confused as to why this would work in debug, but not when run normally. Ideas?
Have you tried printing the default encoding when you run the file as opposed to when you debug the file? There are 3 ways to set / change the encoding in Ruby (that I'm aware of), so I wonder if it's different between running the file and debugging. You should be able to tell by printing the default encoding: puts Encoding.default_external.
As for actually fixing the issue, I ran into a similar problem and found this answer which said to add bin mode as an option to the File.open call and it worked for me.

Ruby writing zip file works on Mac but not on windows / How to recieve zip file in Net::HTTP

actually i'm writing a ruby script which accesses an API based on HTTP-POST calls.
The API returns a zip file containing textdocuments when i call it with specific POST-Parameters. At the moment i'm doing that with the Net::HTTP Package.
Now my problem:
It seems to return the zip-file as a string as far as i know. I can see "PK" (which i suppose is part of the PK-Header of zip-files) and the text from the documents.
And the Content-Type Header is telling me "application/x-zip-compressed; name="somename.zip"".
When i save the zip file like so:
result = comodo.get_cert("<somenumber>")
puts result['Content-Type']
puts result.inspect
puts result.body
File.open("test.zip", "w") do |file|
file.write result.body
end
I can unzip it on my macbook without further problems. But when i run the same code on my Win10 PC it tells me that the file is corrupt or not a ZIP-file.
Has it something to do with the encoding? Can i change it, so it's working on both?
Or is it a complete wrong approach on how to recieve a zip-file from a POST-request?
PS:
My ruby-version on Mac:
ruby 2.2.3p173
My ruby-version on Windows:
ruby 2.2.4p230
Many thanks in advance!
The problem is due to the way Windows handles line endings (\r\n for Windows, whereas OS X and other Unix based operating systems use just \n). When using File.open, using the mode of just w makes the file subject to line ending changes, so any occurrences of byte 0x0A (or \n) are converted into bytes 0x0D 0x0A (or \r\n), which effectively breaks the zip.
When opening the file for write, use the mode wb instead, as this will suppress any line ending changes.
http://ruby-doc.org/core-2.2.0/IO.html#method-c-new-label-IO+Open+Mode
Many thanks! Just as you posted the solution i found it out myself..
So much trouble because of one missing 'b' :/
Thank you very much!
The solution (see Ben Y's answer):
result = comodo.get_cert("<somenumber>")
puts result['Content-Type']
puts result.inspect
puts result.body
File.open("test.zip", "wb") do |file|
file.write result.body
end

Copy yaml formatting (indent) from one file to another

A translator completely messed up a yaml file by copying everything into word (don't ask).
I have already cleaned up the file using regexes, but the indent (spacing) is now missing; everything starts at the first character:
es:
default_blocks:
thank_you_html: "thank you text"
instead of
en:
default_blocks:
thank_you_html: "thank you text"
Do you have a good idea on how to automatically copy the format/structure/indent from the correct file (say en.yml) to the corrupt one (say es.yml)? (I'm using textmate 2.0 as editor)
Thanks!
Assuming the original and the translation contain exactly the same strings per line (except for the indentation problem), a quick&dirty script scanning the leading whitespace may solve this:
#!/usr/bin/env ruby
# encoding: UTF-8
indented = File.readlines(ARGV[0]).map do |l|
l.scan(/^\s+/)[0]
end.zip(File.readlines(ARGV[1])).map { |e| e.join }.join
File.open(ARGV[1], "w") { |io| io.write(indented) }
Save it, make it executable and call
./script_name.rb en.yml es.yml
Wouldn't mess with Textmate if this is not a regular task, but you could easily transform this to a command and either prompt for the two files via a dialog or select both in the file browser, open one of them in the current tab and differentiate them via environment variables ($TM_FILEPATH, $TM_SELECTED_FILES)

How do I run a non-ASCII/Unicode shell command from Ruby on Windows?

I cannot figure out the proper way to encode a shell command to run from Ruby on Windows. The following script reproduces the problem:
# encoding: utf-8
def test(word)
returned = `echo #{word}`.chomp
puts "#{word} == #{returned}"
raise "Cannot roundtrip #{word}" unless word == returned
end
test "good"
test "bÃd"
puts "Success"
# win7, cmd.exe font set to Lucinda Console, chcp 65001
# good == good
# bÃd == bÃd
Is this a bug in Ruby, or do I need to encode the command string manually to a specific encoding, before it gets passed to the cmd.exe process?
Update: I want to make it clear that the problem is not with reading the output back into Ruby, its purely with sending the command to the shell. To demonstrate:
# encoding: utf-8
File.open("bbbÃd.txt", "w") do |f|
f.puts "nothing to see here"
end
filename = Dir.glob("bbb*.txt").first
command = "attrib #{filename}"
puts command.encoding
puts "#{filename} exists?: #{ File.exists?(filename) }"
system command
File.delete(filename)
#=>
# UTF-8
# bbbÃd.txt exists?: true
# File not found - bbbÃd.txt
You can see that the file gets created correctly, the File.exists? method confirms that Ruby can see it, but when I try to run the attrib command on it, its trying to use a different filename.
Try setting the environment variable LC_CTYPE like this:
LC_CTYPE=en_US.UTF-8
Set this globally in the command shell or inside your Ruby script:
ENV['LC_CTYPE']='en_US.UTF-8'
I had the same issue using drag-and-drop in windows.
When I dropped a file having unicode characters in it's name the unicode characters got replaced by question marks.
Tried everything with encoding, changing the drophandler etc.
The only thing that worked was creating a batch file with following contents.
ruby.exe -Eutf-8 C:\Users\user\myscript.rb %*
The batch file does receive the unicode characters correctly as you can see as you do an echo %* first followed by a pause
I needed to add the -Eutf-8 parameter to have the filename come in as UTF-8 in the script itself, having the following lines in my script were not enough
#encoding: UTF-8
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
Hope this helps people with similar problems.

Automatically open a file as binary with Ruby

I'm using Ruby 1.9 to open several files and copy them into an archive. Now there are some binary files, but some are not. Since Ruby 1.9 does not open binary files automatically as binaries, is there a way to open them automatically anyway? (So ".class" would be binary, ".txt" not)
Actually, the previous answer by Alex D is incomplete. While it's true that there is no "text" mode in Unix file systems, Ruby does make a difference between opening files in binary and non-binary mode:
s = File.open('/tmp/test.jpg', 'r') { |io| io.read }
s.encoding
=> #<Encoding:UTF-8>
is different from (note the "rb")
s = File.open('/tmp/test.jpg', 'rb') { |io| io.read }
s.encoding
=> #<Encoding:ASCII-8BIT>
The latter, as the docs say, set the external encoding to ASCII-8BIT which tells Ruby to not attempt to interpret the result at UTF-8. You can achieve the same thing by setting the encoding explicitly with s.force_encoding('ASCII-8BIT'). This is key if you want to read binary into a string and move them around (e.g. saving them to a database, etc.).
Since Ruby 1.9.1 there is a separate method for binary reading (IO.binread) and since 1.9.3 there is one for writing (IO.binwrite) as well:
For reading:
content = IO.binread(file)
For writing:
IO.binwrite(file, content)
Since IO is the parent class of File, you could also do the following which is probably more expressive:
content = File.binread(file)
File.binwrite(file, content)
On Unix-like platforms, there is no difference between opening files in "binary" and "text" modes. On Windows, "text" mode converts line breaks to DOS style, and "binary" mode does not.
Unless you need linebreak conversion on Windows platforms, just open all the files in "binary" mode. There is no harm in reading a text file in "binary" mode.
If you really want to distinguish, you will have to match File.extname(filename) against a list of known extensions like ".txt" and ".class".

Resources