Unicode filenames on Windows in Ruby - ruby

I have a piece of code that looks like this:
Dir.new(path).each do |entry|
puts entry
end
The problem comes when I have a file named こんにちは世界.txt in the directory that I list.
On a Windows 7 machine I get the output:
???????.txt
From googling around, properly reading this filename on windows seems to be an impossible task. Any suggestions?

I had the same problem & just figured out how to get the entries of a directory in UTF-8 in Windows. The following worked for me (using Ruby 1.9.2p136):
opts = {}
opts[:encoding] = "UTF-8"
entries = Dir.entries(path, opts)
entries.each do |entry|
# example
stat = File::stat(entry)
puts "Size: " + String(stat.size)
end

You're out of luck with pure ruby (either 1.8 or 1.9.1) since it uses the ANSI versions of the Windows API.
It seems like Ruby 1.9.2 will support Unicode filenames on Windows. This bug report has 1.9.2 as target. According to this announcement Ruby 1.9.2 will be released at the end of July 2010.
If you really need it earlier you could try to use FindFirstFileW etc. directly via Win32API.new or win32-api.

My solution was to use Dir.glob instead of Dir.entries. But it only works with * parameter. It does not work when passing a path (c:/dir/*). Tested in 1.9.2p290 and 1.9.3p0 on Windows 7.
There are many other issues with unicode paths on Windows. It is still an open issue. The patches are currently targeted at Ruby 2.0, which is rumored to be released in 2013.

Related

Ruby writing zip file works on Mac but not on windows / How to recieve zip file in Net::HTTP

actually i'm writing a ruby script which accesses an API based on HTTP-POST calls.
The API returns a zip file containing textdocuments when i call it with specific POST-Parameters. At the moment i'm doing that with the Net::HTTP Package.
Now my problem:
It seems to return the zip-file as a string as far as i know. I can see "PK" (which i suppose is part of the PK-Header of zip-files) and the text from the documents.
And the Content-Type Header is telling me "application/x-zip-compressed; name="somename.zip"".
When i save the zip file like so:
result = comodo.get_cert("<somenumber>")
puts result['Content-Type']
puts result.inspect
puts result.body
File.open("test.zip", "w") do |file|
file.write result.body
end
I can unzip it on my macbook without further problems. But when i run the same code on my Win10 PC it tells me that the file is corrupt or not a ZIP-file.
Has it something to do with the encoding? Can i change it, so it's working on both?
Or is it a complete wrong approach on how to recieve a zip-file from a POST-request?
PS:
My ruby-version on Mac:
ruby 2.2.3p173
My ruby-version on Windows:
ruby 2.2.4p230
Many thanks in advance!
The problem is due to the way Windows handles line endings (\r\n for Windows, whereas OS X and other Unix based operating systems use just \n). When using File.open, using the mode of just w makes the file subject to line ending changes, so any occurrences of byte 0x0A (or \n) are converted into bytes 0x0D 0x0A (or \r\n), which effectively breaks the zip.
When opening the file for write, use the mode wb instead, as this will suppress any line ending changes.
http://ruby-doc.org/core-2.2.0/IO.html#method-c-new-label-IO+Open+Mode
Many thanks! Just as you posted the solution i found it out myself..
So much trouble because of one missing 'b' :/
Thank you very much!
The solution (see Ben Y's answer):
result = comodo.get_cert("<somenumber>")
puts result['Content-Type']
puts result.inspect
puts result.body
File.open("test.zip", "wb") do |file|
file.write result.body
end

JSON Parser Acts Differently

I am trying to parse the following string called result:
{
"status":0,
"id":"faxxxxx-1",
"hypotheses":[
{"utterance":"skateboard","confidence":0.90466744},
{"utterance":"skate board"},
{"utterance":"skateboarding"},
{"utterance":"skateboards"},
{"utterance":"skate bored"}
]
}
Using obj = JSON.parse(result) in Ruby 1.8 with the json gem.
The command in question is:
puts "#{obj['hypotheses'][0]}"
My old workstation (whose harddrive died) gave me:
{"utterance" => "skateboard", "confidence" => 0.90466744}
My current workstation gives me:
confidence0.90466744utteranceskateboard
The old workstation was not set up by me, so I don't know what kind of packages were installed, while this current one was.
Why is there a difference in the output of the exact same script?
How can I make the current one look like the old one?
I am completely new to this btw.
In Ruby 1.8, Hash#to_s simply joins all of the elements together without spaces, equivalent to to_a.flatten.join('').
In Ruby 1.9, Hash#to_s is an alias to inspect and produces well-formatted output.
To get the equivalent thing in both cases:
puts obj['hypotheses'][0].inspect
The same thing applies to Array.

What is the correct way to detect if ruby is running on Windows?

What is the correct way to detect from within Ruby whether the interpreter is running on Windows? "Correct" includes that it works on all major flavors of Ruby, including 1.8.x, 1.9.x, JRuby, Rubinius, and IronRuby.
The currently top ranked Google results for "ruby detect windows" are all incorrect or outdated. For example, one incorrect way to do it is:
RUBY_PLATFORM =~ /mswin/
This is incorrect because it fails to detect the mingw version, or JRuby on Windows.
What's the right way?
It turns out, there's this way:
Gem.win_platform?
Preferred Option (Updated based on #John's recommendations):
require 'rbconfig'
is_windows = (RbConfig::CONFIG['host_os'] =~ /mswin|mingw|cygwin/)
This could also work, but is less reliable (it won't work with much older versions, and the environment variable can be modified)
is_windows = (ENV['OS'] == 'Windows_NT')
(I can't easily test either on all of the rubies listed, or anything but Windows 7, but I know that both will work for 1.9.x, IronRuby, and JRuby).
This works perfectly for me
Also etc does not need to be installed, it comes with ruby.
require "etc"
def check_system
return "windows" if Etc.uname[:sysname] == "Windows_NT"
return "linux" if Etc.uname[:sysname] == "Linux"
end
(File::ALT_SEPARATOR || File::SEPARATOR) == '\\'

Ruby: How to determine if file being read is binary or text

I am writing a program in Ruby which will search for strings in text files within a directory - similar to Grep.
I don't want it to attempt to search in binary files but I can't find a way in Ruby to determine whether a file is binary or text.
The program needs to work on both Windows and Linux.
If anyone could point me in the right direction that would be great.
Thanks,
Xanthalas
libmagic is a library which detects filetypes. For this solution I assume, that all mimetype's which start with text/ represent text files. Eveything else is a binary file. This assumption is not correct for all mime types (eg. application/x-latex, application/json), but libmagic detect's these as text/plain.
require "filemagic"
def binary?(filename)
begin
fm= FileMagic.new(FileMagic::MAGIC_MIME)
!(fm.file(filename)=~ /^text\//)
ensure
fm.close
end
end
gem install ptools
require 'ptools'
File.binary?(file)
An alternative to using the ruby-filemagic gem is to rely on the file command that ships with most Unix-like operating systems. I believe it uses the same libmagic library under the hood but you don't need the development files required to compile the ruby-filemagic gem. This is helpful if you're in an environment where it's a bit of work to install additional libraries (e.g. Heroku).
According to man file, text files will usually contain the word text in their description:
$ file Gemfile
Gemfile: ASCII text
You can run the file command through Ruby can capture the output:
require "open3"
def text_file?(filename)
file_type, status = Open3.capture2e("file", filename)
status.success? && file_type.include?("text")
end
Updating above answer with such example, when file name includes "text":
file /tmp/ball-texture.png
/tmp/ball-texture.png: PNG image data, 11 x 18, 8-bit/color RGBA, non-interlaced
So updated code will be like:
def text_file?(filename)
file_type, status = Open3.capture2e('file', filename)
status.success? && file_type.split(':').last.include?('text')
end

Determine file type in Ruby

How does one reliably determine a file's type? File extension analysis is not acceptable. There must be a rubyesque tool similar to the UNIX file(1) command?
This is regarding MIME or content type, not file system classifications, such as directory, file, or socket.
There is a ruby binding to libmagic that does what you need. It is available as a gem named ruby-filemagic:
gem install ruby-filemagic
Require libmagic-dev.
The documentation seems a little thin, but this should get you started:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
If you're on a Unix machine try this:
mimetype = `file -Ib #{path}`.gsub(/\n/,"")
I'm not aware of any pure Ruby solutions that work as reliably as 'file'.
Edited to add: depending what OS you are running you may need to use 'i' instead of 'I' to get file to return a mime-type.
I found shelling out to be the most reliable. For compatibility on both Mac OS X and Ubuntu Linux I used:
file --mime -b myvideo.mp4
video/mp4; charset=binary
Ubuntu also prints video codec information if it can which is pretty cool:
file -b myvideo.mp4
ISO Media, MPEG v4 system, version 2
You can use this reliable method base on the magic header of the file :
def get_image_extension(local_file_path)
png = Regexp.new("\x89PNG".force_encoding("binary"))
jpg = Regexp.new("\xff\xd8\xff\xe0\x00\x10JFIF".force_encoding("binary"))
jpg2 = Regexp.new("\xff\xd8\xff\xe1(.*){2}Exif".force_encoding("binary"))
case IO.read(local_file_path, 10)
when /^GIF8/
'gif'
when /^#{png}/
'png'
when /^#{jpg}/
'jpg'
when /^#{jpg2}/
'jpg'
else
mime_type = `file #{local_file_path} --mime-type`.gsub("\n", '') # Works on linux and mac
raise UnprocessableEntity, "unknown file type" if !mime_type
mime_type.split(':')[1].split('/')[1].gsub('x-', '').gsub(/jpeg/, 'jpg').gsub(/text/, 'txt').gsub(/x-/, '')
end
end
This was added as a comment on this answer but should really be its own answer:
path = # path to your file
IO.popen(
["file", "--brief", "--mime-type", path],
in: :close, err: :close
) { |io| io.read.chomp }
I can confirm that it worked for me.
If you're using the File class, you can augment it with the following functions based on #PatrickRichie's answer:
class File
def mime_type
`file --brief --mime-type #{self.path}`.strip
end
def charset
`file --brief --mime #{self.path}`.split(';').second.split('=').second.strip
end
end
And, if you're using Ruby on Rails, you can drop this into config/initializers/file.rb and have available throughout your project.
For those who came here by the search engine, a modern approach to find the MimeType in pure ruby is to use the mimemagic gem.
require 'mimemagic'
MimeMagic.by_magic(File.open('tux.jpg')).type # => "image/jpeg"
If you feel that is safe to use only the file extension, then you can use the mime-types gem:
MIME::Types.type_for('tux.jpg') => [#<MIME::Type: image/jpeg>]
You could give shared-mime a try (gem install shared-mime-info). Requires the use ofthe Freedesktop shared-mime-info library, but does both filename/extension checks as well as "magic" checks... tried giving it a whirl myself just now but I don't have the freedesktop shared-mime-info database installed and have to do "real work," unfortunately, but it might be what you're looking for.
Pure Ruby solution using magic bytes and returning a symbol for the matching type:
https://github.com/SixArm/sixarm_ruby_magic_number_type
I wrote it, so if you have suggestions, let me know.
I recently found mimetype-fu.
It seems to be the easiest reliable solution to get a file's MIME type.
The only caveat is that on a Windows machine it only uses the file extension, whereas on *Nix based systems it works great.
The best I found so far:
http://bogomips.org/mahoro.git/
The ruby gem is well.
mime-types for ruby
You could give a go with MIME::Types for Ruby.
This library allows for the identification of a file’s likely MIME content type. The identification of MIME content type is based on a file’s filename extensions.

Resources