File.exist? not working when directory name has special characters - ruby

File.exist? in not working with directory name having special characters. for something like given below
path = "/home/cis/Desktop/'El%20POP%20que%20llevas%20dentro%20Vol.%202'/*.mp3"
it works fine but if it has letters like ñ its returns false.
Plz help with this.

Try the following:
Make sure you're running 1.9.2 or greater and put # encoding: UTF-8 at the top of your file (which must be in UTF-8 and your editor must support it).
If you're running MRI(i.e. not JRuby or other implementation) you can add environment variable RUBYOPT=-Ku instead of # encoding: UTF-8 to the top of each file.

Related

ASCII incompatible encoding with normal run, not in debug mode

I'm really confused on this one, and maybe it's a bug in Ruby 2.6.2. I have files that were written as UTF-8 with BOM, so I'm using the following:
filelist = Dir.entries(#input_dirname).join(' ')
filelist = filelist.split(' ').grep(/xml/)
filelist.each do |indfile|
filecontents_tmp = File.read("#{#input_dirname}/#{indfile}", :encoding =>'bom|utf-8')
puts filecontents_tmp
end
If I put a debug breakpoint at the puts line, my file is read in properly. If I just run the simple script, I get the following error:
in `read': ASCII incompatible encoding needs binmode (ArgumentError)
I'm confused as to why this would work in debug, but not when run normally. Ideas?
Have you tried printing the default encoding when you run the file as opposed to when you debug the file? There are 3 ways to set / change the encoding in Ruby (that I'm aware of), so I wonder if it's different between running the file and debugging. You should be able to tell by printing the default encoding: puts Encoding.default_external.
As for actually fixing the issue, I ran into a similar problem and found this answer which said to add bin mode as an option to the File.open call and it worked for me.

Encoding problems with ruby while reading in command line arguments with optparse

I'm writing a small programm in ruby, which essentially changes some files within a zip-file. The zip-file is specified as a parameter on the command line and interpreted via the OptionParser.
The problem is, that when specifiying a file, which contains non-ascii characters, the file cannot be opened, saying that it could not be found. This problem occurs using cmd.exe under Windows.
Here is a minimal example:
# example.rb
require "zip"
require "optparse"
zip_file_name = String.new
# read and interprete command line arguments:
OptionParser.new do |opts|
opts.on("-f", "--file FILE", String, "The zip-file, which will be modified") do |f|
zip_file_name = f
end
end.parse!
# Open the zip file:
Zip::File.open(zip_file_name) do |zipfile|
end
If you create a zip-file test.zip and run example.rb -f test.zip everything is okay (it does finish without errors). Doing the same with a zip-file täst.zip gives me an error. I tried doing zip_file_name.encode!(Encoding::UTF_8), but this didn't solve the problem.
It seems to be an encoding problem (the encoding of zip_file_name is cp850) but the transcoding does not seem to work correctly.
So my question would be: How can I change my program to also allow non-ascii characters for specifying files on the command line?
Adding zip_file_name.force_encoding(Encoding::Windows_1252) before opening the file solves the issue (on Western Europe Windows).
Apparently, the CP850 file names encoding is a wrong assumption from Ruby. On my Windows system, it seems that filenames are encoded in Windows_1252 (a custom version of Latin1 or ISO 8859-1).

ruby 1.9 wrong file encoding on windows

I have a ruby file with these contents:
# encoding: iso-8859-1
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
puts File.read('foo.txt').encoding
When I run it from windows command prompt ruby 1.9.3 I get: IBM437
When I run it from cygwin ruby 1.9.3 I get: UTF-8
What I expect to get is: iso-8859-1
Can someone explain what's happening here?
UPDATE
Here's a better description of what I'm looking for:
I understand now thanks to Darshan that by default ruby will load files in
Encoding.default _external, but shouldn't the # encoding: iso-8859-1
line override that?
Should ruby be able to auto-detect a file's encoding? Is there any
filesystem where the encoding is an attribute?
What is my best option to 'remember' the encoding I saved the file
in?
You're not specifying the encoding when you read the file. You're being very careful to specify it everywhere except there, but then you're reading it with the default encoding.
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'.force_encoding('iso-8859-1')}
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding }
# => ISO-8859-1
Also note that you probably mean 'fòo'.encode('iso-8859-1') rather than 'fòo'.force_encoding('iso-8859-1'). The latter leaves the bytes unchanged, while the former transcodes the string.
Update: I'll elaborate a bit since I wasn't as clear or thorough as I could have been.
If you don't specify an encoding with File.read(), the file will be read with Encoding.default_external. Since you're not setting that yourself, Ruby is using a value depending on the environment it's run in. In your Windows environment, it's IBM437; in your Cygwin environment, it's UTF-8. So my point above was that of course that's what the encoding is; it has to be, and it has nothing to do with what bytes are contained in the file. Ruby doesn't auto-detect encodings for you.
force_encoding() doesn't change the bytes in a string, it only changes the Encoding attached to those bytes. If you tell Ruby "pretend this string is ISO-8859-1", then it won't transcode them when you tell it "please write this string as ISO-8859-1". encode() transcodes for you, as does writing to the file if you don't trick it into not doing so.
Putting those together, if you have a source file in ISO-8859-1:
# encoding: iso-8859-1
# Write in ISO-8859-1 regardless of default_external
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
If you have a source file in UTF-8:
# encoding: utf-8
# Write in ISO-8859-1 regardless of default_external, transcoding from UTF-8
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
Update 2, to answer your new questions:
No, the # encoding: iso-8859-1 line does not change Encoding.default_external, it only tells Ruby that the source file itself is encoded in ISO-8859-1. Simply add
Encoding.default_external = "iso-8859-1"
if you expect all files that your read to be stored in that encoding.
No, I don't personally think Ruby should auto-detect encodings, but reasonable people can disagree on that one, and a discussion of "should it be so" seems off-topic here.
Personally, I use UTF-8 for everything, and in the rare circumstances that I can't control encoding, I manually set the encoding when I read the file, as demonstrated above. My source files are always in UTF-8. If you're dealing with files that you can't control and don't know the encoding of, the charguess gem or similar would be useful.

Does Ruby auto-detect a file's codepage?

If a save a text file with the following character б U+0431, but save it as an ANSI code page file.
Ruby returns ord = 63. Saving the file with UTF-8 as the codepage returns ord = 208, 177
Should I be specifically telling Ruby to handle the input encoded with a certain code page? If so, how do you do this?
Is that in ruby source code or in a file which is read with File.open? If it's in the ruby source code, you can (in ruby 1.9) add this to the top of the file:
# encoding: utf-8
Or you could specify most other encodings (like iso-8859-1).
If you are reading a file with File.open, you could do something like this:
File.open("file.txt", "r:utf-8") {|f| ... }
As with the encoding comment, you can pass in different types of encodings here too.

Recursive directory listing using Ruby with Chinese characters in file names

I would like to generate a list of files within a directory. Some of the filenames contain Chinese characters.
eg: [试验].Test.txt
I am using the following code:
require 'find'
dirs = ["TestDir"]
for dir in dirs
Find.find(dir) do |path|
if FileTest.directory?(path)
else
p path
end
end
end
Running the script produces a list of files but the Chinese characters are escaped (replaced with backslashes followed by numbers). Using the example filename above would produce:
"TestDir/[\312\324\321\351]Test.txt" instead of "TestDir/[试验].Test.txt".
How can the script be altered to output the Chinese characters?
Ruby needs to know that you are dealing with unicode in your code. Set appropriate character encoding using KCODE, as below:
$KCODE = 'utf-8'
I think utf-8 is good enough for chinese characters.
The following code is more elegant and doesn't require 'find.' It produces a list of files (but not directories) in whatever the working directory is (or whatever directory you put in).
Dir.entries(Dir.pwd).each do |x|
p x.encode('UTF-8') unless FileTest.directory?(x)
end
And to get a recursive digging down one level use:
Dir.glob('*/*').each do |x|
p x.encode('UTF-8') unless FileTest.directory?(x)
end
I'm sure there is a way to get it to go all the way down but Dir.glob('**/*') will go through the whole file system if I remember right.

Resources