Create a UTF-8 Encoded File Using Chef on Windows - ruby

I am using Chef to create a UDL file on Windows 12. The file must be encoded as UTF-8, but I can't get Chef to produce anything other than ANSI.
On the workstation, the template file is UTF-8 encoded, but the template file that ends up on the target nodes is ANSI when it lands in the chef cache -- so it seems the encoding is being lost during the file transfer to the target node. The target node's ruby is defaulting to UTF-8 during the chef-client run, although I expect that because the source ERB template is ANSI, the target is being created as ANSI as well.
The recipe is pretty straightforward:
log "Encoding is #{Encoding.default_external}"
template 'C:/file.udl' do
rights :full_control, 'Everyone'
action :create
source 'file.udl.erb'
end
And the template (UTF-8 encoded at the source but not after being transferred to the target node):
[oledb]
; Everything after this line is an OLE DB initstring
Provider=SQLOLEDB.1;Password=MyPassword;Persist Security Info=True;User ID=MyUsername;Initial Catalog=MyCat;Data Source=MyServer
The resultant ANSI file can't be read by OLE ... is there anyway to convince Chef/ruby to write the file in UTF-8?

This is my approach, hacking Chef::Mixin::Template::TemplateContext
in libraries/template_hacking.rb
require 'chef/mixin/template'
# Hacking Chef loads template with specific encoding
# Usage in your recipe:
#
# template 'my_utf8_file' do
# source 'my_utf8_file'
# variables(
# __encoding__: 'utf-8',
# name: '你好'
# )
# end
#
unless Chef::Mixin::Template::TemplateContext.public_methods.include?(:_origin_render_template)
class Chef::Mixin::Template::TemplateContext
alias_method :_origin_render_template, :_render_template
def _render_template(template, context)
encoding = context[:__encoding__] && context[:__encoding__].upcase
if template && encoding && template.encoding.name != encoding
Chef::Log.info("Encoding template to #{encoding}")
template.force_encoding encoding
end
_origin_render_template(template, context)
end
end
end
in recipe recipes/my_recipe.rb
template 'my_utf8_file' do
source 'my_utf8_file'
variables(
__encoding__: 'utf-8',
name: '你好'
)
end

Related

How to pass file url to helper method in middleman

I'm writing a helper method to convert images to base64 strings when needed. Below is the code
# config.rb
helpers do
def base64_url(img_link, file_type: "jpg")
require "base64"
if file_type =="jpg"
"data:image/jpg;base64,#{Base64.encode64(open(img_link).to_a.join)}"
elsif file_type =="png"
"data:image/jpg;base64,#{Base64.encode64(open(img_link).to_a.join)}"
else
link
end
end
end
In page.html.erb
<%= image_tag base64_url('/images/balcozy-logo.jpg') %>
Now the problem is when ruby reads '/images/balcozy-logo.jpg' it reads the file from system root not from the root of the project.
Error message as follows
Errno::ENOENT at /
No such file or directory # rb_sysopen - /images/balcozy-logo.jpg
How do I get around this and pass proper image url from project_root/source/images
In Middleman app.root returns the root directory of the application. There's also app.root_path, which does the same but returns a Pathname object, which is slightly more convenient:
full_path = app.root_path.join("source", img_link.gsub(/^\//, ''))
The gsub is necessary if img_link starts with a /, since it would be interpreted as the root of your filesystem.
I've taken the liberty of making a few more revisions to your method:
require "base64"
helpers do
def base64_url(path, file_type: "jpg")
return path unless ["jpg", "png"].include?(file_type)
full_path = app.root_path.join("source", path.gsub(/^\//, ''))
data_encoded = File.open(full_path, 'r') do |file|
Base64.urlsafe_encode64(file.read)
end
"data:image/#{file_type};base64,#{data_encoded}"
end
end
I've done a few things here:
Moved require "base64" to the top of the file; it doesn't belong inside a method.
Check file_type at the very beginning of the method and return early if it's not among the listed types.
Instead of open(filename).to_a.join (or the more succinct open(filename).read), use File.open. OpenURI (which supplies the open method you were using) is overkill for reading from the local filesystem.
Use Base64.urlsafe_encode64 instead of encode64. Probably not necessary but it doesn't hurt.
Remove the unnecessary if; since we know file_type will be either jpg or png we can use it directly in the data URI.
There may be a more elegant way to get file_path or determine the file's MIME type using Middleman's built-in asset system, but a very brief search of the docs didn't turn anything up.

Why does a file written out encoded as UTF-8 end up being ISO-8859-1 instead?

I am reading an ISO-8859-1 encoded text file, transcoding it to UTF-8, and writing out a different file as UTF-8. However, when I inspect the output file, it is still encoded as ISO-8859-1! What am I doing wrong?
Here is my ruby class:
module EF
class Transcoder
# app_path ......... Path to the java console application (InferEncoding.jar) that infers the character encoding.
# target_encoding .. Transcodes the text loaded from the file into this encoding.
attr_accessor :app_path, :target_encoding
def initialize(consoleAppPath)
#app_path = consoleAppPath
#target_encoding = "UTF-8"
end
def detect_encoding(filename)
encoding = `java -jar #{#app_path} \"#{filename}\"`
encoding = encoding.strip
end
def transcode(filename)
original_encoding = detect_encoding(filename)
content = File.open(filename, "r:#{original_encoding}", &:read)
content = content.force_encoding(original_encoding)
content.encode!(#target_encoding, :invalid => :replace)
end
def transcode_file(input_filename, output_filename)
content = transcode(input_filename)
File.open(output_filename, "w:#{#target_encoding}") do |f|
f.write content
end
end
end
end
By way of explanation, #app_path is the path to a Java jar file. This console application will read a text file and tell me what its current encoding is (printing it to stdout). It uses the ubiquitous ICU library. (I tried using the ruby gem charlock-holmes, but I cannot get it to compile on Windows for MINGW. The Java bindings to ICU are good, so I wrote a Java application instead.)
To call the above class, I do this in irb:
require_relative 'transcoder'
tc = EF::Transcoder.new("C:/Users/paul.chernoch/Documents/java/InferEncoding.jar")
tc.detect_encoding "C:/temp/infer-encoding-test/ISO-8859-1.txt"
tc.transcode_file("C:/temp/infer-encoding-test/ISO-8859-1.txt", "C:/temp/infer-encoding-test/output-utf8.txt")
tc.detect_encoding "C:/temp/infer-encoding-test/output-utf8.txt"
The file ISO-8859-1.txt is encoded like it sounds. I used Notepad++ to write the file using that encoding.
I used my Java application to test the file. It concurs that it is in ISO-8859-1 format.
I also created a file in Notepad++ and saved it as UTF-8. I then verified using my java app that it was in UTF-8.
After I perform the above in irb, I used my java app to test the output file and it says the format is still ISO-8859-1.
What am I doing wrong? If you hard-code the method detect_encoding to return "ISO-8859-1", you do not need my java application to replicate the part that reads the file.
Any solution must NOT use charlock-holmes.

Problems with Ruby encoding in Windows

I wrote a simple code that reads an email from MS-Outlook, using 'win32ole', and then save its subjects to an CSV file. Everything goes well except the encoding system. When I open my CSV file the words such as "André" are printed as "Andr\x82". I want my output format to be equal to my input.
# encoding: 'CP850'
require 'win32ole'
require 'CSV'
Encoding.default_external = 'CP850'
ol = WIN32OLE.new('Outlook.Application')
inbox = ol.GetNamespace("MAPI").GetDefaultFolder(6)
email_subjecs = []
inbox.Items.each do |m|
email_subjects << m.Subject
end
CSV.open('MyFile.csv',"w") do |csv|
csv << email_subjects
end
O.S: Windows 7 64bit
Encoding.default_external -> CP850
Languadge -> PT
ruby -v -> 1.9.2p290 (2011-07-09) [i386-mingw32]
It seems a simple problem related to external windows encoding and I tryied many solution posted here but I realy can't solve this.
1) Your file name is missing a closing quote.
2) The default open mode for CSV.open() is 'rb', so you can't possibly write to a file with the code you posted.
3) You didn't post the encoding of the text you are trying to write to the file.
4) You didn't post the encoding that you want the the data to be written in.
5)
When I open my CSV file the words such as "é" are printed as "\x82"
Tell your viewing device not to do that.
The magic comment only sets the encoding the current (.rb) file should be read as. It does not set default_external. Try set RUBYOPT=-E utf-8, open your file with CSV.open('MyFile.csv', encoding: 'UTF-8'), or set Encoding.default_external at the top of your file (discouraged).

ruby 1.9 wrong file encoding on windows

I have a ruby file with these contents:
# encoding: iso-8859-1
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
puts File.read('foo.txt').encoding
When I run it from windows command prompt ruby 1.9.3 I get: IBM437
When I run it from cygwin ruby 1.9.3 I get: UTF-8
What I expect to get is: iso-8859-1
Can someone explain what's happening here?
UPDATE
Here's a better description of what I'm looking for:
I understand now thanks to Darshan that by default ruby will load files in
Encoding.default _external, but shouldn't the # encoding: iso-8859-1
line override that?
Should ruby be able to auto-detect a file's encoding? Is there any
filesystem where the encoding is an attribute?
What is my best option to 'remember' the encoding I saved the file
in?
You're not specifying the encoding when you read the file. You're being very careful to specify it everywhere except there, but then you're reading it with the default encoding.
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'.force_encoding('iso-8859-1')}
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding }
# => ISO-8859-1
Also note that you probably mean 'fòo'.encode('iso-8859-1') rather than 'fòo'.force_encoding('iso-8859-1'). The latter leaves the bytes unchanged, while the former transcodes the string.
Update: I'll elaborate a bit since I wasn't as clear or thorough as I could have been.
If you don't specify an encoding with File.read(), the file will be read with Encoding.default_external. Since you're not setting that yourself, Ruby is using a value depending on the environment it's run in. In your Windows environment, it's IBM437; in your Cygwin environment, it's UTF-8. So my point above was that of course that's what the encoding is; it has to be, and it has nothing to do with what bytes are contained in the file. Ruby doesn't auto-detect encodings for you.
force_encoding() doesn't change the bytes in a string, it only changes the Encoding attached to those bytes. If you tell Ruby "pretend this string is ISO-8859-1", then it won't transcode them when you tell it "please write this string as ISO-8859-1". encode() transcodes for you, as does writing to the file if you don't trick it into not doing so.
Putting those together, if you have a source file in ISO-8859-1:
# encoding: iso-8859-1
# Write in ISO-8859-1 regardless of default_external
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
If you have a source file in UTF-8:
# encoding: utf-8
# Write in ISO-8859-1 regardless of default_external, transcoding from UTF-8
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
Update 2, to answer your new questions:
No, the # encoding: iso-8859-1 line does not change Encoding.default_external, it only tells Ruby that the source file itself is encoded in ISO-8859-1. Simply add
Encoding.default_external = "iso-8859-1"
if you expect all files that your read to be stored in that encoding.
No, I don't personally think Ruby should auto-detect encodings, but reasonable people can disagree on that one, and a discussion of "should it be so" seems off-topic here.
Personally, I use UTF-8 for everything, and in the rare circumstances that I can't control encoding, I manually set the encoding when I read the file, as demonstrated above. My source files are always in UTF-8. If you're dealing with files that you can't control and don't know the encoding of, the charguess gem or similar would be useful.

How to copy files with Unicode characters in file names in Ruby?

I can not copy files that have Unicode characters in their names from Ruby 1.9.2p290, on Windows 7.
For example, I have two files in a dir:
file
ハリー・ポッターと秘密の部屋
(The second name contains Japanese characters if you can not see it)
Here is the code:
> entries = Dir.entries(path) - %w{ . .. }
> entries[0]
=> "file"
> entries[1]
=> "???????????????" # <--- what?
> File.file? entries[0]
=> true
> File.file? entries[1]
=> false # <--- !!! Ruby can not see it and will not copy
> entries[1].encoding.name
=> "Windows-1251"
> Encoding.find('filesystem').name
=> "Windows-1251"
As you see my Ruby file system encoding is "windows-1251" which is 8 bit and can not handle Japanese. Setting default_external and default_internal encodings to 'utf-8' does not help.
How can I copy those files from Ruby?
Update
I found a solution. It works if I use Dir.glob or Dir[] instead of Dir.entries. File names are now returned in utf-8 encoding and can be copied.
Update #2
My Dir.glob solution appears to be quite limited. It only works with "*" parameter:
Dir.glob("*") # <--- Shows Unicode names correctly
Dir.glob("c:/test/*") # <--- Does not work for Unicode names
Not so much a real solution, but as a workaround, given:
Dir.glob("*") # <--- Shows Unicode names correctly
Dir.glob("c:/test/*") # <--- Does not work for Unicode names
is there any reason you can't do this:
Dir.chdir("c:/test/")
Dir.glob("*")
?
It's been a while, but I was looking into the same problem and it was all but obvious how to do it.
Turns out that you may specify an encoding when you call Dir#entries in Ruby >= 2.1.
Dir.entries(path, encoding: Encoding::UTF_8)

Resources