Word Document.SaveAs ignores encoding, when calling through OLE, from Ruby or VBS - ruby

I have a script, VBS or Ruby, that saves a Word document as 'Filtered HTML', but the encoding parameter is ignored. The HTML file is always encoded in Windows-1252. I'm using Word 2007 SP3 on Windows 7 SP1.
Ruby Example:
require 'win32ole'
word = WIN32OLE.new('Word.Application')
word.visible = false
word_document = word.documents.open('C:\whatever.doc')
word_document.saveas({'FileName' => 'C:\whatever.html', 'FileFormat' => 10, 'Encoding' => 65001})
word_document.close()
word.quit
VBS Example:
Option Explicit
Dim MyWord
Dim MyDoc
Set MyWord = CreateObject("Word.Application")
MyWord.Visible = False
Set MyDoc = MyWord.Documents.Open("C:\whatever.doc")
MyDoc.SaveAs "C:\whatever2.html", 10, , , , , , , , , , 65001
MyDoc.Close
MyWord.Quit
Set MyDoc = Nothing
Set MyWord = Nothing
Documentation:
Document.SaveAs: http://msdn.microsoft.com/en-us/library/bb221597.aspx
msoEncoding values: http://msdn.microsoft.com/en-us/library/office/aa432511(v=office.12).aspx
Any suggestions, how to make Word save the HTML file in UTF-8?

Hi Bo Frederiksen and kardeiz,
I also encountered the problem of "Word Document.SaveAs ignores encoding" today in my "Word 2003 (11.8411.8202) SP3" version.
Luckily I managed to make msoEncodingUTF8(namely, 65001) work in VBA code. However, I have to change the Word document's settings first. Steps are:
1) From Word's 'Tools' menu, choose 'Options'.
2) Then click 'General'.
3) Press the 'Web Options' button.
4) In the popping-up 'Web Options' dialogue, click 'Encoding'.
5) You can find a combobox, now you can change the encoding, for example, from 'GB2312' to 'Unicode (UTF-8)'.
6) Save the changes and try to rerun the VBA code.
I hope my answer can help you. Below is my code.
Public Sub convert2html()
With ActiveDocument.WebOptions
.Encoding = msoEncodingUTF8
End With
ActiveDocument.SaveAs FileName:=ActiveDocument.Path & "\" & "file_name.html", FileFormat:=wdFormatFilteredHTML, Encoding:=msoEncodingUTF8
End Sub

Word can't do this as far as I know.
However, you could add the following lines to the end of your Ruby script
text_as_utf8 = File.read('C:\whatever.html').encode('UTF-8')
File.open('C:\whatever.html','wb') {|f| f.print text_as_utf8}
If you have an older version of Ruby, you may need to use Iconv. If you have special characters in 'C:\whatever.html', you'll want to look into your invalid/undefined replacement options.
You'll also probably want to update the charset in the HTML meta tag:
text_as_utf8.gsub!('charset=windows-1252', 'charset=UTF-8')
before you write to the file.

My solution was to open the HTML file using the same character set, as Word used to save it.
I also added a whitelist filter (Sanitize), to clean up the HTML. Further cleaning is done using Nokogiri, which Sanitize also rely on.
require 'sanitize'
# ... add some code converting a Word file to HTML.
# Post export cleanup.
html_file = File.open(html_file_name, "r:windows-1252:utf-8")
html = '<!DOCTYPE html>' + html_file.read()
html_document = Nokogiri::HTML::Document.parse(html)
Sanitize.new(Sanitize::Config::RESTRICTED).clean_node!(html_document)
html_document.css('html').first['lang'] = 'en-US'
html_document.css('meta[name="Generator"]').first.remove()
# ... add more cleaning up of Words HTML noise.
sanitized_html = html_document.to_html({:encoding => 'utf-8', :indent => 0})
# writing output to (new) file
sanitized_html_file_name = word_file_name.sub(/(.*)\..*$/, '\1.html')
File.open(sanitized_html_file_name, 'w:UTF-8') do |f|
f.write sanitized_html
end
HTML Sanitizer: https://github.com/rgrove/sanitize/
HTML parser and modifier: http://nokogiri.org/
In Word 2010 there is a new method, SaveAs2: http://msdn.microsoft.com/en-us/library/ff836084(v=office.14).aspx
I haven't tested SaveAs2, since I don't have Word 2010.

Related

How to temporarily block all macros from running and edit xlsm file with VBScript?

I have xlsm file which I need to edit. However, macros there block my script from editing. My code is following:
xlsm_file_name = "webADI_template_Bankbuchungen_GL.xlsm"
'opening xlsm file and setting readonly to false
set xlobj = createobject("Excel.Application")
set excel_file = xlobj.workbooks.open("C:\Users\oleynikov nikolay\Desktop\VBS Automation Scripts\processed_data\Excel Datei\"&xlsm_file_name, readonly=false)
'making changes invisible for the user
excel_file.application.enableevents = false
xlobj.Visible = false
'defining the sheet where we will be inserting our data into
set excel_sheet = excel_file.worksheets(1)
excel_sheet.cells(13,4).value = "EUR"
excel_file.application.enableevents = TRUE
xlobj.DisplayAlerts = FALSE
excel_file.save
At the end of the day, no values are added. This happens because double clicking on the cell runs the macro. I need to disable this macro, insert necessary values and then enable the macros again.
Is there a possibility to do it?
Thank you.
Try this (it seems it should work):
Returns or sets an MsoAutomationSecurity constant that represents the security mode that Microsoft Excel uses when programmatically opening files. Read/write.
MsoAutomationSecurity can be one of these MsoAutomationSecurity constants:
msoAutomationSecurityByUI. Uses the security setting specified in the Security dialog box.
msoAutomationSecurityForceDisable. Disables all macros in all files opened programmatically without showing any security alerts.
VB
Sub Security()
Dim secAutomation As MsoAutomationSecurity
secAutomation = Application.AutomationSecurity
Application.AutomationSecurity = msoAutomationSecurityForceDisable
Application.FileDialog(msoFileDialogOpen).Show
Application.AutomationSecurity = secAutomation
End Sub
https://learn.microsoft.com/en-us/office/vba/api/excel.application.automationsecurity

Word document mysteriously write protected?

I am trying to do a find and replace operation on several Word documents in a folder. I wrote the following VBScript to do that:
Option Explicit
Dim Word, Document, FolderPath, FileSystem, FileList, File, Doc, InfoString
Const ReadOnly = 1
Const wdFindContinue = 1
Const wdReplaceAll = 2
Const wdOriginalDocumentFormat = 1
Set FileSystem = CreateObject("Scripting.FileSystemObject")
FolderPath = FileSystem.GetAbsolutePathName(".")
Set FileList = FileSystem.GetFolder(FolderPath).files
Set Word = CreateObject("Word.Application")
Word.Visible = False
Word.DisplayAlerts = False
For Each File in FileList
If LCase(Right(File.Name,3)) = "doc" Or LCase(Right(File.Name,4)) = "docx" Then
If File.Attributes And ReadOnly Then
File.Attributes = File.Attributes - ReadOnly
End If
Set Doc = Word.Documents.Open(File.Path,,True)
' find and replace stuff
End If
Next
Word.Documents.Save True, wdOriginalDocumentFormat
Word.Quit
MsgBox("Done")
Problem is, when it reaches the line Word.Documents.Save, a Save As dialog box always pops up. If I click Cancel, I get an error from Windows Script Host saying the file is write protected, even though it is not shown as write protected if I open the Properties dialog in File Explorer. If I click save, I am prompted to save all the other files too. What is the problem here?
I have a suspicion that it is caused by the Word documents being very old, like from the 1990s.
Set Doc = Word.Documents.Open(File.Path,,True)
and look at the docs from Object Browser.
Function Open(FileName, [ConfirmConversions], [ReadOnly], [AddToRecentFiles], [PasswordDocument], [PasswordTemplate], [Revert], [WritePasswordDocument], [WritePasswordTemplate], [Format], [Encoding], [Visible], [OpenAndRepair], [DocumentDirection], [NoEncodingDialog]) As Document
Member of Word.Documents
So the True says to open Read Only. This is Word's read only, nothing to do with the file.

Launching browser windows from Ruby command prompt

I've got this code below. What I'd like it to do is launch each of the search queries I've specified into browser windows, instead of listing the search results as it's currently written to do. But I'm a beginner and having a difficult time finding documentation on this. Is it possible?
The issue is the actual list of search_criteria I will be using is actually 40 terms long and I need to do it for dozens and dozens of cities - which is why I was looking to automate the search process.
If it's not possible to launch each query as a browser window (or better tabs in a browser window) is there a way to specify each URL that results in some systematic way so as to be called by Ruby from command prompt to launch as a browser window?
require "google-search"
search_criteria = ["makers", "makerspaces", "fablabs", "smartlabs"]
#City name
search_1 = search_criteria.map do |noun|
"#{noun} new york city"
end
#City acronym 1
search_2 = search_criteria.map do |noun|
"#{noun} new york"
end
#City acronym 2
search_3 = search_criteria.map do |noun|
"#{noun} nyc"
end
#Replace "search_1" for other acronyms
search_1.each do |query|
puts "Just one moment please! I am searching for #{query}"
Google::Search::Web.new do |search|
search.query = query
search.size = :large
end.each { |item| puts item.title }
end
search_criteria = ["makers", "makerspaces", "fablabs", "smartlabs"]
names = ["new+york+city", "new+york", "nyc"]
query_strings = names.map do |name|
"#{name}+#{search_criteria.join('+')}"
end
urls = query_strings.map do |q|
"google.com/search?q=" + q
end
cmd_line = urls.join(' ')
Then you pass cmd_line to the Google Chrom executable via the system() call. On Mac OS X it would be like this:
system("/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --url #{cmd_line}")
Chrome will then open each url in the string passed to it in it's own tab in a new window.
If you are using Windows you will have to find where the chrome.exe executable is buried, and in Linux you would just find the chrome binary to call it. Everything before the last part of ...Chrome --url #{cmd_line}") is just the path to the executable, which is buried inside the "Google Chrome.app" container on OS X.

How can I measure the length of a long string in Ruby? SciTE and command prompt aren't working.

I've written a program that measures my typing speed. As part of this, I need it to count how many characters I've typed. I did that with
text = gets.chomp
puts text.length.to_s
Unfortunately, I can't get this working for a long string.
In the SciTE editor, .length doesn't work properly, so instead of giving me the length of the string, it gives me the character count of everything I've typed, including corrected mistakes - if I typo "Hrello" and correct it to "Hello", it'll still return 6 instead of 5.
I googled this, and the suggested fix was to run the program from the command prompt instead. In the command prompt, .length works fine, but it turned out that I can't type in more than 264 characters.
So I tried to put a GUI on the program with Shoes:
Shoes.app :width => 300, :height => 300 do
button "Start." do
text = ask "Type here."
para text.length.to_s
end
end
and discovered that Shoes' input box has an even shorter character limit.
I'm running Windows 7, Ruby 1.9.2, SciTe version 2.29 and Shoes Policeman Revision 1514.
How can I run this program so it'll correctly measure the length of a really long string? I'd be happy with any solution that fixes the command prompt or Shoes character limit, the SciTE bug, or just a suggestion for a different way to execute ruby programs where this will work.
I'd be happy with [...] a suggestion for a different way to execute ruby programs where this will work.
What about a simple web app? Here is a simple Sinatra app that accomplishes exactly what you have asked with a very large character limit.
require 'sinatra'
get '/' do
%{<html>
<body>
<form method="post">
<textarea name="typed"></textarea>
<input type="submit">
</form>
</body>
</html>
}
end
post '/' do
"You typed #{params['typed'].length} characters."
end
To run the app you can use something as simple as ruby sinatra_example.rb to use a built-in web server. Or, you can deploy this app using any of several web servers.
If you need timers this should be easy to accomplish through javascript and include in the form submit.
Ok, your question is not accurately titled, but lets see:
There is a very broad number of options of using command prompt, and you should consider running a simple script in ruby on it.
On command line from windows, try typing ruby C:/path_to_folder_program/program.rb
If it won`t execute, you can find on ruby folder some executable called ruby and should, from command prompt on that path, run it like above.
But let me ask you, why ruby? Other more accessible and user-friendly programming languages, like javascript would behave better and would be easier to make your program accessible.
- EDIT -
Seems shoes can handle more chars, use edit_box instead of ask:
In Shoes:
Shoes.app do
#txt = edit_box
button("How many"){ alert(#txt.text.size) }
end
Anyway, before trying shoes I did the exercise with that I knew, here it is:
In javascript:
<script>
function start_stop(){
var txt = document.getElementById('txt');
var btn = document.getElementById('btn');
if( txt.disabled ){
txt.value = '';
txt.disabled = false;
btn.value = 'Stop';
txt.focus();
startTime = new Date().getSeconds();
} else {
txt.disabled = true;
btn.value = 'Start again';
timeNow = new Date().getSeconds();
alert(txt.value.length + " characters in " + (timeNow - startTime) + " seconds.");
}
}
</script>
<input type='button' id='btn' onclick='start_stop()' value='Start'>
<textarea id='txt' rows='8' cols='80' disabled></textarea>
In Ruby using Qt: (replicating the same idea as in the javascript one)
require 'Qt'
class MyWidget < Qt::Widget
slots :start_stop
def initialize
super
setFixedSize(400, 120)
#btn = Qt::PushButton.new("Start")
#txt = Qt::TextEdit.new ; #txt.readOnly = true
vbox = Qt::VBoxLayout.new
vbox.addWidget #btn
vbox.addWidget #txt
setLayout vbox
connect(#btn, SIGNAL("clicked()"), self, SLOT(:start_stop))
end
def start_stop
if #txt.readOnly
#txt.plainText = ''
#txt.readOnly = false
#btn.text = "Stop"
#txt.setFocus
#startTime = Time.now
else
#txt.readOnly = true
#btn.text = "Start again (#{#txt.plainText.size} chars #{(Time.now - #startTime).to_i} in seconds)"
end
end
end
app = Qt::Application.new(ARGV)
widget = MyWidget.new
widget.show
app.exec

Is it practical to call vbscript functions from Ruby using win32ole?

Is it practical to call a VBScript function, such as VBScript's Chr(charcode), from Ruby using win32ole?
Background: While working out how to add some nicely formatted headers to an excel worksheet, I followed my standard operating procedure: record an excel macro and copy and paste the code.
VBScript:
ActiveWindow.View = xlPageLayoutView
With ActiveSheet.PageSetup
' Irrelevant options snipped
.CenterHeader = "&F" & Chr(10) & "&A"
' More irrelevant options snipped
End With
The following Ruby code
# workbook is an existing workbook object
worksheet = workbook.Worksheets.Add
worksheet.PageSetup.CenterHeader = "&F \n &A"
works, but I had to look up the Chr(charcode) documentation to check it was the exact same thing. I tried doing
worksheet.PageSetup.CenterHeader = "&F" + workbook.Chr(10) + "&A"
but got
WIN32OLERuntimeError: unknown property or method: `Chr'
HRESULT error code:0x80020006
Unknown name.
from (irb):6:in `method_missing'
from (irb):6
from c:/Ruby19/bin/irb:12:in `<main>'
Is there any practical way to do the latter approach?
I wouldn't call a vbscript script for something that is easier done in Ruby but it is possible.
How you requested it:
require 'win32ole'
sc = WIN32OLE.new("ScriptControl")
sc.language="VBScript"
center_header = sc.eval('"&F" & Chr(10) & "&A"') #"&F\n&A"
and how you could do it in Ruby itself
"&F" + 10.chr + "&A" #"&F\n&A"
or
"&F\n&A" #"&F\n&A"
EDIT: For 64bit windows that wouldn't work any longer out of the box, you need to install a 64 bit dll that you can find at https://tablacus.github.io/scriptcontrol.html (The site is in japanese so translate the page in your browser)
how 'bout
worksheet.PageSetup.CenterHeader = "&F" + 10.chr + "&A"

Resources