I have some vb6 code that is stubbornly writing to Windows-1252.
Open fileName For Binary Access Write As #fileNo
Put #fileNo, , contents
Close #fileNo
I managed to make it write to UTF-16 (LE) by doing this;
contents = ChrW$(&HFEFF&) & contents
Is there any way I could easily make it write to ISO-8859-1? Examples/suggestions would be greatly appreciated here.
If your files are not huge then ADO can come to the rescue for quick and dirty handling of odd encodings.
Example:
Option Explicit
Private Sub Main()
Const contents As String = "Hello World. (4 × 6) ÷ 8 = 3 €€€ ƒƒƒ"
Dim Stm As ADODB.Stream
Set Stm = New ADODB.Stream
With Stm
.Open
.Type = adTypeText
.Charset = "iso-8859-1"
.LineSeparator = adLF
.WriteText contents, adWriteLine
.SaveToFile "ISO-8859-1.txt", adSaveCreateOverWrite
.Close
.Open
.Type = adTypeText
.Charset = "windows-1252"
.LineSeparator = adCRLF
.WriteText contents, adWriteLine
.SaveToFile "Windows-1252.txt", adSaveCreateOverWrite
.Close
End With
MsgBox "Done"
End Sub
Windows-1252 is essentially a superset of ISO-8859-1; just write your data as Windows-1522, and don't use any of the Windows-1252 characters that aren't also ISO-8859-1 characters.
VB6 character output is encoded in the default machine code page for non-Unicode programs.
If you can manage to set your machine code page to 28591, which is the Windows code page for iso-8859-1, then you can be absolutely sure that your output will be iso-8859-1
Alternatively just avoid the characters where Windows 1252 differs from ISO-8859-1. Wikipedia says that is 128 to 159. You could detect them and substitute with question marks or throw an error.
Related
My system is Window 10 English-US.
I need to write some non-printable ASCII characters to a text file. So for eg for the ASCII value of 28, I want to write \u001Cw to the file. I don't have to do anything special when coded in Java. Below is my code in VBS
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.CharSet = "utf-16"
objStream.WriteText ChrW(28) 'Need this to appear as \u001Cw in the output file
objStream.SaveToFile "C:\temp\test.txt", 2
objStream.Close
You need a read-write stream so that writing to it and saving it to file both work.
Const adModeReadWrite = 3
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeText
.Charset = "UTF-16"
.Open
.WriteText text
.SaveToFile filename, adSaveCreateOverWrite
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
Other notes:
I like to explicitly define with Const all the constants in the code. Makes reading so much easier.
A With block save quite some typing here.
Setting the stream type to adTypeText is not really necessary, that's the default anyway. But explicit is better than implicit, I guess.
Setting the Position to 0 on a new stream is superfluous.
It's unnecessary to use ChrW() for ASCII-range characters. The stream's Charset decides the byte width when you save the stream to file. In RAM, everything is Unicode anyway (yes, even in VBScript).
There are two UTF-16 encodings supported by ADODB.Stream: little-endian UTF-16LE (which is the default and synonymous with UTF-16) and big-endian UTF-16BE, with the byte order reversed.
You can achieve the same result with the FileSystemObject and its CreateTextFile() method:
Set FSO = CreateObject("Scripting.FileSystemObject")
Sub SaveToFile(text, filename)
' CreateTextFile(filename [, Overwrite [, Unicode]])
With FSO.CreateTextFile(filename, True, True)
.Write text
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
This is a little bit simpler, but it only offers a Boolean Unicode parameter, which switches between UTF-16 and ANSI (not ASCII, as the documentation incorrectly claims!). The solution with ADODB.Stream gives you fine-grained encoding choices, for example UTF-8, which is impossible with the FileSystemObject.
For the record, there are two ways to create an UTF-8-encoded text file:
The way Microsoft likes to do it, with a 3-byte long Byte Order Mark (BOM) at the start of the file. Most, if not all Microsoft tools do that when they offer "UTF-8" as an option, ADODB.Stream is no exception.
The way everyone else does it - without a BOM. This is correct for most uses.
To create an UTF-8 file with BOM, the first code sample above can be used. To create an UTF-8 file without BOM, we can use two stream objects:
Const adModeReadWrite = 3
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
Dim iStr: Set iStr = CreateObject("ADODB.Stream")
Dim oStr: Set oStr = CreateObject("ADODB.Stream")
' one stream for converting the text to UTF-8 bytes
iStr.Mode = adModeReadWrite
iStr.Type = adTypeText
iStr.Charset = "UTF-8"
iStr.Open
iStr.WriteText text
' one steam to write bytes to a file
oStr.Mode = adModeReadWrite
oStr.Type = adTypeBinary
oStr.Open
' switch first stream to binary mode and skip UTF-8 BOM
iStr.Position = 0
iStr.Type = adTypeBinary
iStr.Position = 3
' write remaining bytes to file and clean up
oStr.Write iStr.Read
oStr.SaveToFile filename, adSaveCreateOverWrite
oStr.Close
iStr.Close
End Sub
I'm new to PL/SQL and i have to convert a 2M text-file that i get from an other division of our administration every month.
My Problem is to convert the file from one single Line in UNIX with lots of LF-terminators to almost 1500 smaller lines in DOS/Windows with CR/LF-terminators after each line.
I know that i canot do it with the UTL_FILE tools because my file to convert is longer than 32767 Bytes.
So I decided to use the DBMS_LOB package, but frankly i do not know how to use the Procedures and Funktions in DBMS_LOB for my purpose.
In the past i used Acces-VBA for this task.
Here is my source-code in case anyone of you knows how to code in Acces-VBA and is able to show me how to code this in PL/SQL or at least post to a site with code similar to my code that can solve my problem.
'File conversion from Unix-/Linux-format to DOS/Windows-format and if nessesary opposite too, then the Parameter "treatment" should not be of value "unix2dos".
Sub ConvertToWindows(SourceFolder As String, SourceFile As String, TargetFolder As String, TargetFile As String, treatment As Integer, Optional DeleteSource As Boolean)
Dim Contents As String
Dim ReadFile As String
Dim d As Integer
ReadFile = SourceFolder & SourceFile
d = FreeFile
Open ReadFile For Binary As #d
Contents = Space(LOF(d))
Get #d, , Contents
Close #d
Open TargetFolder & TargetFile For Output As #d
If treatment = unix2dos Then ' check conversion type
Contents = Replace(Contents, Chr(10), vbCrLf)
Else
Contents = Inhalt & Chr(10)
End If
Contents = Left(Contents, Len(Contents) - 2)
Print #d, Contents
Close #d
If DeleteSource = True Then
Kill SourceFolder & SourceFile
End If
End Sub
Thanks for helping me.
I have a script that I put together after going over many different ways that I could do an encoding conversion using ADODB in VBScript.
Option Explicit
Sub UTFConvert()
Dim objFSO, objStream, file
file = "FileToConvert.csv"
Set objStream = CreateObject( "ADODB.Stream" )
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.Charset = "utf-8"
objStream.LoadFromFile file
objStream.SaveToFile file, 2
objStream.Close
Set objStream = Nothing
End Sub
UTFConvert
The file is supposed to be converted from UCS-2 Little Endian, or whichever readable format it is in (within limitations), to UTF-8. The issue however is that once this file has finished converting to UTF-8 there are many NUL symbols throughout the entire file before and after every letter, and xFF xFE (UCS-2 LE BOM) at the start of the file. These are visible without needing to use any symbol visualization toggles. Any help would be appreciated in understanding where I may be limited with this conversion. Or any alternative approach I can take.
Your Stream object is loading the file as an UTF-8 encoded file, thus misinterpreting the byte sequences. Read the file using a FileSystemObject instance and write it with the ADODB.Stream object:
Sub UTFConvert(filename)
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile filename, 2
stream.Close
End Sub
I need source code for reading the .txt content from a URL.
My text file content sample and then load in Visual Basic 6.0:
My source code:
Dim data As String
data = Inet1.OpenURL("http://test.com/sample.txt")
Text1.Text = data
There is nothing that will only "download" a line at a time as it can't tell where the line breaks are until it's downloaded it.
If you only want to read/process a line at a time, you can split on the line breaks after downloading it:
Dim Data As String
Dim DataLines() As String
Data = Inet1.OpenURL("http://test.com/sample.txt")
DataLines = Split(Data, vbCrLf)
For Index = LBound(DataLines) to UBound(DataLines)
MsgBox DataLines(Index)
Next
You will need to be careful to make sure you have the correct line break for the data being read.
When dealing with HTTP you have to consider both line separators and character encoding. If you can make assumptions after testing then you can bypass some checking and just hard-code to fit your needs.
However the creay old Internet Transfer Control ("Inet") is usually not the best choice available and more modern alternatives are shipped as part of Windows since at least the advent of IE 5.5, and installed with IE 5.5 on more ancient versions of Windows. Thus they'll even be available and work on nearly any Win95 system still running today.
'References to MSXML 3.0 or later,
' ADO 2.5 or later.
Private Function GetHttpText(ByVal URL As String) As ADODB.Stream
Dim Req As MSXML2.XMLHTTP
Dim CharSet As String
Dim CharsetPos As Long
Dim LineSeparator As LineSeparatorEnum
Set Req = New MSXML2.XMLHTTP
Set GetHttpText = New ADODB.Stream
With GetHttpText
.Open
.Type = adTypeBinary
With Req
.Open "GET", URL, False
.send
CharSet = LCase$(.getResponseHeader("CONTENT-TYPE"))
End With
.Write Req.responseBody
CharsetPos = InStr(CharSet, "charset")
If CharsetPos Then
CharSet = Split(Mid$(CharSet, CharsetPos), "=")(1)
Else
'UTF-8 is a reasonable "default" these days:
CharSet = "utf-8"
End If
If CharSet = "utf-8" Then
LineSeparator = adLF
Else
'Your milage may vary here, since there is no line-end
'header defined for HTTP:
LineSeparator = adCRLF
End If
.Position = 0
.Type = adTypeText
.CharSet = CharSet
.LineSeparator = LineSeparator
End With
End Function
Private Sub DumpTextLineByLine()
With GetHttpText("http://textfiles.com/art/simpsons.txt")
'Read text line by line to populate a multiline TextBox
'just as a demonstration:
Do Until .EOS
Text1.SelText = .ReadText(adReadLine)
Text1.SelText = vbNewLine
Loop
.Close
End With
End Sub
I'm having a problem trying to output the content of some variables in vb6 into a text file. The thing is that when a special character from extended ASCII appears as ä, ü, á it is transformed in the output to the matching basic ASCII char like a, u, a.
I've tried to export it like UTF-8 and then the character is shown correctly, but I need the output to be ASCII. Also, looks strange for me that the filename can normally contain this chars (ä, ü, á...) without sustitution.
Can this be because "ASCII" charset is just the basic and not the extended? Maybe because of the CodePages configured in Windows? I've tried with a couple of them (German, English) with the same result.
This is the code I'm using:
Set fileStream = New ADODB.Stream
If Not fileStream Is Nothing Then
inputString = textPreAppend + inputString
fileStream.charSet = "ASCII"
fileStream.Open
fileStream.WriteText inputString
fileStream.Flush
fileStream.SaveToFile fileName, adSaveCreateOverWrite
fileStream.Flush
fileStream.Close
End If
Set fileStream = Nothing
Thanks in advance!
Both PRB: Charset Property of ADO Stream Object May Require Microsoft Internet Explorer Upgrade and Charset Property (ADO) suggest that ADO CharSet values are those listed under HKEY_CLASSES_ROOT\MIME\Database\Charset but that clearly is not the entire story.
For example both values "ascii" and "us-ascii" are listed there as aliases of "iso-8859-1" however running with my locale set to U.S. English they act like a 7-bit ASCII MIME type. They'd almost have to, since there is nothing else provided for requesting 7-bit ASCII encoding anyway.
This produces the result you seem to want:
Option Explicit
Private Sub Main()
Dim fileStream As ADODB.Stream
Dim inputString As String
Const FileName As String = "outputfile.txt"
Set fileStream = New ADODB.Stream
inputString = "The thing is that when a special character" & vbNewLine _
& "from extended ASCII appears as ä, ü, á it" & vbNewLine _
& "is transformed in the output to the matching" & vbNewLine _
& "basic ASCII char like a, u, a." & vbNewLine
With fileStream
.Type = adTypeText
.Charset = "iso-8859-1"
.Open
.WriteText inputString
.SaveToFile FileName, adSaveCreateOverWrite
.Close
End With
End Sub
Output:
The thing is that when a special character
from extended ASCII appears as ä, ü, á it
is transformed in the output to the matching
basic ASCII char like a, u, a.
We have to assume that asking for "ascii" (the values are all lowercased though clearly are not case-sensitive) means 7-bit ASCII, perhaps localized.
Using UTF-8 is a bad idea unless you want UTF-8. While a lot of *nix systems pretend there is no difference, the Stream will write a BOM and of course those extended (non-ASCII) characters are multibyte encoded.
Why can't you just do something like the following? This works ok for me:
Dim FileNo As Integer
Dim strFile As String
strFile = "C:\Test.txt"
FileNo = FreeFile
Dim strVariable As String
strVariable = "some text with extended chars in: ÙÑáêôü"
Open strFile For Append As #FileNo
Print #FileNo, strVariable
Close #FileNo