I am trying to split an RTF file into lines (in my code) and I am not quite getting it right, mostly because I am not really grokking the entirety of the RTF format. It seems that lines can be split by \par or \pard or \par\pard or any number of fun combinations.
I am looking for a piece of code that splits the file into lines in any language really.
You could try the specification (1.9.1) (see External Links on the Wikipedia page - which also has a couple of links to examples or modules in several programming languages).
That would most likely give you an idea of the line insertion "words", so you can split the file into lines using a well-defined set of rules rather than taking a guess at it.
Have you come across O'Reilly's RTF Pocket Guide, by Sean M. Burke ?
On page 13, it says
Here are some rules of thumb for putting linebreaks in RTF:
Put a newline before every \pard or \ (commands that are explained in the "Paragraphs" section.
Put a newline before and after the RTF font-table, stylesheet, and other similar constructs (like the color table, decribed later).
You can put a newline after every Nth space, {, or }. (Alternatively: put a newline after every space, {, or } that's after the 60th column.)
Or were you thinking of extracting the plaintext as lines, and doing it whatever the language of the plaintext?
I coded up a quick and dirty routine and it seems to work for pretty much anything I've been able to throw at it. It's in VB6, but easily translatable into anything else.
Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
Dim colReturn As Collection
Dim lngPosStart As Long
Dim strLine As String
Dim sSplitters(1 To 4) As String
Dim nIndex As Long
' return collection of lines '
' The lines can be split by the following '
' "\par" '
' "\par " '
' "\par\pard " '
' Add these splitters in order so that we do not miss '
' any possible split combos, for instance, "\par\pard" is added before "\par" '
' because if we look for "\par" first, we will miss "\par\pard" '
sSplitters(1) = "\par \pard"
sSplitters(2) = "\par\pard"
sSplitters(3) = "\par "
sSplitters(4) = "\par"
Set colReturn = New Collection
' We have to find each variation '
' We will look for \par and then evaluate which type of separator is there '
Do
lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
If lngPosStart > 0 Then
strLine = Left$(strSource, lngPosStart - 1)
For nIndex = 1 To 4
If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
' remove the 1st line from strSource '
strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))
' add to collection '
colReturn.Add strLine
' get out of here '
Exit For
End If
Next
End If
Loop While lngPosStart > 0
' check to see whether there is a last line '
If Len(strSource) > 0 Then colReturn.Add strSource
Set ParseRTFIntoLines = colReturn
End Function
Related
I tried almost all the methods (CLEAN,TRIM,SUBSTITUTE) trying to remove the character hiding in the beginning and the end of a text. In my case, I downloaded the bill of material report from oracle ERP and found that the item codes are a victim of hidden characters.
After so many findings, I was able to trace which character is hidden and found out that it's a question mark'?' (via VBA code in another thread) both at the front and the end. You can take this item code: 11301-21
If you paste the above into your excel and see its length =LEN(), you can understand my problem much better.
I need a good solution for this problem. Therefore please help!
Thank you very much in advance.
Thanks to Gary's Student, because his answer inspired me.
Also, I used this answer for this code.
This function will clean every single char of your data, so it should work for you. You need 2 functions: 1 to clean the Unicode chars, and other one to clean your item codes_
Public Function CLEAN_ITEM_CODE(ByRef ThisCell As Range) As String
If ThisCell.Count > 1 Or ThisCell.Count < 1 Then
CLEAN_ITEM_CODE = "Only single cells allowed"
Exit Function
End If
Dim ZZ As Byte
For ZZ = 1 To Len(ThisCell.Value) Step 1
CLEAN_ITEM_CODE = CLEAN_ITEM_CODE & GetStrippedText(Mid(ThisCell.Value, ZZ, 1))
Next ZZ
End Function
Private Function GetStrippedText(txt As String) As String
If txt = "–" Then
GetStrippedText = "–"
Else
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "[^\u0000-\u007F]"
GetStrippedText = regEx.Replace(txt, "")
End If
End Function
And this is what i get using it as formula in Excel. Note the difference in the Len of strings:
Hope this helps
You have characters that look like a space character, but are not. They are UniCode 8236 & 8237.
Just replace them with a space character (ASCII 32).
EDIT#1:
Based on the string in your post, the following VBA macro will replace UniCode characters 8236 amd 8237 with simple space characters:
Sub Kleanup()
Dim N1 As Long, N2 As Long
Dim Bad1 As String, Bad2 As String
N1 = 8237
Bad1 = ChrW(N1)
N2 = 8236
Bad2 = ChrW(N2)
Cells.Replace what:=Bad1, replacement:=" ", lookat:=xlPart
Cells.Replace what:=Bad2, replacement:=" ", lookat:=xlPart
End Sub
A VBScript is in use to shorten the system path by replacing entries with the 8.3 versions since it gets cluttered with how much software is installed on our builds. I'm currently adding the ability to remove duplicates, but it's not working correctly.
Here is the relevant portion of code:
original = "apple;orange;apple;lemon\banana;lemon\banana"
shortArray=Split(original, ";")
shortened = shortArray(1) & ";"
For n=2 to Ubound(shortArray)
'If the shortArray element is not in in the shortened string, add it
If NOT (InStr(1, shortened, shortArray(n), 1)) THEN
shortened = shortened & ";" & shortArray(n)
ELSE
'If it already exists in the string, ignore the element
shortened=shortened
End If
Next
(Normally "original" is the system path, I'm just using fruit names to test...)
The output should be something like
apple;orange;lemon\banana
The issue is entries with punctuation, such as lemon\banana, seem to be skipped(?). I've tested it with other punctuation marks, still skips over it. Which is an issue, seeing how the system path has punctuation in every entry.
I know the basic structure works, since there are only one of each entry without punctuation. However, the real output is something like
apple;orange;lemon\banana;lemon\banana
I thought maybe it was just a character escape issue. But no. It still will not do anything with entries containing punctuation.
Is there something I am doing wrong, or is this just a "feature" of VBScript?
Thanks in advance.
This code:
original = "apple;orange;apple;lemon\banana;lemon\banana"
shortArray = Split(original, ";")
shortened = shortArray(0) ' array indices start with 0; & ";" not here
For n=1 to Ubound(shortArray)
'If the shortArray element is not in in the shortened string, add it
'i.e. If InStr() returns *number* 0; Not applied to a number will negate bitwise
' If 0 = InStr(1, shortened, shortArray(n), 1) THEN
If Not CBool(InStr(1, shortened, shortArray(n), 1)) THEN ' if you insist on Not
WScript.Echo "A", shortArray(n), shortened, InStr(1, shortened, shortArray(n), vbTextCompare)
shortened = shortened & ";" & shortArray(n)
End If
Next
WScript.Echo 0, original
WScript.Echo 1, shortened
WScript.Echo 2, Join(unique(shortArray), ";")
Function unique(a)
Dim d : Set d = CreateObject("Scripting.Dictionary")
Dim e
For Each e In a
d(e) = Empty
Next
unique = d.Keys()
End Function
output:
0 apple;orange;apple;lemon\banana;lemon\banana
1 apple;orange;lemon\banana
2 apple;orange;lemon\banana
demonstrates/explains your errors (indices, Not) and shows how to use the proper tool for uniqueness (dictionary).
I would like to clean an auto-generated Word document.
This document contains several tables and there are many blank lines between each of them. I would like to develop a macro that only keeps one blank line between each table.
I don't know if it can be done. Now I'm stuck with:
Dim i As Integer
Dim tTable As Table
For i = 0 To ActiveDocument.Tables.Count
Set tTable = ActiveDocument.Tables.Item(i)
' ???
Next
Any idea?
I found how to do that:
Dim ParagraphToTrim As Range
Dim tTable As Table
Dim aTables() As Table
Set aTables = ActiveDocument.Tables
For Each tTable In aTables
' Supply a Start and End value for the Range.
Set ParagraphToTrim = ActiveDocument.Range(tTable.Range.Next(Unit:=wdParagraph).Start, tTable.Range.Next(Unit:=wdTable).Start)
' Keep at least a paragraph between each table
If ParagraphToTrim.Paragraphs.Count > 1 Then
With ParagraphToTrim
' Change the start of the range
.MoveStart Unit:=wdParagraph
.Delete
End With
End If
Next
Hi I have a text file that I would like to assign to an array and then assign each item in the array to a custom defined variable. When I open the file in notepad, it seems as if the data is on one line and there's about 10 tabs worth of space until the next piece of information.
I use the following code to successfully view the information in a msgbox as MyArray(i).
In my code example, all the information is listed in MyArray(0) and MyArray(1) gives me an error of subscript out of range. The information in the text file seems to appear as if it were delimited by vbCrLf but that does not work either...
Is there a way to trim the spaces from MyArray(0) and then re-assign the individual data to a new array? Here's what the first two pieces of information look like from my file:
967042
144890
Public Function ReadTextFile()
Dim TextFileData As String, myArray() As String, i As Long
Dim strCustomVariable1 As String
Dim strCustomVariable2 As String
'~~> Open file as binary
Open "C:\textfile\DATA-SND" For Binary As #1
'~~> Read entire file's data in one go
TextFileData = Space$(LOF(1))
Get #1, , TextFileData
'~~> Close File
Close #1
'~~> Split the data in seperate lines
myArray() = Split(TextFileData, vbCrLf)
For i = 0 To UBound(myArray())
MsgBox myArray(i)
Next
End Function
Under normal circumstances, I'd suggest that you use Line Input instead:
Open "C:\textfile\DATA-SND" For Input As #1
Do Until EOF(1)
Redim Preserve myArray(i)
Line Input #1, myArray(i)
i = i + 1&
Loop
Close #1
However, you're likely dealing with different end-line characters. You can use your existing code and just change it to use vbCr or vbLf instead of vbCrLf. My method assumes that your end-line characters are vbCrLf.
So for UNIX files:
myArray() = Split(TextFileData, vbLf)
And for old Mac files:
myArray() = Split(TextFileData, vbCr)
i was making resource for my programming class well its actually very basic scripting and i found this site and look through it there was realy many useful stuff about scripting but the thing i was searching for wasnt on the list or i wasnt using right keyword
anyway my question is
My teacher ask me to write a Vbs to Print Multiplication Table and i made researches and this is where i am right now;
dim sum, arraynum(), arrayline1, count, arraynum2(), arrayline2, arraynum3(), arrayline3, arraynum4(), arrayline4, arraynum5(), arrayline5
count=1
sum=1
arrayline1=1
for count=1 to 5
redim preserve arraynum(arrayline1)
redim preserve arraynum2(arrayline2)
redim preserve arraynum3(arrayline3)
redim preserve arraynum4(arrayline4)
redim preserve arraynum5(arrayline5)
arraynum(arrayline1)=sum
arraynum2(arrayline2)=sum*2
arrayline2=arrayline2+1
arraynum3(arrayline3)=sum*3
arrayline3=arrayline3+1
arraynum4(arrayline4)=sum*4
arrayline4=arrayline3+1
arraynum5(arrayline5)=sum*5
arrayline5=arrayline5+1
sum=sum+1
arrayline1=arrayline1+1
next
wscript.echo join(arraynum) & vbcrlf & join(arraynum2) & vbcrlf & join(arraynum3) & vbcrlf & join
(arraynum4) & vbcrlf & join(arraynum5)
' Its printing like;
' 1 2 3 4 5
' 2 4 6 8 10
' 3 6 8 12 15
' 4 8 12 16 20
' 5 10 15 20 25
as you can see they are not in a straight line and i wasnt able to do this with an input i mean take an input and show multiplication table for that i hope i made myself clear enough and if its not too much to ask how can i put a border between them or is it possible.
The trick is to apply leftpadding to your values that you are printing: Count the number of characters that a value contains, substract them from a fixed amount and add the same amount of spaces to the value.
This is an example that will replace and leftpad the vbTab character. If you join your arrays with a vbTab instead of the default space, you can use such a function.
Because this is a homework assignment, I added also some code that recursively get the multiples for a number, starting with 0. Just to trigger some curiosity. I would not recommend to just copy paste it, it does not comply to your requirement: "start from 1".
dim multiple
' Get the numbers 0 to 5
for each multiple in split(getMultiples(1,5), vbTab)
' print the multiplication table for each of this numbers
wscript.echo trim(TabToLpad(getMultiples(multiple, 5), 10))
next
' Does the calculation and returns a Tab delimited string of all multiples
function getMultiples(nr, amount)
getMultiples = 0
' As long as the amount is larger then 0, get the next multiple
if amount > getMultiples then getMultiples = getMultiples(nr, amount-1) & vbTab & (nr * amount)
End function
' Pads each value in a tab delimited string with the nrPadChars spaces. Returns a string.
function tabToLpad(str, nrPadChars)
dim part
for each part in split(str, vbTab)
tabToLpad = tabToLpad & string(nrPadChars - len(part), " ") & part
next
End Function