Parse word document in VBScript - vbscript

I got a weird mission from a friend, to parse through a bunch of Word files and write certain parts of them to a text file for further processing.
VBscript is not my cup of tea so I'm not sure how to fit the pieces together.
The documents look like this:
Header
A lot of not interesting text
Table
Header
More boring text
Table
I want to parse the documents and get all the headers and table of contents out of it. I'm stepping step through the document with
For Each wPara In wd.ActiveDocument.Paragraphs
And I think I know how to get the headers
If Left(wPara.Range.Style, Len("Heading")) = "Heading" Then
But I'm unsure of how to do the
Else if .. this paragraph belongs to a table..
So, any hint on how I could determine if a paragraph is part of a table or not would be nice.

Untested, because I have no access to MS Word right now.
Option Explicit
Dim FSO, Word, textfile, doc, para
' start Word instance, open doc ...
' start FileSystemObject instance, open textfile for output...
For Each para In doc.Paragraphs
If IsHeading(para) Or IsInTable(para) Then
SaveToFile(textfile, para)
End If
Next
Function IsHeading(para)
IsHeading = para.OutlineLevel < 10
End Function
Function IsInTable(para)
Dim p, dummy
IsInTable = False
Set p = para.Parent
' at some point p and p.Parent will both be the Word Application object
Do While p Is Not p.Parent
' dirty check: if p is a table, calling a table object method will work
On Error Resume Next
Set dummy = obj.Cell(1, 1)
If Err.Number = 0 Then
IsInTable = True
Exit Do
Else
Err.Clear
End If
On Error GoTo 0
Set p = p.Parent
Loop
End Function
Obviously SaveToFile is something you'd implement yourself.
Since "is in table" is naturally defined as "the object's parent is a table", this is a perfect situation to use recursion (deconstructed a little further):
Function IsInTable(para)
IsInTable = IsTable(para.Parent)
If Not (IsInTable Or para Is para.Parent) Then
IsInTable = IsInTable(para.Parent)
End If
End Function
Function IsTable(obj)
Dim dummy
On Error Resume Next
Set dummy = obj.Cell(1, 1)
IsTable = (Err.Number = 0)
Err.Clear
On Error GoTo 0
End Function

Related

Assigning CSV values to structure

I'm creating what should be a simple program but I'm having some difficulty assigning values from a file into a structure and it's variables. Visual Basic.
Structure:
Public Structure Teams
Dim teamName As String
End Structure
Function:
Function getAvailableTeams() As Teams()
Dim rec As Teams
Dim index As Integer
Dim recCount As Integer = 0
'Count how many teams exist
FileOpen(1, "teamConfig.csv", OpenMode.Input)
Do Until EOF(1)
LineInput(1) 'Read document line by line
recCount += 1 'Increment team count by 1
Loop
'store team names in array
Dim teamNames(recCount - 1) As Teams
index = 0
Do Until EOF (1)
Input(1, rec.teamName)
teamNames(index).teamName = rec.teamName
index +=1
Loop
FileClose(1)
Return teamNames
End Function
Simple subroutine to test values are available and being picked up.
Dim availableTeams() As Teams
availableTeams = getAvailableTeams()
lbltest.text = availableTeams(1).toString
The file is stored as a CSV file and there are 11 available team names.
team1 \r\n
team2 \r\n
etc...
I appreciate this is probably something simple but I can't work out where I'm going wrong with this.
One of the comments was on the right track. You need to close and re-open the file for input to start at the beginning again. Since you were already at end-of-file, the second attempt fails immediately unless you re-start from the beginning.

Modifying an Excel VBA Script to Work in Word

I have the following VBA code:
Sub test2()
Dim w1 As Worksheet
Dim w2 As Worksheet
Dim k As Long
Dim c As Range
Dim d As Range
Dim strFA As String
Set w1 = Sheets("Sheet1")
Set w2 = Sheets("Sheet2")
w2.Cells.Clear
k = 1
With w1.Range("A:A")
Set c = .Cells.Find("FirstThing", After:=.Cells(.Cells.Count), lookat:=xlWhole)
strFA = ""
While Not c Is Nothing And strFA <> c.Address
If strFA = "" Then strFA = c.Address
If IsError(Application.Match(c.Offset(0, 1).value, w2.Range("A:A"), False)) Then
Set d = .Cells.Find("SecondThing", c, , xlWhole)
w2.Range("A" & k).value = c.Offset(1, 0).value
w2.Range("B" & k).value = d.Offset(0, 1).value
k = k + 1
End If
Set c = .Cells.Find("FirstThing", After:=c, lookat:=xlWhole)
Wend
End With
End Sub
The code works essentially like this:
Look through Sheet1 for a certain phrase.
Once the phrase is found, place the value from the cell one row over in Sheet2
Search for a second phrase.
Place the value from the cell one row over in the cell beside the other value in Sheet2
Repeat
Now. I have the same data that, don't ask me why, is in .doc files. I'd like to create something similar to this code that will go through and look for the first phrase, and place the next n characters in an Excel sheet, and then look for the second phrase and place the next m characters in the row beside the cell housing the previous n characters.
I'm not sure whether it's better to do this with a bash script or whether it's possible to do this with VBA, so I've attached both as tags.
Your question seems to be: "I'm not sure whether it's better to do this with a bash script or whether it's possible to do this with VBA"
The answer to that is: You'd need VBA, especially since this is a *.doc file - docx would be a different matter.
In order to figure out what that is, start by trying to do the task manually in Word. More specifically, how to use Word's "Find" functionality. When you get that figured out, record those actions in a macro to get the starting point for your syntax. The code on the Excel side for writing the data across will essentially stay the same.
You'll also need to decide where the code should reside: in Word or in Excel. That will mean researching how to run the other application from within the one you choose - lots of examples here on SO and on the Internet...

Optimize performance of Removing Hidden Rows in VBA

I am using the following code to remove hidden/filtered lines after applying autofilters to a big sheet in VBA (big means roughly 30,000 rows):
Sub RemoveHiddenRows()
Dim oRow As Range, rng As Range
Dim myRows As Range
With Sheets("Sheet3")
Set myRows = Intersect(.Range("A:A").EntireRow, .UsedRange)
If myRows Is Nothing Then Exit Sub
End With
For Each oRow In myRows.Columns(1).Cells
If oRow.EntireRow.Hidden Then
If rng Is Nothing Then
Set rng = oRow
Else
Set rng = Union(rng, oRow)
End If
End If
Next
If Not rng Is Nothing Then rng.EntireRow.Delete
End Sub
The code comes from here: Delete Hidden/Invisible Rows after Autofilter Excel VBA
Moreover I read this thread: Speeding Up Code that Removes Hidden Rows on a Sheet
The situation: I have applied 5 different filters to a table consisting of 12 columns, therefore a lot of rows are filtered out (hidden) after the process. When I try to delete those, the code above takes a very long time. In my case I don't know if Excel was still working, so I had to force an exit. That leads to the following question:
Is there any other way than looping through all the hidden rows and deleting them?
An idea which came to my mind was to copy only the remaining unfiltered (that is non-hidden) content to a new sheet and afterwards delete the old sheet, which contains the full information. If so, how can that be done?
I don't think you need to involve another worksheet. Simply copy the rows below the existing Range.CurrentRegion property and then remove the filter and delete the original data.
Sub RemoveHiddenRows()
With Sheets("Sheet10")
With .Cells(1, 1).CurrentRegion
With .Offset(1, 0).Resize(.Rows.Count - 1, .Columns.Count)
If CBool(Application.Subtotal(103, .Columns(1))) Then
.Cells.Copy Destination:=.Cells(.Rows.Count + 1, 1)
End If
.AutoFilter
.Cells(1, 1).Resize(.Rows.Count, 1).EntireRow.Delete
End With
End With
End With
End Sub
You may also receive some good, focused help on this subject by posting on Code Review (Excel).
You can improve performance significantly with a function like this:
Option Explicit
Public Sub deleteHiddenRows(ByRef ws As Worksheet)
Dim rngData As Range, rngVisible As Range, rngHidden As Range
With ws
Set rngData = .UsedRange
With rngData
Set rngVisible = .SpecialCells(xlCellTypeVisible)
Set rngHidden = .Columns(1)
End With
End With
If Not (rngVisible Is Nothing) Then
ws.AutoFilterMode = False
' invert hidden / visible
rngHidden.Rows.Hidden = False
rngVisible.Rows.Hidden = True
' delete hidden and show visible
rngData.SpecialCells(xlCellTypeVisible).Delete
rngVisible.Rows.Hidden = False
End If
End Sub
I tested it on a file with 2 filters applied to it
The function was adapted from the code in this suggestion

Lotusscript : How to sort the field values (an array of words) by their frequency

I would like to sort the field values (strings) by their frequency in lotusscript.
Has anyone an idea to solve this?
Thanks a lot.
Personally I would avoid LotusScript if you can help it. You are going to run into limitations that cannot be worked around.
Regardless of which route you do take, from a performance point of view it is better to have the View indexes do the work.
So you would create a view. The first column would be as follows.
Column Value: The field you want to check.
Sort: Ascending
Type: Categorized
After this you can access the data using the NotesViewNavigator. The related method call is getNextCategory. This will give you a view entry object which you can call ChildCount on to get totals.
For example (Disclaimer: Code written from memory, not guaranteed to run):
Dim sess As New NotesSession
Dim db As NotesDatabase
Dim vw As NotesView
Dim nav as NotesViewNavigator
Dim entryA As NotesViewEntry
Dim entryB As NotesViewEntry
Set db = sess.CurrentDatabase
Set vw = db.GetView("testView")
vw.AutoUpdate = False
Set nav = vw.CreateViewNav
Set entryA = nav.GetFirst
while entryA not Nothing
Set entryB = nav.GetNextCategory(entryA)
if entryB not nothing then
' Do your processing.
' entryB.childCount will give total.
end if
set EntryA = EntryB
Wend
view.AutoUpdate = True
This way the heavy lifting (string sorting, counting) is handled by the View index. So you only need to process the final results.
To answer the op's (old) question directly, the way to do this in LotusScript is both simple and easy:
dim para as string
dim words as variant
dim fq list as long
'get the text to freq-count
para = doc.text '(or $ from somewhere)
'tidy up para by removing/replacing characters you don't want
para = replace(para, split(". , : ; - [ ] ()"), "")
words = split(para) 'or split(words, "delim") - default is space
forall w in words
if iselement(words(w)) then
fq(w) = fq(w) + 1
Else
fq(w) = 1
End forall
'you now have a count of each individual word in the FQ list
'to get the words out and the word frequencies (not sorted):
forall x in fq
print listtag(x) = x
End forall
Thats it. No issue with LotusScript - quick and easy (and lists can be massive). To get a sorted list, you would have to move to an array and do a sort on it or move to a field and let #sort do the job somehow.

LotusScript - Setting element in for loop

I have an array set up
Dim managerList(1 To 50, 1 To 100) As String
what I am trying to do, is set the first, second, and third elements in the row
managerList(index,1) = tempManagerName
managerList(index,2) = tempIdeaNumber
managerList(index,3) = 1
But get an error when I try to do that saying that the object variable is not set. I maintain index as an integer, and the value corresponds to a single manager, but I can't seem to manually set the third element. The first and second elements set correctly.
On the flip side, I have the following code that will allow for the element to be set,
For x=1 To 50
If StrConv(tempManagerName,3) = managerList(x,1) Then
found = x
For y=3 to 100
If managerList(x,y) = "" Then
managerList(x,y) = tempIdeaNumber
Exit for
End If
Next
Exit For
End If
Next
It spins through the array (laterally) trying to find an empty element. Ideally I would like to set the index of the element the y variable is on into the 3rd element in the row, to keep a count of how many ideas are on the row.
What is the best way to keep a count like this? Any idea why I am getting a Object variable not set error when I try to manually set the element?
object variable not set means that you are trying to call methods or access properties on an un-initialized object. I don't see anything like that in the code snippets you have published, are you sure the error occurs in those lines?
A good way to pin-point errors is to include the module and line number in the error message. Add this around your subroutine to get a more detailed message:
Sub Initialize
On Error Goto errorthrower
//
// your code goes here...
//
Exit sub
ErrorThrower:
Error Err, Str$(Err) & " " & Error & Chr(13) + "Module: " & Cstr( Getthreadinfo(1) ) & ", Line: " & Cstr( Erl )
End sub
(I originally found this on Ferdy Christants blog)
It's not quite clear what problem you are trying to resolve here, but it looks like you have 1..50 "managers" that can have 1..100 "ideas" ? I'd make a class for managers instead:
Class manager
Private managername As String
Private ideas(1 To 100) As String
Sub new(managername As String)
Me.managername=managername
End Sub
// whatever methods you need....
End Class
Then, I'd keep track of them with a list of these objects:
Dim managerlist List As manager
Dim key As String
key = Strconv(tempmanagername,3)
if not iselement(managerlist(key)) then
set managerlist(key) = new manager(key)
end if
Dim currentmanager As manager
Set currentmanager = managerlist(key)
This is only an example to get you started, you will have to adapt this to solve your actual problem.

Resources